GeoTools
  1. GeoTools
  2. GEOT-2882

"Expose primary keys" is not provided as a factory property in JDBC data stores

    Details

    • Type: Improvement Improvement
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.6.3
    • Component/s: jdbc
    • Labels:
      None

      Description

      JDBCDataStore has a "exposePrimaryKeyColumns" property that allows for mapping the primary key columns as straigth attributes of a feature.
      This is something that has been requested many times by GeoServer users. Yet, GS cannot use that property because it's not exposed by the factory.

      1. GEOT-2882.patch
        40 kB
        Andrea Aime
      2. GEOT-2882.patch
        41 kB
        Andrea Aime
      3. GEOT-2882.patch
        25 kB
        Andrea Aime

        Activity

        Hide
        Andrea Aime added a comment -
        Here is a patch that allows to expose the primary keys as attributes.

        The patch does not limit itself to expose the primary keys because working on it I noticed exposing primary keys broke almost all write attempts to the database.

        This patch reworks the select and insert methods as well as the writer to deal with the extra columns. The update code path was not breaking.

        However there is still a difference, whilst the insert path will ignore the exposed pk attribute values and will unpack the Feature identifier to get the values for those columns instead, the update path will actually execute the update even on those columns. Not sure what to do here.
        I guess the best approach would be to use the value specified in the attribute if the column does not has a sequence or is auto-increment, and ignore it otherwise (and do so uniformly for insert and update).

        Opinions?
        Show
        Andrea Aime added a comment - Here is a patch that allows to expose the primary keys as attributes. The patch does not limit itself to expose the primary keys because working on it I noticed exposing primary keys broke almost all write attempts to the database. This patch reworks the select and insert methods as well as the writer to deal with the extra columns. The update code path was not breaking. However there is still a difference, whilst the insert path will ignore the exposed pk attribute values and will unpack the Feature identifier to get the values for those columns instead, the update path will actually execute the update even on those columns. Not sure what to do here. I guess the best approach would be to use the value specified in the attribute if the column does not has a sequence or is auto-increment, and ignore it otherwise (and do so uniformly for insert and update). Opinions?
        Hide
        Christian Mueller added a comment -
        IMHO I would borrow the concept of the EJB Spec.

        The rules are

        Supply the constructor of a business object with the primary key values.
        Afterwards, these values are exposed but read only.

        This simple concept has many advantages

        1) Avoiding the problems of updating. Since you cannot change the key, no duplicate row errors will happen

        2) It gives you a simple possibility for creating a feature cache if needed. The key is feature type + primary key values. This will not work if you can change the key attributes

        3) Changing the key could break relationship with foreign keys, this is very nasty

        I would always expose the primary key attributes, but I would make them read only.

        And finally, again IMHO, using attributes holding information relevant to the user as primary key is bad design, but seen very often. The concept of heaving a technical key (FID) with no meaning to the user keeps your objects and relations stable. Unfortunately, this design is more the exception than the rule.






        Show
        Christian Mueller added a comment - IMHO I would borrow the concept of the EJB Spec. The rules are Supply the constructor of a business object with the primary key values. Afterwards, these values are exposed but read only. This simple concept has many advantages 1) Avoiding the problems of updating. Since you cannot change the key, no duplicate row errors will happen 2) It gives you a simple possibility for creating a feature cache if needed. The key is feature type + primary key values. This will not work if you can change the key attributes 3) Changing the key could break relationship with foreign keys, this is very nasty I would always expose the primary key attributes, but I would make them read only. And finally, again IMHO, using attributes holding information relevant to the user as primary key is bad design, but seen very often. The concept of heaving a technical key (FID) with no meaning to the user keeps your objects and relations stable. Unfortunately, this design is more the exception than the rule.
        Hide
        Andrea Aime added a comment -
        If I interpret your suggestion correctly the main thing would be to never allow update to change the primary key column values. Which makes a lot of sense to me.
        On insert the interpretation is still open. Shall we use the attribute values or try to unpack the key?
        Generally speaking I don't trust users to actually follow the key building rules, for the simple fact they are not advertised, especially for multicolumn keys, thus I guess we should:
        - use the sequence or auto-increment for all the pk columns that do use them
        - use the user provided value for all other pk columns?
        Show
        Andrea Aime added a comment - If I interpret your suggestion correctly the main thing would be to never allow update to change the primary key column values. Which makes a lot of sense to me. On insert the interpretation is still open. Shall we use the attribute values or try to unpack the key? Generally speaking I don't trust users to actually follow the key building rules, for the simple fact they are not advertised, especially for multicolumn keys, thus I guess we should: - use the sequence or auto-increment for all the pk columns that do use them - use the user provided value for all other pk columns?
        Hide
        Christian Mueller added a comment -
        Yep, your interpretation is correct.

        About the insert:

        Again taken from the EJB Spec. EJB forces you to create a primary key class and on business object creation, you have to pass such a primary key object as a parameter. I think the idea is to use this primary key object as a key for all kind of internal registries.

        This concept avoids to implement public setter methods for primary key attributes. It is not possible to write code modifying the pk values.

        The other way is to have setter methods for pk attributes, but throw an exception if the method is called and the attribute value is not null . The disadvantage is that you can produce code which will fail at runtime. No support in EJB for this approach.

        In EJB, the users MUST follow the key building rules. I am open to both approaches, but prefer the way EJB does it and would rewrite your last 2 lines as

        - use the sequence or auto-increment for all the pk columns that do use them
        - force th user to the geotools key building rules







        Show
        Christian Mueller added a comment - Yep, your interpretation is correct. About the insert: Again taken from the EJB Spec. EJB forces you to create a primary key class and on business object creation, you have to pass such a primary key object as a parameter. I think the idea is to use this primary key object as a key for all kind of internal registries. This concept avoids to implement public setter methods for primary key attributes. It is not possible to write code modifying the pk values. The other way is to have setter methods for pk attributes, but throw an exception if the method is called and the attribute value is not null . The disadvantage is that you can produce code which will fail at runtime. No support in EJB for this approach. In EJB, the users MUST follow the key building rules. I am open to both approaches, but prefer the way EJB does it and would rewrite your last 2 lines as - use the sequence or auto-increment for all the pk columns that do use them - force th user to the geotools key building rules
        Hide
        Andrea Aime added a comment -
        The driving spec for GeoTools datastore is WFS, not EJB.
        Anyways, in WFS a feature id cannot be modified either once set.
        Setting it can be tricky, WFS 1.0 does not say much about that, WFS 1.1 just refers to the gml:id, with three different policies:

        {code}
        GenerateNew The web feature service shall generate unique identifiers for all newly inserted
        (default) feature instances.
                         In response to an <Insert> operation, a web feature service shall use the gml:id
        UseExisting
                         attribute values on inserted features and elements. If any of those IDs duplicates
                         the ID of a feature or element already stored in the WFS, the WFS shall raise an
                         exception.
        ReplaceDuplicate A WFS client can request that the WFS generate IDs to replace the input values of
                         gml:id attributes of feature elements that duplicate the ID of a feature or element
                         already stored in the WFS, instead of raising an exception, by setting the idgen
                         attribute of the InsertElementType to the value "ReplaceDuplicate".
        {code}

        At the moment we do a best effort to support GenerateNew.
        Not sure if we support UseExisting at all in GeoServer, have to check.
        ReplaceDuplicate is something that datastores should not be concerned with imho, if one wants to implement that he can just issues a delete before the insert.

        Long story short... I think I just have to modify the udpate path to make the exposed pk attributes read only and we're in business
        Show
        Andrea Aime added a comment - The driving spec for GeoTools datastore is WFS, not EJB. Anyways, in WFS a feature id cannot be modified either once set. Setting it can be tricky, WFS 1.0 does not say much about that, WFS 1.1 just refers to the gml:id, with three different policies: {code} GenerateNew The web feature service shall generate unique identifiers for all newly inserted (default) feature instances.                  In response to an <Insert> operation, a web feature service shall use the gml:id UseExisting                  attribute values on inserted features and elements. If any of those IDs duplicates                  the ID of a feature or element already stored in the WFS, the WFS shall raise an                  exception. ReplaceDuplicate A WFS client can request that the WFS generate IDs to replace the input values of                  gml:id attributes of feature elements that duplicate the ID of a feature or element                  already stored in the WFS, instead of raising an exception, by setting the idgen                  attribute of the InsertElementType to the value "ReplaceDuplicate". {code} At the moment we do a best effort to support GenerateNew. Not sure if we support UseExisting at all in GeoServer, have to check. ReplaceDuplicate is something that datastores should not be concerned with imho, if one wants to implement that he can just issues a delete before the insert. Long story short... I think I just have to modify the udpate path to make the exposed pk attributes read only and we're in business
        Hide
        Christian Mueller added a comment -
        Ok, the important issue is to not allow updates on pk attributes. +1
        Show
        Christian Mueller added a comment - Ok, the important issue is to not allow updates on pk attributes. +1
        Hide
        Justin Deoliveira added a comment -
        Patch looks good. The only very minor thing would be to rename the getPrimaryKeyColumnNames() method to just getColumnsNames() to match the other primary key "accessor" method, getNextValues(). Or make a note to rename the latter to getPrimaryKeyNextValues(). SOrry I know it is very anal but as that class grows (it is almost 4000 loc) I think it will be increasingly important to keep the methods as organized as possible.
        Show
        Justin Deoliveira added a comment - Patch looks good. The only very minor thing would be to rename the getPrimaryKeyColumnNames() method to just getColumnsNames() to match the other primary key "accessor" method, getNextValues(). Or make a note to rename the latter to getPrimaryKeyNextValues(). SOrry I know it is very anal but as that class grows (it is almost 4000 loc) I think it will be increasingly important to keep the methods as organized as possible.
        Hide
        Andrea Aime added a comment -
        Here is a revised patch. I've changed the method name as suggested.

        Doing my final tests I found two more hiccups I had to fix:
        - when a primary key is exposed there were troubles if not all of the pk attributes where extracted by the Query (propertyNames). Fixed that and added a JDBCFeatureSourceExposePkTest to keep that under check
        - primary key columns are exposed but treated read only, so we cannot make the "compulsory" in the WFS output, in fact we completely ignore them write wise. Changed the type building code to account for that (an exposed pk is nullable even if the underlying table says it's not)
        Show
        Andrea Aime added a comment - Here is a revised patch. I've changed the method name as suggested. Doing my final tests I found two more hiccups I had to fix: - when a primary key is exposed there were troubles if not all of the pk attributes where extracted by the Query (propertyNames). Fixed that and added a JDBCFeatureSourceExposePkTest to keep that under check - primary key columns are exposed but treated read only, so we cannot make the "compulsory" in the WFS output, in fact we completely ignore them write wise. Changed the type building code to account for that (an exposed pk is nullable even if the underlying table says it's not)
        Hide
        Andrea Aime added a comment -
        Sorry, forgot a refactor in the last patch, here is a final one (hopefully)
        Show
        Andrea Aime added a comment - Sorry, forgot a refactor in the last patch, here is a final one (hopefully)
        Hide
        Justin Deoliveira added a comment -
        Patch looks good.
        Show
        Justin Deoliveira added a comment - Patch looks good.
        Hide
        Andrea Aime added a comment -
        Fixed on 2.6.x and trunk
        Show
        Andrea Aime added a comment - Fixed on 2.6.x and trunk

          People

          • Assignee:
            Andrea Aime
            Reporter:
            Andrea Aime
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: