GeoTools
  1. GeoTools
  2. GEOT-2572

Support estimated extents in postgis-ng

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7-M3
    • Component/s: jdbc
    • Labels:
      None

      Description

      The current postgis datastore allows usage of the estimated extents function to quickly compute bounds of multi-million row datasets quickly by fast inspection of the index optimizer statistics.
      The NG one does not, and in order to support it, we'll need to make some API changes as the estimated extent is not a query that is applied on a table, but look like:

      select estimated_extent('tableName', 'columnName')
      

      The oracle plugin needs this to use the SDO_TUNE fast bbox computations as well, but with a twist, in that case the function always returns a correct result provided the data is in a projected CS (so it's always a good choice with the right data set, it's not a matter of user setup).

      In both cases it seems the dialect class should be modified to allow the specific dialect to completely take over the computation of extents for a given set of geometries.

      1. GEOT-2572-v1.patch
        10 kB
        Andrea Aime
      2. GEOT-2572-v2.patch
        27 kB
        Andrea Aime

        Issue Links

          Activity

          Hide
          Andrea Aime added a comment -
          After some discussion on IRC the proposed way is to add the following method to SQLDialect:

          {code}
            ReferencedEnvelope getOptimizedBounds(String schema, SimpleFeatureType featureType, Connection cx)
          {code}

          the method will perform an optimized computation if possible, based on all geometric columns, or return null if the optimized computation is not possible.
          The passed down featureType might have less geoemtric columns than the table if the user asked for a bounds computation on a subset of columns. The eventual type and column metadata will be preserved by JDBCDataStore in that case (so that, for example, the native SRID is guaranteed to be there in any case).
          Show
          Andrea Aime added a comment - After some discussion on IRC the proposed way is to add the following method to SQLDialect: {code}   ReferencedEnvelope getOptimizedBounds(String schema, SimpleFeatureType featureType, Connection cx) {code} the method will perform an optimized computation if possible, based on all geometric columns, or return null if the optimized computation is not possible. The passed down featureType might have less geoemtric columns than the table if the user asked for a bounds computation on a subset of columns. The eventual type and column metadata will be preserved by JDBCDataStore in that case (so that, for example, the native SRID is guaranteed to be there in any case).
          Hide
          Christian Mueller added a comment -
          Only to be sure, Andrea a quick look at

          http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp

          I think that is the equivalent for db2 ?
          Show
          Christian Mueller added a comment - Only to be sure, Andrea a quick look at http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp I think that is the equivalent for db2 ?
          Hide
          Andrea Aime added a comment -
          Christian, that link brings me to the general DB2 9.5 documentation.
          Show
          Andrea Aime added a comment - Christian, that link brings me to the general DB2 9.5 documentation.
          Hide
          Christian Mueller added a comment -
          Ups, please type in the search field

          MBR Aggregate

          and click on the second row in the result list
          Show
          Christian Mueller added a comment - Ups, please type in the search field MBR Aggregate and click on the second row in the result list
          Hide
          Andrea Aime added a comment -
          Christian, nope, it does not look like the above ones. I think you're already using ST_BuildMBRAgg in the current envelope computation implementation no?

          The two functions above are not aggregate ones, they are faster alternatives that do use the index statistics as opposed to inspecting the geometries themselves and aggregating the results. As a result they are almost instantaenous in results, thought they may return an incorrect bound (either inflated or reduced compared to the actual bounds). I have a dataset sitting here in pg with over 20 million roads, the standard aggregation takes over 2 minutes to compute the bbox, whilst the estimated_extent is instant and returns a bbox that's around 2% away from the real one.
          Show
          Andrea Aime added a comment - Christian, nope, it does not look like the above ones. I think you're already using ST_BuildMBRAgg in the current envelope computation implementation no? The two functions above are not aggregate ones, they are faster alternatives that do use the index statistics as opposed to inspecting the geometries themselves and aggregating the results. As a result they are almost instantaenous in results, thought they may return an incorrect bound (either inflated or reduced compared to the actual bounds). I have a dataset sitting here in pg with over 20 million roads, the standard aggregation takes over 2 minutes to compute the bbox, whilst the estimated_extent is instant and returns a bbox that's around 2% away from the real one.
          Hide
          Christian Mueller added a comment -
          Uaah, panic mode, no I did not use ST_BuildMBRAgg for

          public void encodeGeometryEnvelope(String tableName, String geometryColumn, StringBuffer sql)

          This was not obvious to me by reading the java doc.

          But know, I will
          Show
          Christian Mueller added a comment - Uaah, panic mode, no I did not use ST_BuildMBRAgg for public void encodeGeometryEnvelope(String tableName, String geometryColumn, StringBuffer sql) This was not obvious to me by reading the java doc. But know, I will
          Hide
          Andrea Aime added a comment -
          Here is a first patch to foster discussion. It implements the api change suggested.
          Looking at it it's evident there is duplicated logic about merging the various resultin envelopers (in the case of multiple geom columns).
          I guess it's better to:
          - have the getOptimizedBounds receive just one geometry column name
          - have JDBCDataStore loop over the geom columns (instead of making the sql dialect do so)
          - factor out the logic that merges (and reprojects) the envelopes into a single method in JDBCDataStore taking a list of ReferencedEnvelope and a target CRS, so that both code paths can use it

          Oh, by looking at the code it would seem that in case of multiple geometries the current query is built to aggregate all of the geom columsn, but in the end only the first one is used by getBounds(SimpleFeatureType featureType, Query query, Connection cx). I guess we need a test case with multiple geoms.
          Show
          Andrea Aime added a comment - Here is a first patch to foster discussion. It implements the api change suggested. Looking at it it's evident there is duplicated logic about merging the various resultin envelopers (in the case of multiple geom columns). I guess it's better to: - have the getOptimizedBounds receive just one geometry column name - have JDBCDataStore loop over the geom columns (instead of making the sql dialect do so) - factor out the logic that merges (and reprojects) the envelopes into a single method in JDBCDataStore taking a list of ReferencedEnvelope and a target CRS, so that both code paths can use it Oh, by looking at the code it would seem that in case of multiple geometries the current query is built to aggregate all of the geom columsn, but in the end only the first one is used by getBounds(SimpleFeatureType featureType, Query query, Connection cx). I guess we need a test case with multiple geoms.
          Hide
          Justin Deoliveira added a comment -
          All in all patch looks good, but agreed that the aggregating and the reprojection of envelopes could be factored out into the datastore. What about changing the method to return a list of ReferencedEnvelope, one item per geometry column. That way the dialect would not have to worry about merging them?

          One minor thing would be adding a log statement for the sql statements.
          Show
          Justin Deoliveira added a comment - All in all patch looks good, but agreed that the aggregating and the reprojection of envelopes could be factored out into the datastore. What about changing the method to return a list of ReferencedEnvelope, one item per geometry column. That way the dialect would not have to worry about merging them? One minor thing would be adding a log statement for the sql statements.
          Hide
          Andrea Aime added a comment -
          I see uDig people are getting crazy due to the lack of this optimization in Oracle.

          So let's try to resume the work on this one. I still prefer the dialect to just receive one geom column name, makes it simpler.
          If there are no strong feelings against it I'm going to move forward with it (will provide an amended patch tomorrow)
          Show
          Andrea Aime added a comment - I see uDig people are getting crazy due to the lack of this optimization in Oracle. So let's try to resume the work on this one. I still prefer the dialect to just receive one geom column name, makes it simpler. If there are no strong feelings against it I'm going to move forward with it (will provide an amended patch tomorrow)
          Hide
          Justin Deoliveira added a comment -
          Cool +1, my only issue is with the duplication, was it agreed to move out the aggregation into the datastore? And just have the dialect method do the query and return the envelope?
          Show
          Justin Deoliveira added a comment - Cool +1, my only issue is with the duplication, was it agreed to move out the aggregation into the datastore? And just have the dialect method do the query and return the envelope?
          Hide
          Andrea Aime added a comment -
          Yes, we just have a difference in the details. I would like to have the datastore scan the feature type and call the dialect once per geometry, so the dialect code gets a geom column name and returns one envelope. The perf difference is going to be negligible.
          Show
          Andrea Aime added a comment - Yes, we just have a difference in the details. I would like to have the datastore scan the feature type and call the dialect once per geometry, so the dialect code gets a geom column name and returns one envelope. The perf difference is going to be negligible.
          Hide
          Justin Deoliveira added a comment -
          Cool, that sounds good to me.
          Show
          Justin Deoliveira added a comment - Cool, that sounds good to me.
          Hide
          Andrea Aime added a comment -
          Hi, here is an improved and potentially final version of the patch. In the end I followed Justin suggestion and simplified the code while I was at it.
          Please review :-)
          (I'm setting up the layers on the benchmarking server and I don't want to wait forever for the bounds to compute, nor to manually cut and paste the bounds for 15 or so layers... :-p )
          Show
          Andrea Aime added a comment - Hi, here is an improved and potentially final version of the patch. In the end I followed Justin suggestion and simplified the code while I was at it. Please review :-) (I'm setting up the layers on the benchmarking server and I don't want to wait forever for the bounds to compute, nor to manually cut and paste the bounds for 15 or so layers... :-p )
          Hide
          Christian Mueller added a comment -
          Tested with DB2, all tests ok
          Show
          Christian Mueller added a comment - Tested with DB2, all tests ok
          Hide
          Andrea Aime added a comment -
          Fixed on trunk (including Oracle bounds computation speedup)
          Show
          Andrea Aime added a comment - Fixed on trunk (including Oracle bounds computation speedup)
          Hide
          Andrea Aime added a comment -
          Mass closing all issues that have been in "resolved" state for 2 months or more without any feedback or update
          Show
          Andrea Aime added a comment - Mass closing all issues that have been in "resolved" state for 2 months or more without any feedback or update

            People

            • Assignee:
              Andrea Aime
              Reporter:
              Andrea Aime
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: