jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • GeoTools
  • GEOT-2572

Support estimated extents in postgis-ng

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 2.7-M3
  • Component/s: jdbc
  • Labels:
    None

Description

The current postgis datastore allows usage of the estimated extents function to quickly compute bounds of multi-million row datasets quickly by fast inspection of the index optimizer statistics.
The NG one does not, and in order to support it, we'll need to make some API changes as the estimated extent is not a query that is applied on a table, but look like:

select estimated_extent('tableName', 'columnName')

The oracle plugin needs this to use the SDO_TUNE fast bbox computations as well, but with a twist, in that case the function always returns a correct result provided the data is in a projected CS (so it's always a good choice with the right data set, it's not a matter of user setup).

In both cases it seems the dialect class should be modified to allow the specific dialect to completely take over the computation of extents for a given set of geometries.

  • Options
    • Sort By Name
    • Sort By Date
    • Ascending
    • Descending
    • Download All

Attachments

  1. Text File
    GEOT-2572-v1.patch
    03/Jul/09 10:24 AM
    10 kB
    Andrea Aime
  2. Text File
    GEOT-2572-v2.patch
    11/Aug/10 12:57 PM
    27 kB
    Andrea Aime

Issue Links

is related to

New Feature - A new feature of the product, which has yet to be developed. UDIG-1659 Relaxing requirement to know bounds

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • History
  • Activity
Hide
Permalink
Andrea Aime added a comment - 28/Jun/09 5:19 AM
After some discussion on IRC the proposed way is to add the following method to SQLDialect:

{code}
  ReferencedEnvelope getOptimizedBounds(String schema, SimpleFeatureType featureType, Connection cx)
{code}

the method will perform an optimized computation if possible, based on all geometric columns, or return null if the optimized computation is not possible.
The passed down featureType might have less geoemtric columns than the table if the user asked for a bounds computation on a subset of columns. The eventual type and column metadata will be preserved by JDBCDataStore in that case (so that, for example, the native SRID is guaranteed to be there in any case).
Show
Andrea Aime added a comment - 28/Jun/09 5:19 AM After some discussion on IRC the proposed way is to add the following method to SQLDialect: {code}   ReferencedEnvelope getOptimizedBounds(String schema, SimpleFeatureType featureType, Connection cx) {code} the method will perform an optimized computation if possible, based on all geometric columns, or return null if the optimized computation is not possible. The passed down featureType might have less geoemtric columns than the table if the user asked for a bounds computation on a subset of columns. The eventual type and column metadata will be preserved by JDBCDataStore in that case (so that, for example, the native SRID is guaranteed to be there in any case).
Hide
Permalink
Christian Mueller added a comment - 28/Jun/09 12:33 PM
Only to be sure, Andrea a quick look at

http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp

I think that is the equivalent for db2 ?
Show
Christian Mueller added a comment - 28/Jun/09 12:33 PM Only to be sure, Andrea a quick look at http://publib.boulder.ibm.com/infocenter/db2luw/v9r5/index.jsp I think that is the equivalent for db2 ?
Hide
Permalink
Andrea Aime added a comment - 29/Jun/09 1:24 AM
Christian, that link brings me to the general DB2 9.5 documentation.
Show
Andrea Aime added a comment - 29/Jun/09 1:24 AM Christian, that link brings me to the general DB2 9.5 documentation.
Hide
Permalink
Christian Mueller added a comment - 29/Jun/09 2:46 AM
Ups, please type in the search field

MBR Aggregate

and click on the second row in the result list
Show
Christian Mueller added a comment - 29/Jun/09 2:46 AM Ups, please type in the search field MBR Aggregate and click on the second row in the result list
Hide
Permalink
Andrea Aime added a comment - 29/Jun/09 3:21 AM
Christian, nope, it does not look like the above ones. I think you're already using ST_BuildMBRAgg in the current envelope computation implementation no?

The two functions above are not aggregate ones, they are faster alternatives that do use the index statistics as opposed to inspecting the geometries themselves and aggregating the results. As a result they are almost instantaenous in results, thought they may return an incorrect bound (either inflated or reduced compared to the actual bounds). I have a dataset sitting here in pg with over 20 million roads, the standard aggregation takes over 2 minutes to compute the bbox, whilst the estimated_extent is instant and returns a bbox that's around 2% away from the real one.
Show
Andrea Aime added a comment - 29/Jun/09 3:21 AM Christian, nope, it does not look like the above ones. I think you're already using ST_BuildMBRAgg in the current envelope computation implementation no? The two functions above are not aggregate ones, they are faster alternatives that do use the index statistics as opposed to inspecting the geometries themselves and aggregating the results. As a result they are almost instantaenous in results, thought they may return an incorrect bound (either inflated or reduced compared to the actual bounds). I have a dataset sitting here in pg with over 20 million roads, the standard aggregation takes over 2 minutes to compute the bbox, whilst the estimated_extent is instant and returns a bbox that's around 2% away from the real one.
Hide
Permalink
Christian Mueller added a comment - 29/Jun/09 5:24 AM
Uaah, panic mode, no I did not use ST_BuildMBRAgg for

public void encodeGeometryEnvelope(String tableName, String geometryColumn, StringBuffer sql)

This was not obvious to me by reading the java doc.

But know, I will
Show
Christian Mueller added a comment - 29/Jun/09 5:24 AM Uaah, panic mode, no I did not use ST_BuildMBRAgg for public void encodeGeometryEnvelope(String tableName, String geometryColumn, StringBuffer sql) This was not obvious to me by reading the java doc. But know, I will
Hide
Permalink
Andrea Aime added a comment - 03/Jul/09 10:24 AM
Here is a first patch to foster discussion. It implements the api change suggested.
Looking at it it's evident there is duplicated logic about merging the various resultin envelopers (in the case of multiple geom columns).
I guess it's better to:
- have the getOptimizedBounds receive just one geometry column name
- have JDBCDataStore loop over the geom columns (instead of making the sql dialect do so)
- factor out the logic that merges (and reprojects) the envelopes into a single method in JDBCDataStore taking a list of ReferencedEnvelope and a target CRS, so that both code paths can use it

Oh, by looking at the code it would seem that in case of multiple geometries the current query is built to aggregate all of the geom columsn, but in the end only the first one is used by getBounds(SimpleFeatureType featureType, Query query, Connection cx). I guess we need a test case with multiple geoms.
Show
Andrea Aime added a comment - 03/Jul/09 10:24 AM Here is a first patch to foster discussion. It implements the api change suggested. Looking at it it's evident there is duplicated logic about merging the various resultin envelopers (in the case of multiple geom columns). I guess it's better to: - have the getOptimizedBounds receive just one geometry column name - have JDBCDataStore loop over the geom columns (instead of making the sql dialect do so) - factor out the logic that merges (and reprojects) the envelopes into a single method in JDBCDataStore taking a list of ReferencedEnvelope and a target CRS, so that both code paths can use it Oh, by looking at the code it would seem that in case of multiple geometries the current query is built to aggregate all of the geom columsn, but in the end only the first one is used by getBounds(SimpleFeatureType featureType, Query query, Connection cx). I guess we need a test case with multiple geoms.
Hide
Permalink
Justin Deoliveira added a comment - 05/Jul/09 11:06 PM
All in all patch looks good, but agreed that the aggregating and the reprojection of envelopes could be factored out into the datastore. What about changing the method to return a list of ReferencedEnvelope, one item per geometry column. That way the dialect would not have to worry about merging them?

One minor thing would be adding a log statement for the sql statements.
Show
Justin Deoliveira added a comment - 05/Jul/09 11:06 PM All in all patch looks good, but agreed that the aggregating and the reprojection of envelopes could be factored out into the datastore. What about changing the method to return a list of ReferencedEnvelope, one item per geometry column. That way the dialect would not have to worry about merging them? One minor thing would be adding a log statement for the sql statements.
Hide
Permalink
Andrea Aime added a comment - 10/Sep/09 4:42 AM
I see uDig people are getting crazy due to the lack of this optimization in Oracle.

So let's try to resume the work on this one. I still prefer the dialect to just receive one geom column name, makes it simpler.
If there are no strong feelings against it I'm going to move forward with it (will provide an amended patch tomorrow)
Show
Andrea Aime added a comment - 10/Sep/09 4:42 AM I see uDig people are getting crazy due to the lack of this optimization in Oracle. So let's try to resume the work on this one. I still prefer the dialect to just receive one geom column name, makes it simpler. If there are no strong feelings against it I'm going to move forward with it (will provide an amended patch tomorrow)
Hide
Permalink
Justin Deoliveira added a comment - 10/Sep/09 7:45 AM
Cool +1, my only issue is with the duplication, was it agreed to move out the aggregation into the datastore? And just have the dialect method do the query and return the envelope?
Show
Justin Deoliveira added a comment - 10/Sep/09 7:45 AM Cool +1, my only issue is with the duplication, was it agreed to move out the aggregation into the datastore? And just have the dialect method do the query and return the envelope?
Hide
Permalink
Andrea Aime added a comment - 10/Sep/09 7:58 AM
Yes, we just have a difference in the details. I would like to have the datastore scan the feature type and call the dialect once per geometry, so the dialect code gets a geom column name and returns one envelope. The perf difference is going to be negligible.
Show
Andrea Aime added a comment - 10/Sep/09 7:58 AM Yes, we just have a difference in the details. I would like to have the datastore scan the feature type and call the dialect once per geometry, so the dialect code gets a geom column name and returns one envelope. The perf difference is going to be negligible.
Hide
Permalink
Justin Deoliveira added a comment - 10/Sep/09 9:07 AM
Cool, that sounds good to me.
Show
Justin Deoliveira added a comment - 10/Sep/09 9:07 AM Cool, that sounds good to me.
Hide
Permalink
Andrea Aime added a comment - 11/Aug/10 12:57 PM
Hi, here is an improved and potentially final version of the patch. In the end I followed Justin suggestion and simplified the code while I was at it.
Please review :-)
(I'm setting up the layers on the benchmarking server and I don't want to wait forever for the bounds to compute, nor to manually cut and paste the bounds for 15 or so layers... :-p )
Show
Andrea Aime added a comment - 11/Aug/10 12:57 PM Hi, here is an improved and potentially final version of the patch. In the end I followed Justin suggestion and simplified the code while I was at it. Please review :-) (I'm setting up the layers on the benchmarking server and I don't want to wait forever for the bounds to compute, nor to manually cut and paste the bounds for 15 or so layers... :-p )
Hide
Permalink
Christian Mueller added a comment - 12/Aug/10 3:54 AM
Tested with DB2, all tests ok
Show
Christian Mueller added a comment - 12/Aug/10 3:54 AM Tested with DB2, all tests ok
Hide
Permalink
Andrea Aime added a comment - 20/Aug/10 1:09 AM
Fixed on trunk (including Oracle bounds computation speedup)
Show
Andrea Aime added a comment - 20/Aug/10 1:09 AM Fixed on trunk (including Oracle bounds computation speedup)
Hide
Permalink
Andrea Aime added a comment - 28/Nov/10 2:57 AM
Mass closing all issues that have been in "resolved" state for 2 months or more without any feedback or update
Show
Andrea Aime added a comment - 28/Nov/10 2:57 AM Mass closing all issues that have been in "resolved" state for 2 months or more without any feedback or update

People

  • Assignee:
    Andrea Aime
    Reporter:
    Andrea Aime
Vote (1)
Watch (2)

Dates

  • Created:
    28/Jun/09 4:43 AM
    Updated:
    02/Jun/11 8:16 AM
    Resolved:
    20/Aug/10 1:09 AM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.