Details
-
Type:
Improvement
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: None
-
Component/s: xml
-
Labels:None
Description
To support streaming of features, GeoServer WFS writes the root element of a response and then each feature as it is created. The root element requires a numberOfFeatures element; because the number of features is not known, GeoServer must get them all to count them. It then must get them all a second time to encode them one by one. (Streaming is needed as not all responses will fit in memory.) The consequence of the double query is that WFS responses take twice as long. This is particularly bad for app-schema as it is already very slow.
In practice, GeoServer streams features to disk first, so that an error document can be returned if an error occurs. (This is a selectable output strategy.)
In theory it should be possible to stream only the features to disk as an XML fragment, then encode the root element on the fly when all the features have been seen, meaning the numberOfFeatures is known, and only one query is required.
This may require modification to Encoder, OutputStrategy, and OutputFormat APIs.
-
Hide
- GEOT-3302.zip
- 19/Jan/11 11:47 PM
- 7 kB
- Victor Tey
- Download Zip
-
Hide
- GEOT-3302 FULL PATCH.zip
- 13/Dec/10 3:25 AM
- 9 kB
- Victor Tey
- Download Zip
-
Hide
- Geot-3302 - Relavant Patch.zip
- 13/Dec/10 3:25 AM
- 3 kB
- Victor Tey
- Download Zip
-
- GEOT-3302-Revision2_002.patch
- 17/Feb/11 7:45 PM
- 8 kB
- Victor Tey
-
- GEOT-3302-Revision2.patch
- 17/Feb/11 1:14 AM
- 8 kB
- Victor Tey
-
- ResultTypeHit.patch
- 05/Apr/11 4:33 AM
- 1 kB
- Victor Tey
Activity
Of course, having the app-schema store support a real count query would be much better.
Thanks for your advice. My colleague Matt is actually working on this, and it was Ben's idea (who'll be back on Monday).
I will get Matt to start a mailing list discussion for this.
I was looking at your suggestion but serializing the feature will be equally painful as well due to the complexity of the feature. My initial though was to create a collection of the features and serialize it during the count and then deserialize it on request to getNext().
To serialize the feature, every instance in the object must be serializable as well, preferably from the superclass onwards so as not to end up with partial serialization.
Would you be able to suggest a systemic approach to making the feature serializable?
Maybe this could be solved as a string handling problem instead, and at the output format level? First write out the collection using the normal encoder, but wrap the collection into something that will count the features as they are written out.
And then read back the collection and use XSLT to inject that single extra attribute (actually, that could be done for the collection bounds as well). Since the XLST would target a well known attribute it should be fast.
Which is very close to the initial suggestion, however I'd do this as a detail of the GML output format implementation without API changes.
Full patch shows fixes for app-schema tests due to change in calculating size() in mappingFeatureCollection
I'll try to have a look at it during the weekend (work time fully booked as always...)
One point to note, I wasn't sure how in GML3OutputFormat to identify only complex feature therefore I write everything out to file and only perform the transformation if it is complex feature. Perhaps you would be able to provide some advice once again. Ideally if we can do something like
if(!complex){
encode(results,output,encoder)
}else{
complexFeatureStreamIntercept(results,output,encoder)
}
I also just notice my spelling mistake and will correct it my second revision of the patch :p
I know you have been busy but I am just wondering if you manage to have a look at my patch :)
Thanks
generally speaking the patch seems good, but you'll surely need Justin's review as well.
Quick observations:
- could you put the xslt in a separate file to be loaded from the classpath just once, statically, in memory? It would make the xslt more readable and you'd avoid parsing it over and over (don't remember how but I think you can parse it just once and keep it around)
- a collection should be returning its size always, but I think that if you don't want to you should probably follow FeatureSource.getCount() lead and return -1 instead
- to create the temporary file to dump the xml to for xslt purposes you should use File.createTempFile() and possibly move the try way up to make sure not even runtime exception can make GS leave a stale file around
* please avoid formatting only changes in patches, it makes them harder to review
* this line seems confusing:
{noformat}
+ public boolean isComplexFeature() {
+ return ids.size() != 0;
+ }
{noformat}
Why does the existence of ids imply complex features? Comments please or perhaps a better method name.
* In GML3OutputFormat the method named "complexFeatureSteamInceptor". What does it mean? Is that supposed to read "stream interceptor"?
* When you generate the temp file is there any reason not to wrap the file output stream in a buffered output stream?
* If I am reading the patch correctly this xslt transformation is applied for every single GML3 request. I have reservations about this. I could be wrong here but for fast data sources like shapefile and postgis might not the double query be faster than a full xslt transform, which are known to be notoriously slow? Maybe I am wrong but it may be worth only following this patch if the source feature source is a complex one
The only affected file should be GML3OutputFormat and MappingFeatureCollection
Changing the size in to return -1 would cause an exception to be thrown as the numberOfFeature uses XSDNonNegativeInteger class. What if i alternatively return Integer max value rather then 0?
GeoServer trunk: Completed: At revision: 15312
Geotool 2.7.x
GeoServer 2.1.x
I understand that at this current point in time, gt 2.7.x has not been created and will only commit after the release of geoserver 2.1.
I had to commit this to branch prior to getting approval from the community in order to fix the build.
At this current point in time, gs 2.1.x depends on gt trunk which was the cause of the build failure because previously I committed my changes to geotool trunk and geoserver trunk. My changes was not captured in the 2.1.x which has a dependency on gt trunk with the commit I made.
If we decide to roll back the patch I commit then we will need to branch out gt 2.7.
1) Can you provide a quick comment just describing this issue in general, more specifically why we have to count the results manually
2) It would be nice to factor out the isComplexFeature() method that appears to be copied from GML3OutputFormat. Maybe make one of them static? Not a big deal but nice to consolidate code. if we had a utility class we could throw it in there.
Anyways, your good to commit.
we have to count the number of features here manually because complex feature collection
size() now returns 0. In order to count the number of features, we have to build the features
to count them and this has great performance impact. Unless we introduce joins in our fetching of
data, we will have to count the number of features manually when needed. In GML3Outputformat
I use xslt to populate numberOfFeatures attribute.
I was unable to find a suitable util class to factor out the isComplexFeature method therefore I made the method in GML3OutputFormat static
SecuredFeatureCollection.features() is supposed to return a FeatureIterator but due to some policy setting on the server, it seem to return a Iterator instead. I was unable to replicate the issue on my local machine and it is machine setting specific.
I was also unable to get any response from Andrea with regards to this therefore had to submit this workaround.
2011-02-24 11:04:02,941 ERROR [geoserver.ows] -
java.lang.ClassCastException: org.geoserver.security.decorators.SecuredIterator cannot be cast to org.geotools.feature.FeatureIterator
at org.geoserver.security.decorators.SecuredFeatureCollection.features(SecuredFeatureCollection.java:57)
at org.geoserver.wfs.response.HitsOutputFormat.countFeature(HitsOutputFormat.java:102)
at org.geoserver.wfs.response.HitsOutputFormat.write(HitsOutputFormat.java:85)
at org.geoserver.ows.Dispatcher.response(Dispatcher.java:751)
at org.geoserver.ows.Dispatcher.handleRequestInternal(Dispatcher.java:233)
at org.springframework.web.servlet.mvc.AbstractController.handleRequest(AbstractController.java:153)
at org.springframework.web.servlet.mvc.SimpleControllerHandlerAdapter.handle(SimpleControllerHandlerAdapter.java:48)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:875)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:809)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:571)
I believe what is happening in this case is that there are iterator objects that implement both FeatureIterator and Iterator. So the security wrapper is wrapping it in this case as an Iterator when it should be wrapping it as a FeatureIterator. I am not sure exactly which iterator is doing this but one such iterator is ForceCoordinateSystemIterator.
Not sure exactly what the best way to solve that issue is... perhaps passing in the interface into SecuredObjects.secure rather than just the object. Or perhaps just hunt down feature iterators that implement both and separate them out.I think the former is a more robust solution. Would like to hear Andrea weigh in on this one.