History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: GEOT-865
Type: Sub-task Sub-task
Status: In Progress In Progress
Priority: Major Major
Assignee: Bryce Nordgren
Reporter: Bryce Nordgren
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
GeoTools
GEOT-414

Collections which fire events and check types

Created: 02/Jun/06 06:31 PM   Updated: 23/Jun/06 07:23 PM
Component/s: core geometry
Affects Version/s: None
Fix Version/s: None


 Description  « Hide
Prior to accepting the geometry code, it should be thoroughly overhauled to use only implementations of the Collections interfaces which facilitate synchronization and validate that contained elements are of a particular type.

It is quite possible to use Jakarta's Commons Collections as a starting point. They already have type checking decorators and they have an abstract base upon which to write an event firing decorator.

http://jakarta.apache.org/commons/collections/



 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Bryce Nordgren - 02/Jun/06 06:48 PM
They also have an "event-firing" decorator for collections interfaces which is in the "dormant" segment of their project. Checking it out now.

Martin Desruisseaux - 05/Jun/06 11:42 PM
One thing seems a little bit sad to me about the Jakarta collection framework. They provides a large set of new collection implementations, but ignore totally the J2SE standard implementation. Is there really that much advantage to use their HashedMap implementation instead of the standard HashMap? Why they ignore totally java.util.AbstractMap, which provides a great deal of basic behavior?

It was close to trivial to implements checked collection on top of J2SE standard implementation. For example org.geotools.util.CheckedHashMap had to override only one method. Same apply to java.util.CheckedHashSet. Those classes provided in Geotools are much smaller than the Jakarta implementation (14 kb uncompressed compared to 1 Mb). Jakarta seems to reinvent at least some wheels, but maybe I missed their point?

If we want to provide event-firing HashSet, HashMap and ArrayList implementations, I can volunter for adding this functionality on top of CheckedHashSet, CheckedHashMap or CheckedArrayList. It should be very easy and use few space (maybe an additional 10 kb - we still far from the 1 Mb Jarkarta framework). Should we follow this path?

Bryce Nordgren - 06/Jun/06 12:24 PM
Ah hah. I didn't know we had Checked Maps/Sets/Lists.

The main difference in strategy between {{org.geotools.util}} and the Jakarta commons-collection stuff is that Jakarta has not provided new Set/Map/List implementations. They used the decorator pattern instead. The stuff in {{org.geotools.util}} is a complete, self contained set, capable of managing storage and retrieval of elements. The commons-collections stuff does not implement storage or retrieval, it _only_ implements the one particular functionality (e.g. checking the type of the element before adding it, synchronizing, etc.) This is the same strategy as {{Collections.unmodifiableSet()}}, which has been carried forward into 1.5. Decorators are beneficial because:

1. You wrap the original set, not copy it.
2. You can chain decorators together with whatever combination you want: List decoratedList = Collections.checkedList(Collections.synchronizedList(baseList))
3. The result looks like a plain old java.util.List/Map/Set.

Whichever way we go forward, be it commons-collections or J2SE 1.5, we'll be using a decorator pattern instead of many subclasses of AbstractCollection. I'd prefer to use commons-collections because it works on 1.4+ jvms (maybe even earlier), plus it provides more functionality than the new J2SE libs (can bounds check, check for null, transform entries on the fly, has other types of collections: buffers, bidirectional Maps bags, and ordered maps/sets (for Jody and his XML fetish) :) )

As to size, 3.1 in my maven repo is 547k. That seems like a small price to pay for the use of code which someone else maintains and has a lot more use than a small component inside GT. ;)

Martin Desruisseaux - 06/Jun/06 06:02 PM
The "org.geotools.util" implementations (CheckedHashSet as HashSet subclass instead of a decorator) was choosen in order to reduce the amount of objects created in the metadata package. It doesn't involve any copy, because CheckedHashSet are private static final fields in our metadata implementation. Avoding the decorator pattern here divide by 2 the number of objects created, which may be significant when the amount of such object is large. I suspect that a similar concern may apply to geometries as well.

Note that "org.geotools.util" leave room for implementing CheckedSet, CheckedMap and CheckedList as decorators as well. First, the name "CheckedSet" (compared to "CheckedHashSet") was intentionnaly left available. Second, implementing CheckedSet as decorator on top of java.util.AbstractSet is easy too, sinced AbstractSet already do a lot of job.

Chaining decorators is a very elegant approach when there is not a huge amount of objects. If we really have a lot of geometries, it is going to put more burden on the garbage collector.

Note that to be strict, a decorator constructor would need to scan over all elements in the wrapped set for making sure that they are of the right type. This is not as costly as copying, but still more costly than what decorators usually are.

The compressed JAR is about 500 kb on my machine too. The 1 Mb cited in my previous comment was an uncompressed size (as mentionned in the comment) in order to compare it with the 14 kb size. My main grief about that is: Why Jakarta reinvent the weel? Why they don't leverage J2SE AbstractSet, HashSet and their friends?

However, if we are really going to use a lot of Jakarta extra functionalities (and not just the checked collections, which by the way are available in J2SE 1.5 as new static methods in Collections), then it is worth consideration.

Bryce Nordgren - 07/Jun/06 01:41 PM
I just went through looking for where this is going to burn us:

Large numbers of points are handled efficiently via PointArray and PointGrid.
LineStrings, ArcStrings, etc. are specified to use PointArrays, and so can contain large numbers of points. The "positions()" method converts this to a list of positions in GeoAPI, is not specified by 19107, and probably should be deprecated/removed. I suppose clever implementations could be backed by the PointArray.
A Ring aggregates one or more "OrientableCurves" into a closed cycle. Probably the most common case is that a Ring is just a single LineString. This must be backed by a List to be manipulatable.
A SurfaceBoundary combines a conditionally mandatory exterior ring with an arbitrary number of optional interior rings. The common case is probably the containment of a single exterior Ring. This doesn't use a Collection in any case, but probably should (see GEO-65).
A Polygon contains a reference to a single SurfaceBoundary.

Ergo, to create a Polygon you must create at a minimum: 1] a Polygon; 2] a SurfaceBoundary; 3] a Ring; 4] a List for the Ring; 5] a LineString (or ArcString); 6] a PointArray of the correct size. Every Polygon is a minimum of 6 objects, only one of which is a collection.

Probably the effort we would spend optimizing collections for memory utilization would be better spent trying to figure out how to combine some of these things. (e.g. Special case implementation: the Polygon manages the PointArray directly. Intermediate objects created only when needed.)

Regardless, I think efficiency is a different issue. It's important and we're going to have to address it down the road, but this jira is about synchronization and type checking. We can optimize the collections stuff when we go thru and optimize "geometry". Efficiency is a pandora's box.