Details
-
Type:
New Feature
-
Status:
Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 1.8.5
-
Fix Version/s: None
-
Component/s: groovy-jdk
-
Labels:
-
Environment:OSX 10.6.8, groovy 1.8.5 from macports
-
Number of attachments :
Description
I'd like to enhance the groovy additions to java.util.Iterator
For the current state of groovy Iterator api see:
http://groovy.codehaus.org/groovy-jdk/java/util/Iterator.html
I'd like to add to Iterator a lot of the methods from the groovy Collection api, see:
http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html
Rationale:
The groovy collection api offers a rich api for applying closures
for various tasks like filtering, transforming and aggregation
But the groovy Iterator api does currently implement only a very small subset of this.
When working with large data sets,
for example extracting data from a large log file, line by line
it is not suitable to put all of that data in a collection first.
But instead that data should stream through the processing steps.
Like building up a pipe flow in unix with a lot of chained grep and sed scripts.
But don't get me wrong here, we're not talking about filtering character data,
instead we're talking about a stream of Objects that we want to pass on from step to step.
I believe the Iterator api is suitable for this.
Maybe it's currently possible to apply some of the Collection api to Iterators,
but they collect and return Collection then.
But I want to return another Iterator in turn,
so to avoid collecting the data,
but instead just process that single item,
and pass it on to the next step in turn as an Iterator.
I came up with transform() and filter() already,
I'll post this here, if I'm done debugging
Some other thoughts.
Once we have this dataflow going in the single threaded case,
then we might want to extend this to multi threading.
Hi, this sounds potentially interesting. Before going too far, do you have some specific examples of what you want the api to achieve?
Groovy already has methods like "findAll" and "collect" which do "filtering" and "transforming". If introducing (for instance) an Iterator variant of findAll that returned an iterator was deemed close enough to existing semantics (which might be possible if we provide all of the available Collection methods on Iterators or Objects), then it might not be needed to introduce a new name, as an example. If you give us some more examples of what you are after we could make further suggestions to you.
Also, Gpars already has findAll/collect (for collections and object but not Iterators because in general iteration is sequential) and parallel array equivalents map(), filter() and reduce() methods which may provide the concurrent use cases you are interested in. Again, if you have some specific examples we might be able to point you into areas we see in the gpars space where improvements can be made.