Details

    • Type: New Feature New Feature
    • Status: Open Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.8.5
    • Fix Version/s: None
    • Component/s: groovy-jdk
    • Labels:
    • Environment:
      OSX 10.6.8, groovy 1.8.5 from macports
    • Number of attachments :
      0

      Description

      I'd like to enhance the groovy additions to java.util.Iterator

      For the current state of groovy Iterator api see:
      http://groovy.codehaus.org/groovy-jdk/java/util/Iterator.html

      I'd like to add to Iterator a lot of the methods from the groovy Collection api, see:
      http://groovy.codehaus.org/groovy-jdk/java/util/Collection.html

      Rationale:

      The groovy collection api offers a rich api for applying closures
      for various tasks like filtering, transforming and aggregation

      But the groovy Iterator api does currently implement only a very small subset of this.

      When working with large data sets,
      for example extracting data from a large log file, line by line
      it is not suitable to put all of that data in a collection first.

      But instead that data should stream through the processing steps.

      Like building up a pipe flow in unix with a lot of chained grep and sed scripts.

      But don't get me wrong here, we're not talking about filtering character data,
      instead we're talking about a stream of Objects that we want to pass on from step to step.

      I believe the Iterator api is suitable for this.

      Maybe it's currently possible to apply some of the Collection api to Iterators,
      but they collect and return Collection then.

      But I want to return another Iterator in turn,
      so to avoid collecting the data,
      but instead just process that single item,
      and pass it on to the next step in turn as an Iterator.

      I came up with transform() and filter() already,
      I'll post this here, if I'm done debugging

      Some other thoughts.
      Once we have this dataflow going in the single threaded case,
      then we might want to extend this to multi threading.

        Activity

        Hide
        Paul King added a comment -

        Hi, this sounds potentially interesting. Before going too far, do you have some specific examples of what you want the api to achieve?

        Groovy already has methods like "findAll" and "collect" which do "filtering" and "transforming". If introducing (for instance) an Iterator variant of findAll that returned an iterator was deemed close enough to existing semantics (which might be possible if we provide all of the available Collection methods on Iterators or Objects), then it might not be needed to introduce a new name, as an example. If you give us some more examples of what you are after we could make further suggestions to you.

        Also, Gpars already has findAll/collect (for collections and object but not Iterators because in general iteration is sequential) and parallel array equivalents map(), filter() and reduce() methods which may provide the concurrent use cases you are interested in. Again, if you have some specific examples we might be able to point you into areas we see in the gpars space where improvements can be made.

        Show
        Paul King added a comment - Hi, this sounds potentially interesting. Before going too far, do you have some specific examples of what you want the api to achieve? Groovy already has methods like "findAll" and "collect" which do "filtering" and "transforming". If introducing (for instance) an Iterator variant of findAll that returned an iterator was deemed close enough to existing semantics (which might be possible if we provide all of the available Collection methods on Iterators or Objects), then it might not be needed to introduce a new name, as an example. If you give us some more examples of what you are after we could make further suggestions to you. Also, Gpars already has findAll/collect (for collections and object but not Iterators because in general iteration is sequential) and parallel array equivalents map(), filter() and reduce() methods which may provide the concurrent use cases you are interested in. Again, if you have some specific examples we might be able to point you into areas we see in the gpars space where improvements can be made.
        Hide
        blackdrag blackdrag added a comment -

        As I see it you take a Collection, and apply n chained operations on it. What you get is not one new collection for each operation, but instead you collect the operations and if then the user asks for the next element the original Collection is filtered according to the given operations, to get in the end no new collection, but only the newest element. Imagine for example you have a very large data structure and it fits into memory only once, using up almost all of the available memory. There you cannot afford the operations producing a copy of that structure containing almost all of the elements once again.

        Show
        blackdrag blackdrag added a comment - As I see it you take a Collection, and apply n chained operations on it. What you get is not one new collection for each operation, but instead you collect the operations and if then the user asks for the next element the original Collection is filtered according to the given operations, to get in the end no new collection, but only the newest element. Imagine for example you have a very large data structure and it fits into memory only once, using up almost all of the available memory. There you cannot afford the operations producing a copy of that structure containing almost all of the elements once again.
        Hide
        Eike Dierks added a comment - - edited

        I want to stream Objects through processing by Closures.

        Java provides the Iterator Interface for this.

        Adding behaviour to an Interface is hard, because there's no implementation
        but somehow groovy already does do this somewhere.

        Most of the methods that are available for Collection
        should be available for Iterator as well.

        Whenever possible,
        those methods should return an Iterator again.

        While as an exception to that Rule,
        sort() might return a sorted collection.

        I was tricked by uniq().
        Obviously uniq can easily be implemented in terms of an iterator.
        (while that iterator might need to keep some state)

        Show
        Eike Dierks added a comment - - edited I want to stream Objects through processing by Closures. Java provides the Iterator Interface for this. Adding behaviour to an Interface is hard, because there's no implementation but somehow groovy already does do this somewhere. Most of the methods that are available for Collection should be available for Iterator as well. Whenever possible, those methods should return an Iterator again. While as an exception to that Rule, sort() might return a sorted collection. I was tricked by uniq(). Obviously uniq can easily be implemented in terms of an iterator. (while that iterator might need to keep some state)
        Hide
        Eike Dierks added a comment -

        Where should I add my code?

        Show
        Eike Dierks added a comment - Where should I add my code?
        Hide
        Paul King added a comment -

        You could attach a patch here or create a fork on github if you prefer.

        Show
        Paul King added a comment - You could attach a patch here or create a fork on github if you prefer.
        Hide
        Pascal Schumacher added a comment -

        @Eike Dierks: Do you still want to submit this enhancement or can this be closed?

        Show
        Pascal Schumacher added a comment - @Eike Dierks: Do you still want to submit this enhancement or can this be closed?

          People

          • Assignee:
            Unassigned
            Reporter:
            Eike Dierks
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 1 week
              1w
              Remaining:
              Remaining Estimate - 1 week
              1w
              Logged:
              Time Spent - Not Specified
              Not Specified