castor
  1. castor
  2. CASTOR-1029

allow to forget read only objects in long batch processes

    Details

    • Type: Improvement Improvement
    • Status: Open Open
    • Priority: Minor Minor
    • Resolution: Unresolved
    • Affects Version/s: 0.9.6
    • Fix Version/s: None
    • Component/s: JDO
    • Labels:
      None
    • Environment:
      Operating System: Windows XP
      Platform: PC
    • Bugzilla Id:
      1892
    • Number of attachments :
      2

      Description

      When using Castor to process data in a long read only batch process the fetched
      objects are all stored in TransactionContext._readOnlyObjects. This hash table
      is not cleared until the end of the transaction. So there may arise problems
      running out of heap memory.
      Well, one way to overcome this would be to use shorter transactions. But I
      think it's better to provide a simple additional method to disable usage of the
      read-only-objects hash table. It avoids reconnecting to the database very
      frequently and increases the performance of the batch process.
      Attached patch implements the new function Database.setRememberReadOnlyObjects
      () to deactivate the read only hash.

      1. castor-1892-forget-ro-objects.patch
        6 kB
        Martin Fuchs
      2. patch-1892.txt
        26 kB
        Ralf Joachim

        Activity

        Hide
        Martin Fuchs added a comment -

        Created an attachment (id=943)
        proposed patch

        Show
        Martin Fuchs added a comment - Created an attachment (id=943) proposed patch
        Hide
        Martin Fuchs added a comment -

        Well, what does everybody think about this extension?
        We already used this successfully in our application.

        Show
        Martin Fuchs added a comment - Well, what does everybody think about this extension? We already used this successfully in our application.
        Hide
        Martin Fuchs added a comment -
            • Bug 1450 has been marked as a duplicate of this bug. ***
        Show
        Martin Fuchs added a comment - Bug 1450 has been marked as a duplicate of this bug. ***
        Hide
        Werner Guttmann added a comment -

        Martin, I think your patch is quite clear and precise. I wonder, though,
        whether we should actually overload Databaseload() to provide this
        functionality, and not add a property to Database(Impl) ?

        Show
        Werner Guttmann added a comment - Martin, I think your patch is quite clear and precise. I wonder, though, whether we should actually overload Databaseload() to provide this functionality, and not add a property to Database(Impl) ?
        Hide
        Werner Guttmann added a comment -

        Just a question: I think that somewhere in the PAI, Castor JDO allows you to
        elevate a lock from 'read' to something higher, e.g. 'write'. How would this
        interfer with your ideas ?

        Show
        Werner Guttmann added a comment - Just a question: I think that somewhere in the PAI, Castor JDO allows you to elevate a lock from 'read' to something higher, e.g. 'write'. How would this interfer with your ideas ?
        Hide
        Werner Guttmann added a comment -

        Btw, if you wondered about what PAI stands for, don't worry .. . I meant to
        say 'API', of course.

        Show
        Werner Guttmann added a comment - Btw, if you wondered about what PAI stands for, don't worry .. . I meant to say 'API', of course.
        Hide
        gblock@ctoforaday.com added a comment -

        I'm of two minds on this.

        Either: Read only objects are critical to keep, in which case the hashtable is a correct implementation
        and a flag either on load or on the database object is correct behavior...

        Or: Keeping these things around forever is pointless, and moving to a weak reference is the right
        solution; if the application didn't want it, and we're not referencing it, it can go away.

        The weak hash ref is probably the easiest transparent change to make, and will likely behave correctly
        in all normal circumstances; it also provides a mechanism for knowing when something used to be
        loaded but is no longer present, as the key is still in the weak hash reference, and debug output can
        inform the user that memory pressure led to the application clearing that content.

        A flag, instead of setting whether or not read only objects should be remembered, could then allow you
        to select reference type: Soft (eliminate when the JVM catches an OOM exception and wants to purge
        unnecessary objects from the system), weak (eliminate whenever a GC pass detects an unreferenced
        object), phantom (eliminate when all other references are gone, if the application uses weak
        references itself).

        Implementing a switch that allows you to select the referent style - strong, weak, soft, or phantom -
        then provides for today's behavior (a 'strong' reference, which just holds an ordinary link to the object
        in the value), and provide for three situations: Discard unreferenced on GC (soft), discard unreferenced
        under memory pressure (weak), and discard unused on GC (phantom).

        Drawback: it requires you to create reference objects, which increases the number of objects we have
        to handle.

        Show
        gblock@ctoforaday.com added a comment - I'm of two minds on this. Either: Read only objects are critical to keep, in which case the hashtable is a correct implementation and a flag either on load or on the database object is correct behavior... Or: Keeping these things around forever is pointless, and moving to a weak reference is the right solution; if the application didn't want it, and we're not referencing it, it can go away. The weak hash ref is probably the easiest transparent change to make, and will likely behave correctly in all normal circumstances; it also provides a mechanism for knowing when something used to be loaded but is no longer present, as the key is still in the weak hash reference, and debug output can inform the user that memory pressure led to the application clearing that content. A flag, instead of setting whether or not read only objects should be remembered, could then allow you to select reference type: Soft (eliminate when the JVM catches an OOM exception and wants to purge unnecessary objects from the system), weak (eliminate whenever a GC pass detects an unreferenced object), phantom (eliminate when all other references are gone, if the application uses weak references itself). Implementing a switch that allows you to select the referent style - strong, weak, soft, or phantom - then provides for today's behavior (a 'strong' reference, which just holds an ordinary link to the object in the value), and provide for three situations: Discard unreferenced on GC (soft), discard unreferenced under memory pressure (weak), and discard unused on GC (phantom). Drawback: it requires you to create reference objects, which increases the number of objects we have to handle.
        Hide
        Werner Guttmann added a comment -

        What's a good article on weak/phantom references ? It looks like I need to
        educate myself a bit .. .

        Show
        Werner Guttmann added a comment - What's a good article on weak/phantom references ? It looks like I need to educate myself a bit .. .
        Hide
        Ralf Joachim added a comment -

        The requested feature only relates to access mode ReadOnly and can not be used
        with other access modes. As suggested by Martin someone that wants to use the
        new feature would have to call setRememberReadOnlyObjects() and call load() or
        execute() with access mode ReadOnly. A drawback of the suggested solution is
        that it will not be possible to load objects with access mode ReadOnly that are
        remembert in the transaction with others that should not been remembert.

        I suggest to introduce a new access mode ReadOffTransaction to implement the
        requested feature. This would not have the disadvantages from above.

        In addition we need to think about what will happen if the requested object is
        already part of the transaction. I think it have to remain in the transaction
        and the load() or execute() with access mode ReadOffTransaction behaves as with
        access mode ReadOnly for this object only.

        I don't think we can use soft-, weak- or phantom-references in
        TransactionContext because we have to take care to commit or rollback all
        changes done during the transaction and also all locks have to be released at
        the end of the transaction.

        Show
        Ralf Joachim added a comment - The requested feature only relates to access mode ReadOnly and can not be used with other access modes. As suggested by Martin someone that wants to use the new feature would have to call setRememberReadOnlyObjects() and call load() or execute() with access mode ReadOnly. A drawback of the suggested solution is that it will not be possible to load objects with access mode ReadOnly that are remembert in the transaction with others that should not been remembert. I suggest to introduce a new access mode ReadOffTransaction to implement the requested feature. This would not have the disadvantages from above. In addition we need to think about what will happen if the requested object is already part of the transaction. I think it have to remain in the transaction and the load() or execute() with access mode ReadOffTransaction behaves as with access mode ReadOnly for this object only. I don't think we can use soft-, weak- or phantom-references in TransactionContext because we have to take care to commit or rollback all changes done during the transaction and also all locks have to be released at the end of the transaction.
        Hide
        Ralf Joachim added a comment -

        The javadoc is not to bad but I can also search for some articles in magazines
        if you like.

        Show
        Ralf Joachim added a comment - The javadoc is not to bad but I can also search for some articles in magazines if you like.
        Hide
        Werner Guttmann added a comment -

        Don't worry, Ralf. Unless a particular article springs to your mind immediately,
        I'll be happy to browse myself.

        Show
        Werner Guttmann added a comment - Don't worry, Ralf. Unless a particular article springs to your mind immediately, I'll be happy to browse myself.
        Hide
        Werner Guttmann added a comment -

        > In addition we need to think about what will happen if the requested
        > object is already part of the transaction. I think it have to remain
        > in the transaction and the load() or execute() with access mode
        > ReadOffTransaction behaves as with access mode ReadOnly for this object
        > only.
        Without commenting upon the feasability of this suggested approach, this looks
        to me like we need to add even more if-else-ifs to an already quite convoluted
        code base.

        Show
        Werner Guttmann added a comment - > In addition we need to think about what will happen if the requested > object is already part of the transaction. I think it have to remain > in the transaction and the load() or execute() with access mode > ReadOffTransaction behaves as with access mode ReadOnly for this object > only. Without commenting upon the feasability of this suggested approach, this looks to me like we need to add even more if-else-ifs to an already quite convoluted code base.
        Hide
        gblock@ctoforaday.com added a comment -

        Java in a Nutshell is my bible on references behavior.

        As for long-term stuff; Basically, it's read-only. I'm only talking about modifying the read-only portion
        of the transaction, and it's under the assumption that anything in the read-only section can safely go
        away. If that's an invalid assumption, say so.

        However, it's probably safe to use references for anything in the read-only section.

        Show
        gblock@ctoforaday.com added a comment - Java in a Nutshell is my bible on references behavior. As for long-term stuff; Basically, it's read-only. I'm only talking about modifying the read-only portion of the transaction, and it's under the assumption that anything in the read-only section can safely go away. If that's an invalid assumption, say so. However, it's probably safe to use references for anything in the read-only section.
        Hide
        Werner Guttmann added a comment -

        > ... and it's under the assumption that anything in the read-only section
        > can safely go away.
        As mentioned above, a read lock for an object can be upgraded. If we 'did away'
        with that object, this could be a problmem.

        Show
        Werner Guttmann added a comment - > ... and it's under the assumption that anything in the read-only section > can safely go away. As mentioned above, a read lock for an object can be upgraded. If we 'did away' with that object, this could be a problmem.
        Hide
        Martin Fuchs added a comment -

        > I suggest to introduce a new access mode ReadOffTransaction to implement the
        > requested feature. This would not have the disadvantages from above.

        Would be fine with me. But could you please explain what the
        name "ReadOffTransaction" stands for? I could think about a new constant with a
        name like "PlainRead" or "BatchRead" in the Database interface.

        > In addition we need to think about what will happen if the requested object is
        > already part of the transaction. I think it have to remain in the transaction
        > and the load() or execute() with access mode ReadOffTransaction behaves as
        with
        > access mode ReadOnly for this object only.

        Perhaps someone can explain me why the accessMode is translated into an
        AccessMode object in DatabaseImpl.load() ?
        Dropping this mapping and just using the enumeration values could perhaps make
        the code a bit more readable.

        Show
        Martin Fuchs added a comment - > I suggest to introduce a new access mode ReadOffTransaction to implement the > requested feature. This would not have the disadvantages from above. Would be fine with me. But could you please explain what the name "ReadOffTransaction" stands for? I could think about a new constant with a name like "PlainRead" or "BatchRead" in the Database interface. > In addition we need to think about what will happen if the requested object is > already part of the transaction. I think it have to remain in the transaction > and the load() or execute() with access mode ReadOffTransaction behaves as with > access mode ReadOnly for this object only. Perhaps someone can explain me why the accessMode is translated into an AccessMode object in DatabaseImpl.load() ? Dropping this mapping and just using the enumeration values could perhaps make the code a bit more readable.
        Hide
        Ralf Joachim added a comment -

        >Would be fine with me. But could you please explain what the
        >name "ReadOffTransaction" stands for? I could think about a new constant with a
        >name like "PlainRead" or "BatchRead" in the Database interface.

        I used 'ReadOffTransaction' as it is a read only without adding the objects read
        to the transaction but I also didn't like that name. Maybe your ideas are
        better. Let's hear suggestions from our contibutors speaking native english

        >Perhaps someone can explain me why the accessMode is translated into an
        >AccessMode object in DatabaseImpl.load() ?
        >Dropping this mapping and just using the enumeration values could perhaps make
        >the code a bit more readable.
        I guess that this has historical reasons and that the one that introduced
        AccessMode don't wanted to change Database interface. If we want to do that now
        we first need to introduce new db.load(... AccessMode) methods and declare the
        old ones as deprecated until we remove them some versions later. We also can
        make code a little easier to read by delegating the short->AccessMode
        translation to the AccessMode class.

        Show
        Ralf Joachim added a comment - >Would be fine with me. But could you please explain what the >name "ReadOffTransaction" stands for? I could think about a new constant with a >name like "PlainRead" or "BatchRead" in the Database interface. I used 'ReadOffTransaction' as it is a read only without adding the objects read to the transaction but I also didn't like that name. Maybe your ideas are better. Let's hear suggestions from our contibutors speaking native english >Perhaps someone can explain me why the accessMode is translated into an >AccessMode object in DatabaseImpl.load() ? >Dropping this mapping and just using the enumeration values could perhaps make >the code a bit more readable. I guess that this has historical reasons and that the one that introduced AccessMode don't wanted to change Database interface. If we want to do that now we first need to introduce new db.load(... AccessMode) methods and declare the old ones as deprecated until we remove them some versions later. We also can make code a little easier to read by delegating the short->AccessMode translation to the AccessMode class.
        Hide
        Martin Fuchs added a comment -

        Well, I though of keeping the Database interface as it's now (just adding a new
        constant for the read-only-dont-remember mode) and dropping the AccessMode
        class.

        Show
        Martin Fuchs added a comment - Well, I though of keeping the Database interface as it's now (just adding a new constant for the read-only-dont-remember mode) and dropping the AccessMode class.
        Hide
        Ralf Joachim added a comment -

        If you search where the AccessMode class is used you may recognize that it will
        be rather complicated to remove it (211 references). In addition I prefer
        enumerations like AccessMode over unrelated short primitives.

        I'll prepare a patch about my ideas to modify AccessMode and clean up Database
        and DatabaseImpl.

        Show
        Ralf Joachim added a comment - If you search where the AccessMode class is used you may recognize that it will be rather complicated to remove it (211 references). In addition I prefer enumerations like AccessMode over unrelated short primitives. I'll prepare a patch about my ideas to modify AccessMode and clean up Database and DatabaseImpl.
        Hide
        Ralf Joachim added a comment -

        Created an attachment (id=989)
        Patch to make usage of AccessMode public

        Show
        Ralf Joachim added a comment - Created an attachment (id=989) Patch to make usage of AccessMode public
        Hide
        Ralf Joachim added a comment -

        Moved refactoring of AccessMode to bug 1933

        Show
        Ralf Joachim added a comment - Moved refactoring of AccessMode to bug 1933

          People

          • Assignee:
            Unassigned
            Reporter:
            Martin Fuchs
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: