castor
  1. castor
  2. CASTOR-1247

After LockNotGrantedException an attempt to re-retrieve the object results in ObjectNotFoundException

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.9.9
    • Fix Version/s: 0.9.9.1
    • Component/s: JDO
    • Labels:
      None
    • Environment:
      MySQL 4.0.25, Linux
    • Testcase included:
      yes
    • Number of attachments :
      5

      Description

      I am trying to retrieve the record CounterVar (id=4, type=1) - see mapping file description below - in one transaction. In the same program, with a different transaction, I am trying to retrieve the record CounterVar (id=1, type=1) in the DbLocked mode. Note that in both cases the type field references the same record. This results in my second transaction getting LockNotGrantedException (writeLockTimeout). When I rollback both transactions and retry the retrieve (in the context of the same process), I observed that the call to QueryResults.size() returns 1, but the call to QueryResults.hasMore() returns false. On further investigation, I found that the call to hasMore() sees the following exception stack trace:

      org.exolab.castor.jdo.ObjectNotFoundException: The object of type CounterVar with identity 1 was not found in persistent storage
      at org.exolab.castor.persist.LockEngine.load(LockEngine.java:385)
      at org.exolab.castor.persist.TransactionContext.load(TransactionContext.java:651)
      at org.exolab.castor.persist.QueryResults.fetch(QueryResults.java:229)
      at org.exolab.castor.jdo.engine.OQLQueryImpl$OQLEnumeration.hasMore(OQLQueryImpl.java:633)
      at org.exolab.castor.jdo.engine.OQLQueryImpl$OQLEnumeration.hasMore(OQLQueryImpl.java:614)
      at TestWriteLockTimeout.main(TestWriteLockTimeout.java:84)

      In looking at the code, I see that LockEngine.load() is actually converting the exception ObjectDeletedWaitingForLockException into ObjectNotFoundException.

      I am guessing that this is happening because of the previous (unsuccessful) retrieve.

      1. bug1247.tar
        20 kB
        Vishal Madan
      2. output.txt
        2 kB
        Swagata De Khan
      3. patch.C1247.20051112.bug-report.txt
        19 kB
        Werner Guttmann
      4. patch.C1247.20051112.txt
        3 kB
        Werner Guttmann
      5. TestWriteLockTimeout.java
        5 kB
        Swagata De Khan

        Activity

        Hide
        Vishal Madan added a comment -

        I will add the test case in some time.

        Show
        Vishal Madan added a comment - I will add the test case in some time.
        Hide
        Vishal Madan added a comment -

        Test case attached.

        NOTE: On Windows when I execute this program, the second transaction hangs trying to retrieve the record (id=1). It comes back from the DB retrieve and then indefinately waits to acquire the write lock (which should actually time-out after 10s). I dont know why this happens. Is this some strange Windows quirk??

        Show
        Vishal Madan added a comment - Test case attached. NOTE: On Windows when I execute this program, the second transaction hangs trying to retrieve the record (id=1). It comes back from the DB retrieve and then indefinately waits to acquire the write lock (which should actually time-out after 10s). I dont know why this happens. Is this some strange Windows quirk??
        Hide
        Werner Guttmann added a comment -

        How do you set the timeout ? At the RDBMS level, or at Castor ?

        Show
        Werner Guttmann added a comment - How do you set the timeout ? At the RDBMS level, or at Castor ?
        Hide
        Vishal Madan added a comment -

        We have set the DB timeout for write-locks to 300s. We don't set any timeout values in Castor, but I believe it defaults to 10s. Doesn't it? I have seen the 10s lock wait timeout in Castor in several of my tests.

        Show
        Vishal Madan added a comment - We have set the DB timeout for write-locks to 300s. We don't set any timeout values in Castor, but I believe it defaults to 10s. Doesn't it? I have seen the 10s lock wait timeout in Castor in several of my tests.
        Hide
        Werner Guttmann added a comment -

        Yes, I think the default value (Castor) is 10s.

        Show
        Werner Guttmann added a comment - Yes, I think the default value (Castor) is 10s.
        Hide
        Werner Guttmann added a comment -

        Vishal, just out of curiosity, is this your problem a result of switching to 0.9.9 ? Iow, can you produce the same output with an older version of castor, e.g. 0.9.7 ?

        Show
        Werner Guttmann added a comment - Vishal, just out of curiosity, is this your problem a result of switching to 0.9.9 ? Iow, can you produce the same output with an older version of castor, e.g. 0.9.7 ?
        Hide
        Vishal Madan added a comment -

        Werner, I came across the issue in 0.9.5.3. I then tried it out with 0.9.9 and the same problem exists there too. I have not tried any version in between.

        Show
        Vishal Madan added a comment - Werner, I came across the issue in 0.9.5.3. I then tried it out with 0.9.9 and the same problem exists there too. I have not tried any version in between.
        Hide
        Werner Guttmann added a comment -

        Vishal, just executing the test case on Windows Xp, and I am getting the LockNotGrantedExceptionn after 10s or so.

        Show
        Werner Guttmann added a comment - Vishal, just executing the test case on Windows Xp, and I am getting the LockNotGrantedExceptionn after 10s or so.
        Hide
        Werner Guttmann added a comment -

        Vishal, it it on intention that you start a second transaction nested within the first one. Afair, Castor does not support this, and even if it does, I'd expect the LockNotGrantedException to be thrown as a result of loading the CounterType with id 1 as there's still a write-lock associated with this record in the database (as the first transaction has not been committed/rolled back yet).

        Show
        Werner Guttmann added a comment - Vishal, it it on intention that you start a second transaction nested within the first one. Afair, Castor does not support this, and even if it does, I'd expect the LockNotGrantedException to be thrown as a result of loading the CounterType with id 1 as there's still a write-lock associated with this record in the database (as the first transaction has not been committed/rolled back yet).
        Hide
        Vishal Madan added a comment -

        Werner, Yes I have mentioned the quirky Windows behavior in one of the comments that I added to this issue. Seems like in Windows one does not get a write lock timeout exception at all. That is another thing that someone may want to look into (since we do expect to get the exception after the timeout has been reached).
        And yes again, it is my intention to have a nested transaction retrieve the dependent record. And I completely do expect to see the Writelock timeout. My problem is with the ObjectNotFoundException. When the retry is done, I should be able to repeatedly get the WriteLock exception, which is not happening.

        Show
        Vishal Madan added a comment - Werner, Yes I have mentioned the quirky Windows behavior in one of the comments that I added to this issue. Seems like in Windows one does not get a write lock timeout exception at all. That is another thing that someone may want to look into (since we do expect to get the exception after the timeout has been reached). And yes again, it is my intention to have a nested transaction retrieve the dependent record. And I completely do expect to see the Writelock timeout. My problem is with the ObjectNotFoundException. When the retry is done, I should be able to repeatedly get the WriteLock exception, which is not happening.
        Hide
        Vishal Madan added a comment -

        Oops... sorry about the Windows update. I thought you wrote that you could not see the error. Which version of JDK/JRE are you using?

        Show
        Vishal Madan added a comment - Oops... sorry about the Windows update. I thought you wrote that you could not see the error. Which version of JDK/JRE are you using?
        Hide
        Werner Guttmann added a comment -

        Vishal, just to get a couple of things straight:

        a) I do get a LockNotGrantedException upon loading the same object.
        b) Apart from this exception, the test case completes successfully.
        c) Iow, I do not get any other exceptions.

        Now, I thought that test case were meant to show something else which you consider wrong. I assume it's me who does get something wrong, rigth ?

        Show
        Werner Guttmann added a comment - Vishal, just to get a couple of things straight: a) I do get a LockNotGrantedException upon loading the same object. b) Apart from this exception, the test case completes successfully. c) Iow, I do not get any other exceptions. Now, I thought that test case were meant to show something else which you consider wrong. I assume it's me who does get something wrong, rigth ?
        Hide
        Swagata De Khan added a comment -

        Werner,

        In the 1st attempt we are getting the LockNotGrantedException from the 2nd transaction. So we are doing a rollback and retry.
        In the 2nd attemp, we are not getting this exception. Instead, the hasmore() method call on the queryResults object obtained from the 2nd transaction is returning false and the program ends there immediately after this 2nd attempt.
        We are expecting to get the same LockNotGrantedException from the 2nd transaction in all the 4 retries.

        Show
        Swagata De Khan added a comment - Werner, In the 1st attempt we are getting the LockNotGrantedException from the 2nd transaction. So we are doing a rollback and retry. In the 2nd attemp, we are not getting this exception. Instead, the hasmore() method call on the queryResults object obtained from the 2nd transaction is returning false and the program ends there immediately after this 2nd attempt. We are expecting to get the same LockNotGrantedException from the 2nd transaction in all the 4 retries.
        Hide
        Werner Guttmann added a comment -

        Thanks for explaining again ... and sorry for being so slow to pick this up. I thought that the JUnit test case would highlight this fact to me by e.g. using fail() to signal that a particular expectation has not been met. I will have a closer look later on ....

        Show
        Werner Guttmann added a comment - Thanks for explaining again ... and sorry for being so slow to pick this up. I thought that the JUnit test case would highlight this fact to me by e.g. using fail() to signal that a particular expectation has not been met. I will have a closer look later on ....
        Hide
        Swagata De Khan added a comment -

        I have modified vishal's testcode a little to format the output. I am attaching the testcode(TestWriteLockTimeout.java) as well as the output.txt so that you can understand the situation better.

        Show
        Swagata De Khan added a comment - I have modified vishal's testcode a little to format the output. I am attaching the testcode(TestWriteLockTimeout.java) as well as the output.txt so that you can understand the situation better.
        Hide
        Werner Guttmann added a comment -

        Trying to get this in for 0.9.9.1 ...

        Show
        Werner Guttmann added a comment - Trying to get this in for 0.9.9.1 ...
        Hide
        Vishal Madan added a comment -

        Werner, Any idea what is going wrong here? I've been trying to go through the code, but without any success (I kind of would expect that since the entire code is new to me - so it's taking me some time to understand how it works).
        By the way, you didnt tell me what JDK version you are using on Windows. I'm using 1.4.1 and I see the problem with the retrieve call waiting indefinately and not retruning the LockWaitTimeout exception.

        Show
        Vishal Madan added a comment - Werner, Any idea what is going wrong here? I've been trying to go through the code, but without any success (I kind of would expect that since the entire code is new to me - so it's taking me some time to understand how it works). By the way, you didnt tell me what JDK version you are using on Windows. I'm using 1.4.1 and I see the problem with the retrieve call waiting indefinately and not retruning the LockWaitTimeout exception.
        Hide
        Werner Guttmann added a comment -

        Vishal, I am using JDK 1.4.2_08.

        Show
        Werner Guttmann added a comment - Vishal, I am using JDK 1.4.2_08.
        Hide
        Werner Guttmann added a comment -

        Btw, I still have not found any time to step through your problem. By the way it looks I will be able to do so tonight.

        Show
        Werner Guttmann added a comment - Btw, I still have not found any time to step through your problem. By the way it looks I will be able to do so tonight.
        Hide
        Vishal Madan added a comment -

        Thanks. A solution will really help. I have my mgmt breathing down my neck for this one.

        Show
        Vishal Madan added a comment - Thanks. A solution will really help. I have my mgmt breathing down my neck for this one.
        Hide
        Werner Guttmann added a comment -

        I am not 100% sure, but I think that the problem is with releasing the locks properly after the LockNotGrantedxception has been thrown. Because of this, when trying to load the CounterCar instance again after having rolled back the first time, the load will fail as the ObjectLock instance stored with the transaction has (still) the deleted flag set to true .. which apparently is wrong.

        Show
        Werner Guttmann added a comment - I am not 100% sure, but I think that the problem is with releasing the locks properly after the LockNotGrantedxception has been thrown. Because of this, when trying to load the CounterCar instance again after having rolled back the first time, the load will fail as the ObjectLock instance stored with the transaction has (still) the deleted flag set to true .. which apparently is wrong.
        Hide
        Vishal Madan added a comment -

        Werner, After spending some time with the code I traced the following behavior:
        In the method LockEngine.load(...) the following things happen:

        • typeInfo.acquire(...) is called which results in the locks being taken, or it throws the LockNotGranterException
        • typeInfo.molder.load(...) is called which creates the object to be returned back to the application. This method will also recursively call LockEngine.load(...) for any dependent objects to be retrieved.

        So, in the test program the first transaction is able to get the lock on both CounterVar (id=1) and CounterType (id=1) objects. Then when the next transaction is retrieving it is able to get the lock for the CounterVar(id=4) object i.e. the call to typeInfo.acquire(...) succeeds and then while in the call to typeInfo.molder.load(...) there is a call back into LockEngine.load(...) for loading the CounterType (id=1) object. Here the call to typeInfo.acquire(...) throws LockNotGrantedException. Since no lock is taken, in the finally block, the ObjectLock.confirm(...) method is not called. When the exception propagates back again into LockEngine.load(...) for the CounterVar(id=4) object as part of unwinding, a valid lock is present as a result of which ObjectLock.confirm(...) gets called in the finally block. This sets the _deleted flag in ObjectLock.

        Now, I am assuming that this flag should normally be cleared up during the transaction rollback/commit, but since the retrieve was really never successful, the transaction has no reference to these objects. As a result the cleanup really never happens.

        I tried out some changes for this:

        In LockEngine.load(...), just before the finally block and after the catch block for ObjectDeletedWaitingForLockException I added a catch block for LockNotGrantedException as follows:

        } catch ( ObjectDeletedWaitingForLockException except )

        { // This is equivalent to object does not exist throw new ObjectNotFoundException( Messages.format("persist.objectNotFound", oid.getName(), oid.getIdentity())); }

        catch ( LockNotGrantedException except )

        { if ( lock != null ) lock.confirm( tx, succeed ); if ( lock != null ) lock.release( tx ); lock = null; throw except; }

        finally

        { if ( lock != null ) lock.confirm( tx, succeed ); }

        This does seem to work with my test program. BUT, since i am not very familiar with the Castor code, I am not sure what the possible side effects of such a change would be.

        Comments???

        Show
        Vishal Madan added a comment - Werner, After spending some time with the code I traced the following behavior: In the method LockEngine.load(...) the following things happen: typeInfo.acquire(...) is called which results in the locks being taken, or it throws the LockNotGranterException typeInfo.molder.load(...) is called which creates the object to be returned back to the application. This method will also recursively call LockEngine.load(...) for any dependent objects to be retrieved. So, in the test program the first transaction is able to get the lock on both CounterVar (id=1) and CounterType (id=1) objects. Then when the next transaction is retrieving it is able to get the lock for the CounterVar(id=4) object i.e. the call to typeInfo.acquire(...) succeeds and then while in the call to typeInfo.molder.load(...) there is a call back into LockEngine.load(...) for loading the CounterType (id=1) object. Here the call to typeInfo.acquire(...) throws LockNotGrantedException. Since no lock is taken, in the finally block, the ObjectLock.confirm(...) method is not called. When the exception propagates back again into LockEngine.load(...) for the CounterVar(id=4) object as part of unwinding, a valid lock is present as a result of which ObjectLock.confirm(...) gets called in the finally block. This sets the _deleted flag in ObjectLock. Now, I am assuming that this flag should normally be cleared up during the transaction rollback/commit, but since the retrieve was really never successful, the transaction has no reference to these objects. As a result the cleanup really never happens. I tried out some changes for this: In LockEngine.load(...), just before the finally block and after the catch block for ObjectDeletedWaitingForLockException I added a catch block for LockNotGrantedException as follows: } catch ( ObjectDeletedWaitingForLockException except ) { // This is equivalent to object does not exist throw new ObjectNotFoundException( Messages.format("persist.objectNotFound", oid.getName(), oid.getIdentity())); } catch ( LockNotGrantedException except ) { if ( lock != null ) lock.confirm( tx, succeed ); if ( lock != null ) lock.release( tx ); lock = null; throw except; } finally { if ( lock != null ) lock.confirm( tx, succeed ); } This does seem to work with my test program. BUT, since i am not very familiar with the Castor code, I am not sure what the possible side effects of such a change would be. Comments???
        Hide
        Vishal Madan added a comment -

        Werner, are you busy with something else? Please let me know if you can spend some time on this issue.

        Show
        Vishal Madan added a comment - Werner, are you busy with something else? Please let me know if you can spend some time on this issue.
        Hide
        Werner Guttmann added a comment -

        Vishal, I had a brief look at your suggested patch, and I think it looks okay. I still need to get the CTF to complete a successful run, but as I am going on a three day workshop right now, this won't be possible before Monday morning, I am afraid.

        Show
        Werner Guttmann added a comment - Vishal, I had a brief look at your suggested patch, and I think it looks okay. I still need to get the CTF to complete a successful run, but as I am going on a three day workshop right now, this won't be possible before Monday morning, I am afraid.
        Hide
        Vishal Madan added a comment -

        Werner, Thanks. I was just wondering about the long silence. I guess you are busy.
        What is a CTF???

        Show
        Vishal Madan added a comment - Werner, Thanks. I was just wondering about the long silence. I guess you are busy. What is a CTF???
        Hide
        Vishal Madan added a comment -

        While testing different combinations of lock I came across a problem with the fix that I have proposed. In the attached test case if the access mode for the second retrieve is changed from DbLocked to ReadOnly I see the following exception:

        java.lang.IllegalStateException: Internal error: Attempted to release/delete lock without first acquiring the lockCounterVar/4/2 by org.exolab.castor.jdo.engine.TransactionContextImpl@3bc257
        at org.exolab.castor.persist.ObjectLock.release(ObjectLock.java:854)
        at org.exolab.castor.persist.LockEngine.load(LockEngine.java:389)
        at org.exolab.castor.persist.TransactionContext.load(TransactionContext.java:651)
        at org.exolab.castor.persist.QueryResults.fetch(QueryResults.java:229)
        at org.exolab.castor.jdo.engine.OQLQueryImpl$OQLEnumeration.hasMore(OQLQueryImpl.java:631)
        at org.exolab.castor.jdo.engine.OQLQueryImpl$OQLEnumeration.hasMore(OQLQueryImpl.java:614)
        at TestReadLockTimeout.main(TestReadLockTimeout.java:74)

        So, I made some changes to the fix I proposed earlier:

        } catch ( ObjectDeletedWaitingForLockException except )

        { // This is equivalent to object does not exist throw new ObjectNotFoundException( Messages.format("persist.objectNotFound", oid.getName(), oid.getIdentity())); }

        catch ( LockNotGrantedException except )

        { if ( lock != null ) lock.release( tx ); throw except; }

        finally

        { if ( lock != null ) lock.confirm( tx, succeed ); }

        So, now in the newly added catch block for LockNotGrantedException I just call release(...). Then confirm(...) will get called in the finally block.

        For some reason the change in order of the call seems to solve the problem. I am not quiet sure I understand why!! I'm digging deeper into what is going on over here. Comments??

        Show
        Vishal Madan added a comment - While testing different combinations of lock I came across a problem with the fix that I have proposed. In the attached test case if the access mode for the second retrieve is changed from DbLocked to ReadOnly I see the following exception: java.lang.IllegalStateException: Internal error: Attempted to release/delete lock without first acquiring the lockCounterVar/4/2 by org.exolab.castor.jdo.engine.TransactionContextImpl@3bc257 at org.exolab.castor.persist.ObjectLock.release(ObjectLock.java:854) at org.exolab.castor.persist.LockEngine.load(LockEngine.java:389) at org.exolab.castor.persist.TransactionContext.load(TransactionContext.java:651) at org.exolab.castor.persist.QueryResults.fetch(QueryResults.java:229) at org.exolab.castor.jdo.engine.OQLQueryImpl$OQLEnumeration.hasMore(OQLQueryImpl.java:631) at org.exolab.castor.jdo.engine.OQLQueryImpl$OQLEnumeration.hasMore(OQLQueryImpl.java:614) at TestReadLockTimeout.main(TestReadLockTimeout.java:74) So, I made some changes to the fix I proposed earlier: } catch ( ObjectDeletedWaitingForLockException except ) { // This is equivalent to object does not exist throw new ObjectNotFoundException( Messages.format("persist.objectNotFound", oid.getName(), oid.getIdentity())); } catch ( LockNotGrantedException except ) { if ( lock != null ) lock.release( tx ); throw except; } finally { if ( lock != null ) lock.confirm( tx, succeed ); } So, now in the newly added catch block for LockNotGrantedException I just call release(...). Then confirm(...) will get called in the finally block. For some reason the change in order of the call seems to solve the problem. I am not quiet sure I understand why!! I'm digging deeper into what is going on over here. Comments??
        Hide
        Werner Guttmann added a comment -

        Vishal, just trying to run your patch against the CTF now. CTF, btw, stands for the Castor Test Framework, and is a set of functional (JUnit) tests to verify that Castor is (still) working ,,, to run the complete test suite against e.g mysql, first execute > build jar.jdotests and second >test castor.mysql.

        Show
        Werner Guttmann added a comment - Vishal, just trying to run your patch against the CTF now. CTF, btw, stands for the Castor Test Framework, and is a set of functional (JUnit) tests to verify that Castor is (still) working ,,, to run the complete test suite against e.g mysql, first execute > build jar.jdotests and second >test castor.mysql.
        Hide
        Werner Guttmann added a comment -

        All 53 tests complete succesfullyy after applying this patch. I'd like to see the test case provided by you to be integrated with the CTF after commiting this patch. Still need to make up my mind as to where I'd like to see this added.

        Show
        Werner Guttmann added a comment - All 53 tests complete succesfullyy after applying this patch. I'd like to see the test case provided by you to be integrated with the CTF after commiting this patch. Still need to make up my mind as to where I'd like to see this added.
        Hide
        Werner Guttmann added a comment -

        Final patch, incl. commit preparations. As already reported, all 53 CTF tests pass successfuly.

        Show
        Werner Guttmann added a comment - Final patch, incl. commit preparations. As already reported, all 53 CTF tests pass successfuly.
        Hide
        Werner Guttmann added a comment -

        The complete bug report ... for future reference.

        Show
        Werner Guttmann added a comment - The complete bug report ... for future reference.
        Hide
        Vishal Madan added a comment -

        Werner, Would you like me to write the JUnit tests for the issue? I modified the test that I had attached to use multiple threads rather than nested transactions for my own testing. I could convert that into a JUnit test and attach it.

        Show
        Vishal Madan added a comment - Werner, Would you like me to write the JUnit tests for the issue? I modified the test that I had attached to use multiple threads rather than nested transactions for my own testing. I could convert that into a JUnit test and attach it.
        Hide
        Werner Guttmann added a comment -

        Thanks for the offer, but I think I'll do this myself, as I need to make up my mind where exactly to include this within the CTF.

        Show
        Werner Guttmann added a comment - Thanks for the offer, but I think I'll do this myself, as I need to make up my mind where exactly to include this within the CTF.
        Hide
        Werner Guttmann added a comment -

        Vishal, can I come back to your offer to supply me with a JUnit test case (based upon your re-write) ?

        Show
        Werner Guttmann added a comment - Vishal, can I come back to your offer to supply me with a JUnit test case (based upon your re-write) ?
        Hide
        Vishal Madan added a comment -

        Sure. I am a little swamped right now, but I should be able to write it for you in the next couple of days.

        Show
        Vishal Madan added a comment - Sure. I am a little swamped right now, but I should be able to write it for you in the next couple of days.
        Hide
        Vishal Madan added a comment -

        For whatever reason I am unable to figure out how to attach something to this report. I know it is possible because I've done it before!! Werner.... HELP! Or I could just e-mail it to you?

        Show
        Vishal Madan added a comment - For whatever reason I am unable to figure out how to attach something to this report. I know it is possible because I've done it before!! Werner.... HELP! Or I could just e-mail it to you?
        Hide
        Ralf Joachim added a comment -

        Vishal, as this issues is closed you can not attach new files to it. If you found that the original issue has not been resolved it should be reopened which would allow you to attach new files thereafter. If you found something that relates to the original issue or you have created a test case I suggest to create a new issue and link it to this one as related.

        Show
        Ralf Joachim added a comment - Vishal, as this issues is closed you can not attach new files to it. If you found that the original issue has not been resolved it should be reopened which would allow you to attach new files thereafter. If you found something that relates to the original issue or you have created a test case I suggest to create a new issue and link it to this one as related.

          People

          • Assignee:
            Werner Guttmann
            Reporter:
            Vishal Madan
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0 minutes
              0m
              Logged:
              Time Spent - 20 minutes
              20m