BTM
  1. BTM
  2. BTM-39

Improper HeuristicMixedException

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.3.2
    • Fix Version/s: 1.3.3
    • Labels:
      None
    • Number of attachments :
      0

      Description

      I tested the following failure scenario (Bitronix 1.3.2):

      • Connect to two different instances of PostgreSQL (using driver 8.3-604) with XA connections
      • Insert records into a table in each database
      • Call commit on UserTransaction
      • Break-point was set in BitronixTransaction.commit() approx line 180 - between prepare and commit
      • While paused on break point (after prepare) I killed one of the databases
      • Then I resumed execution
      • The result was a HeuristicMixedException
      • When I started the Postgresql instance back up, the commit recovered (as expected), and I ended up with the record in both databases - as should be, since prepare succeeded

      But why did my application get a HeuristicMixedException ???

      Per spec, this exception is defined as:
      > thrown when a heuristic decision was made and some updates have been committed and others were rolled back

      But no rollback occurred. Both resorces voted commit on prepare, only the second commit failed due to the resource being down. And in fact the commit happened when the resource was started back up.

      BTM threw this to my application:
      ===============================================
      bitronix.tm.internal.BitronixHeuristicMixedException: transaction failed during commit of a Bitronix Transaction with GTRID [6D79544D546573740000011F4883413200000008], status=UNKNOWN, 2 resource(s) enlisted (started Thu Feb 05 15:16:15 MST 2009): resource(s) [datasource/ds2] improperly unilaterally rolled back (or hazard happened)
      at bitronix.tm.twopc.Committer.throwException(Committer.java:104)
      at bitronix.tm.twopc.Committer.commit(Committer.java:63)
      at bitronix.tm.BitronixTransaction.commit(BitronixTransaction.java:183)
      at bitronix.tm.BitronixTransactionManager.commit(BitronixTransactionManager.java:96)
      at com.foo.TestTM.main(TestTM.java:116)

      BTM logged this when the commit failed:
      =====================================
      SEVERE: resource datasource/ds2 failed on a Bitronix XID [6D79544D546573740000011F4883413200000008 : 6D79544D546573740000011F488341350000000B]
      bitronix.tm.internal.BitronixXAException: resource reported Unable to render embedded object: File (invalid error code (0)) not found. when asked to commit transaction branch
      at bitronix.tm.twopc.Committer$CommitJob.handleXAException(Committer.java:173)
      at bitronix.tm.twopc.Committer$CommitJob.commitResource(Committer.java:156)
      at bitronix.tm.twopc.Committer$CommitJob.run(Committer.java:142)
      at bitronix.tm.twopc.executor.SyncExecutor.submit(SyncExecutor.java:12)
      at bitronix.tm.twopc.AbstractPhaseEngine.runJobsForPosition(AbstractPhaseEngine.java:108)
      at bitronix.tm.twopc.AbstractPhaseEngine.executePhase(AbstractPhaseEngine.java:70)
      at bitronix.tm.twopc.Committer.commit(Committer.java:59)
      at bitronix.tm.BitronixTransaction.commit(BitronixTransaction.java:183)
      at bitronix.tm.BitronixTransactionManager.commit(BitronixTransactionManager.java:96)
      at com.foo.TestTM.main(TestTM.java:116)
      Caused by: javax.transaction.xa.XAException: org.postgresql.util.PSQLException: An I/O error occured while sending to the backend.
      at org.postgresql.xa.PGXAConnection.commitPrepared(PGXAConnection.java:444)
      at org.postgresql.xa.PGXAConnection.commit(PGXAConnection.java:371)
      at bitronix.tm.twopc.Committer$CommitJob.commitResource(Committer.java:153)
      ... 8 more

        Issue Links

          Activity

          Hide
          James House added a comment -

          scratch that last comment!

          I had fat-fingered copying the new jar, so the test used the old version.

          With the new build properly in place it all seems to work great now!

          Show
          James House added a comment - scratch that last comment! I had fat-fingered copying the new jar, so the test used the old version. With the new build properly in place it all seems to work great now!
          Hide
          James House added a comment -

          I'm now rather happy with the behavior from this failure case, though I have one remaining suggestion for improvement:

          In Committer.handleXAException() the logged report of the failure (on line 192) looks like this:

          log.error("resource '" + failedResourceHolder.getUniqueName() + "' reported " + Decoder.decodeXAExceptionErrorCode(xaException) + " when asked to commit transaction branch", xaException);

          ... and similar handling in Rollbacker at line 173

          Might I suggest the following small changes:

          • Since the tx actually did commit (or more particularly the tx is prepared and state recorded in the journal so that it will commit when the resource returns) - can this logging be at the warning level, rather than error? To me error means that something failed, but in fact, in this case, everything's okay - so warning is more appropriate.
          • On a similar note, it may be nice to add some more text to that message to let the reader know that things are okay (in terms of the state of the tx).

          With these two recommendations, I suggest changing the logging statement to be something like this:

          log.warn("resource '" + failedResourceHolder.getUniqueName() + "' reported " + Decoder.decodeXAExceptionErrorCode(xaException) + " when asked to commit transaction branch. Transaction is prepared and will commit via recovery service when resource availability allows.", xaException);

          Show
          James House added a comment - I'm now rather happy with the behavior from this failure case, though I have one remaining suggestion for improvement: In Committer.handleXAException() the logged report of the failure (on line 192) looks like this: log.error("resource '" + failedResourceHolder.getUniqueName() + "' reported " + Decoder.decodeXAExceptionErrorCode(xaException) + " when asked to commit transaction branch", xaException); ... and similar handling in Rollbacker at line 173 Might I suggest the following small changes: Since the tx actually did commit (or more particularly the tx is prepared and state recorded in the journal so that it will commit when the resource returns) - can this logging be at the warning level, rather than error? To me error means that something failed, but in fact, in this case, everything's okay - so warning is more appropriate. On a similar note, it may be nice to add some more text to that message to let the reader know that things are okay (in terms of the state of the tx). With these two recommendations, I suggest changing the logging statement to be something like this: log.warn("resource '" + failedResourceHolder.getUniqueName() + "' reported " + Decoder.decodeXAExceptionErrorCode(xaException) + " when asked to commit transaction branch. Transaction is prepared and will commit via recovery service when resource availability allows.", xaException);
          Hide
          Ludovic Orban added a comment -

          I've applied your proposed debug log change with a slight improvement: an error is still logged when background recoverer is disabled.

          Another bug that managed to sneak in has been fixed as well: even if the in-flight transactions weren't terminated by the recovery the journal was still updated.

          Here's the final snapshot build with the complete fix of this issue:
          http://snapshots.repository.codehaus.org/org/codehaus/btm/btm/1.3.3-20090218/btm-1.3.3-20090218.jar

          Show
          Ludovic Orban added a comment - I've applied your proposed debug log change with a slight improvement: an error is still logged when background recoverer is disabled. Another bug that managed to sneak in has been fixed as well: even if the in-flight transactions weren't terminated by the recovery the journal was still updated. Here's the final snapshot build with the complete fix of this issue: http://snapshots.repository.codehaus.org/org/codehaus/btm/btm/1.3.3-20090218/btm-1.3.3-20090218.jar
          Hide
          James House added a comment -

          I spent some time testing this latest build today, and everything looks fine.

          Thank you very much.

          Show
          James House added a comment - I spent some time testing this latest build today, and everything looks fine. Thank you very much.
          Hide
          Ludovic Orban added a comment -

          Recovery implementation changed again as described here: http://www.nabble.com/Re%3A-BTM.getCurrentTransaction()-a-bottleneck-at-load-p22807783.html

          Here is the relevant part:

          The recoverer must always ignore in-flight transactions no matter how. The current implementation I made for BTM-39 locks the in-flight transactions map while its collecting in-flight XIDs avoiding any new transaction from starting while it's collecting. In-flight XIDs are then filtered out of the recovered XIDs by the recoverer.

          Instead of collecting an exact list of in-flight XIDs, we could instead collect the start date of the oldest running transaction then assume that any XID newer than that is still in-flight. This can be done as the Bitronix XID implementation contains an encoded timestamp.

          Show
          Ludovic Orban added a comment - Recovery implementation changed again as described here: http://www.nabble.com/Re%3A-BTM.getCurrentTransaction()-a-bottleneck-at-load-p22807783.html Here is the relevant part: The recoverer must always ignore in-flight transactions no matter how. The current implementation I made for BTM-39 locks the in-flight transactions map while its collecting in-flight XIDs avoiding any new transaction from starting while it's collecting. In-flight XIDs are then filtered out of the recovered XIDs by the recoverer. Instead of collecting an exact list of in-flight XIDs, we could instead collect the start date of the oldest running transaction then assume that any XID newer than that is still in-flight. This can be done as the Bitronix XID implementation contains an encoded timestamp.

            People

            • Assignee:
              Ludovic Orban
              Reporter:
              James House
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: