BTM
  1. BTM
  2. BTM-28

Race condition when mixing global and local transactions

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 1.2, 1.3
    • Fix Version/s: 1.3.1
    • Labels:
      None
    • Environment:
      Java 6 with Postgres 8.3, JDBC driver postgresql-8.3-603.jdbc4.jar
    • Testcase included:
      yes
    • Patch Submitted:
      Yes
    • Number of attachments :
      1

      Description

      Load testing one of our applications, we noticed that after some time, there were no more connections available in the pool, effectively disabling the application.

      Thanks to BTM's excellent logging (thank you!), I was able to trace the problem to a race condition when mixing local and global transactions.

      What seems to happen is basically (take this with a grain of salt, I'm haven't studied the BTM source much):

      1. Thread T gets a connection in a transaction
      2. Thread U doesn't use a transaction and waits for a connection.
      3. Thread T closes the connection.
      4. BTM registes 2 synchronization handlers: DeferredReleaseSynchronization and BTM$ClearContextSynchronization
      5. Thread T finishes the transaction
      6. The DeferredReleaseSynchronization fires, setting the State to IN_POOL
      7. Before ClearContextSynchronization can fire, a context switch occurs
      8. Thread U gets the connection, performs some work with it (successfully), and then closes the Connection
      9. When trying to requeue, TransactionContextHelper.requeue checks whether the connection is part of a global transaction, which is not the case (as the transaction was commited at step 5). However, the connection is still marked as associated an unfinished transaction, since CCS didn't fire yet.
      10. TCH barfs and throws an exception, leaving the connection unavailable for good.

      I've included a testcase that reproduces the problem most of the time (not everytime, since it is a race condition)
      To force a context switch at the critical section, I've used an AspectJ aspect, alternatively, inserting the Thread.sleep at the beginning of
      BTM$CCS.afterCompletion() also works.

      The log that is produced is also included, hopefully giving someone familiar with the code a better idea what's going wrong.

      The problem exists with 1.3 and 1.2, other versions weren't tested. Although I left the priority at Major, for us it's critical; we can't use BTM until this is fixed.

        Activity

        Hide
        Ludovic Orban added a comment -

        I haven't yet tried to reproduce your problem but your description is quite clear and seems accurate.

        I think the issue can be fixed by making sure the ClearContextSynchronization executes before the DeferredReleaseSynchronization.

        Please try patching the BTM code yourself, this can be done by changing 2 lines of code:

        in BitronixTransactionManager:86 you have:
        currentTx.getSynchronizationScheduler().add(new ClearContextSynchronization(currentTx), Scheduler.ALWAYS_LAST_POSITION);

        change it to:
        currentTx.getSynchronizationScheduler().add(new ClearContextSynchronization(currentTx), Scheduler.ALWAYS_LAST_POSITION -1);

        and in TransactionContextHelper:128 you have:
        currentTransaction.getSynchronizationScheduler().add(synchronization, Scheduler.DEFAULT_POSITION);

        change it to:
        currentTransaction.getSynchronizationScheduler().add(synchronization, Scheduler.ALWAYS_LAST_POSITION);

        then rebuild BTM, give it a try and let me know if that helped.

        Show
        Ludovic Orban added a comment - I haven't yet tried to reproduce your problem but your description is quite clear and seems accurate. I think the issue can be fixed by making sure the ClearContextSynchronization executes before the DeferredReleaseSynchronization. Please try patching the BTM code yourself, this can be done by changing 2 lines of code: in BitronixTransactionManager:86 you have: currentTx.getSynchronizationScheduler().add(new ClearContextSynchronization(currentTx), Scheduler.ALWAYS_LAST_POSITION); change it to: currentTx.getSynchronizationScheduler().add(new ClearContextSynchronization(currentTx), Scheduler.ALWAYS_LAST_POSITION -1); and in TransactionContextHelper:128 you have: currentTransaction.getSynchronizationScheduler().add(synchronization, Scheduler.DEFAULT_POSITION); change it to: currentTransaction.getSynchronizationScheduler().add(synchronization, Scheduler.ALWAYS_LAST_POSITION); then rebuild BTM, give it a try and let me know if that helped.
        Hide
        Dennis Brakhane added a comment -

        Thanks for the quick response. Your proposed fix seems to work.

        Show
        Dennis Brakhane added a comment - Thanks for the quick response. Your proposed fix seems to work.
        Hide
        Ludovic Orban added a comment -

        Fixed in head.

        Show
        Ludovic Orban added a comment - Fixed in head.

          People

          • Assignee:
            Ludovic Orban
            Reporter:
            Dennis Brakhane
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: