Affects Version/s: 1.2, 1.3
Fix Version/s: 1.3.1
Environment:Java 6 with Postgres 8.3, JDBC driver postgresql-8.3-603.jdbc4.jar
Number of attachments :
Load testing one of our applications, we noticed that after some time, there were no more connections available in the pool, effectively disabling the application.
Thanks to BTM's excellent logging (thank you!), I was able to trace the problem to a race condition when mixing local and global transactions.
What seems to happen is basically (take this with a grain of salt, I'm haven't studied the BTM source much):
1. Thread T gets a connection in a transaction
2. Thread U doesn't use a transaction and waits for a connection.
3. Thread T closes the connection.
4. BTM registes 2 synchronization handlers: DeferredReleaseSynchronization and BTM$ClearContextSynchronization
5. Thread T finishes the transaction
6. The DeferredReleaseSynchronization fires, setting the State to IN_POOL
7. Before ClearContextSynchronization can fire, a context switch occurs
8. Thread U gets the connection, performs some work with it (successfully), and then closes the Connection
9. When trying to requeue, TransactionContextHelper.requeue checks whether the connection is part of a global transaction, which is not the case (as the transaction was commited at step 5). However, the connection is still marked as associated an unfinished transaction, since CCS didn't fire yet.
10. TCH barfs and throws an exception, leaving the connection unavailable for good.
I've included a testcase that reproduces the problem most of the time (not everytime, since it is a race condition)
To force a context switch at the critical section, I've used an AspectJ aspect, alternatively, inserting the Thread.sleep at the beginning of
BTM$CCS.afterCompletion() also works.
The log that is produced is also included, hopefully giving someone familiar with the code a better idea what's going wrong.
The problem exists with 1.3 and 1.2, other versions weren't tested. Although I left the priority at Major, for us it's critical; we can't use BTM until this is fixed.