Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 2.1.0
-
Fix Version/s: 2.1.1
-
Labels:None
-
Number of attachments :
Description
Application using BTM get stuck quite quickly when run on Sun JVM 1.6.0_24. This does not happen on JVM 1.6.0_18.
Eventually, all threads get stuck in the method XAResourceManager.collectUniqueNames(). It seems they are running in an infinite loop, so I suspect the SchedulerNaturalOrderIterator is the culprit, here is a stack dump of a stuck thread:
"Thread-12" prio=10 tid=0x9f049c00 nid=0xdf6 runnable [0x9dd01000]
java.lang.Thread.State: RUNNABLE
at java.util.HashMap.put(HashMap.java:372)
at java.util.HashSet.add(HashSet.java:200)
at bitronix.tm.internal.XAResourceManager.collectUniqueNames(XAResourceManager.java:272)
at bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:323)
at bitronix.tm.twopc.Preparer.prepare(Preparer.java:64)
at bitronix.tm.BitronixTransaction.commit(BitronixTransaction.java:225)
at bitronix.tm.BitronixTransactionManager.commit(BitronixTransactionManager.java:120)
Some extra observations:
1) It happens only in a thread with the most complicated transaction. In my case the most complicated transaction means two datasources (JMS [swiftmq ], DB [oracle]) . The JMS datasource is enlisted 4 times as 4 different JMS queues participate in the transaction (i.e. TMJOIN is used)
2) It is possible to reproduce it with a single thread, so the concurrency is not the problem.
3) It happens when operations XAResourceManager.enlist() or XAResourceManager.delist() are called
4) It fails quite fast, 3000 transactions should do the job
This is now fixed and reported as such by two impacted users.
Apparently this bug is not strictly speaking a race condition but has the same symptoms: a lack of synchronization in the iterator of a collection shared by different threads in very special conditions was the root cause (ie: SchedulerNaturalOrderIterator and SchedulerReverseOrderIterator need synchronized blocks on Scheduler.this).
Only some aggressive memory optimizations in the JVM could make this bug surface which probably is what happened here because:
See: http://old.nabble.com/SchedulerNaturalOrderIterator-causes-infinite-loop-to31136251.html