Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 1.3.3
-
Fix Version/s: 2.0.0
-
Labels:None
-
Environment:BTM SVN Snapshot from 2009-03-22
-
Number of attachments :
Description
(We are still running the BTM snapshot version from 2009-03-22, but judging from the code the bug is still present in the newest version)
I am analyzing a problem that occured on a few of our servers, where we got many "transaction timed out" exceptions until the server was restarted.
It looks like all of these exceptions occur in the same thread, after an OOM occured:
Caused by: java.lang.OutOfMemoryError: PermGen space
at bitronix.tm.utils.ManagementRegistrar.buildObjectName(ManagementRegistrar.java:87)
at bitronix.tm.utils.ManagementRegistrar.register(ManagementRegistrar.java:57)
at bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:278)
at bitronix.tm.BitronixTransaction.setStatus(BitronixTransaction.java:261)
at bitronix.tm.BitronixTransaction.setActive(BitronixTransaction.java:256)
at bitronix.tm.BitronixTransactionManager.begin(BitronixTransactionManager.java:87)
at XXX.DelegatingTransactionManagerInterceptor.begin(DelegatingTransactionManagerInterceptor.java:131)
at org.springframework.transaction.jta.JtaTransactionManager.doJtaBegin(JtaTransactionManager.java:886)
at org.springframework.transaction.jta.JtaTransactionManager.doBegin(JtaTransactionManager.java:843)
If my analysis is correct, an exception in BTM.begin() after the call to createTransaction() will leave the failed transaction associated to the ThreadContext. This means that after a while, every call to TransactionContextHelper.enlistInCurrentTransaction from this thread will fail, because the old transaction (that never got used due to the exception in begin(), but seems to still be marked by the TransactionTimeoutTask) will be used. Since it's marked as timed out, the call will fail.
The stacktrace then looks like this:
Caused by: java.sql.SQLException: error enlisting a JdbcConnectionHandle of a JdbcPooledConnection from datasource user2 in state ACCESSIBLE wrapping org.postgresql.xa.PGXAConnection@74187f3e on Pooled connection wrapping physical connection org.postgresql.jdbc4.Jdbc4Connection@7449fafd
at bitronix.tm.resource.jdbc.JdbcConnectionHandle.enlistResource(JdbcConnectionHandle.java:61)
at bitronix.tm.resource.jdbc.JdbcConnectionHandle.prepareStatement(JdbcConnectionHandle.java:200)
at sun.reflect.GeneratedMethodAccessor72.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.jdbc.datasource.TransactionAwareDataSourceProxy$TransactionAwareInvocationHandler.invoke(TransactionAwareD
ataSourceProxy.java:225)
at $Proxy10.prepareStatement(Unknown Source)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.prepareStatement(SqlExecutor.java:494)
at com.ibatis.sqlmap.engine.execution.SqlExecutor.executeQuery(SqlExecutor.java:176)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.sqlExecuteQuery(GeneralStatement.java:205)
at com.ibatis.sqlmap.engine.mapping.statement.GeneralStatement.executeQueryWithCallback(GeneralStatement.java:173)
... 45 more
Caused by: bitronix.tm.internal.BitronixSystemException: transaction timed out
at bitronix.tm.resource.common.TransactionContextHelper.enlistInCurrentTransaction(TransactionContextHelper.java:44)
at bitronix.tm.resource.jdbc.JdbcConnectionHandle.enlistResource(JdbcConnectionHandle.java:59)
... 55 more
I'll weigh in on this issue. I do not consider this a bug in BTM. OutOfMemory is a non-recoverable Error.
http://eyalsch.wordpress.com/2009/06/17/oome/
OOM is an Error not an Exception, and again is not considered recoverable. It is a sign of a misconfigured, or improperly designed application. A properly configured (and designed) server, with no leaks, can run until the Sun (or processor) burns out. An OOM error can occur anywhere at anytime under low memory conditions. An attempt to catch or handle OOM can itself result in an OOM error. Even something as simple as trying to log the exception can itself throw another OOM Error.
An internet search for OutOfMemory and recoverable (or non-recoverable) will backup what I'm saying. Sun's official stance is that OOM is non-recoverable, that's why from Day 1 it has been a sub-class of VirtualMachineError – not Exception.
Sun has added an option to let you take some action, like restart your server, when an OOM occurs:
-XX:OnOutOfMemoryError="<cmd args>;<cmd args>"
See http://java.sun.com/javase/technologies/hotspot/vmoptions.jsp
Suggested close.