castor

Polymorphism Issues w/ ClassMolder.load

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Critical Critical
  • Resolution: Fixed
  • Affects Version/s: 1.1.1, 1.1.2, 1.1.2.1
  • Fix Version/s: 1.3.2
  • Component/s: JDO
  • Labels:
    None
  • Environment:
    Solaris 10, Java 1.5, Tomcat
  • Number of attachments :
    6

Description

I've just started working with the Polymorphism features recently - and
at first they worked wonderfully, but recently I started seeing the
following error when trying to load an object after writing to it a few
times. I saw a lot of talk about this problem in 2005 - but then all
went quiet on this issue. Has there been are any resolution as to why
these kinds of bugs pop up since the days of bugs CASTOR-1195 getting
solved?

Also, I cannot seem to recreate this bug in a deterministic fashion.

java.lang.ArrayIndexOutOfBoundsException: 7
org.castor.persist.ProposedEntity.getField(ProposedEntity.java:103)
org.castor.persist.resolver.ManyRelationResolver.load(ManyRelationResolver.java:254)
org.exolab.castor.persist.ClassMolder.load(ClassMolder.java:638)
org.exolab.castor.persist.LockEngine.load(LockEngine.java:396)
org.castor.persist.AbstractTransactionContext.load(AbstractTransactionContext.java:568)
org.castor.persist.AbstractTransactionContext.load(AbstractTransactionContext.java:431)
org.castor.persist.resolver.ManyRelationResolver.load(ManyRelationResolver.java:272)
org.exolab.castor.persist.ClassMolder.load(ClassMolder.java:638)
org.exolab.castor.persist.LockEngine.load(LockEngine.java:396)
org.castor.persist.AbstractTransactionContext.load(AbstractTransactionContext.java:568)
org.castor.persist.AbstractTransactionContext.load(AbstractTransactionContext.java:431)
org.exolab.castor.jdo.engine.AbstractDatabaseImpl.load(AbstractDatabaseImpl.java:301)
org.exolab.castor.jdo.engine.AbstractDatabaseImpl.load(AbstractDatabaseImpl.java:272)
com.codemagi.servlets.NavigationController.dispatchCommand(NavigationController.java:108)
com.codemagi.servlets.NavigationController.service(NavigationController.java:56)
javax.servlet.http.HttpServlet.service(HttpServlet.java:802)

And once I get this (polymorphism specific) bug on one object, the
entire engine starts to break down, and I get errors like this at login.

org.exolab.castor.jdo.TransactionAbortedException: Nested error:
org.exolab.castor.jdo.LockNotGrantedException:
persist.writeTimeoutcom.codemagi.login.model.User/<1(1)>/232 by
org.castor.persist.LocalTransactionContext@2d6271:
persist.writeTimeoutcom.mycode.login.model.User/<1(1)>/232 by
org.castor.persist.LocalTransactionContext@2d6271

My polymorphic mapping is relatively simple (MySQL).

I have a core Node:

<!-- Key generator for Node -->
<key-generator name="IDENTITY" alias="NODE_SEQ"/>

<!-- Mapping for Node -->
<class name="com.mycode.servlets.model.Node" identity="id"
key-generator="NODE_SEQ" cache-type="none">

That I can xref with any other node

<field name="children" type="com.mycode.servlets.model.Node"
collection="arraylist">
<sql dirty="ignore" name="child_id"
many-table="node_node_xref" many-key="parent_id" />
<bind-xml name="terms"/>
</field>

That all other objects inherit from

<!-- Mapping for Person -->
<class name="com.myclientscode.model.Person"
extends="com.mycode.servlets.model.Node" identity="id" cache-type="none">

Any help, advice or counsel on this matter would be greatly appreciated.

  1. castor_1.3rc1_diff.txt
    04/Nov/08 5:12 PM
    4 kB
    August Detlefsen
  2. patch.c2045.20070630.txt
    13/Jul/07 4:00 PM
    7 kB
    Werner Guttmann
  3. patch.c2045.20070802.txt
    02/Aug/07 4:37 PM
    9 kB
    Werner Guttmann
  4. patch-C2045-20100120.txt
    20/Jan/10 2:01 PM
    17 kB
    Clovis Wichoski
  5. patch-C2045-20100125.txt
    25/Jan/10 2:38 AM
    40 kB
    Ralf Joachim
  6. patch-C2045-20100131.txt
    30/Jan/10 7:09 PM
    25 kB
    Ralf Joachim

Issue Links

Activity

Hide
Werner Guttmann added a comment -

Initial patch for review.

Show
Werner Guttmann added a comment - Initial patch for review.
Hide
Werner Guttmann added a comment -

Updated patch to accommodate for creation of CPA module, etc.

Show
Werner Guttmann added a comment - Updated patch to accommodate for creation of CPA module, etc.
Hide
Werner Guttmann added a comment -

I am basically ready to commit this patch, as it only mirrors some code moves within the SVN repo. I'd still appreciate any additional testing (if possible).

Show
Werner Guttmann added a comment - I am basically ready to commit this patch, as it only mirrors some code moves within the SVN repo. I'd still appreciate any additional testing (if possible).
Hide
Ralf Joachim added a comment -

Assuming that you have tested this with our ctf suite and that there had been no response from August on your initial patch from13. July, I would suggest to commit as is.

Show
Ralf Joachim added a comment - Assuming that you have tested this with our ctf suite and that there had been no response from August on your initial patch from13. July, I would suggest to commit as is.
Hide
James Manico added a comment -

The AIOOB seems to be fixed this this patch, but I am seeing other errors that seem related.

When I have a transaction where I load both the Sub and Super class in one TX, after a while I get what I think to be a concurrency exception.

In the following code, the exact same ID fails after a while. The exception is throw - but only after many reloads.

More specifically, when I load the parent Node, it successfully does a instanceof check and knows its an Organization or Campus. But after a while, that Node is loaded and is no longer recognize as the proper class type.

<code>
//load the Node
Node node = (Node)db.load(Node.class, id, Database.ReadOnly);

//check for access to the node
Organization org = null;
if (node instanceof Organization) { org = (Organization)db.load(Organization.class, id, Database.ReadOnly); } else if (node instanceof Campus) { org = admin.getCampusOrganizationReadOnly( (Campus)node, db ); } else { throw new AppException("Only Campus and Orginization are supported"); }
</code>

Show
James Manico added a comment - The AIOOB seems to be fixed this this patch, but I am seeing other errors that seem related. When I have a transaction where I load both the Sub and Super class in one TX, after a while I get what I think to be a concurrency exception. In the following code, the exact same ID fails after a while. The exception is throw - but only after many reloads. More specifically, when I load the parent Node, it successfully does a instanceof check and knows its an Organization or Campus. But after a while, that Node is loaded and is no longer recognize as the proper class type. <code> //load the Node Node node = (Node)db.load(Node.class, id, Database.ReadOnly); //check for access to the node Organization org = null; if (node instanceof Organization) { org = (Organization)db.load(Organization.class, id, Database.ReadOnly); } else if (node instanceof Campus) { org = admin.getCampusOrganizationReadOnly( (Campus)node, db ); } else { throw new AppException("Only Campus and Orginization are supported"); } </code>
Hide
Ralf Joachim added a comment -

I executed CTF tests and saw TC97 failing. In addition one of the test cases not integrated in CTF (ctf.jdo.special.test1379.Test1379) also fails with that patch applied but succeeds without.

I thought of committing this patch before next cleanup of checkstyle warnings but according to the problems seen this is not possible yet. Therefore the patch needs to be updated to HEAD once again.

Show
Ralf Joachim added a comment - I executed CTF tests and saw TC97 failing. In addition one of the test cases not integrated in CTF (ctf.jdo.special.test1379.Test1379) also fails with that patch applied but succeeds without. I thought of committing this patch before next cleanup of checkstyle warnings but according to the problems seen this is not possible yet. Therefore the patch needs to be updated to HEAD once again.
Hide
August Detlefsen added a comment -

I am fairly confident that this patch breaks certain aspects of polymorphism.

We have a URL that loads a polymorphic object, marshals it to XML and returns it to the browser. This object contains an ArrayList of other polymorphic objects as children.

On the first hit to this URL, everything is returned correctly. On subsequent hits, the child objects are not loaded correctly. The XML signature of a child object on the first hit would be:

<children xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" start-date="2007-05-01T00:00:00.000-07:00" usage-amount="4.562" xsi:type="energy-usage">

on subsequent loads, we just get:

<children>

When we roll back to castor-1.1.1.jar from the patched castor-1.1.2.1.jar we can no longer reproduce this issue. But now we are risking the AIOOBE again.

Show
August Detlefsen added a comment - I am fairly confident that this patch breaks certain aspects of polymorphism. We have a URL that loads a polymorphic object, marshals it to XML and returns it to the browser. This object contains an ArrayList of other polymorphic objects as children. On the first hit to this URL, everything is returned correctly. On subsequent hits, the child objects are not loaded correctly. The XML signature of a child object on the first hit would be: <children xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" start-date="2007-05-01T00:00:00.000-07:00" usage-amount="4.562" xsi:type="energy-usage"> on subsequent loads, we just get: <children> When we roll back to castor-1.1.1.jar from the patched castor-1.1.2.1.jar we can no longer reproduce this issue. But now we are risking the AIOOBE again.
Hide
James Manico added a comment -

After even just a little concurrency testing, I can get the AIOOBE to fire again after we roll back to castor-1.1.1.jar! Ouch!

Show
James Manico added a comment - After even just a little concurrency testing, I can get the AIOOBE to fire again after we roll back to castor-1.1.1.jar! Ouch!
Hide
August Detlefsen added a comment -

We are able to get the app to run steadily with Werner's patch if we map out cache usage.

I added the following to every polymorphic class definition (and to the superclass) in the mapping:

<cache-type type="none"/>

Show
August Detlefsen added a comment - We are able to get the app to run steadily with Werner's patch if we map out cache usage. I added the following to every polymorphic class definition (and to the superclass) in the mapping: <cache-type type="none"/>
Hide
Werner Guttmann added a comment -

As such, I suggest to commit the patch as is, and create a follow-up issue for the cache inconsistencies ( as already discussed on IRC or via email).

Show
Werner Guttmann added a comment - As such, I suggest to commit the patch as is, and create a follow-up issue for the cache inconsistencies ( as already discussed on IRC or via email).
Hide
Ralf Joachim added a comment -

Before committing this patch we have to fix the problems that break 2 of our tests as explained above. As such I vote against committing the patch as is.

Show
Ralf Joachim added a comment - Before committing this patch we have to fix the problems that break 2 of our tests as explained above. As such I vote against committing the patch as is.
Hide
Werner Guttmann added a comment -

As talked about yesterday, the tests don't fail for me (against mySQL). What tests are actually failing ? I will have a look, but without failing tests this is going to be hard.

Show
Werner Guttmann added a comment - As talked about yesterday, the tests don't fail for me (against mySQL). What tests are actually failing ? I will have a look, but without failing tests this is going to be hard.
Hide
August Detlefsen added a comment -

Maybe when the mapping is loaded you can just implicitly set the cache type to 'none' for polymorphic objects?

This would (seem to) solve the issue, and it doesn't require anyone already using Castor to make major modifications to their mappings.

Show
August Detlefsen added a comment - Maybe when the mapping is loaded you can just implicitly set the cache type to 'none' for polymorphic objects? This would (seem to) solve the issue, and it doesn't require anyone already using Castor to make major modifications to their mappings.
Hide
Werner Guttmann added a comment -

Not sure how to go about this, to be honest. But I appreciate the input ....

Show
Werner Guttmann added a comment - Not sure how to go about this, to be honest. But I appreciate the input ....
Hide
Ralf Joachim added a comment -

Maybe a test case would allow everybody to understand the original problem. It may also help to find a solution that works for our current tests and this issue, especially as we have 2 versions of the code that work in one or the other case.

Show
Ralf Joachim added a comment - Maybe a test case would allow everybody to understand the original problem. It may also help to find a solution that works for our current tests and this issue, especially as we have 2 versions of the code that work in one or the other case.
Hide
Ralf Joachim added a comment -

Simply forcing cache type none does not seam to be a solution that is acceptable for other users as it may break applications that rely on caching. Having said that I am not sure if that works when your classes implement Timestampable interface. If they do not, caching may not work as expected anyway.

Show
Ralf Joachim added a comment - Simply forcing cache type none does not seam to be a solution that is acceptable for other users as it may break applications that rely on caching. Having said that I am not sure if that works when your classes implement Timestampable interface. If they do not, caching may not work as expected anyway.
Hide
August Detlefsen added a comment -

OK, here is a test case that fails under load. This is one dispatcher from within a servlet we are working on:

<code>
protected String dispatchCompareRemove(HttpServletRequest request, HttpServletResponse response)
throws LoginException, AccessException {

log.debug("***** dispatchCompareRemove *****");

String nextPage = VIEW_NONE;

HttpSession session = request.getSession();

//see if there is already a Set of IDs in the session
HashSet compareIds = (HashSet)session.getAttribute("compareIds");
if (compareIds == null) compareIds = new HashSet();

Database db = null;

//the reply to send back
AjaxReply reply = new AjaxReply();

try {
//get the ID to compare from the request
Integer newId = convertInteger(request.getParameter("id"));

db = getJDODatabase();
db.begin();

//load the node
Node nodeToAdd = (Node)db.load(Node.class, newId, Database.ReadOnly);

//no sense keeping a tx open
db.rollback();

//determine type, and put Buildings into the set
if ("org.myclient.model.Organization".equals(nodeToAdd.getClassName())) { compareIds.removeAll( ((Organization)nodeToAdd).getBuildingIds() ); //ClassCastException here under load } else if ("org.myclient.model.Campus".equals(nodeToAdd.getClassName())) { compareIds.removeAll( ((Campus)nodeToAdd).getBuildingIds() ); //ClassCastException here under load } else if ("org.myclient.model.Building".equals(nodeToAdd.getClassName())) { compareIds.remove( newId ); }

//success!
reply.setId(newId);
reply.setName(MSG_COMPARE_REMOVE_SUCCESS);

} catch (AppException ae) { log.debug("", ae); reply.setSuccess(false); reply.setName(ae.getMessage()); } catch (PersistenceException pe) { log.debug("Exception attempting to remove Ids from compare", pe); reply.setSuccess(false); reply.setName(MSG_DATABASE_ERROR); } finally { closeJDODatabase(db); }

//set the modified set back in the session
session.setAttribute("compareIds", compareIds);

//marshal the AjaxReply to the response
try { log.debug("REPLY: " + marshal(reply, mapping) ); marshal(reply, response, mapping); } catch (Exception e) { log.debug("Unable to marshal reply: ", e); }

return nextPage;
}
</code>

Let's say you are trying to load a Campus object (which extends from Node). The first few times you call the method everything works properly and we are able to cast Node to Campus. However, once you reach a certain level of load, the campus no longer loads as a Campus, but only as a Node. Attempting to cast to Campus will cause a ClassCastException.

I can show you a working demo of this if you would like to observe it live on our servers.

Show
August Detlefsen added a comment - OK, here is a test case that fails under load. This is one dispatcher from within a servlet we are working on: <code> protected String dispatchCompareRemove(HttpServletRequest request, HttpServletResponse response) throws LoginException, AccessException { log.debug("***** dispatchCompareRemove *****"); String nextPage = VIEW_NONE; HttpSession session = request.getSession(); //see if there is already a Set of IDs in the session HashSet compareIds = (HashSet)session.getAttribute("compareIds"); if (compareIds == null) compareIds = new HashSet(); Database db = null; //the reply to send back AjaxReply reply = new AjaxReply(); try { //get the ID to compare from the request Integer newId = convertInteger(request.getParameter("id")); db = getJDODatabase(); db.begin(); //load the node Node nodeToAdd = (Node)db.load(Node.class, newId, Database.ReadOnly); //no sense keeping a tx open db.rollback(); //determine type, and put Buildings into the set if ("org.myclient.model.Organization".equals(nodeToAdd.getClassName())) { compareIds.removeAll( ((Organization)nodeToAdd).getBuildingIds() ); //ClassCastException here under load } else if ("org.myclient.model.Campus".equals(nodeToAdd.getClassName())) { compareIds.removeAll( ((Campus)nodeToAdd).getBuildingIds() ); //ClassCastException here under load } else if ("org.myclient.model.Building".equals(nodeToAdd.getClassName())) { compareIds.remove( newId ); } //success! reply.setId(newId); reply.setName(MSG_COMPARE_REMOVE_SUCCESS); } catch (AppException ae) { log.debug("", ae); reply.setSuccess(false); reply.setName(ae.getMessage()); } catch (PersistenceException pe) { log.debug("Exception attempting to remove Ids from compare", pe); reply.setSuccess(false); reply.setName(MSG_DATABASE_ERROR); } finally { closeJDODatabase(db); } //set the modified set back in the session session.setAttribute("compareIds", compareIds); //marshal the AjaxReply to the response try { log.debug("REPLY: " + marshal(reply, mapping) ); marshal(reply, response, mapping); } catch (Exception e) { log.debug("Unable to marshal reply: ", e); } return nextPage; } </code> Let's say you are trying to load a Campus object (which extends from Node). The first few times you call the method everything works properly and we are able to cast Node to Campus. However, once you reach a certain level of load, the campus no longer loads as a Campus, but only as a Node. Attempting to cast to Campus will cause a ClassCastException. I can show you a working demo of this if you would like to observe it live on our servers.
Hide
August Detlefsen added a comment -

Any chance to get this patch included in the 1.3 release?

Show
August Detlefsen added a comment - Any chance to get this patch included in the 1.3 release?
Hide
Werner Guttmann added a comment -

I will have a look to see whether the patch can be applied against SVN trunk, first of all. given that Ralf has changed a few things.

Show
Werner Guttmann added a comment - I will have a look to see whether the patch can be applied against SVN trunk, first of all. given that Ralf has changed a few things.
Hide
August Detlefsen added a comment -

I can give it a shot.. Do both of the two patches attached here need to be applied, or does patch.c2045.20070802.txt override patch.c2045.20070630.txt ?

Show
August Detlefsen added a comment - I can give it a shot.. Do both of the two patches attached here need to be applied, or does patch.c2045.20070802.txt override patch.c2045.20070630.txt ?
Hide
Werner Guttmann added a comment -

The latter.

Show
Werner Guttmann added a comment - The latter.
Hide
August Detlefsen added a comment -

I got it to compile and we are testing the jars against our web app now. Attached is a diff against the SVN trunk.

I would appreciate a code review, especially in the places where I diverged from Werner's original patch.

Thanks,
August

Show
August Detlefsen added a comment - I got it to compile and we are testing the jars against our web app now. Attached is a diff against the SVN trunk. I would appreciate a code review, especially in the places where I diverged from Werner's original patch. Thanks, August
Hide
Werner Guttmann added a comment -

Let me know the test results first, and I'll review the code.

Show
Werner Guttmann added a comment - Let me know the test results first, and I'll review the code.
Hide
August Detlefsen added a comment -

Thus far we have not gotten the AIOOBE with the patch in place.

Show
August Detlefsen added a comment - Thus far we have not gotten the AIOOBE with the patch in place.
Hide
Werner Guttmann added a comment -

In that case, I will briefly review the code tonight, and commit if there's no substantial flaws. But please remember that I will have to address the problems with the non-running test cases first. If that patch breaks anything, we will have to investigate.

Show
Werner Guttmann added a comment - In that case, I will briefly review the code tonight, and commit if there's no substantial flaws. But please remember that I will have to address the problems with the non-running test cases first. If that patch breaks anything, we will have to investigate.
Hide
August Detlefsen added a comment -

This patch makes solves the AIOOBE, but makes it so that persistent objects in a for() loop will not update:

Here's the scenario:

We are storing CO2 emissions over the course of a span of months. For each month, there is either an existing emission to be updated, or we need to create a new one.

Any new objects we create in the course of the loop are getting added to the database properly, but existing objects that updated are not getting persisted and the changes do not show up in the DB.

The code works like this:

//load up all existing emissions in the span
HashMap existingEmissions = new HashMap();
oql = db.getOQLQuery(" SELECT e FROM org.openeco.footprint.model.Emissions e " +
" WHERE e.sourceName.id = $1 " +
" AND e.startDate >= $2 AND e.startDate <= $3 ");
oql.bind(name.getId());
oql.bind(startDate);
oql.bind(endDate);

results = oql.execute();
while (results.hasMore()) { Emissions e = (Emissions)results.next(); existingEmissions.put(e.getStartDate(), e); }

// Explicitly close the QueryResults
results.close();

// Explicitly close the OQLQuery
oql.close();

//for each month in the span, create or edit emission
for (int i = 0; i < span; i++) {

Date emissionDate = DateUtils.addMonths(startDate, i);
log.debug("Checking for emission with name: " + name.getId() + " and date: " + emissionDate);

Emissions emission = (Emissions)existingEmissions.get(emissionDate);
log.debug("Existing emissions: " + emission + " persistent? " + db.isPersistent(emission));
if (emission == null) { emission = new Emissions(); emission.setSourceName(name); emission.setStartDate( emissionDate ); }

//set new values into emission
emission.setCO2(CO2);
emission.setCH4(CH4);

emission.setN2O(N2O);
emission.setHFC(HFC);
emission.setPFC(PFC);
emission.setSF6(SF6);
emission.setTotalCO2(totalCO2);

if (emission.getId() == null) { log.debug("Creating new emission: " + emission); db.create(emission); }

}

//save changes to db
log.debug("About to commit...");
db.commit();
log.debug("Done.");

Show
August Detlefsen added a comment - This patch makes solves the AIOOBE, but makes it so that persistent objects in a for() loop will not update: Here's the scenario: We are storing CO2 emissions over the course of a span of months. For each month, there is either an existing emission to be updated, or we need to create a new one. Any new objects we create in the course of the loop are getting added to the database properly, but existing objects that updated are not getting persisted and the changes do not show up in the DB. The code works like this: //load up all existing emissions in the span HashMap existingEmissions = new HashMap(); oql = db.getOQLQuery(" SELECT e FROM org.openeco.footprint.model.Emissions e " + " WHERE e.sourceName.id = $1 " + " AND e.startDate >= $2 AND e.startDate <= $3 "); oql.bind(name.getId()); oql.bind(startDate); oql.bind(endDate); results = oql.execute(); while (results.hasMore()) { Emissions e = (Emissions)results.next(); existingEmissions.put(e.getStartDate(), e); } // Explicitly close the QueryResults results.close(); // Explicitly close the OQLQuery oql.close(); //for each month in the span, create or edit emission for (int i = 0; i < span; i++) { Date emissionDate = DateUtils.addMonths(startDate, i); log.debug("Checking for emission with name: " + name.getId() + " and date: " + emissionDate); Emissions emission = (Emissions)existingEmissions.get(emissionDate); log.debug("Existing emissions: " + emission + " persistent? " + db.isPersistent(emission)); if (emission == null) { emission = new Emissions(); emission.setSourceName(name); emission.setStartDate( emissionDate ); } //set new values into emission emission.setCO2(CO2); emission.setCH4(CH4); emission.setN2O(N2O); emission.setHFC(HFC); emission.setPFC(PFC); emission.setSF6(SF6); emission.setTotalCO2(totalCO2); if (emission.getId() == null) { log.debug("Creating new emission: " + emission); db.create(emission); } } //save changes to db log.debug("About to commit..."); db.commit(); log.debug("Done.");
Hide
Ralf Joachim added a comment -

August, would you please take a look at my comment and the test case under castor-2568.

Show
Ralf Joachim added a comment - August, would you please take a look at my comment and the test case under castor-2568.
Hide
August Detlefsen added a comment -

Just saw that this issue was reassigned. Let me know if there is any more information I can provide to help get it resolved.

Show
August Detlefsen added a comment - Just saw that this issue was reassigned. Let me know if there is any more information I can provide to help get it resolved.
Hide
Ralf Joachim added a comment -

Hi August,

another user (Clovis) has been hit by the same issue and had been able to reproduce the problem in a multi threaded environment. He will attach a test case the next days.

What would be interesting is, if the original problem for you only apears with multilpe threads also or if you got this problem also with a single thread. At the moment I am not sure if both of you face the same issue but according to the stacktrace I got from Clovis it looks this way.

In addition I would ask you if you could test that your problem still exists in current SVN head. Since this issue has been reported I did big changes to load methods of LockEngine, ClassMolder and AbstractTransactionContext to fix some other bugs. Maybe you also want to take a look at CASTOR-2871 and CASTOR-1217 where the major changes at these classes had been made.

Regards Ralf

Show
Ralf Joachim added a comment - Hi August, another user (Clovis) has been hit by the same issue and had been able to reproduce the problem in a multi threaded environment. He will attach a test case the next days. What would be interesting is, if the original problem for you only apears with multilpe threads also or if you got this problem also with a single thread. At the moment I am not sure if both of you face the same issue but according to the stacktrace I got from Clovis it looks this way. In addition I would ask you if you could test that your problem still exists in current SVN head. Since this issue has been reported I did big changes to load methods of LockEngine, ClassMolder and AbstractTransactionContext to fix some other bugs. Maybe you also want to take a look at CASTOR-2871 and CASTOR-1217 where the major changes at these classes had been made. Regards Ralf
Hide
Clovis Wichoski added a comment -

Patch that simulates the problem with current SVN version.

Show
Clovis Wichoski added a comment - Patch that simulates the problem with current SVN version.
Hide
August Detlefsen added a comment -

All of our issues have been in a multi-threaded (servlet) environment. The exceptions seem to show up when there is load and multiple threads are trying to load the same extended object (same ID) at the same time.

I would see it as an improvement if at the very least the AIOOBE could be caught in Castor and a PersistenceException thrown instead. Then we could catch the PersistenceException and execute CacheManager.expireCache();

Expiring the cache seems to alleviate the problem to some extent.

Show
August Detlefsen added a comment - All of our issues have been in a multi-threaded (servlet) environment. The exceptions seem to show up when there is load and multiple threads are trying to load the same extended object (same ID) at the same time. I would see it as an improvement if at the very least the AIOOBE could be caught in Castor and a PersistenceException thrown instead. Then we could catch the PersistenceException and execute CacheManager.expireCache(); Expiring the cache seems to alleviate the problem to some extent.
Hide
Ralf Joachim added a comment -

Initial patch for review.

Major change is the synchronization in LockEngine.load(). As I feared at my last changes to this method, it seams to causes problems that we need to release and reacquire a lock for extended hierarchies. Having said that I'm not sure for 100% on that.

The other changes are more or less refactorings to eliminate other sideeffects.

Drawbacks of the patch are a valuable performance penalty and that deadlock detection does not work any more. Currently I do not see another solution to be implemented in short time. For the long term I think we need to refactor our lock engine to acquire locks on base classes always.

Please let me know your thoughths and results of your tests.

Show
Ralf Joachim added a comment - Initial patch for review. Major change is the synchronization in LockEngine.load(). As I feared at my last changes to this method, it seams to causes problems that we need to release and reacquire a lock for extended hierarchies. Having said that I'm not sure for 100% on that. The other changes are more or less refactorings to eliminate other sideeffects. Drawbacks of the patch are a valuable performance penalty and that deadlock detection does not work any more. Currently I do not see another solution to be implemented in short time. For the long term I think we need to refactor our lock engine to acquire locks on base classes always. Please let me know your thoughths and results of your tests.
Hide
Ralf Joachim added a comment -

The refactorings included in the previous patch have already been committed with separate issues.

Slightly improved patch by synchronizing on ClassMolder of base class of an extends hierarchy instead of the global LockEngine.

Show
Ralf Joachim added a comment - The refactorings included in the previous patch have already been committed with separate issues. Slightly improved patch by synchronizing on ClassMolder of base class of an extends hierarchy instead of the global LockEngine.

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: