Details
Description
Castor appears to have a memory leak when unmarshalling XML files. Large files cause OutOfMemoryErrors. This is a critical problem and renders Castor unusable for high volume data processing.
Turning on the debug for castor package, we were able to see the log messages from the Castor classes. Mostly the log messages were for adding descriptors using different methods (by package name, by element name...) to resolve the mapping, in case if a mapping for a node element is not found. This happens when the element name in the xml does not have a direct mapping in the mapping files.
For example: An XML element called person-info exists in the xml, but we don't have a class mapped directly to this element. Instead the location attribute is used to bind a different object to the XML element.
Also looked in to the XMLClassDescriptorResolverImpl Castor class where these messages are logged. These missing classes were added to an ArrayList and we witnessed it growing larger and larger over a period of time as the process parses more and more data.
We used JProbe to match the above behavior with the snapshot taken. The snapshot revealed the ArrayList object with a very large size in the heap memory.
Ed, a colleague of mine and myself fixed a problem closely related to this about 6 month ago. The required patch had been committed to SVN trunk at about the same time. In other words, Castor 1.3 should have a fix for that problem (or a very similar problem). If I remember correctly, the information about the fact that a ClassDescriptor for a given XML artefact was not defined by the means of a mapping or a generated class descriptor was not added to a cache (where it should have been).
Can I please ask you to try your test(s) against 1.3 and report back your findings.
In addition, what ArrayList object precisely was growing in size ad infinitum ?