Castor appears to have a memory leak when unmarshalling XML files. Large files cause OutOfMemoryErrors. This is a critical problem and renders Castor unusable for high volume data processing.
Turning on the debug for castor package, we were able to see the log messages from the Castor classes. Mostly the log messages were for adding descriptors using different methods (by package name, by element name...) to resolve the mapping, in case if a mapping for a node element is not found. This happens when the element name in the xml does not have a direct mapping in the mapping files.
For example: An XML element called person-info exists in the xml, but we don't have a class mapped directly to this element. Instead the location attribute is used to bind a different object to the XML element.
Also looked in to the XMLClassDescriptorResolverImpl Castor class where these messages are logged. These missing classes were added to an ArrayList and we witnessed it growing larger and larger over a period of time as the process parses more and more data.
We used JProbe to match the above behavior with the snapshot taken. The snapshot revealed the ArrayList object with a very large size in the heap memory.