XStream
  1. XStream
  2. XSTR-395

PermGen Space - OutOfMemory due to String.intern() in StringConverter

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3
    • Component/s: None
    • Labels:
      None

      Description

      when loading huge XStream data sets from disk, holding mainly Strings, after a while, you get a PermGen Space - OutOfMemory.

      Increasing the PermSpace of the JVM with -XX:MaxPermSize=256M or higher only postpones the problem.

      First I assumed that the amount of classes created caused the VM to choke. Additionally I found a VM defect, that describes an "PermHeap bloat in and only in server VM". http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4957990

      So I stopped reusing my XStream instance and constructed always a new one. This improved the situation further but at the end, there was always a "PermGen Space" exception. Days ago I found some more infos about the content of the perm gen space and the information, that also interned strings are kept there. So I searched through the code of XStream and found StringConverter.java.

      StringConverter is interning strings to unify them to reduce the memory footprint. This is basically not a bad ideas but I have the feeling that String.intern() still behaves strange in Java 5. It does not seem to be a real weak storage and additionally String.intern() is a native call.

      Luckily all converters are pluggable and therefore I wrote a new StringConverter and plugged it in. It works just perfect! Basically I use a WeakHashMap or a LRUHashMap to unify the string instances and my PermSpace stays low and my program runs through (of course some additional heap might be required but this is set high already in my case).

      Find the new StringConverter attached as well as a usage example. The cache to use is defined globally in the program to reuse it across several instances of XStream. This has the best effect and saves quite some memory. If you gonna use the cache in a multi-threaded szenario, do not forget to wrap it with Collections.synchronizedMap(...).

      /**

      • Cache for unifying strings.
        */
        private static final Map<String, String> stringCache = new WeakHashMap<String, String>();

      XStream xstream = new XStream();
      xstream.registerConverter(new SRStringConverter(stringCache));

        Activity

        Hide
        Jörg Schaible added a comment -

        Hi Rene,

        after some benchmarks it was visible that even the synchronized WeakHashMap is most times a lot faster than String.intern(). Therefore I've replaced our current implementation with yours and have the additional benefit of saving the limited PermGen space. This was really a great contribution. Thanks a lot! You may checkout the current version from Subversion to test yourself.

        • Jörg
        Show
        Jörg Schaible added a comment - Hi Rene, after some benchmarks it was visible that even the synchronized WeakHashMap is most times a lot faster than String.intern(). Therefore I've replaced our current implementation with yours and have the additional benefit of saving the limited PermGen space. This was really a great contribution. Thanks a lot! You may checkout the current version from Subversion to test yourself. Jörg
        Hide
        Tatu Saloranta added a comment -

        I know I'm bit late given this is closed, but would it have been possible to make this configurable? While String.intern() is slow, it can have other performance benefits beyond just reduced memory usage. As such there are cases where I would prefer to get intern()ed names back (apologies if I misunderstood this issue and it's only for internal usage of unified/canonical Strings). Basically nice thing about intern()ed Strings is that they can be == compared with String constants. This can not be done by library-unified Strings.

        So it'd be nice if one could choose interning/non-interning behavior; defaulting to non-interning is fine since it's safer.

        Show
        Tatu Saloranta added a comment - I know I'm bit late given this is closed, but would it have been possible to make this configurable? While String.intern() is slow, it can have other performance benefits beyond just reduced memory usage. As such there are cases where I would prefer to get intern()ed names back (apologies if I misunderstood this issue and it's only for internal usage of unified/canonical Strings). Basically nice thing about intern()ed Strings is that they can be == compared with String constants. This can not be done by library-unified Strings. So it'd be nice if one could choose interning/non-interning behavior; defaulting to non-interning is fine since it's safer.
        Hide
        Jörg Schaible added a comment -

        The current implementation ensures that always the same string is used. So there's no reason to use inter(). It is slower, allocated valuable space and you gain nothing from such an option. However, you can easily register your own StringConverter if you insist on such a behaviour.

        Show
        Jörg Schaible added a comment - The current implementation ensures that always the same string is used. So there's no reason to use inter(). It is slower, allocated valuable space and you gain nothing from such an option. However, you can easily register your own StringConverter if you insist on such a behaviour.
        Hide
        Rene Schwietzke added a comment -

        There is a small error in my implementation. The new WeakHashMap<String, String> is wrong, it has to be new WeakHashMap<String, WeakReference<String>>. Otherwise the map has strong references to its own weak keys. Check Java doc for more information.

        So, the Converter code should look like this:

        /**

        • A Map to store strings as long as needed to map similar
        • strings onto the same instance and conserve memory. The map can be set
        • from the outside during construction, so it can be a lru map or a
        • weak map, sychronised or not.
          */
          private final Map<String, WeakReference<String>> cache;

        public SRStringConverter(Map<String, WeakReference<String>> cache)

        { this.cache = cache; }

        public boolean canConvert(Class type)

        { return type.equals(String.class); }

        public Object fromString(String str)
        {
        String result = null;

        WeakReference<String> ref = cache.get(str);
        if (ref != null)

        { result = ref.get(); }

        if (result == null)

        { // fill cache cache.put(str, new WeakReference<String>(str)); result = str; }

        return result;
        }

        Show
        Rene Schwietzke added a comment - There is a small error in my implementation. The new WeakHashMap<String, String> is wrong, it has to be new WeakHashMap<String, WeakReference<String>>. Otherwise the map has strong references to its own weak keys. Check Java doc for more information. So, the Converter code should look like this: /** A Map to store strings as long as needed to map similar strings onto the same instance and conserve memory. The map can be set from the outside during construction, so it can be a lru map or a weak map, sychronised or not. */ private final Map<String, WeakReference<String>> cache; public SRStringConverter(Map<String, WeakReference<String>> cache) { this.cache = cache; } public boolean canConvert(Class type) { return type.equals(String.class); } public Object fromString(String str) { String result = null; WeakReference<String> ref = cache.get(str); if (ref != null) { result = ref.get(); } if (result == null) { // fill cache cache.put(str, new WeakReference<String>(str)); result = str; } return result; }
        Hide
        Rene Schwietzke added a comment -

        Check the last comment. It might be necessary to check my findings.

        Show
        Rene Schwietzke added a comment - Check the last comment. It might be necessary to check my findings.
        Hide
        Jörg Schaible added a comment -

        This is what already has been implemented. You might check the latest SNAPSHOT.

        Show
        Jörg Schaible added a comment - This is what already has been implemented. You might check the latest SNAPSHOT.
        Hide
        Rene Schwietzke added a comment -

        Thanks. So it was a problem in my code because I still use my own converter and have not upgraded yet.

        Show
        Rene Schwietzke added a comment - Thanks. So it was a problem in my code because I still use my own converter and have not upgraded yet.
        Hide
        Jörg Schaible added a comment -

        Closing issues before next release.

        Show
        Jörg Schaible added a comment - Closing issues before next release.

          People

          • Assignee:
            Jörg Schaible
            Reporter:
            Rene Schwietzke
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: