Jetty
  1. Jetty
  2. JETTY-515

Timer is missing scavenging Task in HashSessionManager

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 6.1.7
    • Fix Version/s: 6.2.0pre0, 6.1.8
    • Component/s: Servlet
    • Labels:
      None
    • Environment:
      Windows Server 2003, jdk1.6.0, jetty 6.1.7
    • Number of attachments :
      4

      Description

      Sporadically our production environment goes out of Heap, because expired sessions are not scavenged.
      To validate this we have performed a heap dump when the server is hitting its maximum heap usage, before crashing the server.
      In this .hprof heap dump we used SAP Memory Analyzer to do post-mortem analysis. (screenshot of problem in attachment)
      What happens is that somehow the java.util.Timer object (_timer in the HashSessionManager) ends up not having any TimerTask assigned to it (we are absolutely sure about this - in other heap dumps the TimerTask is clearly there - pointing to the anonymous inner class that calls the scavenge() method).
      Hence timed-out sessions are not garbage collected as they remain referenced through the SessionManager.
      Analyzing the Session objects in the heap dump reveals a very large amount of HashSessionManager$$Session objects (about 10 times more then normal system usage). Ffurthermore the oldest Session objects have lastAccessed times that were more than 24 hours before the system heap problems.

      Often the system can run weeks before experiencing this problem.

      If more information is needed please contact me at dimitry_dhondt [at] yahoo.com

      Dimitry.

      1. heap.jpg
        191 kB
      2. scavenge.jpg
        201 kB
      3. TimerTask.jpg
        175 kB
      4. TimerThread.jpg
        214 kB

        Activity

        Hide
        Dimitry D'hondt added a comment -

        Added info about TimerThread.

        Taking a look at the TimerThread that should be executing the Timer's Tasks (and thus the scavenge() method), reveales that this thread is probably broken :

        1) The variables underlined in red are null which indicates that likely the Thread.exit() method has run :

        private void exit() {
        if (group != null)

        { group.remove(this); group = null; }

        /* Aggressively null out all reference fields: see bug 4006245 */
        target = null;
        /* Speed the release of some of these resources */
        threadLocals = null;
        inheritableThreadLocals = null;
        inheritedAccessControlContext = null;
        blocker = null;
        uncaughtExceptionHandler = null;
        }

        And a very curious variable (not references from the java portion of Thread - probably from the JVM native code) stillborn indicates true.

        Dimitry.

        Show
        Dimitry D'hondt added a comment - Added info about TimerThread. Taking a look at the TimerThread that should be executing the Timer's Tasks (and thus the scavenge() method), reveales that this thread is probably broken : 1) The variables underlined in red are null which indicates that likely the Thread.exit() method has run : private void exit() { if (group != null) { group.remove(this); group = null; } /* Aggressively null out all reference fields: see bug 4006245 */ target = null; /* Speed the release of some of these resources */ threadLocals = null; inheritableThreadLocals = null; inheritedAccessControlContext = null; blocker = null; uncaughtExceptionHandler = null; } And a very curious variable (not references from the java portion of Thread - probably from the JVM native code) stillborn indicates true. Dimitry.
        Hide
        Jan Bartel added a comment -

        Hi Dimitry,

        Thanks for the thorough investigation!

        We released jetty-6.1.8 yesterday, which includes what I still believe is the fix for the scavenger thread dying. If you are able to try out 6.1.8 and report back, that would be most useful. If the problem re-occurs, can you also please attach the jetty log file.

        cheers
        Jan

        Show
        Jan Bartel added a comment - Hi Dimitry, Thanks for the thorough investigation! We released jetty-6.1.8 yesterday, which includes what I still believe is the fix for the scavenger thread dying. If you are able to try out 6.1.8 and report back, that would be most useful. If the problem re-occurs, can you also please attach the jetty log file. cheers Jan
        Hide
        Dimitry D'hondt added a comment -

        Jan,

        We're currently quite convinced that the Timer background thread was broken from the start of the application server. This would mean that Jetty had no influence on this (as it has no way to control the Timer's thread - which is started from the constructor of the Timer object - called from the HashSessionManager). The stillborn flag clearly indicated that this thread failed to start in the first place. This seems to be a JVM related issue. We're upgrading from JRE 1.6.0 to 1.6.0_04. Other events (unrelated to the problem described here) are requiring us to redeploy the application on Tomcat 5.5 (Some people believe that it would be easier on the people maintaining the production environment - if all applications were running on Tomcat).

        That is not to say that the fix you developed would most surely fix some cases. I'm also assuming that for some of the crashes we've had in the past this race condition may indeed have been a cause.

        Thanks for the support you've given us !

        Dimitry.

        Show
        Dimitry D'hondt added a comment - Jan, We're currently quite convinced that the Timer background thread was broken from the start of the application server. This would mean that Jetty had no influence on this (as it has no way to control the Timer's thread - which is started from the constructor of the Timer object - called from the HashSessionManager). The stillborn flag clearly indicated that this thread failed to start in the first place. This seems to be a JVM related issue. We're upgrading from JRE 1.6.0 to 1.6.0_04. Other events (unrelated to the problem described here) are requiring us to redeploy the application on Tomcat 5.5 (Some people believe that it would be easier on the people maintaining the production environment - if all applications were running on Tomcat). That is not to say that the fix you developed would most surely fix some cases. I'm also assuming that for some of the crashes we've had in the past this race condition may indeed have been a cause. Thanks for the support you've given us ! Dimitry.
        Hide
        Dimitry D'hondt added a comment -

        Jan,

        Apparently we're now back on jetty
        We'll deploy 6.1.8 this week, and keep you posted.

        Dimitry.

        Show
        Dimitry D'hondt added a comment - Jan, Apparently we're now back on jetty We'll deploy 6.1.8 this week, and keep you posted. Dimitry.
        Hide
        Dimitry D'hondt added a comment -

        Jan,

        We've been running for quite a while now in production (about 2 months), and the issue has never returned after your fix.

        Thx,

        Dimi.

        Show
        Dimitry D'hondt added a comment - Jan, We've been running for quite a while now in production (about 2 months), and the issue has never returned after your fix. Thx, Dimi.

          People

          • Assignee:
            Jan Bartel
            Reporter:
            Dimitry D'hondt
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: