Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 7.1.4
    • Fix Version/s: 7.4.0
    • Component/s: Bayeux, Continuations, HTTP, NIO, Servlet
    • Labels:
      None
    • Environment:
      Linux 2.6.21.7-2.ec2.v1.2.fc8xen
    • Number of attachments :
      0

      Description

      I have a CometD 1.1.1 application running on Jetty 7.1.4.

      I am currently load testing this application using the CometD java client. The test setup is as follows:

      1) 3500 clients (split over two machines)
      2) Broadcasts to all clients every second, each broadcast message data size is approx. 1000 bytes unencrypted, not including the other CometD message data or HTTP request and response headers.

      I am using the SslSelectChannelConnector, 4 acceptors, 50 threads.

      The Jetty startup parameters are:

       
      -Xms1g
      -Xmx2g
      -XX:NewRatio=2
      -XX:+UseParallelGC
      -XX:+UseParallelOldGC
      -XX:MaxGCPauseMillis=25
      

      With this setup, the Jetty server resident memory grows and grows very quickly until it starts producing errors like:

       
      [2010-07-02 21:07:28,762][qtp2661678-76][WARN ][org.eclipse.jetty.util.log] handle failed
      java.lang.OutOfMemoryError: null
              at sun.misc.Unsafe.allocateMemory(Native Method) ~[na:1.6.0_20]
              at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:99) ~[na:1.6.0_20]
              at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) ~[na:1.6.0_20]
              at org.eclipse.jetty.io.nio.DirectNIOBuffer.<init>(DirectNIOBuffer.java:46) ~[jetty-io-7.1.4.v20100610.jar:7.1.4.v20100610]
              at org.eclipse.jetty.server.nio.AbstractNIOConnector.newRequestBuffer(AbstractNIOConnector.java:53) ~[jetty-server-7.1.4.v20100610.jar:7.1.4.v20100610]
              at org.eclipse.jetty.http.HttpBuffers$1.newBuffer(HttpBuffers.java:34) ~[jetty-http-7.1.4.v20100610.jar:7.1.4.v20100610]
              at org.eclipse.jetty.io.ThreadLocalBuffers.getBuffer(ThreadLocalBuffers.java:60) ~[jetty-io-7.1.4.v20100610.jar:7.1.4.v20100610]
              at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:557) ~[jetty-http-7.1.4.v20100610.jar:7.1.4.v20100610]
              at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) ~[jetty-http-7.1.4.v20100610.jar:7.1.4.v20100610]
              at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:424) ~[jetty-server-7.1.4.v20100610.jar:7.1.4.v20100610]
              at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) ~[jetty-io-7.1.4.v20100610.jar:7.1.4.v20100610]
              at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436) [jetty-util-7.1.4.v20100610.jar:7.1.4.v20100610]
              at java.lang.Thread.run(Thread.java:619) [na:1.6.0_20]
      

      and finally the JVM usually crashes with:

       
      #
      # A fatal error has been detected by the Java Runtime Environment:
      #
      # java.lang.OutOfMemoryError: requested 32756 bytes for ChunkPool::allocate. Out of swap space?
      #
      #  Internal Error (allocation.cpp:117), pid=5925, tid=1771887504
      #  Error: ChunkPool::allocate
      #
      # JRE version: 6.0_20-b02
      # Java VM: Java HotSpot(TM) Server VM (16.3-b01 mixed mode linux-x86 )
      # An error report file with more information is saved as:
      

      I have even tried this with a 64-bit JVM, setting the -XX:MaxDirectMemorySize=6g and the error still happened once the JVM resident size had grown to approximately 6g plus the heap size.

      I noted that if the maximum heap memory size (-Xmx) is set small enough that the system has to do Full GC's every few seconds (or Full GC's are forced to run every 30s or so via System.gc()), then the problem does not happen. The problem only happens if the system runs for about a minute without a Full GC (with my normal application setting of 2G, Full GC's only happen every 15-20 minutes or so).

      With a smaller message size (between 100-130 bytes) this did not seem to be a consistent problem, though I did get the OOM error occasionally. However, with the larger message size I can consistently trigger the OOM.

      This error is very similar to http://jira.codehaus.org/browse/JETTY-102, and I suspect a direct buffer leak somewhere.

        Activity

        Hide
        Greg Wilkins added a comment -

        Sorry but I have to ask the dumb question.... does the machine actually have enough real memory and swap to run a 2G JVM with the other processes that are running? Are you running client and server on same machine?

        Show
        Greg Wilkins added a comment - Sorry but I have to ask the dumb question.... does the machine actually have enough real memory and swap to run a 2G JVM with the other processes that are running? Are you running client and server on same machine?
        Hide
        Raman Gupta added a comment - - edited

        Hi Greg. Yup, the server machine has 7G of RAM (c1.xlarge on Amazon EC2). Both the 32-bit JVM and the 64-bit JVM was setup with a -ms/-mx of 1g/2g. The 64-bit JVM had an additional parameter of -XX:MaxDirectMemorySize=6g so that Jetty had 6G of native memory available for direct buffers. No swap was used during any of these tests.

        The clients were run across 4 different machines, each with 7G of memory (all c1.xlarge). As an aside, when I enabled SSL I had to set useDirectBuffers=false on the client side. With SSL enabled and useDirectBuffers=true on the client side, my clients were crashing. I could file a separate bug report for that, but if any of the SSL client and server code is shared, I wonder if its the same underlying issue.

        Tomorrow I can run a test with the non-SSL connector to check if this happens only with SSL.

        Show
        Raman Gupta added a comment - - edited Hi Greg. Yup, the server machine has 7G of RAM (c1.xlarge on Amazon EC2). Both the 32-bit JVM and the 64-bit JVM was setup with a -ms/-mx of 1g/2g. The 64-bit JVM had an additional parameter of -XX:MaxDirectMemorySize=6g so that Jetty had 6G of native memory available for direct buffers. No swap was used during any of these tests. The clients were run across 4 different machines, each with 7G of memory (all c1.xlarge). As an aside, when I enabled SSL I had to set useDirectBuffers=false on the client side. With SSL enabled and useDirectBuffers=true on the client side, my clients were crashing. I could file a separate bug report for that, but if any of the SSL client and server code is shared, I wonder if its the same underlying issue. Tomorrow I can run a test with the non-SSL connector to check if this happens only with SSL.
        Hide
        Raman Gupta added a comment -

        BTW, I just finished an 8-hour test with the exact same codebase and settings, with the only difference being that the broadcast message data was about 60-130 bytes each, but the message rate was 38 messages/s per client. In this case, no leak in direct buffers was noticed – at least it wasn't major enough to cause an OOM in an eight hour test. The process resident memory at the end of the test was 1.7G, with the java heap fluctuating between 160MB and 750MB.

        However, as soon as the message size is raised to 1000 or so bytes, even with a reduced message rate of 1 message/s per client, the native memory usage climbs very quickly (but not heap) and within one to three minutes the JVM fails with OOM.

        Show
        Raman Gupta added a comment - BTW, I just finished an 8-hour test with the exact same codebase and settings, with the only difference being that the broadcast message data was about 60-130 bytes each, but the message rate was 38 messages/s per client. In this case, no leak in direct buffers was noticed – at least it wasn't major enough to cause an OOM in an eight hour test. The process resident memory at the end of the test was 1.7G, with the java heap fluctuating between 160MB and 750MB. However, as soon as the message size is raised to 1000 or so bytes, even with a reduced message rate of 1 message/s per client, the native memory usage climbs very quickly (but not heap) and within one to three minutes the JVM fails with OOM.
        Hide
        Raman Gupta added a comment -

        I have confirmed that the problem occurs only when using the SslSelectChannelConnector.

        Once I configured the test clients to connect to the server using the non-SSL SelectChannelConnector, there is no direct buffer leak. The process resident size fluctuates between 615MB and 680MB. The test has now been running for ten minutes and it has not gone beyond 690MB resident.

        When changing nothing except switching clients to use the SslSelectChannelConnector, the resident size grows to about 2.4GB within about 40 seconds, and then fails because the 32-bit JVM runs out of address space.

        Show
        Raman Gupta added a comment - I have confirmed that the problem occurs only when using the SslSelectChannelConnector. Once I configured the test clients to connect to the server using the non-SSL SelectChannelConnector, there is no direct buffer leak. The process resident size fluctuates between 615MB and 680MB. The test has now been running for ten minutes and it has not gone beyond 690MB resident. When changing nothing except switching clients to use the SslSelectChannelConnector, the resident size grows to about 2.4GB within about 40 seconds, and then fails because the 32-bit JVM runs out of address space.
        Hide
        Raman Gupta added a comment -

        Here is my connector configuration:

        <Call name="addConnector">
          <Arg>
            <New class="org.eclipse.jetty.server.ssl.SslSelectChannelConnector">
              <Set name="host">
                <Property name="jetty.host"/>
              </Set>
              <Set name="port">
                <Property name="jetty.ssl.port" default="8443"/>
              </Set>
              <Set name="acceptors">4</Set>
              <Set name="statsOn">false</Set>
              <Set name="lowResourcesConnections">30000</Set>
              <Set name="lowResourcesMaxIdleTime">2000</Set>
              <Set name="keystore">
                <SystemProperty name="cometd.home"/>/config/cometd.ks
              </Set>
              <Set name="password">xxx</Set>
              <Set name="keyPassword">xxx</Set>
              <Set name="truststore">
                <SystemProperty name="cometd.home"/>/config/cometd.ks
              </Set>
              <Set name="trustPassword">xxx</Set>
              <Set name="includeCipherSuites">
                <Array type="java.lang.String">
                  <!-- Allow only 128-bit or 256-bit cipher suites. Configure as desired. See
                        http://java.sun.com/javase/6/docs/technotes/guides/security/SunProviders.html#SunJSSEProvider for a complete list. -->
                  <Item>SSL_RSA_WITH_RC4_128_MD5</Item>
                  <Item>SSL_RSA_WITH_RC4_128_SHA</Item>
                  <Item>TLS_RSA_WITH_AES_128_CBC_SHA</Item>
                  <Item>TLS_RSA_WITH_AES_256_CBC_SHA</Item>
                  <Item>TLS_ECDH_ECDSA_WITH_RC4_128_SHA</Item>
                  <Item>TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA</Item>
                  <Item>TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA</Item>
                  <Item>TLS_ECDH_RSA_WITH_RC4_128_SHA</Item>
                  <Item>TLS_ECDH_RSA_WITH_AES_128_CBC_SHA</Item>
                  <Item>TLS_ECDH_RSA_WITH_AES_256_CBC_SHA</Item>
                  <Item>TLS_ECDHE_ECDSA_WITH_RC4_128_SHA</Item>
                  <Item>TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA</Item>
                  <Item>TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA</Item>
                  <Item>TLS_ECDHE_RSA_WITH_RC4_128_SHA</Item>
                  <Item>TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA</Item>
                  <Item>TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA</Item>
                  <Item>TLS_DHE_RSA_WITH_AES_128_CBC_SHA</Item>
                  <Item>TLS_DHE_RSA_WITH_AES_256_CBC_SHA</Item>
                  <Item>TLS_DHE_DSS_WITH_AES_128_CBC_SHA</Item>
                  <Item>TLS_DHE_DSS_WITH_AES_256_CBC_SHA</Item>
                  <Item>SSL_RSA_WITH_3DES_EDE_CBC_SHA</Item>
                  <Item>TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA</Item>
                  <Item>TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA</Item>
                  <Item>TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA</Item>
                  <Item>TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA</Item>
                  <Item>SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA</Item>
                  <Item>SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA</Item>
                  <Item>SSL_RSA_WITH_DES_CBC_SHA</Item>
                  <Item>SSL_DHE_RSA_WITH_DES_CBC_SHA</Item>
                  <Item>SSL_DHE_DSS_WITH_DES_CBC_SHA</Item>
                  <Item>TLS_KRB5_WITH_RC4_128_SHA</Item>
                  <Item>TLS_KRB5_WITH_RC4_128_MD5</Item>
                  <Item>TLS_KRB5_WITH_3DES_EDE_CBC_SHA</Item>
                  <Item>TLS_KRB5_WITH_3DES_EDE_CBC_MD5</Item>
                  <Item>TLS_KRB5_WITH_DES_CBC_SHA</Item>
                  <Item>TLS_KRB5_WITH_DES_CBC_MD5</Item>
                </Array>
              </Set>
              <Set name="excludeCipherSuites">
                <Array type="java.lang.String">
                  <!-- Exclude anon or 40-bit ciphers. Configure as desired. -->
                  <Item>SSL_DH_anon_WITH_RC4_128_MD5</Item>
                  <Item>TLS_DH_anon_WITH_AES_128_CBC_SHA</Item>
                  <Item>TLS_DH_anon_WITH_AES_256_CBC_SHA</Item>
                  <Item>SSL_DH_anon_WITH_3DES_EDE_CBC_SHA</Item>
                  <Item>SSL_DH_anon_WITH_DES_CBC_SHA</Item>
                  <Item>TLS_ECDH_anon_WITH_RC4_128_SHA</Item>
                  <Item>TLS_ECDH_anon_WITH_AES_128_CBC_SHA</Item>
                  <Item>TLS_ECDH_anon_WITH_AES_256_CBC_SHA</Item>
                  <Item>TLS_ECDH_anon_WITH_3DES_EDE_CBC_SHA</Item>
                  <Item>SSL_DH_anon_EXPORT_WITH_RC4_40_MD5</Item>
                  <Item>SSL_DH_anon_EXPORT_WITH_DES40_CBC_SHA</Item>
                  <Item>TLS_ECDH_anon_WITH_NULL_SHA</Item>
                  <Item>SSL_RSA_EXPORT_WITH_RC4_40_MD5</Item>
                  <Item>SSL_RSA_EXPORT_WITH_DES40_CBC_SHA</Item>
                  <Item>SSL_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA</Item>
                  <Item>SSL_DHE_DSS_EXPORT_WITH_DES40_CBC_SHA</Item>
                  <Item>SSL_RSA_WITH_NULL_MD5</Item>
                  <Item>SSL_RSA_WITH_NULL_SHA</Item>
                  <Item>TLS_ECDH_ECDSA_WITH_NULL_SHA</Item>
                  <Item>TLS_ECDH_RSA_WITH_NULL_SHA</Item>
                  <Item>TLS_ECDHE_ECDSA_WITH_NULL_SHA</Item>
                  <Item>TLS_ECDHE_RSA_WITH_NULL_SHA</Item>
                  <Item>TLS_KRB5_EXPORT_WITH_RC4_40_SHA</Item>
                  <Item>TLS_KRB5_EXPORT_WITH_RC4_40_MD5</Item>
                  <Item>TLS_KRB5_EXPORT_WITH_DES_CBC_40_SHA</Item>
                  <Item>TLS_KRB5_EXPORT_WITH_DES_CBC_40_MD5</Item>
                </Array>
              </Set>
            </New>
          </Arg>
        </Call>
        
        Show
        Raman Gupta added a comment - Here is my connector configuration: <Call name="addConnector"> <Arg> <New class="org.eclipse.jetty.server.ssl.SslSelectChannelConnector"> <Set name="host"> <Property name="jetty.host"/> </Set> <Set name="port"> <Property name="jetty.ssl.port" default="8443"/> </Set> <Set name="acceptors">4</Set> <Set name="statsOn">false</Set> <Set name="lowResourcesConnections">30000</Set> <Set name="lowResourcesMaxIdleTime">2000</Set> <Set name="keystore"> <SystemProperty name="cometd.home"/>/config/cometd.ks </Set> <Set name="password">xxx</Set> <Set name="keyPassword">xxx</Set> <Set name="truststore"> <SystemProperty name="cometd.home"/>/config/cometd.ks </Set> <Set name="trustPassword">xxx</Set> <Set name="includeCipherSuites"> <Array type="java.lang.String"> <!-- Allow only 128-bit or 256-bit cipher suites. Configure as desired. See http://java.sun.com/javase/6/docs/technotes/guides/security/SunProviders.html#SunJSSEProvider for a complete list. --> <Item>SSL_RSA_WITH_RC4_128_MD5</Item> <Item>SSL_RSA_WITH_RC4_128_SHA</Item> <Item>TLS_RSA_WITH_AES_128_CBC_SHA</Item> <Item>TLS_RSA_WITH_AES_256_CBC_SHA</Item> <Item>TLS_ECDH_ECDSA_WITH_RC4_128_SHA</Item> <Item>TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA</Item> <Item>TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA</Item> <Item>TLS_ECDH_RSA_WITH_RC4_128_SHA</Item> <Item>TLS_ECDH_RSA_WITH_AES_128_CBC_SHA</Item> <Item>TLS_ECDH_RSA_WITH_AES_256_CBC_SHA</Item> <Item>TLS_ECDHE_ECDSA_WITH_RC4_128_SHA</Item> <Item>TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA</Item> <Item>TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA</Item> <Item>TLS_ECDHE_RSA_WITH_RC4_128_SHA</Item> <Item>TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA</Item> <Item>TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA</Item> <Item>TLS_DHE_RSA_WITH_AES_128_CBC_SHA</Item> <Item>TLS_DHE_RSA_WITH_AES_256_CBC_SHA</Item> <Item>TLS_DHE_DSS_WITH_AES_128_CBC_SHA</Item> <Item>TLS_DHE_DSS_WITH_AES_256_CBC_SHA</Item> <Item>SSL_RSA_WITH_3DES_EDE_CBC_SHA</Item> <Item>TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA</Item> <Item>TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA</Item> <Item>TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA</Item> <Item>TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA</Item> <Item>SSL_DHE_RSA_WITH_3DES_EDE_CBC_SHA</Item> <Item>SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA</Item> <Item>SSL_RSA_WITH_DES_CBC_SHA</Item> <Item>SSL_DHE_RSA_WITH_DES_CBC_SHA</Item> <Item>SSL_DHE_DSS_WITH_DES_CBC_SHA</Item> <Item>TLS_KRB5_WITH_RC4_128_SHA</Item> <Item>TLS_KRB5_WITH_RC4_128_MD5</Item> <Item>TLS_KRB5_WITH_3DES_EDE_CBC_SHA</Item> <Item>TLS_KRB5_WITH_3DES_EDE_CBC_MD5</Item> <Item>TLS_KRB5_WITH_DES_CBC_SHA</Item> <Item>TLS_KRB5_WITH_DES_CBC_MD5</Item> </Array> </Set> <Set name="excludeCipherSuites"> <Array type="java.lang.String"> <!-- Exclude anon or 40-bit ciphers. Configure as desired. --> <Item>SSL_DH_anon_WITH_RC4_128_MD5</Item> <Item>TLS_DH_anon_WITH_AES_128_CBC_SHA</Item> <Item>TLS_DH_anon_WITH_AES_256_CBC_SHA</Item> <Item>SSL_DH_anon_WITH_3DES_EDE_CBC_SHA</Item> <Item>SSL_DH_anon_WITH_DES_CBC_SHA</Item> <Item>TLS_ECDH_anon_WITH_RC4_128_SHA</Item> <Item>TLS_ECDH_anon_WITH_AES_128_CBC_SHA</Item> <Item>TLS_ECDH_anon_WITH_AES_256_CBC_SHA</Item> <Item>TLS_ECDH_anon_WITH_3DES_EDE_CBC_SHA</Item> <Item>SSL_DH_anon_EXPORT_WITH_RC4_40_MD5</Item> <Item>SSL_DH_anon_EXPORT_WITH_DES40_CBC_SHA</Item> <Item>TLS_ECDH_anon_WITH_NULL_SHA</Item> <Item>SSL_RSA_EXPORT_WITH_RC4_40_MD5</Item> <Item>SSL_RSA_EXPORT_WITH_DES40_CBC_SHA</Item> <Item>SSL_DHE_RSA_EXPORT_WITH_DES40_CBC_SHA</Item> <Item>SSL_DHE_DSS_EXPORT_WITH_DES40_CBC_SHA</Item> <Item>SSL_RSA_WITH_NULL_MD5</Item> <Item>SSL_RSA_WITH_NULL_SHA</Item> <Item>TLS_ECDH_ECDSA_WITH_NULL_SHA</Item> <Item>TLS_ECDH_RSA_WITH_NULL_SHA</Item> <Item>TLS_ECDHE_ECDSA_WITH_NULL_SHA</Item> <Item>TLS_ECDHE_RSA_WITH_NULL_SHA</Item> <Item>TLS_KRB5_EXPORT_WITH_RC4_40_SHA</Item> <Item>TLS_KRB5_EXPORT_WITH_RC4_40_MD5</Item> <Item>TLS_KRB5_EXPORT_WITH_DES_CBC_40_SHA</Item> <Item>TLS_KRB5_EXPORT_WITH_DES_CBC_40_MD5</Item> </Array> </Set> </New> </Arg> </Call>
        Hide
        Raman Gupta added a comment -

        Is there any more information I can provide to make this easier to isolate?

        Show
        Raman Gupta added a comment - Is there any more information I can provide to make this easier to isolate?
        Hide
        Greg Wilkins added a comment -

        Is it possible to try 7.1.5
        It has a change in the way SSL connectors work, so they are much more aggressive about sharing buffers. This should greatly reduce the memory foot print.

        Note the 7.1.5 has a few issues with idle timeouts, so I would not put it into production... but if it does resolve the OOM issue, then we will soon be doing 7.1.6

        Show
        Greg Wilkins added a comment - Is it possible to try 7.1.5 It has a change in the way SSL connectors work, so they are much more aggressive about sharing buffers. This should greatly reduce the memory foot print. Note the 7.1.5 has a few issues with idle timeouts, so I would not put it into production... but if it does resolve the OOM issue, then we will soon be doing 7.1.6
        Hide
        Raman Gupta added a comment -

        Hi Greg, I just tested this with Jetty 7.1.5 and have exactly the same problem. With SSL on, process resident memory grows and grows until about 2.7GB and then dies due to the 32-bit address space limitation. It took about 1 minute, 47 seconds to go from startup to failure (about the same as with 7.1.4).

        With SSL off, process resident memory stays around 650-700 MB.

        Show
        Raman Gupta added a comment - Hi Greg, I just tested this with Jetty 7.1.5 and have exactly the same problem. With SSL on, process resident memory grows and grows until about 2.7GB and then dies due to the 32-bit address space limitation. It took about 1 minute, 47 seconds to go from startup to failure (about the same as with 7.1.4). With SSL off, process resident memory stays around 650-700 MB.
        Hide
        Greg Wilkins added a comment -

        Simone,

        if you have a moment in your travels, can you look at this one.

        Show
        Greg Wilkins added a comment - Simone, if you have a moment in your travels, can you look at this one.
        Hide
        Raman Gupta added a comment -

        Here is a reproduction recipe using the CometD2 chat client (latest trunk version as of today).

        Ensure the server machine has at 3-4 GB free memory. Then:

        1) Modify BayeuxLoadServer to configure an SslSelectChannelConnector.

        2) Modify the pom.xml server profile to use the following arguments – with the bigger JVM size and larger young generation, a Full GC is done less often, which better shows the problem because direct buffers are only collected at Full GCs:

        <arguments>
        <argument>-XX:AdaptiveSizePolicyOutputInterval=1</argument>
        <argument>-XX:+PrintGCDateStamps</argument>
        <argument>-XX:+PrintGCTimeStamps</argument>
        <argument>-XX:+PrintGCDetails</argument>
        <argument>-XX:+UseParallelGC</argument>
        <argument>-XX:+UseParallelOldGC</argument>
        <argument>-XX:MaxGCPauseMillis=25</argument>
        <argument>-Xms1g</argument>
        <argument>-Xmx2g</argument>
        <argument>-XX:NewRatio=2</argument>
        <argument>-XX:+PrintCommandLineFlags</argument>
        <argument>-cp</argument>
        <classpath />
        <argument>org.cometd.client.BayeuxLoadServer</argument>
        </arguments>

        3) Run the BayeuxLoadServer.

        3) On a different machine, run the BayeuxLoadGenerator with the default settings, except for the following:

        protocol: https
        server: <server machine>
        clients: 2000
        batch count: 20000
        message size: 1000

        Setup top or other tool to view the RSS size of the server memory. Watch it climb to 2.5-2.7 GB within a couple of minutes, and then crash.

        SIDE NOTE: The direct memory error on the client-side (as I mentioned in an earlier comment) can be reproduced simply by choosing 3500 or more clients when running BayeuxLoadTestGenerator with SSL. The client will crap out with a direct memory error just while setting up the clients even before the test begins. Again, this may be caused by the same code that is problematic on the server side.

        Show
        Raman Gupta added a comment - Here is a reproduction recipe using the CometD2 chat client (latest trunk version as of today). Ensure the server machine has at 3-4 GB free memory. Then: 1) Modify BayeuxLoadServer to configure an SslSelectChannelConnector. 2) Modify the pom.xml server profile to use the following arguments – with the bigger JVM size and larger young generation, a Full GC is done less often, which better shows the problem because direct buffers are only collected at Full GCs: <arguments> <argument>-XX:AdaptiveSizePolicyOutputInterval=1</argument> <argument>-XX:+PrintGCDateStamps</argument> <argument>-XX:+PrintGCTimeStamps</argument> <argument>-XX:+PrintGCDetails</argument> <argument>-XX:+UseParallelGC</argument> <argument>-XX:+UseParallelOldGC</argument> <argument>-XX:MaxGCPauseMillis=25</argument> <argument>-Xms1g</argument> <argument>-Xmx2g</argument> <argument>-XX:NewRatio=2</argument> <argument>-XX:+PrintCommandLineFlags</argument> <argument>-cp</argument> <classpath /> <argument>org.cometd.client.BayeuxLoadServer</argument> </arguments> 3) Run the BayeuxLoadServer. 3) On a different machine, run the BayeuxLoadGenerator with the default settings, except for the following: protocol: https server: <server machine> clients: 2000 batch count: 20000 message size: 1000 Setup top or other tool to view the RSS size of the server memory. Watch it climb to 2.5-2.7 GB within a couple of minutes, and then crash. SIDE NOTE: The direct memory error on the client-side (as I mentioned in an earlier comment) can be reproduced simply by choosing 3500 or more clients when running BayeuxLoadTestGenerator with SSL. The client will crap out with a direct memory error just while setting up the clients even before the test begins. Again, this may be caused by the same code that is problematic on the server side.
        Hide
        Greg Wilkins added a comment -

        can you try setting useDirectBuffers to false on your ssl connector, like:

        <Call name="addConnector">
        <Arg>
        <New class="org.eclipse.jetty.server.ssl.SslSelectChannelConnector">
        <Set name="Port">8443</Set>
        <Set name="maxIdleTime">30000</Set>
        <Set name="Acceptors">2</Set>
        <Set name="AcceptQueueSize">100</Set>
        <Set name="Keystore"><Property name="jetty.home" default="." />/etc/keystore</Set>
        <Set name="Password">OBF:1vny1zlo1x8e1vnw1vn61x8g1zlu1vn4</Set>
        <Set name="KeyPassword">OBF:1u2u1wml1z7s1z7a1wnl1u2g</Set>
        <Set name="truststore"><Property name="jetty.home" default="." />/etc/keystore</Set>
        <Set name="trustPassword">OBF:1vny1zlo1x8e1vnw1vn61x8g1zlu1vn4</Set>
        <Set name="useDirectBuffers">false</Set>
        </New>
        </Arg>
        </Call>

        Show
        Greg Wilkins added a comment - can you try setting useDirectBuffers to false on your ssl connector, like: <Call name="addConnector"> <Arg> <New class="org.eclipse.jetty.server.ssl.SslSelectChannelConnector"> <Set name="Port">8443</Set> <Set name="maxIdleTime">30000</Set> <Set name="Acceptors">2</Set> <Set name="AcceptQueueSize">100</Set> <Set name="Keystore"><Property name="jetty.home" default="." />/etc/keystore</Set> <Set name="Password">OBF:1vny1zlo1x8e1vnw1vn61x8g1zlu1vn4</Set> <Set name="KeyPassword">OBF:1u2u1wml1z7s1z7a1wnl1u2g</Set> <Set name="truststore"><Property name="jetty.home" default="." />/etc/keystore</Set> <Set name="trustPassword">OBF:1vny1zlo1x8e1vnw1vn61x8g1zlu1vn4</Set> <Set name="useDirectBuffers">false</Set> </New> </Arg> </Call>
        Hide
        Greg Wilkins added a comment -

        In my own testing, I've been running cometd-2 over SSL for 7.1.5. With 100 clients the base memory usage is 13MB and I can always return to that after a GC. With SSL the memory usage while running is indeed greater (upto 60MB), but the heap can always be GC'd down to 13MB after a run. So there is no heap leak.

        But there could be a JVM direct buffer leak, but I see no evidence of growth of the process size either.

        There have been some JVM bugs with direct buffer usage, so my recommendation is to not use them for SSL (where they have little benefit as the SslEngine needs to write directly the the buffer... they are probably detrimental).

        Show
        Greg Wilkins added a comment - In my own testing, I've been running cometd-2 over SSL for 7.1.5. With 100 clients the base memory usage is 13MB and I can always return to that after a GC. With SSL the memory usage while running is indeed greater (upto 60MB), but the heap can always be GC'd down to 13MB after a run. So there is no heap leak. But there could be a JVM direct buffer leak, but I see no evidence of growth of the process size either. There have been some JVM bugs with direct buffer usage, so my recommendation is to not use them for SSL (where they have little benefit as the SslEngine needs to write directly the the buffer... they are probably detrimental).
        Hide
        Raman Gupta added a comment -

        Tried using server-side useDirectBuffers=false with the CometD2 chat load test. RSS seemed to stabilize around 1.1G, and everything seems OK other than the GC overhead being way higher than with direct buffers (the overhead was around 6.4%, which is unacceptable).

        FYI, other tests I just ran or comparison:

        1) useDirectBuffers=false, regular non-SSL SelectChannelConnector: RSS seemed to stabilize around 1.1G, but was still climbing slowly – probably because the java heap size was growing rather than direct buffers. Full GCs much less often than with SSL on. Only one noticed in a test of about 10 minutes. GC overhead around 4%.

        2) useDirectBuffers=true, regular non-SSL SelectChannelConnector: RSS stabilizes at less than 1G, again still climbing slowly with the java heap size. GC overhead around 2%.

        So the only combination that seems to cause the out-of-control memory growth is SSL on, directBuffers=true.

        Show
        Raman Gupta added a comment - Tried using server-side useDirectBuffers=false with the CometD2 chat load test. RSS seemed to stabilize around 1.1G, and everything seems OK other than the GC overhead being way higher than with direct buffers (the overhead was around 6.4%, which is unacceptable). FYI, other tests I just ran or comparison: 1) useDirectBuffers=false, regular non-SSL SelectChannelConnector: RSS seemed to stabilize around 1.1G, but was still climbing slowly – probably because the java heap size was growing rather than direct buffers. Full GCs much less often than with SSL on. Only one noticed in a test of about 10 minutes. GC overhead around 4%. 2) useDirectBuffers=true, regular non-SSL SelectChannelConnector: RSS stabilizes at less than 1G, again still climbing slowly with the java heap size. GC overhead around 2%. So the only combination that seems to cause the out-of-control memory growth is SSL on, directBuffers=true.
        Hide
        Raman Gupta added a comment -

        In the test you did with 100 clients, what message size did you use? I noticed that the direct buffer leak with SSL turned on is only noticeable when the message size is much larger than the default, say 1000 bytes or greater. Also, I am testing with many more clients than 100, say 2000-5000 – 100 clients probably isn't driving enough load to really notice an increase in the process size unless you run the test for a while.

        I did compare SSL with and without direct buffers using the Cometd2 chat demo on 7.1.5 with a smaller test (1000 clients, 5000 batches) so that it would complete successfully with direct buffers on. The results show that for smaller loads, SSL without direct buffers is indeed slighly faster.

        However, for the longer test with more clients and larger messages, I'm concerned about the GC overhead when direct buffers are enabled, which seemed to be very high (but of course I can't compare with directBuffers=true because that test always craps out, unless the message size is small).

        Tomorrow, I'll go back to testing my own application with useDirectBuffers=false and see what kind of results I get on the larger message sizes. With the smaller message size the memory problem isn't noticeable – with SSL and direct buffers enabled, even with 2500-3500 clients everything works perfectly fine.

        Show
        Raman Gupta added a comment - In the test you did with 100 clients, what message size did you use? I noticed that the direct buffer leak with SSL turned on is only noticeable when the message size is much larger than the default, say 1000 bytes or greater. Also, I am testing with many more clients than 100, say 2000-5000 – 100 clients probably isn't driving enough load to really notice an increase in the process size unless you run the test for a while. I did compare SSL with and without direct buffers using the Cometd2 chat demo on 7.1.5 with a smaller test (1000 clients, 5000 batches) so that it would complete successfully with direct buffers on. The results show that for smaller loads, SSL without direct buffers is indeed slighly faster. However, for the longer test with more clients and larger messages, I'm concerned about the GC overhead when direct buffers are enabled, which seemed to be very high (but of course I can't compare with directBuffers=true because that test always craps out, unless the message size is small). Tomorrow, I'll go back to testing my own application with useDirectBuffers=false and see what kind of results I get on the larger message sizes. With the smaller message size the memory problem isn't noticeable – with SSL and direct buffers enabled, even with 2500-3500 clients everything works perfectly fine.
        Hide
        Simone Bordet added a comment -

        I think that this bug in involved here: http://bugs.sun.com/view_bug.do?bug_id=6210541
        I will do more testing to confirm, for now I just wanted to have a reference to that bug to avoid forgetting it.

        Show
        Simone Bordet added a comment - I think that this bug in involved here: http://bugs.sun.com/view_bug.do?bug_id=6210541 I will do more testing to confirm, for now I just wanted to have a reference to that bug to avoid forgetting it.
        Hide
        Raman Gupta added a comment -

        Simone, I think that makes sense. The behavior I am seeing matches – i.e. if the JVM is sized with a smaller ms/mx, memory pressure increases and Full GC's are run more often. When a Full GC is run the direct buffer stubs in heap memory that are garbage are collected, and therefore the underlying native memory is released as well.

        Note that I normally run my application with ms/mx of 1g/2g. With this scenario Full GC's happen in the app about every 20-25 minutes. Even with the SSL connector configured with useDirectBuffers=false, I still get an out of memory error after a few minutes with this configuration.

         
        [2010-08-05 23:59:22,895][qtp798740-295][WARN ][org.eclipse.jetty.util.log] handle failed
        java.lang.OutOfMemoryError: null
                at sun.misc.Unsafe.allocateMemory(Native Method) ~[na:1.6.0_20]
                at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:99) ~[na:1.6.0_20]
                at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) ~[na:1.6.0_20]
                at org.eclipse.jetty.io.nio.DirectNIOBuffer.<init>(DirectNIOBuffer.java:46) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705]
                at org.eclipse.jetty.server.ssl.SslSelectChannelConnector$1.newBuffer(SslSelectChannelConnector.java:621) ~[jetty-server-7.1.5.v20100705.jar:7.1.5.v20100705]
                at org.eclipse.jetty.io.ThreadLocalBuffers.getBuffer(ThreadLocalBuffers.java:95) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705]
                at org.eclipse.jetty.io.nio.SslSelectChannelEndPoint.needInBuffer(SslSelectChannelEndPoint.java:94) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705]
                at org.eclipse.jetty.io.nio.SslSelectChannelEndPoint.unwrap(SslSelectChannelEndPoint.java:624) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705]
                at org.eclipse.jetty.io.nio.SslSelectChannelEndPoint.fill(SslSelectChannelEndPoint.java:298) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705]
                at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:289) ~[jetty-http-7.1.5.v20100705.jar:7.1.5.v20100705]
                at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) ~[jetty-http-7.1.5.v20100705.jar:7.1.5.v20100705]
                at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:424) ~[jetty-server-7.1.5.v20100705.jar:7.1.5.v20100705]
                at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705]
                at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436) [jetty-util-7.1.5.v20100705.jar:7.1.5.v20100705]
                at java.lang.Thread.run(Thread.java:619) [na:1.6.0_20]
        

        It seems that SslSelectChannelConnector is still allocating direct buffers at line 621 even though useDirectBuffers=false.

        The work-around, as stated previously, is to size the JVM smaller so that Full GC's happen more often. This works perfectly well but is sub-optimal.

        Show
        Raman Gupta added a comment - Simone, I think that makes sense. The behavior I am seeing matches – i.e. if the JVM is sized with a smaller ms/mx, memory pressure increases and Full GC's are run more often. When a Full GC is run the direct buffer stubs in heap memory that are garbage are collected, and therefore the underlying native memory is released as well. Note that I normally run my application with ms/mx of 1g/2g. With this scenario Full GC's happen in the app about every 20-25 minutes. Even with the SSL connector configured with useDirectBuffers=false, I still get an out of memory error after a few minutes with this configuration. [2010-08-05 23:59:22,895][qtp798740-295][WARN ][org.eclipse.jetty.util.log] handle failed java.lang.OutOfMemoryError: null at sun.misc.Unsafe.allocateMemory(Native Method) ~[na:1.6.0_20] at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:99) ~[na:1.6.0_20] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) ~[na:1.6.0_20] at org.eclipse.jetty.io.nio.DirectNIOBuffer.<init>(DirectNIOBuffer.java:46) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705] at org.eclipse.jetty.server.ssl.SslSelectChannelConnector$1.newBuffer(SslSelectChannelConnector.java:621) ~[jetty-server-7.1.5.v20100705.jar:7.1.5.v20100705] at org.eclipse.jetty.io.ThreadLocalBuffers.getBuffer(ThreadLocalBuffers.java:95) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705] at org.eclipse.jetty.io.nio.SslSelectChannelEndPoint.needInBuffer(SslSelectChannelEndPoint.java:94) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705] at org.eclipse.jetty.io.nio.SslSelectChannelEndPoint.unwrap(SslSelectChannelEndPoint.java:624) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705] at org.eclipse.jetty.io.nio.SslSelectChannelEndPoint.fill(SslSelectChannelEndPoint.java:298) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705] at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:289) ~[jetty-http-7.1.5.v20100705.jar:7.1.5.v20100705] at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) ~[jetty-http-7.1.5.v20100705.jar:7.1.5.v20100705] at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:424) ~[jetty-server-7.1.5.v20100705.jar:7.1.5.v20100705] at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) ~[jetty-io-7.1.5.v20100705.jar:7.1.5.v20100705] at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436) [jetty-util-7.1.5.v20100705.jar:7.1.5.v20100705] at java.lang.Thread.run(Thread.java:619) [na:1.6.0_20] It seems that SslSelectChannelConnector is still allocating direct buffers at line 621 even though useDirectBuffers=false. The work-around, as stated previously, is to size the JVM smaller so that Full GC's happen more often. This works perfectly well but is sub-optimal.
        Hide
        David Phillips added a comment -

        SslSelectChannelConnector always allocates DIRECT bytebuffers on a per thread basis despite setting
        <Set name="useDirectBuffers">false</Set>
        in the config file.

        ThreadLocalBuffers buffers = new ThreadLocalBuffers(){
        @Override
        protected Buffer newBuffer(int size)

        { // TODO indirect? return new DirectNIOBuffer(size); }
        @Override
        protected Buffer newHeader(int size) { // TODO indirect? return new DirectNIOBuffer(size); }

        @Override
        protected boolean isHeader(Buffer buffer)

        { return true; }

        };

        Show
        David Phillips added a comment - SslSelectChannelConnector always allocates DIRECT bytebuffers on a per thread basis despite setting <Set name="useDirectBuffers">false</Set> in the config file. ThreadLocalBuffers buffers = new ThreadLocalBuffers(){ @Override protected Buffer newBuffer(int size) { // TODO indirect? return new DirectNIOBuffer(size); } @Override protected Buffer newHeader(int size) { // TODO indirect? return new DirectNIOBuffer(size); } @Override protected boolean isHeader(Buffer buffer) { return true; } };
        Hide
        Greg Wilkins added a comment -

        I've fixed those TODO's for jetty 7.2

        Show
        Greg Wilkins added a comment - I've fixed those TODO's for jetty 7.2
        Hide
        Greg Wilkins added a comment -

        Simone did an analysis that indicates part of this problem comes back to the poor hit/miss ratios of the ThreadLocalBuffers implementation, resulting in too many direct buffers being discarded for no purpose.

        At his prompting, I've completely reviewed and refactored this code:

        + There is now a static BuffersFactory class that will allow different Buffers implementations to be used.
        + The ThreadLocalBuffers and HttpBuffers classes are now not abstract and instead of using overridden methods to pick buffer types, a buffer types are now constructor injected into the Buffers instance.
        + There is a new PooledBuffers class that uses concurrent queues for headers,buffers,others and that has a maxSize applied to all. The buffers that are arbitrary sizes are stored in a single queue, and buffers are consumed until one of the right size is found. This makes sure that no strange sized buffers are put in the pool and never taken out.

        check in coming soon.

        Show
        Greg Wilkins added a comment - Simone did an analysis that indicates part of this problem comes back to the poor hit/miss ratios of the ThreadLocalBuffers implementation, resulting in too many direct buffers being discarded for no purpose. At his prompting, I've completely reviewed and refactored this code: + There is now a static BuffersFactory class that will allow different Buffers implementations to be used. + The ThreadLocalBuffers and HttpBuffers classes are now not abstract and instead of using overridden methods to pick buffer types, a buffer types are now constructor injected into the Buffers instance. + There is a new PooledBuffers class that uses concurrent queues for headers,buffers,others and that has a maxSize applied to all. The buffers that are arbitrary sizes are stored in a single queue, and buffers are consumed until one of the right size is found. This makes sure that no strange sized buffers are put in the pool and never taken out. check in coming soon.
        Hide
        Greg Wilkins added a comment -

        committed r2885

        probably needs more exposure of max pool size so that it can be configured.
        If the pool size is set to -1, then the ThreadLocalBuffers impl is used.

        Show
        Greg Wilkins added a comment - committed r2885 probably needs more exposure of max pool size so that it can be configured. If the pool size is set to -1, then the ThreadLocalBuffers impl is used.
        Hide
        Simone Bordet added a comment -

        I tested the latest code from Greg, and direct memory usage is now stable and low, and the new Buffers implementation effectively pools and reuses buffers efficiently.

        Show
        Simone Bordet added a comment - I tested the latest code from Greg, and direct memory usage is now stable and low, and the new Buffers implementation effectively pools and reuses buffers efficiently.

          People

          • Assignee:
            Simone Bordet
            Reporter:
            Raman Gupta
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: