Jetty
  1. Jetty
  2. JETTY-999

Number of file handles grows with time

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Not A Bug
    • Affects Version/s: 6.1.16
    • Fix Version/s: None
    • Component/s: HTTP, NIO
    • Labels:
      None
    • Environment:
      Linux xxx 2.6.16.54-0.2.3-smp #1 SMP Thu Nov 22 18:32:07 UTC 2007 x86_64 x86_64 x86_64 GNU/Linux
    • Number of attachments :
      1

      Description

      Dear Jetty developers,
      We found issue in our project which appear to be jetty related. Can you help to understand what we can do regarding it and is it possible to solve it?

      In short we have a simple Axis webservice hosting in Jetty (6.1.5). Our customers use C# to connect and retrieve data from our service. While the client retrieves data from our service, a lot of requests, sometimes, absolutely unpredictably Jetty throws an exception (it is visible only if debug set up for Jetty).

      2009-04-03 14:51:20.254::WARN: /services/MyService
      java.nio.channels.CancelledKeyException
      at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
      at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
      at org.mortbay.io.nio.SelectChannelEndPoint.updateKey(SelectChannelEndPoint.java:311)
      at org.mortbay.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:265)
      at org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:540)
      at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:567)
      at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:907)
      at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:628)
      at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:575)
      at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:202)
      at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:263)
      at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:106)
      at java.io.OutputStreamWriter.write(OutputStreamWriter.java:190)
      at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
      at java.io.BufferedWriter.write(BufferedWriter.java:212)
      at java.io.Writer.write(Writer.java:140)
      at org.apache.axis.SOAPPart.writeTo(SOAPPart.java:277)
      at org.apache.axis.Message.writeTo(Message.java:440)
      at org.apache.axis.transport.http.AxisServlet.sendResponse(AxisServlet.java:1018)
      at org.apache.axis.transport.http.AxisServlet.doPost(AxisServlet.java:895)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:767)
      at org.apache.axis.transport.http.AxisServletBase.service(AxisServletBase.java:339)
      at javax.servlet.http.HttpServlet.service(HttpServlet.java:860)
      at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
      at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:362)
      at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
      at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
      at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
      at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
      at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
      at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
      at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
      at org.mortbay.jetty.handler.StatisticsHandler.handle(StatisticsHandler.java:89)
      at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
      at org.mortbay.jetty.Server.handle(Server.java:309)
      at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:506)
      at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:844)
      at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:644)
      at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:205)
      at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:381)
      at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:396)
      at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
      2009-04-03 14:51:20.255::DEBUG: Response already committed for handling java.nio.channels.CancelledKeyException

      After this exception happens, client recieves incomplete responce, XML we sent from server arrives interrupted with EOF. From network traffic point of view jetty sends responce with two flags set up PSH and ACK (usually it sends only ACKs).

      After this problem was found we tried latest version of jetty (6.1.16) and happily found that the bug is not reproduced with this Jetty version.

      Though after some time we noticed that file handles are increasing slowly (after one day by usual load we saw about 2K file handles, while 6.1.5 version keeps on 300 file handles). So now we are little bit puzzled which Jetty version to use. I tried version 6.1.11 and the bug with interrupted request is still reproduced, while handles are not growing (behaves same way as 6.1.5).

      There are several weirdies about the handles growing:
      We had the handles leak before and java was crashing when the number reached some critical value (~ 1K or 2K). For sure now we have more and system continue working with no problems.
      The handles themselves are shown in lsof command somewhat strange

      lsof -p <process-pid>

      COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
      //some ordinary files
      java 10426 zrinst mem REG 253,3 111970 34599 /data/zephyr/jdk/jdk1.6.0_11/jre/lib/amd64/libnet.so
      ...
      //some IPv files/sockets
      java 10426 zrinst 216u IPv6 2730598894 TCP *:opsession-srvr (LISTEN)
      ...
      //and the ones which are growing
      java 10426 zrinst 765u unix 0xffff81041a651a40 2730601387 socket
      java 10426 zrinst 766u unix 0xffff81041a651a40 2730601387 socket
      java 10426 zrinst 767u unix 0xffff81041a651a40 2730601387 socket
      java 10426 zrinst 768u unix 0xffff81041a651a40 2730601387 socket
      java 10426 zrinst 769u unix 0xffff81041a651a40 2730601387 socket
      java 10426 zrinst 770u unix 0xffff81041a651a40 2730601387 socket

      Is it possible to understand what is wrong and propose Jetty version to use
      I can provide more information regarding the issue if you need

        Activity

        Hide
        Greg Wilkins added a comment -

        Jetty 6.1.16 is definitely the version to use.
        I have not seen any evidence of FD usage growth with 6.1.16, at least not to the level of being a problem.

        what does netstat report? Are there connections to match those FDs?

        Show
        Greg Wilkins added a comment - Jetty 6.1.16 is definitely the version to use. I have not seen any evidence of FD usage growth with 6.1.16, at least not to the level of being a problem. what does netstat report? Are there connections to match those FDs?
        Hide
        Greg Wilkins added a comment -

        The large number of TIME_WAIT connections indicates that this is probably due to a large number of connections
        being created and then closed.

        So firstly, you should try to use persistent connections. Investigate why you connections are being closed!

        Do you have a setSoLingerTime on your jetty connector? by default, jetty will set sockets not to linger (in TIME_WAIT).
        Try setting that to -1 or at least to a very short time.

        You can also configure linux to have a short TIME_WAIT period.

        I have marked this issue resolved for now. Please re-open if the above steps do not help the problem

        Show
        Greg Wilkins added a comment - The large number of TIME_WAIT connections indicates that this is probably due to a large number of connections being created and then closed. So firstly, you should try to use persistent connections. Investigate why you connections are being closed! Do you have a setSoLingerTime on your jetty connector? by default, jetty will set sockets not to linger (in TIME_WAIT). Try setting that to -1 or at least to a very short time. You can also configure linux to have a short TIME_WAIT period. I have marked this issue resolved for now. Please re-open if the above steps do not help the problem
        Hide
        Daryl Beattie added a comment -

        Note that you will have to set the soLingerTime to 0, as setting it to -1 doesn't work (at least it didn't for us, with a linux 2.6 kernel).

        Show
        Daryl Beattie added a comment - Note that you will have to set the soLingerTime to 0, as setting it to -1 doesn't work (at least it didn't for us, with a linux 2.6 kernel).
        Hide
        Daryl Beattie added a comment -

        Also, you may see this exception caused by the same issue (I'm putting the stack-trace in here so that people will find it when searching for it):

        Caused by: java.net.SocketException: Too many open files
        at java.net.Socket.createImpl(Socket.java:389)
        at java.net.Socket.connect(Socket.java:518)
        at java.net.Socket.connect(Socket.java:470)
        at sun.net.NetworkClient.doConnect(NetworkClient.java:157)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:388)
        at sun.net.www.http.HttpClient.openServer(HttpClient.java:523)
        at sun.net.www.http.HttpClient.<init>(HttpClient.java:231)
        at sun.net.www.http.HttpClient.New(HttpClient.java:304)
        at sun.net.www.http.HttpClient.New(HttpClient.java:321)
        at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:830)
        at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:782)
        at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:707)
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:951)
        at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:367)

        Show
        Daryl Beattie added a comment - Also, you may see this exception caused by the same issue (I'm putting the stack-trace in here so that people will find it when searching for it): Caused by: java.net.SocketException: Too many open files at java.net.Socket.createImpl(Socket.java:389) at java.net.Socket.connect(Socket.java:518) at java.net.Socket.connect(Socket.java:470) at sun.net.NetworkClient.doConnect(NetworkClient.java:157) at sun.net.www.http.HttpClient.openServer(HttpClient.java:388) at sun.net.www.http.HttpClient.openServer(HttpClient.java:523) at sun.net.www.http.HttpClient.<init>(HttpClient.java:231) at sun.net.www.http.HttpClient.New(HttpClient.java:304) at sun.net.www.http.HttpClient.New(HttpClient.java:321) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:830) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:782) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:707) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:951) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:367)
        Hide
        Greg Wilkins added a comment -

        Daryl,

        are you saying that you think there is a jetty bug?
        Do you believe that you have more sockets in TIME_WAIT than you should?

        Can you gather some statistics from your server about number and duration of connections, then relate that to the number of file descriptors in use and in TIME_WAIT.

        Show
        Greg Wilkins added a comment - Daryl, are you saying that you think there is a jetty bug? Do you believe that you have more sockets in TIME_WAIT than you should? Can you gather some statistics from your server about number and duration of connections, then relate that to the number of file descriptors in use and in TIME_WAIT.
        Hide
        Daryl Beattie added a comment -

        Hi Greg,

        Actually I think this is a problem in the default configuration of the linux machines or in the default configuration of NIO. In most linux distros (all that I have seen), the default file limit is set to be low (usually 100). Also, the socket linger time is probably defaulted to something large like 30s (the RFC says it can be up to 240s; http://tools.ietf.org/html/rfc793). Due to these two common configuration defaults in linux, when NIO is using sockets the OS may hold them in TIME-WAIT (during which time they are counted as "open files") long after Jetty has closed the connection. This means that it is very easy for a server to get the "too many open files" socket exception.

        So I do not consider it a "Jetty bug", no. If Jetty is trying to set the socket linger time to 0 through NIO, then it is indeed failing in doing so, and that would be a bug (in either Jetty or NIO). For us, the only solution was to add this to our SelectChannelConnector in jboss-service.xml (we're using Jetty in JBoss): <Set name="soLingerTime">0</Set>

        Yes, we definitely have more sockets in TIME_WAIT than we should (or than we'd like). Let's say we have a limit of 10,000 open files in linux, and we have lots of short-lived connections. If we have 3,000 simultaneous short-lived open connections at any time, we may see "too many open files" exceptions in our logs. When we check "lsof", it'll happily report 1,000 or so other open files (jars, static resources, etc.), which clearly are less than 10,000. But if we use netstat, we'll find that we have 3,000 open sockets and perhaps 6,000 in the TIME_WAIT state; meaning the 1,000 files, 3,000 sockets and 6,000 TIME_WAIT sockets are at our limit.

        About the statistics: I no longer have our exact statistics. I can tell you that we had 1/3 the number of our file limit in open HTTP connections, and the rest were in TIME_WAIT. Thus, if a server had 500 open connections, then it would throw these socket errors if our ulimit was set to 1500. (It also depended upon the number of classes that were loaded by the classloader, obviously.) Setting the socket linger time to 0 (in Jetty's config) solved this problem absolutely.

        To make Jetty more usable by the general public, I suggest adding the socket linger time of 0 to be a commented-out configuration option in the default jetty configuration, and put a comment next to it saying that users should probably use this if they are using linux and experience "too many open files" socket exceptions. The same commented-out configuration section could be added to the default jboss-service.xml file in contrib\jboss\src\main\resources\META-INF. Although this may not be a Jetty bug, such a configuration comment may greatly help the users of Jetty, who don't care where the bug is coming from all they think is "Jetty isn't working". Clearly, the real solution to this problem is to configure the OS to have a higher open file limit and not hold onto TIME_WAIT connections for so long.

        Show
        Daryl Beattie added a comment - Hi Greg, Actually I think this is a problem in the default configuration of the linux machines or in the default configuration of NIO. In most linux distros (all that I have seen), the default file limit is set to be low (usually 100). Also, the socket linger time is probably defaulted to something large like 30s (the RFC says it can be up to 240s; http://tools.ietf.org/html/rfc793 ). Due to these two common configuration defaults in linux, when NIO is using sockets the OS may hold them in TIME-WAIT (during which time they are counted as "open files") long after Jetty has closed the connection. This means that it is very easy for a server to get the "too many open files" socket exception. So I do not consider it a "Jetty bug", no. If Jetty is trying to set the socket linger time to 0 through NIO, then it is indeed failing in doing so, and that would be a bug (in either Jetty or NIO). For us, the only solution was to add this to our SelectChannelConnector in jboss-service.xml (we're using Jetty in JBoss): <Set name="soLingerTime">0</Set> Yes, we definitely have more sockets in TIME_WAIT than we should (or than we'd like). Let's say we have a limit of 10,000 open files in linux, and we have lots of short-lived connections. If we have 3,000 simultaneous short-lived open connections at any time, we may see "too many open files" exceptions in our logs. When we check "lsof", it'll happily report 1,000 or so other open files (jars, static resources, etc.), which clearly are less than 10,000. But if we use netstat, we'll find that we have 3,000 open sockets and perhaps 6,000 in the TIME_WAIT state; meaning the 1,000 files, 3,000 sockets and 6,000 TIME_WAIT sockets are at our limit. About the statistics: I no longer have our exact statistics. I can tell you that we had 1/3 the number of our file limit in open HTTP connections, and the rest were in TIME_WAIT. Thus, if a server had 500 open connections, then it would throw these socket errors if our ulimit was set to 1500. (It also depended upon the number of classes that were loaded by the classloader, obviously.) Setting the socket linger time to 0 (in Jetty's config) solved this problem absolutely. To make Jetty more usable by the general public, I suggest adding the socket linger time of 0 to be a commented-out configuration option in the default jetty configuration, and put a comment next to it saying that users should probably use this if they are using linux and experience "too many open files" socket exceptions. The same commented-out configuration section could be added to the default jboss-service.xml file in contrib\jboss\src\main\resources\META-INF. Although this may not be a Jetty bug, such a configuration comment may greatly help the users of Jetty, who don't care where the bug is coming from all they think is "Jetty isn't working". Clearly, the real solution to this problem is to configure the OS to have a higher open file limit and not hold onto TIME_WAIT connections for so long.

          People

          • Assignee:
            Greg Wilkins
            Reporter:
            Kirill Lastovirya
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: