Jetty
  1. Jetty
  2. JETTY-1270

100% of CPU used in WebSocketConnection.handle()

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: 7.1.6
    • Fix Version/s: None
    • Component/s: Servlet
    • Labels:
      None
    • Environment:
      Sun Solaris 11. Jetty 7.1.6.
    • Number of attachments :
      0

      Description

      I'm using WebSockets in production. Over time threads will start to hang in the following place:

      WebSocketConnection.handle() line: 168
      SelectChannelEndPoint.run() line: 506
      QueuedThreadPool$2.run() line: 436
      Thread.run() line: 619

      A new thread hangs about once a day and after I few days I'm forced to restart my server.
      Please let me know if there's any other info I can provide to help diagnose this bug!

        Issue Links

          Activity

          Hide
          Andrew Lee added a comment -

          Greg - We had to take websockets out of our production code. Our server just couldn't stay up. I unfortunately have only so far been able to reproduce this issue in production.

          My usage looks like:
          -Sending messages of varying size (a few bytes all the way up to maybe 50k)
          -Receiving small messages < 1k
          -Sending "lots" of messages I guess. A busy user would need to be sent several per second I think.

          If I can reproduce this in testing I'll try what you describe about disconnecting the server from the network (it's a virtualized server though... so I'll have to figure out how to do that). If I'm able to do this I'll report back, though it may be a while.

          Show
          Andrew Lee added a comment - Greg - We had to take websockets out of our production code. Our server just couldn't stay up. I unfortunately have only so far been able to reproduce this issue in production. My usage looks like: -Sending messages of varying size (a few bytes all the way up to maybe 50k) -Receiving small messages < 1k -Sending "lots" of messages I guess. A busy user would need to be sent several per second I think. If I can reproduce this in testing I'll try what you describe about disconnecting the server from the network (it's a virtualized server though... so I'll have to figure out how to do that). If I'm able to do this I'll report back, though it may be a while.
          Hide
          Andre Ben Hamou added a comment - - edited

          I wonder if this relates to a bizarre bug I've been seeing...

          1. WS onConnect
          2. WS client sends a ~50 byte message
          3. WS client sends a ~500 bytes message
          4. WS client sends a ~50 byte message
          5. WS client sends a ~100 byte message

          If the client is 'fast' - i.e. a snippet of Javascript on localhost, the whole socket will block around step 4/5 and only flush when the client disconnects (at which point the remaining messages are suddenly handed to my WebSocket subclass).

          Of interest is that the point at which the blocking occurs always seems to have a different thread ID to the previous step. So what if both my problem and the one that Andrew reported are to do with a non-synchronizing race hazard in something like WebSocketParser.parseNext(). Does, for example, the _state manipulation assume wrongly that a given thread will run to completion upon the receipt of some new data before the next thread's contribution is read in?

          Additional note: by making my test app respond with a 50ish byte message after every one that's received, the blockage no longer occurs - perhaps because the synchronized blocks on the generator are creating a mutex-by-proxy effect?

          Show
          Andre Ben Hamou added a comment - - edited I wonder if this relates to a bizarre bug I've been seeing... 1. WS onConnect 2. WS client sends a ~50 byte message 3. WS client sends a ~500 bytes message 4. WS client sends a ~50 byte message 5. WS client sends a ~100 byte message If the client is 'fast' - i.e. a snippet of Javascript on localhost, the whole socket will block around step 4/5 and only flush when the client disconnects (at which point the remaining messages are suddenly handed to my WebSocket subclass). Of interest is that the point at which the blocking occurs always seems to have a different thread ID to the previous step. So what if both my problem and the one that Andrew reported are to do with a non-synchronizing race hazard in something like WebSocketParser.parseNext(). Does, for example, the _state manipulation assume wrongly that a given thread will run to completion upon the receipt of some new data before the next thread's contribution is read in? Additional note: by making my test app respond with a 50ish byte message after every one that's received, the blockage no longer occurs - perhaps because the synchronized blocks on the generator are creating a mutex-by-proxy effect?
          Hide
          Andre Ben Hamou added a comment -

          Hiya - I wondered if there is a game-plan for attacking this bug? It's bitten us again in production (just came back from the weekend to discover most of our servers at max CPU with 0-2 active connections apiece). I could try and have a go myself but I'm concerned that, given the complexity of the state machine and the need to understand the Jetty acceptor / handling lifecycle that I will be far less effective than the code's original author.

          Show
          Andre Ben Hamou added a comment - Hiya - I wondered if there is a game-plan for attacking this bug? It's bitten us again in production (just came back from the weekend to discover most of our servers at max CPU with 0-2 active connections apiece). I could try and have a go myself but I'm concerned that, given the complexity of the state machine and the need to understand the Jetty acceptor / handling lifecycle that I will be far less effective than the code's original author.
          Hide
          Greg Wilkins added a comment -

          Well we need a tame version of the bug that we can conduct scientific experiments on while in captivity.
          Failing that, we need some thread dumps - about 1s apart - of suffering server.

          Note that we have fixed a couple of issues in jetty's selector that may improve the websockets experience (we also have implemented draft 6 websockets).

          So it may be worthwhile testing 7.3.1 (to be released in a few days), and try to replicate on that.

          Show
          Greg Wilkins added a comment - Well we need a tame version of the bug that we can conduct scientific experiments on while in captivity. Failing that, we need some thread dumps - about 1s apart - of suffering server. Note that we have fixed a couple of issues in jetty's selector that may improve the websockets experience (we also have implemented draft 6 websockets). So it may be worthwhile testing 7.3.1 (to be released in a few days), and try to replicate on that.
          Hide
          Greg Wilkins added a comment -

          websockets has been substantially reworked in 7.4.x, so I'm closing this one.

          Please open a new issue if you can reproduce on 7.4.1 or later.

          Show
          Greg Wilkins added a comment - websockets has been substantially reworked in 7.4.x, so I'm closing this one. Please open a new issue if you can reproduce on 7.4.1 or later.

            People

            • Assignee:
              Greg Wilkins
              Reporter:
              Andrew Lee
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: