Jetty
  1. Jetty
  2. JETTY-920

Deadlock while disconnecting Client

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 6.1.14
    • Fix Version/s: 6.1.15.rc5
    • Component/s: Bayeux
    • Labels:
      None
    • Environment:
    • Number of attachments :
      1

      Description

      Replaced generated identifiers from attached deadlock.log with human-readable identifiers:
      "1255795388@qtp0-3431" => thread0
      "987957938@qtp0-3446" => thread1

      <0x00007fbff86e77d0> (a org.mortbay.cometd.ChannelImpl) => channel0
      <0x00007fbffef26790> (a org.mortbay.cometd.ChannelImpl) => channel1
      <0x00007fbffef22af8> (a org.mortbay.cometd.continuation.ContinuationClient) => client0

      Found one Java-level deadlock:

       thread0:
        waiting to lock monitor channel1 (a org.mortbay.cometd.ChannelImpl),
        which is held by thread1
      thread1:
        waiting to lock monitor channel0 (a org.mortbay.cometd.ChannelImpl),
        which is held by thread0

      Java stack information for the threads listed above:

      thread0 (server-side disconnect after ajax-clicking signout link):
      	at org.mortbay.cometd.ChannelImpl.doRemove(ChannelImpl.java:193)
      	waiting to lock <channel1>
      	Locks in acquired order: <channel0>
      
      thread1 (client-side disconnect):
      	at org.mortbay.cometd.ChannelImpl.doRemove(ChannelImpl.java:191)
      	waiting to lock <channel0>
      	Locks in acquired order: <client1> <client1> <channel1>

      Analysis:
      channel0 has channelId "/notifications"
      channel1 has channelId "/notifications/foo"

      client-side disconnect causes call to ClientImpl.remove(true) which than removes the now empty channel.

      server-side disconnect directly removes the channel (always only a single connected client)

      I simplified the stack into a Java-like representation:

      // client-side
      client.remove(false) {
      	_channel.unsubscribe(client) {
      		synchronized(_channel) { // "/notifications/foo"
      			_abstractBayeux.removeChannel(_channel) {
      				_rootChannel.doRemove(_channel) {
      					for (ChannelImpl child : _children.values()) {
      						child.doRemove(_channel) {
      							channelId = channel.getChannelId();
      							// deadlock
      							synchronized(child._children.get(channelId.getSegment(channelId.depth()-1))) { // "/notifications"
      								...
      							}
      						}
      					}
      				}
      			}
      		}
      	}
      }
      
      // server-side
      channelId = "/notifications/foo"
      abstractBayeux.removeChannel(channelId) {
      	channel = getChannel(channelId);
      	channel.remove() {
      		_rootChannel.doRemove(channel) {
      			channelId = channel.getChannelId();
      			synchronized (rootChannel) { // "/notification"
      				// deadlock
      				synchronized (_children.get(channelId.getSegment(channelId.depth()-1))) { // "/notifications/foo"
      
      				}
      			}
      		}
      	}
      }

      Now it's - kind of - easy to see that the channels are locked in different order. I was also able to verify this deadlock in the debugger. It's easiest to put a breakpoint into ChannelImpl.doRemove(channel) and to wait until both disconnect threads - client-side and server-side - are suspended. Then, resume both threads, server-side disconnect first. This will cause the deadlock that will also block any following thread that is trying to lock the "/notifications" channel - which was basically every user in our app.

      As a quick workaround, I'd suggest to remove all clients from a channel instead of the channel itself. This should avoid any deadlocks. In order to get consistent locking (and to save others from spending some hours debugging), I'd suggest to add a synchronized block in AbstractBayeux.removeChannel(String), i.e. to lock the channel that's going to be removed. Maybe somebody who knows the big picture can confirm this?

      1. deadlock.log
        3 kB
        Stefan Fussenegger

        Issue Links

          Activity

          Hide
          Greg Wilkins added a comment -

          snap!

          Show
          Greg Wilkins added a comment - snap!
          Hide
          Greg Wilkins added a comment -

          the remove client is now done outside of the sync block holding the client lock

          Show
          Greg Wilkins added a comment - the remove client is now done outside of the sync block holding the client lock

            People

            • Assignee:
              Unassigned
              Reporter:
              Stefan Fussenegger
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: