Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: 6.1.14
-
Fix Version/s: 6.1.15.rc5
-
Component/s: Bayeux
-
Labels:None
-
Environment:Hidejetty-6.1.14
cometd-jetty-6.1.14
using jetty embedded
$ java -version
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
Showjetty-6.1.14 cometd-jetty-6.1.14 using jetty embedded $ java -version java version "1.6.0_07" Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
-
Number of attachments :
Description
Replaced generated identifiers from attached deadlock.log with human-readable identifiers:
"1255795388@qtp0-3431" => thread0
"987957938@qtp0-3446" => thread1
<0x00007fbff86e77d0> (a org.mortbay.cometd.ChannelImpl) => channel0
<0x00007fbffef26790> (a org.mortbay.cometd.ChannelImpl) => channel1
<0x00007fbffef22af8> (a org.mortbay.cometd.continuation.ContinuationClient) => client0
Found one Java-level deadlock:
thread0: waiting to lock monitor channel1 (a org.mortbay.cometd.ChannelImpl), which is held by thread1 thread1: waiting to lock monitor channel0 (a org.mortbay.cometd.ChannelImpl), which is held by thread0
Java stack information for the threads listed above:
thread0 (server-side disconnect after ajax-clicking signout link): at org.mortbay.cometd.ChannelImpl.doRemove(ChannelImpl.java:193) waiting to lock <channel1> Locks in acquired order: <channel0> thread1 (client-side disconnect): at org.mortbay.cometd.ChannelImpl.doRemove(ChannelImpl.java:191) waiting to lock <channel0> Locks in acquired order: <client1> <client1> <channel1>
Analysis:
channel0 has channelId "/notifications"
channel1 has channelId "/notifications/foo"
client-side disconnect causes call to ClientImpl.remove(true) which than removes the now empty channel.
server-side disconnect directly removes the channel (always only a single connected client)
I simplified the stack into a Java-like representation:
// client-side client.remove(false) { _channel.unsubscribe(client) { synchronized(_channel) { // "/notifications/foo" _abstractBayeux.removeChannel(_channel) { _rootChannel.doRemove(_channel) { for (ChannelImpl child : _children.values()) { child.doRemove(_channel) { channelId = channel.getChannelId(); // deadlock synchronized(child._children.get(channelId.getSegment(channelId.depth()-1))) { // "/notifications" ... } } } } } } } } // server-side channelId = "/notifications/foo" abstractBayeux.removeChannel(channelId) { channel = getChannel(channelId); channel.remove() { _rootChannel.doRemove(channel) { channelId = channel.getChannelId(); synchronized (rootChannel) { // "/notification" // deadlock synchronized (_children.get(channelId.getSegment(channelId.depth()-1))) { // "/notifications/foo" } } } } }
Now it's - kind of - easy to see that the channels are locked in different order. I was also able to verify this deadlock in the debugger. It's easiest to put a breakpoint into ChannelImpl.doRemove(channel) and to wait until both disconnect threads - client-side and server-side - are suspended. Then, resume both threads, server-side disconnect first. This will cause the deadlock that will also block any following thread that is trying to lock the "/notifications" channel - which was basically every user in our app.
As a quick workaround, I'd suggest to remove all clients from a channel instead of the channel itself. This should avoid any deadlocks. In order to get consistent locking (and to save others from spending some hours debugging), I'd suggest to add a synchronized block in AbstractBayeux.removeChannel(String), i.e. to lock the channel that's going to be removed. Maybe somebody who knows the big picture can confirm this?
Issue Links
- is duplicated by
-
JETTY-921
Deadlock with cometd remove and unsubscribe all
-
snap!