Affects Version/s: 6.1.14
Fix Version/s: 6.1.15.rc5
using jetty embedded
$ java -version
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
Showjetty-6.1.14 cometd-jetty-6.1.14 using jetty embedded $ java -version java version "1.6.0_07" Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
Number of attachments :
Replaced generated identifiers from attached deadlock.log with human-readable identifiers:
"1255795388@qtp0-3431" => thread0
"987957938@qtp0-3446" => thread1
<0x00007fbff86e77d0> (a org.mortbay.cometd.ChannelImpl) => channel0
<0x00007fbffef26790> (a org.mortbay.cometd.ChannelImpl) => channel1
<0x00007fbffef22af8> (a org.mortbay.cometd.continuation.ContinuationClient) => client0
Found one Java-level deadlock:
Java stack information for the threads listed above:
channel0 has channelId "/notifications"
channel1 has channelId "/notifications/foo"
client-side disconnect causes call to ClientImpl.remove(true) which than removes the now empty channel.
server-side disconnect directly removes the channel (always only a single connected client)
I simplified the stack into a Java-like representation:
Now it's - kind of - easy to see that the channels are locked in different order. I was also able to verify this deadlock in the debugger. It's easiest to put a breakpoint into ChannelImpl.doRemove(channel) and to wait until both disconnect threads - client-side and server-side - are suspended. Then, resume both threads, server-side disconnect first. This will cause the deadlock that will also block any following thread that is trying to lock the "/notifications" channel - which was basically every user in our app.
As a quick workaround, I'd suggest to remove all clients from a channel instead of the channel itself. This should avoid any deadlocks. In order to get consistent locking (and to save others from spending some hours debugging), I'd suggest to add a synchronized block in AbstractBayeux.removeChannel(String), i.e. to lock the channel that's going to be removed. Maybe somebody who knows the big picture can confirm this?