Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Cannot Reproduce
-
Affects Version/s: 6.1.15.rc5
-
Component/s: NIO
-
Labels:None
-
Environment:CentOS 5.x, 32/64-bit, 32/64-bit JDK 1.6.0_12 and older, Kernel 2.6.18-92.1.22.el5 and also older versions, Jetty 6.1.14, 6.1.15 trunk
-
Number of attachments :
Description
The problem itself is described at http://markmail.org/message/zudwd2ve5l2qijbw . There has been no reply more than a month.
We are having exactly the same problem in a similar environment.
It can be caused by various JDK/NIO bugs which are still not fixed in the JDK 1.6.0_12, like: http://bugs.sun.com/view_bug.do?bug_id=6693490 .
At some point Acceptor thread starts to use 100% CPU and doesn't exit this state until application is restarted. It appears that it's spinning in the tight loop. Some other applications provide workaround for this problem like recycling the acceptor and canceling the keys when it happens, so probably the same approach should be implemented in Jetty. I'm not a big expert in this area.
I've contacted Chris directly and he told me that they are still having the same problem and found no solutions/workarounds. Interesting thing is that our applications have some common components: they both use Jetty and MINA framework in the same application, so it can be related as MINA has its own acceptors which may somehow affect Jetty NIO acceptors. MINA versions are 1.x and 2.x, so could be that the problem is not related to the MINA as well.
In our application we had to switch Jetty to the old SocketConnector instead of the SelectChannelConnector. This fixed the problem, but it may be not an option for those using Continuations and lots of idle connections.
MINA itself doesn't seem to be affected by this problem, so it could be Jetty/JDK/OS specific.
Attached are the screenshots from the profiler before and after the problem.
The most common traces from this thread using 100% CPU (50% on the screenshots as it's a dual core machine):
Stacks at 1:27:10
21684929@qtp-7760420-1 - Acceptor0 SelectChannelConnector@0.0.0.0:8000 [RUNNABLE] CPU time: 1:22
sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int)
sun.nio.ch.EPollArrayWrapper.poll(long)
sun.nio.ch.EPollSelectorImpl.doSelect(long)
sun.nio.ch.SelectorImpl.lockAndDoSelect(long)
sun.nio.ch.SelectorImpl.select(long)
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect()
org.mortbay.io.nio.SelectorManager.doSelect(int)
org.mortbay.jetty.nio.SelectChannelConnector.accept(int)
org.mortbay.jetty.AbstractConnector$Acceptor.run()
org.mortbay.thread.QueuedThreadPool$PoolThread.run()
Stacks at 1:28:39
21684929@qtp-7760420-1 - Acceptor0 SelectChannelConnector@0.0.0.0:8000 [RUNNABLE] CPU time: 2:33
sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int)
sun.nio.ch.EPollArrayWrapper.poll(long)
sun.nio.ch.EPollSelectorImpl.doSelect(long)
sun.nio.ch.SelectorImpl.lockAndDoSelect(long)
sun.nio.ch.SelectorImpl.selectNow()
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect()
org.mortbay.io.nio.SelectorManager.doSelect(int)
org.mortbay.jetty.nio.SelectChannelConnector.accept(int)
org.mortbay.jetty.AbstractConnector$Acceptor.run()
org.mortbay.thread.QueuedThreadPool$PoolThread.run()
Stacks at 1:36:09
21684929@qtp-7760420-1 - Acceptor0 SelectChannelConnector@0.0.0.0:8000 [RUNNABLE] CPU time: 9:00
sun.misc.Unsafe.getInt(long)
sun.nio.ch.NativeObject.getInt(int)
sun.nio.ch.EPollArrayWrapper.getDescriptor(int)
sun.nio.ch.EPollArrayWrapper.poll(long)
sun.nio.ch.EPollSelectorImpl.doSelect(long)
sun.nio.ch.SelectorImpl.lockAndDoSelect(long)
sun.nio.ch.SelectorImpl.selectNow()
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect()
org.mortbay.io.nio.SelectorManager.doSelect(int)
org.mortbay.jetty.nio.SelectChannelConnector.accept(int)
org.mortbay.jetty.AbstractConnector$Acceptor.run()
org.mortbay.thread.QueuedThreadPool$PoolThread.run()
Activity
| Field | Original Value | New Value |
|---|---|---|
| Assignee | Greg Wilkins [ gregw ] |
| Remaining Estimate | 0 minutes [ 0 ] | |
| Original Estimate | 0 minutes [ 0 ] |
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Resolution | Fixed [ 1 ] |
| Attachment | rso-stack.txt [ 43918 ] |
| Attachment | jrockit-recording.zip [ 45132 ] |
| Attachment | jrockit-dump.txt [ 45136 ] |
| Fix Version/s | 6.1.22 [ 15722 ] | |
| Fix Version/s | 7.0.1 [ 15875 ] |
| Resolution | Fixed [ 1 ] | |
| Status | Resolved [ 5 ] | Reopened [ 4 ] |
| Status | Reopened [ 4 ] | Closed [ 6 ] |
| Resolution | Cannot Reproduce [ 5 ] |
The jetty select manager does have handling for JVM bugs that cause 100%.
You can see this at http://jetty.mortbay.org/jetty/jetty-7/xref/org/mortbay/io/nio/SelectorManager.html#408
and it was put in place for http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933
This code attempts to detect a select that returns quickly with no keys selected, and if so it
cancels all keys with 0 interestedops.
Perhaps for this bug, we should cancel and remake all keys? perhaps trash the selector and
start again?
Anyway, I just tried to reproduce on my system for 10 minutes and have not been able to.
I'll keep trying, but it would be really good if you are able to reproduce easily if you could
compile Jetty with a println in that loop detection to see if it is being triggered.
Unfortunately we are too deep into the 6.1.15 cycle to hold the release for this.