Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Cannot Reproduce
-
Affects Version/s: 6.1.15.rc5
-
Component/s: NIO
-
Labels:None
-
Environment:CentOS 5.x, 32/64-bit, 32/64-bit JDK 1.6.0_12 and older, Kernel 2.6.18-92.1.22.el5 and also older versions, Jetty 6.1.14, 6.1.15 trunk
-
Number of attachments :5
Description
The problem itself is described at http://markmail.org/message/zudwd2ve5l2qijbw . There has been no reply more than a month.
We are having exactly the same problem in a similar environment.
It can be caused by various JDK/NIO bugs which are still not fixed in the JDK 1.6.0_12, like: http://bugs.sun.com/view_bug.do?bug_id=6693490 .
At some point Acceptor thread starts to use 100% CPU and doesn't exit this state until application is restarted. It appears that it's spinning in the tight loop. Some other applications provide workaround for this problem like recycling the acceptor and canceling the keys when it happens, so probably the same approach should be implemented in Jetty. I'm not a big expert in this area.
I've contacted Chris directly and he told me that they are still having the same problem and found no solutions/workarounds. Interesting thing is that our applications have some common components: they both use Jetty and MINA framework in the same application, so it can be related as MINA has its own acceptors which may somehow affect Jetty NIO acceptors. MINA versions are 1.x and 2.x, so could be that the problem is not related to the MINA as well.
In our application we had to switch Jetty to the old SocketConnector instead of the SelectChannelConnector. This fixed the problem, but it may be not an option for those using Continuations and lots of idle connections.
MINA itself doesn't seem to be affected by this problem, so it could be Jetty/JDK/OS specific.
Attached are the screenshots from the profiler before and after the problem.
The most common traces from this thread using 100% CPU (50% on the screenshots as it's a dual core machine):
Stacks at 1:27:10
21684929@qtp-7760420-1 - Acceptor0 SelectChannelConnector@0.0.0.0:8000 [RUNNABLE] CPU time: 1:22
sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int)
sun.nio.ch.EPollArrayWrapper.poll(long)
sun.nio.ch.EPollSelectorImpl.doSelect(long)
sun.nio.ch.SelectorImpl.lockAndDoSelect(long)
sun.nio.ch.SelectorImpl.select(long)
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect()
org.mortbay.io.nio.SelectorManager.doSelect(int)
org.mortbay.jetty.nio.SelectChannelConnector.accept(int)
org.mortbay.jetty.AbstractConnector$Acceptor.run()
org.mortbay.thread.QueuedThreadPool$PoolThread.run()
Stacks at 1:28:39
21684929@qtp-7760420-1 - Acceptor0 SelectChannelConnector@0.0.0.0:8000 [RUNNABLE] CPU time: 2:33
sun.nio.ch.EPollArrayWrapper.epollWait(long, int, long, int)
sun.nio.ch.EPollArrayWrapper.poll(long)
sun.nio.ch.EPollSelectorImpl.doSelect(long)
sun.nio.ch.SelectorImpl.lockAndDoSelect(long)
sun.nio.ch.SelectorImpl.selectNow()
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect()
org.mortbay.io.nio.SelectorManager.doSelect(int)
org.mortbay.jetty.nio.SelectChannelConnector.accept(int)
org.mortbay.jetty.AbstractConnector$Acceptor.run()
org.mortbay.thread.QueuedThreadPool$PoolThread.run()
Stacks at 1:36:09
21684929@qtp-7760420-1 - Acceptor0 SelectChannelConnector@0.0.0.0:8000 [RUNNABLE] CPU time: 9:00
sun.misc.Unsafe.getInt(long)
sun.nio.ch.NativeObject.getInt(int)
sun.nio.ch.EPollArrayWrapper.getDescriptor(int)
sun.nio.ch.EPollArrayWrapper.poll(long)
sun.nio.ch.EPollSelectorImpl.doSelect(long)
sun.nio.ch.SelectorImpl.lockAndDoSelect(long)
sun.nio.ch.SelectorImpl.selectNow()
org.mortbay.io.nio.SelectorManager$SelectSet.doSelect()
org.mortbay.io.nio.SelectorManager.doSelect(int)
org.mortbay.jetty.nio.SelectChannelConnector.accept(int)
org.mortbay.jetty.AbstractConnector$Acceptor.run()
org.mortbay.thread.QueuedThreadPool$PoolThread.run()
-
- jrockit-dump.txt
- 09/Oct/09 7:06 AM
- 120 kB
- Daniel Peters
-
Hide
- jrockit-recording.zip
- 09/Oct/09 5:58 AM
- 1.44 MB
- Daniel Peters
-
- recording2972822076224932408.jra 1.46 MB
-
- rso-stack.txt
- 20/Aug/09 11:01 AM
- 125 kB
- Damian Bradicich
-
- normal.png
- 23 kB
- 04/Mar/09 3:30 AM
-
- problem.png
- 35 kB
- 04/Mar/09 3:30 AM
Activity
I did what you have suggested and just after 5 minutes of running the log got filled with 2MB of:
INFO | jvm 1 | 2009/03/05 19:11:55 | 1236269515093] JVM BUG: 6403933 - 1000
INFO | jvm 1 | 2009/03/05 19:11:55 | 1236269515093] JVM BUG: 6403933 - 1000
INFO | jvm 1 | 2009/03/05 19:11:55 | 1236269515093] JVM BUG: 6403933 - 1000
INFO | jvm 1 | 2009/03/05 19:11:55 | 1236269515093] JVM BUG: 6403933 - 1000
INFO | jvm 1 | 2009/03/05 19:11:55 | 1236269515093] JVM BUG: 6403933 - 1000
INFO | jvm 1 | 2009/03/05 19:11:55 | 1236269515093] JVM BUG: 6403933 - 1000
INFO | jvm 1 | 2009/03/05 19:11:55 | 1236269515093] JVM BUG: 6403933 - 1000
INFO | jvm 1 | 2009/03/05 19:11:55 | 1236269515093] JVM BUG: 6403933 - 1000
INFO | jvm 1 | 2009/03/05 19:11:55 | 1236269515093] JVM BUG: 6403933 - 1000
INFO | jvm 1 | 2009/03/05 19:11:55 | 1236269515093] JVM BUG: 6403933 - 1000
This is for line
System.err.println(now + "] JVM BUG: 6403933 - " + wait);
added just after
if (_jvmBug++>5)
Obviously, the current workaround doesn't work.
Rabbit HTTP proxy has the following workaround code for this JDK bug which seems to work fine:
http://pastie.org/408410 .
I'll try applying it to Jetty and see how it works.
Unfortunately, that code it didn't help. Probably the selector needs to be destroyed and created from scratch when it happens. Will try other workarounds.
what operating system are you using? on linux with java 1.6.0_10 the bug is detected frequently and the work around works.
at least the good news is that the bug is detected!
bad news is that it is detected frequently enough on my system that I don't really want a draconian fix of killing the entire selector, as that will slow things up a lot..
I'll look at those links now.
I just tried jdk1.6.0_12 and still not 100%.
The problem occurs more frequently, but the workaround does seam to work.
So I still can't reproduce on my system.
Operating system, kernel and JDK versions are specified in the Environment field of this bug report. You can also find this information in the original report to the jetty-support newsgroup which is linked in the first sentence of this report.
I can trigger the workaround code even on Windows by quickly refreshing a Servlet in the browser (Chrome).
We're experiencing this issue as well with 6.1.15 and JDK 1.6.0. Same OS. It's less frequent than with the older version of Jetty we were running (1.6.2.rc1, I believe), but it still occurs..
Serge and Jason,
I'm going to spend time on this bug this week. Can either of you schedule some time to
try out some fixes (as I still can't reproduce)
cheers
Sure, I'll be glad to try any fix.
BTW, this issue also affects MINA so it's definitely a JDK bug, but it's much easier for us to reproduce it with Jetty.
You may want to check this discussion: http://www.nabble.com/Comet-handler-starts-terminating-TCP-connections-with-RST--td20337445.html . Grizzly guys already fixed similar problem. Their workaround is to trash the selector, you may look at the code if you like.
Jeanfrancois Arcand, who provided the workaround for Grizzly told me that JDK guys do not plan to fix it in 1.6 branch at all and claimed that it's fixed in 1.7 (though, didn't specify in which build). I've tried the latest available JDK 1.7 build (b50) and it also has the same problem for me. So, we either have some different bug or it's not fixed in 1.7 b50 yet.
I've submitted a similar issue to the MINA project: https://issues.apache.org/jira/browse/DIRMINA-678 .
Sun has a patch for this bug, more information is available in the MINA issue linked above.
I just tried jdk 1.6.0_13-b03 and I was unable to detect any select busy loops.
Note that I have not been able to ever reproduce the 100% failure, only the less severe case that
can be fixed by cancelling the key.
But considering that the patch was available against _12, it would be reasonable to think that _13 has the fix.
Could all please try this?
thanks
I'm seeing this issue on a production server. A few specifics:
- jdk1.6.0_13-b03 (latest stable version)
- Linux 2.6.15-29-server i686
- 4 core Intel(R) Xeon E5320
- Using org.mortbay.thread.QueuedThreadPool, Jetty jetty-6.1.15
When I restart my application Java seems well-behaved for a number of minutes before the process starts using a ton of CPU. There is no heavy processing in our application, it's very much a typical I/O bound web application.
Any additional pointers are greatly appreciated.
thanks!
hernan
I just read the comments on the issue Serge opened against MINA (https://issues.apache.org/jira/browse/DIRMINA-678) and there's some good information there. Sounds like JDK 1.6.0_13 doesn't have the fix yet and folks are working around it in various ways until it becomes available (jdk 1.6.0_14 or _15).
thnx...
hernan
Dang, so 1.6.0_13 still does not fix it.
I'm still not inclined to fix this in jetty since the extra work needed to detect, throw away the selector and re-register is probably going to slow the server more than just running on 1.5.
If you really want a fix for this and can't run 1.5, please vote for this issue and we'll reconsider.
I've checked in an attempted work around for this bug into jetty-7 at eclipse.
Could somebody who can reproduce the bug is able to checkout and build
http://dev.eclipse.org/svnroot/rt/org.eclipse.jetty/jetty/trunk/
and then test it?
If the fix triggers you should see "SUN BUG WORKAROUND!!!!" on stderr
and the CPU should not go to 100%.
If this can be shown to work, I'll backport to jetty-6
Thanks for the response, Greg. I've checked out and built the latest Jetty-7 code from the eclipse repo as you suggested. The code I'm working with makes heavy use of Jetty 6 continuations though and dropping my application into Jetty 7 doesn't seem to work. I'm looking at the code now to see what it will take to get it to work with the standard suspend/resume API supported by Jetty 7. If I can avoid this work, please let me know. I'm getting ClassNotFoundExceptions on org/mortbay/util/ajax/ContinuationSupport and I don't see that class in any of the jars generated by mvn.
thanks!
hernan
org.eclipse.jetty.continuation.ContinuationSupport in the jetty-continuations package ![]()
cheers
jesse
Thanks, Jesse. I had already tried using org.eclipse.jetty.continuation.ContinuationSupport instead of org.mortbay.util.ajax.ContinuationSupport but it's apparently not a drop-in replacement.
ContinuationSupport.getContinuation() takes just a ServletRequest now. I can work around that.
The org.eclipse.jetty.continuation.Continuation instance returned by ContinuationSupport.getContinuation() is quite different than what was being returned by org.mortbay.util.ajax.ContinuationSupport.getContinuation(). Specifically, I don't see these:
isPending()
get/setObject()
reset()
Any ideas? I don't see analogous methods on org.eclipse.jetty.continuation.Continuation.
thnx...
hernan
The jetty-7 continuation API is a simplification of the previous API, so that it can better work with the new servlet 3.0 APIs when they are available.
after suspending you have to return yourself. Because you can execute code after the suspend, there is no need to pass in the mutex.
instead of isPending you can use isInitial or request attributes
instead of get/set object, user request attributes
reset should not be needed now.
if you send me a code snippet, I'll help rework it. I'll also do a blog/wiki page on this soon.
I've just committed a possible fix for this bug to both jetty-6 and jetty-7.
I've confirmed that if I force the "remedy" it can recreate a selector without harm (other than time).
So if the detection of the problem is correct, the work around should work around.
PLLLLLLLEEEEEAAAAAAAAAAASSSSSSSSSSEEEEEEEEE if you can reproduce this problem,
can you run jetty built after rxxxx
You will see an info message the first time the remedy is applied.
Fixed be trashing the selector if we go infinite with zero selected
Hey Greg, thanks for your help and for following up on this issue. I'd love to put together a Jetty 6 build to test this fix with. I'm not familiar with your SVN layout, should I pull from the Jetty 6.1 branch? Also not sure which repo to get it from (codehaus/eclipse)... if you can pass along the right SVN URL that would be great and hopefully I can give you feedback on the remedy code.
thnx...
hernan
Hernan,
This patch made it into the 6.1.17 release (as it seamed to at least do no harm).
Please give that release a go,
but to answer your question, jetty is really easy to build:
svn co http://svn.codehaus.org/jetty/jetty/branches/jetty-6.1
cd jetty-6.1
mvn
java -jar start.jar
It is in 6.1.17.... but I don't know if it solves the whole issue. So far it does not appear to hurt.
http://svn.codehaus.org/jetty/jetty/tags/jetty-6.1.17/VERSION.txt
second issue from the top!
% grep 'JVM BUG' logs/2009_05_12.stderrout.log
2009-05-12 11:23:35.794::INFO: seeing JVM BUG(s) - cancelling interestOps==0
2009-05-12 11:49:36.136::INFO: seeing JVM BUG(s) - recreating selector
Logging is set to INFO on this server. CPU usage is way down again where it should be and it's staying there.
Specifics:
Jetty 6.1.17
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)
CentoOS 5.1 (2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 11:57:43 EST 2008 x86_64 x86_64 x86_64 GNU/Linux)
4 X Intel(R) Xeon(R) CPU E5320 @ 1.86GHz
Thanks for taking the time to put this workaround in place and being patient while I found an opportunity to push it to an affected server. You turned my frown upside-down.
hernan
I hope this one is fixed on 6.1.17.
but we are having a hard time to test it.
Can anyone help me to reproduce the bug on previous release than 6.1.17?
Any codes would be appreciated.
One more thing, is this bug fixed in lastest version of Sun Java VM?
I'm kind of confused because they said it is already fixed and delievered on this link (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933)
But we still have this problem even on the latest Java version (6u13)
Thanks.
The work around is in 6.1.17 and we have had some reports of success.
Their is a fix for the JVM bug, but last I heard it was only going to be made available in 1.7
but we can hope it might be in a 1.6.x release soon?
Hi es kim,
I also had a hard time reproducing this bug. I would happen on our production servers all the time but only happened a couple of times in our staging environment--the only difference between the two was the number of users, our production environment is really busy. We're not doing anything out of the ordinary in our code, just writing servlet handlers. So if you're seeing the bug in a production environment and you absolutely need to test the workaround in staging I recommend you generate a representative load, I think that will help your chances of tripping over the JVM bug.
The workaround Greg provided in 6.1.17 is working for us and our experience with 6.1.17 so far is that it's every bit as stable as 6.1.16.
thnx...
Hernan
This is affecting my project as well. I've had to roll back to 6.1.14 from 6.1.15 (not an RC)
OS: RHEL 5: 2.6.18-128.1.1.el5
java version "1.6.0_14-ea"
Java(TM) SE Runtime Environment (build 1.6.0_14-ea-b06)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b15, mixed mode)
No luck,
We still got same problem even on 6.1.17 that is supposed to fix this issue.
jstack show like this
"22188335@qtp-33439957-0 - Acceptor0 SelectChannelConnector@0.0.0.0:7070" prio=10 tid=0x641eb000 nid=0x3986 runnable [0x639f3000..0x639f4040]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
- locked <0x767a6f58> (a sun.nio.ch.Util$1)
- locked <0x767a6f48> (a java.util.Collections$UnmodifiableSet)
- locked <0x767a6de0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:436)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:189)
at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:706)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)
That thread has almost 1 day running time
which is kind of problem, I guess.
And with this RetryContinuation loop,
....
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@21661635
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@19584892
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@6446596
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@19584892
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@6446596
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@9131473
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@6446596
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@19584892
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@21661635
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@19584892
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@6446596
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@9131473
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@6446596
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@19584892
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@21661635
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@19584892
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@6446596
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@9131473
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@6446596
2009.05.19 11:28:12 JettyLog: continuation RetryContinuation@19584892
....
And one more notes is that
Log that you put in SelectManager.java class isn't printed out.
It seems that it doesn't get into 'if' statemnt below
// Look for JVM bugs
// http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933
// http://bugs.sun.com/view_bug.do?bug_id=6693490
if (selected==0 && wait>0 && (now-before)<wait/2)
Thanks.
es kim,
I think you have a separate problem, probably with regards to your use of Continuations rather than a java 1.6 bug.
Can you open a new JIRA and tell us more about how your are using continuations, perhaps with a code snippet?
This is a stack output of Nexus running on one of our servers, nexus is using
Jetty 6.1.17
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) Server VM (build 14.0-b16, mixed mode)
And after running for a 1+ days we start seeing the nexus process taking up 100%+ of cpu. I found this ticket, and appears to be exact problem we are having, so i upgrade to 6.1.17 jetty (from 6.1.12) but still the problem persists.
Any thoughts of a new step to try ?
Talked with jesse, and have upgraded to 6.1.19, we will see how that goes over the next few days
I also stuck on this trouble - at least it's good I'm not alone in whole world...
I tried to update to latest java & jetty just a few days ago - seems like it didn't help.
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
Linux bizmania 2.6.22.5-31-default #1 SMP 2007/09/21 22:29:00 UTC x86_64 x86_64 x86_64 GNU/Linux
Jetty 1.6.19
We use Jetty embedded in our own application. Untill some users appeared everything was O.K. but some time ago the server started to hang day over day.
The workaround seems to trigger - sometimes I see in a console: "seeing JVM BUG(s) - cancelling interestOps==0". But still time over time server completely hangs in Acceptor thread - and application begin to slow down.
Don't know what to do - we planed to go for production - but with these results... What shell we try - downgrade to java 1.5? Any advice is desperate required!!!
The JDK bug seems to be addressed in 1.6.0_18 which is available at https://jdk6.dev.java.net/6uNea.html .
I still have this NIO issue quite regularly, too.
It occurs on a quad core machine, so the CPU usage is always >= 25% until you restart the JVM.
The dumps/graphs look just like the ones posted above.
Environment:
-------------------
Jetty:
Version 7.0.0 (but it also happend in 6.1.21)
JVM:
java version "1.6.0_05"
Java(TM) SE Runtime Environment (build 1.6.0_05-b13)
BEA JRockit(R) (build R27.6.0-50_o-100423-1.6.0_05-20080626-2104-linux-ia32, compiled mode)
Mina is also used in this JVM.
If you need anything (Threaddumps etc) just let me know.
This is a really elusive jvmbug and it appears that it presents itself in a number of different ways.
We have been working with a client who also was still seeing this issue - but for them it self corrects after about 15 minutes.
In response, we have committed a patch to trunk of jetty-6 and jetty-7 (not in any release yet), that is better at
detecting the busy acceptor threads and more proactive at applying the work arounds (either cancelling
keys or recreating the selector).
While the work arounds are "effective", we found that sometimes the jvm bug just reoccurs less than a
second later - so there is little gain.
Thus we have also put in a last line of defence, so that the acceptor will pause for 50ms on each iteration
for the next second after it has seen excessive looping. This will cost 25ms of average latency processing
requests, but will limit the acceptor to 20 iterations per second. These can be configured so you can have
less latency and more iterations if you like.
With this mode, the servers that suffer from this bug continue to work pretty well and may even get better
throughput due to the latency tradeoff.
Oh I just noticed that you are using JRocket! hmmmm does that share code with Sun's JVM? it would be
very strange for it to have exactly the same bug.... can you do some thread dumps and confirm that it is indeed
the acceptor threads that are going 100% - also try a build from the head would be good.
it seems that they share code, it says "sun.nio.ch..." in the stacktrace, i added it to this issue.
I noticed that my JRockit version wasnt the latest, so i updated it to:
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
BEA JRockit(R) (build R27.6.3-40_o-112056-1.6.0_11-20090318-2103-linux-ia32, compiled mode)
lets see if this helps... it can take some days until it happens on my server...
There have been several "fixes" over several versions.
But 7.0.1 and 6.1.22 contain the best fix to date.
I'm seeing this bug now as well. I'm running Java 1.6.0_15 with Jetty 7.0.1 on OpenSolaris. Every couple days when under heavy load Jetty will get stuck in AbstractConnector$Acceptor.run() and eat up 100% CPU.
Has this issue been resolved? What is the optimal Jetty / Java version combination for me to run to avoid this issue? Thanks! I'm going to upgrade to Jetty 7.1.5 in the meantime just to be safe.
If it reoccurs with 7.1.5, then please immediately comment here with a stack trace.
thanks
Ok. I'll keep my eyes open. Unfortunately I've been only able to reproduce it in a production setting so I'm loathe to try to reproduce the issue there. I'm using Jetty 7.1.5 now with JVM version 1.6.0_06 now and it appears to be stable.
hi. we were using jetty 6.1.5 and the cpu went 100% high on windows server 2003.
the thread dump had this like as mentioned in http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933
but the only difference is that we got it in windows.
are there any steps to reproduce this bug in windows.
will an upgrade help us to see past this issue.
thanks and regards,
ch vishnu
"btpool0-9 - Acceptor0 SelectChannelConnector@0.0.0.0:8000" prio=6 tid=0x545c6c10 nid=0x1a80 runnable [0x559af000..0x559afa6c]
at sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method)
at sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(Unknown Source)
at sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(Unknown Source)
at sun.nio.ch.WindowsSelectorImpl.doSelect(Unknown Source)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)
- locked <0x09c509f0> (a sun.nio.ch.Util$1)
- locked <0x09c50a00> (a java.util.Collections$UnmodifiableSet)
- locked <0x09c50978> (a sun.nio.ch.WindowsSelectorImpl)
at sun.nio.ch.SelectorImpl.select(Unknown Source)
at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:416)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:168)
at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:514)
at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
yes, there are a number of fixes for a number of these jvm bugs in more recent versions of jetty, update to 6.1.5 or the new jetty7 series which is up to 7.1.6 at this point
cheers
as mentioned above, the problem is supposed to be a jvm bug.
our environment was a single cpu box with Java 5- windows server 2003, running as service using the "Wrapper"(wrapper.tanukisoftware.com)
so is necessary to upgrade to java6 along with jetty?
or will a jetty upgrade suffice?
thanks in advance,
ch vishnu
Vishnu,
I am also having the same problem in our production environment.
So what version did you decide to go with ? Did it fix your issue ?
my environment is : java 1.5.0, jetty-6.1.20, linux.
thanks in advance !
- sunil
Greg / Serge / Chris,
as per the jira fix I upgraded to Jetty 6.1.22 . But we are still facing the same issue in our production enviroment. CPU usage goes to 100% sometimes.
Has anybody found a fix for it ?
Should I be using SOCKET CONNECTOR instead of SELECT CHANNEL CONNECTOR ?
thanks,
Sunil
A few issues have been found and fixed in jetty-7 with regards to 100% CPU.
I would strongly urge and upgrade to 7.3.0 or the soon to be release 7.3.1
failing that, we will try to look at if these issues exist in jetty-6 and if the fixes can be back ported, but this will be at least several weeks, if not a month or two.
Hi Greg,
We recently upgraded from Jetty 6.1.14 to Jetty 6.1.26. We also upgraded Sun JVM 1.6.0_12 to 1.6.0_21. We are using Linux 2.6.18-164.
In our production boxes, we have seen some of the jetty servers taking 90-100% CPU. Also, jetty http port stops responding. We verified that one of the Jetty server threads is consuming the CPU. Following is the threaddump
runnable [0xaddfb000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:62)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
locked <0xb6524f58> (a sun.nio.ch.Util$1)
locked <0xb6524f68> (a java.util.Collections$UnmodifiableSet)
locked <0xb6400930> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:652)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
As I understand, this issue is well known from 6.1.5 release of Jetty. Could it be that due to Jetty upgrade to 6.1.26, we are encountering this issue more often?
Are there any workarounds as we are encountering this in production. Is restart the only option?
I am seeing a similar issue with Java 1.6.0_24 and Jetty 6.1.26 on RHEL 6.0. In this case strace shows epoll_wait returning EINVAL in a tight loop. My project's ticket is here: https://issues.apache.org/jira/browse/MAPREDUCE-2386
Stack trace for the exception:
"1989851698@qtp-2059580150-0 - Acceptor0 SelectChannelConnector@0.0.0.0:57866" prio=10 tid=0x00007f535c016800 nid=0x4c75 runnable [0x00007f533208f000]
java.lang.Thread.State: RUNNABLE
at java.lang.Throwable.fillInStackTrace(Native Method)
- locked <0x00000000ebcdbbe8> (a java.io.IOException)
at java.lang.Throwable.<init>(Throwable.java:196)
at java.lang.Exception.<init>(Exception.java:41)
at java.io.IOException.<init>(IOException.java:41)
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x00000000c7749038> (a sun.nio.ch.Util$2)
- locked <0x00000000c7749048> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000c7748ff0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:652)
at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
I've just read the entire thread and I have exactly the same problem. My env:
- Ubuntu 10.04.2 LTS
- Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
- Jetty Version: 6.1.22-1ubuntu1
100% CPU till a server restart. The logs say this:
2011_05_23.stderrout.log:1231061538 [857966469@qtp-1111869819-9 - Acceptor0 SelectChannelConnector@0.0.0.0:8080] INFO org.mortbay.log - org.mortbay.io.nio.SelectorManager$SelectSet@52dac7a1 JVM BUG(s) - recreating selector 59 times, canceled keys 944 times
2011_05_23.stderrout.log:1231061545 [2008400431@qtp-1111869819-8 - Acceptor1 SelectChannelConnector@0.0.0.0:8080] INFO org.mortbay.log - org.mortbay.io.nio.SelectorManager$SelectSet@7e6dd7a8 JVM BUG(s) - injecting delay59 times
2011_05_23.stderrout.log:1231061545 [2008400431@qtp-1111869819-8 - Acceptor1 SelectChannelConnector@0.0.0.0:8080] INFO org.mortbay.log - org.mortbay.io.nio.SelectorManager$SelectSet@7e6dd7a8 JVM BUG(s) - recreating selector 59 times, canceled keys 944 times
2011_05_23.stderrout.log:1231122137 [857966469@qtp-1111869819-9 - Acceptor0 SelectChannelConnector@0.0.0.0:8080] INFO org.mortbay.log - org.mortbay.io.nio.SelectorManager$SelectSet@52dac7a1 JVM BUG(s) - injecting delay27 times
As a workaround, I switched to org.mortbay.jetty.bio.SocketConnector.
Just to take a note, I found Tomcat users that have the same problem when using their NIO Connector(which is not the default): http://marc.info/?l=tomcat-user&m=119369624323640
This is not going to make it into 1.6.3, but we'll need to investigate a bit more. Does it sound (to those of you who have done some investigation) like this is a problem in JRuby or a problem outside of JRuby?
Since this issue has more then 2 years and nobody could provide a reproducible way of hitting this bug(and it's probably jvm related), I think we should close it as "cannot reproduce".
Even though this issue is closed I would like to post an update here, as this bug hits me as well.
When I start my application it just takes a few (<50) requests. I can just hit refresh in my browser for some minutes and baaam the app does not respond anymore and I have a java process with 100% CPU running.
Environment:
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
Debian 6.0.2 Linux 2.6.32-5-xen-amd64 x86_64
Embedded Jetty in Eclipse Equinox as part of the Eclipse Gyrex Project.
I have the following jetty related bundles running:
org.eclipse.equinox.http.jetty_2.0.100.v20110502 org.eclipse.gyrex.admin.ui.http.jetty_1.0.0.v20110907-2000 org.eclipse.gyrex.http.jetty_1.0.0.v20110923-1955 org.eclipse.jetty.client_7.4.2.v20110526 org.eclipse.jetty.continuation_7.4.2.v20110526 org.eclipse.jetty.http_7.4.2.v20110526 org.eclipse.jetty.io_7.4.2.v20110526 org.eclipse.jetty.rewrite_7.4.2.v20110526 org.eclipse.jetty.security_7.4.2.v20110526 org.eclipse.jetty.server_7.4.2.v20110526 org.eclipse.jetty.servlet_7.4.2.v20110526 org.eclipse.jetty.servlets_7.4.2.v20110526 org.eclipse.jetty.util_7.4.2.v20110526
And I also see stack traces from a thread dump which look similar to the ones presented here:
"jetty-server-admin-71 Acceptor0 SelectChannelConnector@0.0.0.0:3110 STARTED" prio=10 tid=0x0000000041c41000 nid=0x451c runnable [0x00007f84fe7e3000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152) - locked <0x0000000780a1ddf0> (a java.lang.Object) at org.eclipse.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:92) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:830) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529) at java.lang.Thread.run(Thread.java:662) "jetty-server-admin-70 Selector0 SelectChannelConnector@0.0.0.0:3110 STARTED" prio=10 tid=0x000000004156c800 nid=0x451b runnable [0x00007f84fe8e4000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x0000000780a1d948> (a sun.nio.ch.Util$2) - locked <0x0000000780a1d938> (a java.util.Collections$UnmodifiableSet) - locked <0x0000000780a1d710> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.eclipse.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:504) at org.eclipse.jetty.io.nio.SelectorManager.doSelect(SelectorManager.java:228) at org.eclipse.jetty.server.nio.SelectChannelConnector$1.run(SelectChannelConnector.java:268) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529) at java.lang.Thread.run(Thread.java:662) "jetty-server-69" prio=10 tid=0x0000000041710000 nid=0x451a waiting on condition [0x00007f84fe9e5000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000078099ea28> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:320) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:512) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:38) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:558) at java.lang.Thread.run(Thread.java:662) "jetty-server-68" prio=10 tid=0x0000000042c18000 nid=0x4519 waiting on condition [0x00007f84feae6000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000078099ea28> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:320) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:512) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:38) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:558) at java.lang.Thread.run(Thread.java:662) "jetty-server-66" prio=10 tid=0x00007f850086b000 nid=0x4517 waiting on condition [0x00007f84fece8000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000078099ea28> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2025) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:320) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:512) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$600(QueuedThreadPool.java:38) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:558) at java.lang.Thread.run(Thread.java:662) "jetty-server-65 Selector1 SelectChannelConnector@0.0.0.0:8081 STARTED" prio=10 tid=0x00007f8500b69800 nid=0x4516 runnable [0x00007f84fede9000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x0000000780986678> (a sun.nio.ch.Util$2) - locked <0x0000000780986668> (a java.util.Collections$UnmodifiableSet) - locked <0x0000000780986000> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.eclipse.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:504) at org.eclipse.jetty.io.nio.SelectorManager.doSelect(SelectorManager.java:228) at org.eclipse.jetty.server.nio.SelectChannelConnector$1.run(SelectChannelConnector.java:268) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529) at java.lang.Thread.run(Thread.java:662) "jetty-server-64 Selector0 SelectChannelConnector@0.0.0.0:8081 STARTED" prio=10 tid=0x00007f850028f800 nid=0x4515 runnable [0x00007f84feeea000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x000000078099e728> (a sun.nio.ch.Util$2) - locked <0x000000078099e718> (a java.util.Collections$UnmodifiableSet) - locked <0x000000078097fc08> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.eclipse.jetty.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:504) at org.eclipse.jetty.io.nio.SelectorManager.doSelect(SelectorManager.java:228) at org.eclipse.jetty.server.nio.SelectChannelConnector$1.run(SelectChannelConnector.java:268) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529) at java.lang.Thread.run(Thread.java:662) "jetty-server-63 Acceptor1 SelectChannelConnector@0.0.0.0:8081 STARTED" prio=10 tid=0x00007f850093b000 nid=0x4514 runnable [0x00007f84fefeb000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:152) - locked <0x000000078097f1a0> (a java.lang.Object) at org.eclipse.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:92) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:830) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529) at java.lang.Thread.run(Thread.java:662) "jetty-server-62 Acceptor0 SelectChannelConnector@0.0.0.0:8081 STARTED" prio=10 tid=0x00007f8500536000 nid=0x4513 waiting for monitor entry [0x00007f84ff0ec000] java.lang.Thread.State: BLOCKED (on object monitor) at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:136) - waiting to lock <0x000000078097f1a0> (a java.lang.Object) at org.eclipse.jetty.server.nio.SelectChannelConnector.accept(SelectChannelConnector.java:92) at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:830) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529) at java.lang.Thread.run(Thread.java:662)
Any more thoughts ideas on this?
Just an update: jetty 7 has a lot of improvements handling closed connections. If you have mobile clients that drop connections, it's likely that you hit this bug. Updating to jetty 7.5.2(or the to-be-released 7.5.3) can help.
Related: JETTY-1441
Commits that may helped to fix this:
@7.5.2
https://github.com/eclipse/jetty.project/commit/633f3b15886664e116478145a3e0139fc8f89847
https://github.com/eclipse/jetty.project/commit/d0a2557527ed7d660ee22f10741ec8d245008ede
@7.5.3
https://github.com/eclipse/jetty.project/commit/ea56eaff005e6b14c3f3464adf95e1e13742df68
https://github.com/eclipse/jetty.project/commit/06f4ada935de096869177cfd60e62a667a14f321
https://github.com/eclipse/jetty.project/commit/3476887f8a1e2281d9ec7ea85a0d3f4dd5017cc8
So, if you hit this bug, try updating to 7.5.3 and it will (probably) be solved.
The jetty select manager does have handling for JVM bugs that cause 100%.
You can see this at http://jetty.mortbay.org/jetty/jetty-7/xref/org/mortbay/io/nio/SelectorManager.html#408
and it was put in place for http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6403933
This code attempts to detect a select that returns quickly with no keys selected, and if so it
cancels all keys with 0 interestedops.
Perhaps for this bug, we should cancel and remake all keys? perhaps trash the selector and
start again?
Anyway, I just tried to reproduce on my system for 10 minutes and have not been able to.
I'll keep trying, but it would be really good if you are able to reproduce easily if you could
compile Jetty with a println in that loop detection to see if it is being triggered.
Unfortunately we are too deep into the 6.1.15 cycle to hold the release for this.