Issue Details (XML | Word | Printable)

Key: JETTY-256
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Greg Wilkins
Reporter: Boyce Lee
Votes: 0
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
Jetty

continuation may cause 100%cpu occasionally!

Created: 09/Mar/07 05:36 AM   Updated: 20/Jul/09 06:47 PM   Resolved: 24/Mar/07 11:18 AM
Component/s: Continuations
Affects Version/s: 6.0.0beta8, 6.0.0beta9, 6.0.0beta10, 6.0.0beta11, 6.0.0beta12, 6.0.0beta14, 6.0.0beta15, 6.0.0beta16, 6.0.0beta17, 6.0.0beta18, 6.0.0RC0, 6.0.0rc1, 6.0.0rc2, 6.0.0rc3, 6.0.0rc4, 6.0.0, 6.0.1, 6.0.2, 6.0.3, 6.1.0pre0, 6.1.0pre1, 6.1.0pre2, 6.1.0pre3, 6.1.0rc0, 6.1.0rc1, 6.1.0rc2, 6.1.0rc3, 6.1.0, 6.1.1rc0, 6.1.1rc1, 6.1.1, 6.1.2rc0
Fix Version/s: 6.1.2rc2

Time Tracking:
Not Specified

File Attachments: None
Image Attachments:

1. 20070309192117750.png
(65 kB)
Environment: jdk1.5 gentoo-linux


 Description  « Hide

continuation may cause 100%cpu occasionally.

sun.nio.ch.SelectorImpl.select(long) invoked 314658 times in 16889ms ! (see the YourKit snapshot )

may be the blocking Selector.select method stop blocking occasionally.
CODE SelectorManager.doSelect [
if (wait > 0)
_selector.select(wait); <====================== may return immediately.
else if (wait == 0)
_selector.selectNow();
else
_selector.select();
]

suggest code [
if (wait > 0) { int ret = _selector.select(wait); if (ret == 0) Thread.sleep(1000); //do something }
else if (wait == 0)
_selector.selectNow();
else
_selector.select();

]



Jan Bartel added a comment - 11/Mar/07 12:41 AM

Boyce,

What is the configuration on your SelectChannelConnector? Also, what kind of io load were you running during the profiling?

regards, Jan


Boyce Lee added a comment - 12/Mar/07 02:00 AM

the SelectChannelConnector configuration :
<Call name="addConnector">
<Arg>
<New class="org.mortbay.jetty.nio.SelectChannelConnector">
<Set name="port"><SystemProperty name="jetty.port" default="80"/></Set>
<Set name="maxIdleTime">30000</Set>
<Set name="Acceptors">1</Set>
<Set name="StatsOn">true</Set>
</New>
</Arg>
</Call>

the JAVA_OPTIONS:
-agentlib:yjpagent -Xmn256M -Xms512M -Xmx512M -Xss256k -XX:SurvivorRatio=6 -XX:+DisableExplicitGC -XX:MaxDirectMemorySize=128M

when i start jetty without yourkit, it still cause 100%cpu occasionally.

the duration of full cpu occupation(100% cpu) is close to timeout of 'Continuation.suspend(timeout)' by value,and after a while(the timeout) the cpu usage come down to the normal level(0%).


Greg Wilkins added a comment - 18/Mar/07 07:15 PM

Can't reproduce.

Can you tell us more about how you are using continuations? any sample code?

It should be OK for select(wait) to return immediately - so long as it eventually sleeps
for the wait time.


Boyce Lee added a comment - 19/Mar/07 12:20 AM

I found the reason why CPU usage was 100%.

When Continuation is in suspend, if users with some ie versions click the Stop button of ie toolbar or press the Esc key, CPU usage will be 100%.

Through analysing the tcp packet with net sniffer, i know that when the tcp packet with tcp flag RST and ACK set are sent to jetty, CPU usage will be 100%.

And i also found that the isResumed method of RetryContinuation class is not correct. See the followings:

public boolean isResumed()
{
return _pending;
}

My temporary resolution is:

Add rules to iptables: iptables -A INPUT -p tcp --dport 80 --tcp-flags RST,ACK RST,ACK -j DROP
Modify the isResumed method of RetryContinuation:

public boolean isResumed()
{
return _resumed;
}


Greg Wilkins added a comment - 21/Mar/07 12:14 AM

I have fixed the isResumed method (thanks),

but I can't reproduce the 100% CPU.

Can you reproduce with the demo chat room?


Boyce Lee added a comment - 22/Mar/07 01:09 AM

To reproduce the bug, please do as followings:

1. Write a simple servlet program, the code is:

public class TestServlet extends HttpServlet {
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
Continuation continuation = ContinuationSupport.getContinuation(request,null);
if (continuation.isPending()) { continuation.reset(); System.out.println("============= resumed or timeout ================"); PrintWriter out = null; out = response.getWriter(); out.write("OK, resumed: " + continuation.isResumed()); out.flush(); }
else { System.out.println("============= suspend(60000) ================"); continuation.suspend(60000); }
}
}

2. Deploy the jetty server in Linux (the tested Linux versions are gentoo 2.6.17-gentoo-r8,redhat 2.6.9-5.EL, the bug may not be shown in Windows)

3. Use the IE browser in Windows 2003 to access to the servlet, and click the Stop button on ie toolbar during the process of waiting (servlet request is in suspend). (the ie behavior of winxp is different with that of win2003, it won't send the tcp packet which tcp flag RST and ACK are set to server)


Greg Wilkins added a comment - 22/Mar/07 03:09 AM

It is going to take me some time to find a windows 2003 setup.... if anybody has a legal CD they can sell me then I will run it on vmware.

In the mean time, I can suggest that you restructure your code a little bit:

<pre>

protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException
{
Continuation continuation = ContinuationSupport.getContinuation(request,null);

if (!continuation.isPending())

{ System.err.println("============= suspend(5000) ================"); continuation.suspend(5000); }

continuation.reset();
System.err.println("============= resumed or timeout ================");
PrintWriter out = null;
response.setContentType("text/plain");
out = response.getWriter();
out.write("OK, resumed: " + continuation.isResumed());
out.flush();

}

</pre>

This reflects the fact that suspend may actually block. It probably wont fix the 100% loop.
Can you put a println in the start of the doGet method to see if the 100% loop includes the servlet


Boyce Lee added a comment - 22/Mar/07 03:40 AM

i put a println in the start of the doGet method ,and the println appeared only once in console – so the 100% loop doesn't include the servlet invocation.

"It is going to take me some time to find a windows 2003 setup" . it's unnecessary to test it in win2003 ,you can test it in any operation system that the explorer can send the special packet i mentioned before(tcp packet which tcp flag RST and ACK are set to server)


Boyce Lee added a comment - 22/Mar/07 03:51 AM

if you can put a linux server on internet,and give me url where the test servlet is running, i can help to test from my computer which causes the bug now.


Boyce Lee added a comment - 22/Mar/07 04:32 AM

the console output:

1 2007-03-22 17:24:54.788::INFO: Started SelectChannelConnector @ 0.0.0.0:80
2 ========== doGet ==============
3 ============= suspend(5000) ================
4 ========== doGet ==============
5 ============= resumed or timeout ================

the busy loop occurred between the first two invocation of doGet method,and the
duration of cpu100% is exactly the timeout of suspend().

(2)>(3)>cpu100%( duration: 5000)>(4)>(5)


Boyce Lee added a comment - 22/Mar/07 04:35 AM

the console output:

1 2007-03-22 17:24:54.788::INFO: Started SelectChannelConnector @ 0.0.0.0:80
2 ========== doGet ==============
3 ============= suspend(5000) ================
4 ========== doGet ==============
5 ============= resumed or timeout ================

the busy loop occurred between the only two invocation of doGet method,and the
duration of cpu100% is exactly the timeout of suspend().

(2)>(3)>cpu100%( duration: 5000)>(4)>(5)


Greg Wilkins added a comment - 22/Mar/07 04:28 PM

Boyce,

OK this sounds like the closed channel gets into a state that is selected but not closed or something?

I will put up a test server later today and get you to hit it with your windows 2003 browser.

thanks for the offer.


floyd.yao added a comment - 22/Mar/07 06:50 PM

is it a bug caused by the new feature of jetty 6? recently i found it in redhat server also


Greg Wilkins added a comment - 24/Mar/07 10:08 AM

Thanks all for the information on this one! It does appear to be a JVM bug that is fixed in java6

I have finally been able to reproduce the problem and it is a real shocker!! It is like somebody designed a JVM
bug specifically to make life REALLY hard for the way Jetty uses NIO.

I can detect that the failure is occuring (select returns for no reason before timeout), and I can inspect every
channel in the select set. But unless I write something, they all look perfectly fine (and open) - so I have no
way of detecting which channel is the one that has been badly closed.

So the only solution that I have found is to cancel all keys that have zero interested ops and to recreate
them again when there are indeed needed again. This appears to be working, but the cancelled keys
appears to provoke a couple of issues in Jetty that assumed key valid == channel open

Anyway.... I am very close to a fix ( I think ) and currently I can't produce any failures. But I'm going to
sleep on it and test again in the morning to be doubly certain.

thanks again!


Greg Wilkins added a comment - 24/Mar/07 11:18 AM

I have checked in what I think is a resolution for this issue.

However, some more stress testing of the server is not required as some additional
instabilities may have been introduced as a result!

This will put back 6.1.2rc2 to the end of the month


Doug Daniels added a comment - 20/Jul/09 06:35 PM

I'm running JETTY 7.0.0pre5 Continuations and I'm running into this issue when running in a Redhat Linux VMWare environment. Is the 6.1.2rc2 resolution merged into the 7.0.0pre5 code?

We believe we are seeing the RST, ACK issue on our network, and we're seeing 2 threads causing100% CPU utilization:

"qtp0-9 - Acceptor0 SelectChannelConnector@0.0.0.0:8080" - Thread t@28
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)

  • locked sun.nio.ch.Util$1@46f8d1d0
  • locked java.util.Collections$UnmodifiableSet@1cfc87b7
  • locked sun.nio.ch.EPollSelectorImpl@67454dc9
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:403)
    at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:174)
    at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:123)
    at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:698)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)

"qtp0-8 - Acceptor1 SelectChannelConnector@0.0.0.0:8080" - Thread t@27
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)

  • locked sun.nio.ch.Util$1@f79455e
  • locked java.util.Collections$UnmodifiableSet@697ea809
  • locked sun.nio.ch.EPollSelectorImpl@17b49fcf
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
    at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:403)
    at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:174)
    at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:123)
    at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:698)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)

Greg Wilkins added a comment - 20/Jul/09 06:47 PM

Doug,

the jetty-7 development has moved to eclipse and the latest release is a 7.0.0.RC1, which has handling for these JVM bugs.