Issue Details (XML | Word | Printable)

Key: JETTY-748
Type: Task Task
Status: Reopened Reopened
Priority: Minor Minor
Assignee: Joakim Erdfelt
Reporter: Greg Wilkins
Votes: 1
Watchers: 5
Operations

If you were logged in you would be able to see more operations.
Jetty

Assist Hadoop to port to jetty-6 and optimize

Created: 16/Oct/08 11:34 PM   Updated: Tuesday 04:45 PM
Component/s: HTTP
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments: 1. Java Source File OpenCloseTest.java (2 kB)



Greg Wilkins added a comment - 14/Nov/08 05:13 AM

Added RandomAccessFileBuffer and test harness.
Updated AbstractGenerator so that such buffers passed with sendContent are efficiently sent.

now testing.... then will advise hadoop how to use.


Chris Douglas added a comment - 10/Dec/08 02:38 AM

We're hitting a wall with transferTo in Java SE 6 (HADOOP-4700). It looks like a particularly buggy Linux implementation, and most of the issues aren't fixed until JDK7. There are a few more things to try, but we may have to abandon this particular route for now. Has anyone had more success with this on Linux? Is it worth experimenting with blocking I/O, or is that a non-starter for Jetty6?

On the bright side, dropping the transferTo experiment means we can tune the Server and MapOutputServlet in more conventional ways. Where should we start?

There's currently an open issue (HADOOP-4744) that may be related to the Jetty6 migration. In allocating a medium/large cluster (~400-500 nodes), it looks like some of the jetty servers are failing to start. If there are strategies for troubleshooting this, any guidance would be appreciated.


Greg Wilkins added a comment - 10/Dec/08 05:02 PM

David,

can you have a quick look at these issues and summarize them to me.
Also can you make sure that the RandomAccessFileBuffer and it's support are
backported from jetty-7 to jetty-6.


Devaraj Das added a comment - 11/Dec/08 01:10 PM

I recently commented on HADOOP-4744. Please have a look. Thanks!


Chris Douglas added a comment - 16/Dec/08 03:10 AM

We're seeing this pair of exceptions frequently:

org.mortbay.jetty.EofException
        at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
        at org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:548)
        at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
        at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946)
        at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646)
        at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2872)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:324)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
Caused by: java.io.IOException: Broken pipe
        at sun.nio.ch.FileDispatcher.write0(Native Method)
        at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
        at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
        at sun.nio.ch.IOUtil.write(IOUtil.java:75)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
        at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
        at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
        at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
        ... 24 more
java.lang.IllegalStateException: Committed
        at org.mortbay.jetty.Response.resetBuffer(Response.java:994)
        at org.mortbay.jetty.Response.sendError(Response.java:240)
        at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2895)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:324)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

This is running trunk without attempting to use RandomAccessFileBuffer. Have we misconfigured something? Is it worth worrying about?


Greg Wilkins added a comment - 16/Dec/08 07:07 AM

The Eof exception implies that your client is closing the connection while content is still being sent. Can this happen?
If so, then you can silently ignore(but log in debug) this exception (but best not to close while writing content).

The second exception looks like an attempt to generate an error page for the first exception - should not be a problem if you silently ignore the first exception.


Jan Bartel added a comment - 23/Dec/08 10:14 AM

Greg,

I checked in the change to AbstractGenerator.isBufferFull() see svn rev 4213. Please review.

Jan


David Yu added a comment - 01/Jul/09 03:42 AM

The invalid port returned from Connector.getLocalPort() was extremely random.
The accepted solution was simply to restart the connector and get the new valid port when its restarted.


Owen O'Malley added a comment - 01/Jul/09 12:04 PM

That is an unfortunate response to an observed race condition.


Greg Wilkins added a comment - 02/Jul/09 03:40 AM

Owen,

We are working to remove this close race.
It's not yet 100% fixed, but we will address it soon.

I'll reopen this issue so that we can close it when we have the 100% fix, but I'll reduce the priority because the exception thrown is essentially just noise (in the case that we understand it).


Greg Wilkins added a comment - 26/Aug/09 03:37 AM

Oh my previous comment was about the wrong race condition... I was talking about the issue if somebody closes a connection as it is being written to.

David was discussion a getLocalPort issue that I believe was fixed.
We have certainly not observed getLocalPort returning incorrect values and we heavily use it in our test harnesses.

But I'll reinvestigate anyway.


Greg Wilkins added a comment - 26/Aug/09 03:50 AM

https://issues.apache.org/jira/browse/HADOOP-4744 is closed, so I think the getLocalPort issue is indeed fixed.
The open issue associated with this is https://issues.apache.org/jira/browse/MAPREDUCE-258
Apparently int is not sufficient size for hadoop, so perhaps another approach is required.


Chris Douglas added a comment - 26/Aug/09 05:49 AM

As David said earlier, the "resolution" for HADOOP-4744 was to check if the race was lost, and retry. It looks like the race condition still exists.


Greg Wilkins added a comment - 26/Aug/09 08:50 AM

OK, so I've looked at this again, and I just cannot see how the localPort can be -1 unless the open fails somehow.

I've written a test harness to try to reproduce the problem, but it does not for me.
Could somebody who see the problem on hadoop run the OpenCloseTest and see if they can reproduce.

thanks


Greg Wilkins added a comment - 26/Aug/09 08:51 AM

pass this test a list of interface addresses on the command line


Chris Douglas added a comment - 31/Aug/09 05:29 AM

I couldn't reproduce this with OpenCloseTest and our default interface. I'll try to reproduce in our specific use of Jetty in the next few weeks.


Joakim Erdfelt added a comment - 02/Mar/10 03:11 PM

Looks like https://issues.apache.org/jira/browse/HADOOP-6528 is a re-emergence of this problem.


Joakim Erdfelt added a comment - 02/Mar/10 03:12 PM

Adding tracking for similar (now closed) issue at https://issues.apache.org/jira/browse/HADOOP-6386


Greg Wilkins added a comment - 03/Mar/10 01:26 AM

Joakim,

can you have a look at this? I've poured over the code and tried numerous tests and I fail to see how we can return -1 there.

thanks


Steve Loughran added a comment - 16/Mar/10 04:45 PM

One possibility is that something needs to be marked as volatile but it isn't, so its being cached or prefetched -and the change isn't been picked up. I've encountered that in some projects.