Jetty
  1. Jetty
  2. JETTY-748

Assist Hadoop to port to jetty-6 and optimize

    Details

    • Type: Task Task
    • Status: Resolved Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1.26, 7.2.1
    • Component/s: HTTP
    • Labels:
      None
    • Number of attachments :
      2
    1. JETTY-748.patch
      5 kB
      Greg Wilkins
    2. OpenCloseTest.java
      2 kB
      Greg Wilkins

      Activity

      Hide
      Greg Wilkins added a comment -

      Added RandomAccessFileBuffer and test harness.
      Updated AbstractGenerator so that such buffers passed with sendContent are efficiently sent.

      now testing.... then will advise hadoop how to use.

      Show
      Greg Wilkins added a comment - Added RandomAccessFileBuffer and test harness. Updated AbstractGenerator so that such buffers passed with sendContent are efficiently sent. now testing.... then will advise hadoop how to use.
      Hide
      Chris Douglas added a comment -

      We're hitting a wall with transferTo in Java SE 6 (HADOOP-4700). It looks like a particularly buggy Linux implementation, and most of the issues aren't fixed until JDK7. There are a few more things to try, but we may have to abandon this particular route for now. Has anyone had more success with this on Linux? Is it worth experimenting with blocking I/O, or is that a non-starter for Jetty6?

      On the bright side, dropping the transferTo experiment means we can tune the Server and MapOutputServlet in more conventional ways. Where should we start?

      There's currently an open issue (HADOOP-4744) that may be related to the Jetty6 migration. In allocating a medium/large cluster (~400-500 nodes), it looks like some of the jetty servers are failing to start. If there are strategies for troubleshooting this, any guidance would be appreciated.

      Show
      Chris Douglas added a comment - We're hitting a wall with transferTo in Java SE 6 ( HADOOP-4700 ). It looks like a particularly buggy Linux implementation, and most of the issues aren't fixed until JDK7. There are a few more things to try, but we may have to abandon this particular route for now. Has anyone had more success with this on Linux? Is it worth experimenting with blocking I/O, or is that a non-starter for Jetty6? On the bright side, dropping the transferTo experiment means we can tune the Server and MapOutputServlet in more conventional ways. Where should we start? There's currently an open issue ( HADOOP-4744 ) that may be related to the Jetty6 migration. In allocating a medium/large cluster (~400-500 nodes), it looks like some of the jetty servers are failing to start. If there are strategies for troubleshooting this, any guidance would be appreciated.
      Hide
      Greg Wilkins added a comment -

      David,

      can you have a quick look at these issues and summarize them to me.
      Also can you make sure that the RandomAccessFileBuffer and it's support are
      backported from jetty-7 to jetty-6.

      Show
      Greg Wilkins added a comment - David, can you have a quick look at these issues and summarize them to me. Also can you make sure that the RandomAccessFileBuffer and it's support are backported from jetty-7 to jetty-6.
      Hide
      Devaraj Das added a comment -

      I recently commented on HADOOP-4744. Please have a look. Thanks!

      Show
      Devaraj Das added a comment - I recently commented on HADOOP-4744. Please have a look. Thanks!
      Hide
      Chris Douglas added a comment -

      We're seeing this pair of exceptions frequently:

      org.mortbay.jetty.EofException
              at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787)
              at org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:548)
              at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569)
              at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946)
              at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646)
              at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577)
              at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2872)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
              at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
              at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
              at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
              at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
              at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
              at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
              at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
              at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
              at org.mortbay.jetty.Server.handle(Server.java:324)
              at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
              at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
              at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
              at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
              at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
              at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
              at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
      Caused by: java.io.IOException: Broken pipe
              at sun.nio.ch.FileDispatcher.write0(Native Method)
              at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
              at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
              at sun.nio.ch.IOUtil.write(IOUtil.java:75)
              at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
              at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169)
              at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221)
              at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721)
              ... 24 more
      
      java.lang.IllegalStateException: Committed
              at org.mortbay.jetty.Response.resetBuffer(Response.java:994)
              at org.mortbay.jetty.Response.sendError(Response.java:240)
              at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2895)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
              at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
              at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
              at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
              at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
              at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
              at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
              at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
              at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
              at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
              at org.mortbay.jetty.Server.handle(Server.java:324)
              at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
              at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
              at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
              at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
              at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
              at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
              at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
      

      This is running trunk without attempting to use RandomAccessFileBuffer. Have we misconfigured something? Is it worth worrying about?

      Show
      Chris Douglas added a comment - We're seeing this pair of exceptions frequently: org.mortbay.jetty.EofException at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787) at org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:548) at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:569) at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:646) at org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:577) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2872) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) Caused by: java.io.IOException: Broken pipe at sun.nio.ch.FileDispatcher.write0(Native Method) at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104) at sun.nio.ch.IOUtil.write(IOUtil.java:75) at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334) at org.mortbay.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:169) at org.mortbay.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:221) at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:721) ... 24 more java.lang.IllegalStateException: Committed at org.mortbay.jetty.Response.resetBuffer(Response.java:994) at org.mortbay.jetty.Response.sendError(Response.java:240) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2895) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:324) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522) This is running trunk without attempting to use RandomAccessFileBuffer. Have we misconfigured something? Is it worth worrying about?
      Hide
      Greg Wilkins added a comment -

      The Eof exception implies that your client is closing the connection while content is still being sent. Can this happen?
      If so, then you can silently ignore(but log in debug) this exception (but best not to close while writing content).

      The second exception looks like an attempt to generate an error page for the first exception - should not be a problem if you silently ignore the first exception.

      Show
      Greg Wilkins added a comment - The Eof exception implies that your client is closing the connection while content is still being sent. Can this happen? If so, then you can silently ignore(but log in debug) this exception (but best not to close while writing content). The second exception looks like an attempt to generate an error page for the first exception - should not be a problem if you silently ignore the first exception.
      Hide
      Jan Bartel added a comment -

      Greg,

      I checked in the change to AbstractGenerator.isBufferFull() see svn rev 4213. Please review.

      Jan

      Show
      Jan Bartel added a comment - Greg, I checked in the change to AbstractGenerator.isBufferFull() see svn rev 4213. Please review. Jan
      Hide
      David Yu added a comment -

      The invalid port returned from Connector.getLocalPort() was extremely random.
      The accepted solution was simply to restart the connector and get the new valid port when its restarted.

      Show
      David Yu added a comment - The invalid port returned from Connector.getLocalPort() was extremely random. The accepted solution was simply to restart the connector and get the new valid port when its restarted.
      Hide
      Owen O'Malley added a comment -

      That is an unfortunate response to an observed race condition.

      Show
      Owen O'Malley added a comment - That is an unfortunate response to an observed race condition.
      Hide
      Greg Wilkins added a comment -

      Owen,

      We are working to remove this close race.
      It's not yet 100% fixed, but we will address it soon.

      I'll reopen this issue so that we can close it when we have the 100% fix, but I'll reduce the priority because the exception thrown is essentially just noise (in the case that we understand it).

      Show
      Greg Wilkins added a comment - Owen, We are working to remove this close race. It's not yet 100% fixed, but we will address it soon. I'll reopen this issue so that we can close it when we have the 100% fix, but I'll reduce the priority because the exception thrown is essentially just noise (in the case that we understand it).
      Hide
      Greg Wilkins added a comment -

      Oh my previous comment was about the wrong race condition... I was talking about the issue if somebody closes a connection as it is being written to.

      David was discussion a getLocalPort issue that I believe was fixed.
      We have certainly not observed getLocalPort returning incorrect values and we heavily use it in our test harnesses.

      But I'll reinvestigate anyway.

      Show
      Greg Wilkins added a comment - Oh my previous comment was about the wrong race condition... I was talking about the issue if somebody closes a connection as it is being written to. David was discussion a getLocalPort issue that I believe was fixed. We have certainly not observed getLocalPort returning incorrect values and we heavily use it in our test harnesses. But I'll reinvestigate anyway.
      Hide
      Greg Wilkins added a comment -

      https://issues.apache.org/jira/browse/HADOOP-4744 is closed, so I think the getLocalPort issue is indeed fixed.
      The open issue associated with this is https://issues.apache.org/jira/browse/MAPREDUCE-258
      Apparently int is not sufficient size for hadoop, so perhaps another approach is required.

      Show
      Greg Wilkins added a comment - https://issues.apache.org/jira/browse/HADOOP-4744 is closed, so I think the getLocalPort issue is indeed fixed. The open issue associated with this is https://issues.apache.org/jira/browse/MAPREDUCE-258 Apparently int is not sufficient size for hadoop, so perhaps another approach is required.
      Hide
      Chris Douglas added a comment -

      As David said earlier, the "resolution" for HADOOP-4744 was to check if the race was lost, and retry. It looks like the race condition still exists.

      Show
      Chris Douglas added a comment - As David said earlier, the "resolution" for HADOOP-4744 was to check if the race was lost, and retry. It looks like the race condition still exists.
      Hide
      Greg Wilkins added a comment -

      OK, so I've looked at this again, and I just cannot see how the localPort can be -1 unless the open fails somehow.

      I've written a test harness to try to reproduce the problem, but it does not for me.
      Could somebody who see the problem on hadoop run the OpenCloseTest and see if they can reproduce.

      thanks

      Show
      Greg Wilkins added a comment - OK, so I've looked at this again, and I just cannot see how the localPort can be -1 unless the open fails somehow. I've written a test harness to try to reproduce the problem, but it does not for me. Could somebody who see the problem on hadoop run the OpenCloseTest and see if they can reproduce. thanks
      Hide
      Greg Wilkins added a comment -

      pass this test a list of interface addresses on the command line

      Show
      Greg Wilkins added a comment - pass this test a list of interface addresses on the command line
      Hide
      Chris Douglas added a comment -

      I couldn't reproduce this with OpenCloseTest and our default interface. I'll try to reproduce in our specific use of Jetty in the next few weeks.

      Show
      Chris Douglas added a comment - I couldn't reproduce this with OpenCloseTest and our default interface. I'll try to reproduce in our specific use of Jetty in the next few weeks.
      Hide
      Joakim Erdfelt added a comment -

      Looks like https://issues.apache.org/jira/browse/HADOOP-6528 is a re-emergence of this problem.

      Show
      Joakim Erdfelt added a comment - Looks like https://issues.apache.org/jira/browse/HADOOP-6528 is a re-emergence of this problem.
      Hide
      Joakim Erdfelt added a comment -

      Adding tracking for similar (now closed) issue at https://issues.apache.org/jira/browse/HADOOP-6386

      Show
      Joakim Erdfelt added a comment - Adding tracking for similar (now closed) issue at https://issues.apache.org/jira/browse/HADOOP-6386
      Hide
      Greg Wilkins added a comment -

      Joakim,

      can you have a look at this? I've poured over the code and tried numerous tests and I fail to see how we can return -1 there.

      thanks

      Show
      Greg Wilkins added a comment - Joakim, can you have a look at this? I've poured over the code and tried numerous tests and I fail to see how we can return -1 there. thanks
      Hide
      Steve Loughran added a comment -

      One possibility is that something needs to be marked as volatile but it isn't, so its being cached or prefetched -and the change isn't been picked up. I've encountered that in some projects.

      Show
      Steve Loughran added a comment - One possibility is that something needs to be marked as volatile but it isn't, so its being cached or prefetched -and the change isn't been picked up. I've encountered that in some projects.
      Hide
      Greg Wilkins added a comment -

      Hi,

      we've been looking at the jetty code again and again and can't see how a volatile or synchronized block or any memory barrier could help - unless it is an issue within the JVM libraries themselves.

      We've been running stop/start tests but also cannot reproduce.

      It is possible that something else in hadoop is stopping the server before the call to get the port?

      Can we get full details of the platforms on which this fails - operating system, JVM version etc. etc.

      Show
      Greg Wilkins added a comment - Hi, we've been looking at the jetty code again and again and can't see how a volatile or synchronized block or any memory barrier could help - unless it is an issue within the JVM libraries themselves. We've been running stop/start tests but also cannot reproduce. It is possible that something else in hadoop is stopping the server before the call to get the port? Can we get full details of the platforms on which this fails - operating system, JVM version etc. etc.
      Hide
      Greg Wilkins added a comment -

      Attached is a patch that changes the way localports are obtained.
      Instead of calling socket.getLocalPort() when connector.getLocalPort() is called, this patch calls socket.getLocalPort() in the open method and holds the result in a volatile int

      If close is called, the int is set to -2, so this will tell us if the port has not been opened, or has been closed before it was used.

      This patch is not yet in a release, but we can produce a snapshot build for the hadoop guys to test if that helps.

      Show
      Greg Wilkins added a comment - Attached is a patch that changes the way localports are obtained. Instead of calling socket.getLocalPort() when connector.getLocalPort() is called, this patch calls socket.getLocalPort() in the open method and holds the result in a volatile int If close is called, the int is set to -2, so this will tell us if the port has not been opened, or has been closed before it was used. This patch is not yet in a release, but we can produce a snapshot build for the hadoop guys to test if that helps.
      Hide
      Greg Wilkins added a comment -

      It looks like we have the same failure in https://bugs.eclipse.org/bugs/show_bug.cgi?id=310634
      so I'm definitely going to change the way we return the local port and get it at the time we open the server socket.

      Show
      Greg Wilkins added a comment - It looks like we have the same failure in https://bugs.eclipse.org/bugs/show_bug.cgi?id=310634 so I'm definitely going to change the way we return the local port and get it at the time we open the server socket.
      Hide
      Sachin Bochare added a comment -

      Hi Greg,

      I am running few tests on a 4 node hadoop cluster and hitting this problem at will.
      You mentioned about snapshot build in last comment. Could you please provide me snapshot build so that I can try it out on my cluster.

      Following are details of my environment:
      OS : Ubuntu 8.04.1
      Hadoop Version : 0.20.1
      Jetty Version : 6.1.14

      Error Messages:
      2010-05-16 12:32:53,282 WARN org.apache.hadoop.mapred.ReduceTask: java.net.ConnectException: Connection refused
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
      at sun.net._www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1368)
      at java.security.AccessController.doPrivileged(Native Method)
      at sun.net._www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1362)
      at sun.net._www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1016)
      at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
      at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
      at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
      at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
      Caused by: java.net.ConnectException: Connection refused
      at java.net.PlainSocketImpl.socketConnect(Native Method)
      at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
      at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
      at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
      at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
      at java.net.Socket.connect(Socket.java:525)
      at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
      at sun.net._www.http.HttpClient.openServer(HttpClient.java:394)
      at sun.net._www.http.HttpClient.openServer(HttpClient.java:529)
      at sun.net._www.http.HttpClient.<init>(HttpClient.java:233)
      at sun.net._www.http.HttpClient.New(HttpClient.java:306)
      at sun.net._www.http.HttpClient.New(HttpClient.java:323)
      at sun.net._www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:860)
      at sun.net._www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:801)
      at sun.net._www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726)
      at sun.net._www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049)
      ... 4 more

      Thanks,
      Sachin

      Show
      Sachin Bochare added a comment - Hi Greg, I am running few tests on a 4 node hadoop cluster and hitting this problem at will. You mentioned about snapshot build in last comment. Could you please provide me snapshot build so that I can try it out on my cluster. Following are details of my environment: OS : Ubuntu 8.04.1 Hadoop Version : 0.20.1 Jetty Version : 6.1.14 Error Messages: 2010-05-16 12:32:53,282 WARN org.apache.hadoop.mapred.ReduceTask: java.net.ConnectException: Connection refused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net._www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1368) at java.security.AccessController.doPrivileged(Native Method) at sun.net._www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1362) at sun.net._www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1016) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:525) at sun.net.NetworkClient.doConnect(NetworkClient.java:158) at sun.net._www.http.HttpClient.openServer(HttpClient.java:394) at sun.net._www.http.HttpClient.openServer(HttpClient.java:529) at sun.net._www.http.HttpClient.<init>(HttpClient.java:233) at sun.net._www.http.HttpClient.New(HttpClient.java:306) at sun.net._www.http.HttpClient.New(HttpClient.java:323) at sun.net._www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:860) at sun.net._www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:801) at sun.net._www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:726) at sun.net._www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1049) ... 4 more Thanks, Sachin
      Hide
      Joakim Erdfelt added a comment -

      We have started to see this port issue crop up in the past 2 months during our ever increasing unit testing.

      The latest spate of problems were however fixed by simply updating our Linux instances with a higher number of file descriptors.

      On one of the systems showing this local port issue, we noticed that the default setup for open files was at a mere 2,000.

      [joakim@lapetus jetty]$ lsb_release -a
      No LSB modules are available.
      Distributor ID:	Ubuntu
      Description:	Ubuntu 10.04 LTS
      Release:	10.04
      Codename:	lucid
      
      [joakim@lapetus jetty]$ ulimit -n
      2000
      

      So we updated the /etc/security/limits.conf to bump this number up to 20,000, and this solved our bad local port issues (after a reboot)

      [joakim@lapetus jetty]$ grep nofile /etc/security/limits.conf 
      #        - nofile - max number of open files
      *	hard	nofile	40000
      *	soft	nofile	20000
      
      [joakim@lapetus jetty]$ ulimit -a | grep -i file
      core file size          (blocks, -c) 0
      file size               (blocks, -f) unlimited
      open files                      (-n) 20000
      file locks                      (-x) unlimited
      

      This new ulimit helped our unit testing on the systems having issues.
      Our analysis shows that the aggressive unit testing that we do starts and stops a jetty server (on a system assigned port, using special port #0) consumes the socket at a rate faster than they can be recycled back into the "open files" ulimit, and caused our unit tests to eventually all fail due to a "-1" local port.
      See examples of error messages at https://bugs.eclipse.org/bugs/show_bug.cgi?id=310634

      Having a loop that continues to attempt to start the server while looking for a valid port number (as seen in the Hadoop codebase) will not help in this excessive "open files" ulimit condition.
      The best choice we've been able to come up with is to simply increase the "open files" ulimit.

      • Joakim
      Show
      Joakim Erdfelt added a comment - We have started to see this port issue crop up in the past 2 months during our ever increasing unit testing. The latest spate of problems were however fixed by simply updating our Linux instances with a higher number of file descriptors. On one of the systems showing this local port issue, we noticed that the default setup for open files was at a mere 2,000. [joakim@lapetus jetty]$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 10.04 LTS Release: 10.04 Codename: lucid [joakim@lapetus jetty]$ ulimit -n 2000 So we updated the /etc/security/limits.conf to bump this number up to 20,000, and this solved our bad local port issues (after a reboot) [joakim@lapetus jetty]$ grep nofile /etc/security/limits.conf # - nofile - max number of open files * hard nofile 40000 * soft nofile 20000 [joakim@lapetus jetty]$ ulimit -a | grep -i file core file size (blocks, -c) 0 file size (blocks, -f) unlimited open files (-n) 20000 file locks (-x) unlimited This new ulimit helped our unit testing on the systems having issues. Our analysis shows that the aggressive unit testing that we do starts and stops a jetty server (on a system assigned port, using special port #0) consumes the socket at a rate faster than they can be recycled back into the "open files" ulimit, and caused our unit tests to eventually all fail due to a "-1" local port. See examples of error messages at https://bugs.eclipse.org/bugs/show_bug.cgi?id=310634 Having a loop that continues to attempt to start the server while looking for a valid port number (as seen in the Hadoop codebase) will not help in this excessive "open files" ulimit condition. The best choice we've been able to come up with is to simply increase the "open files" ulimit. Joakim
      Hide
      Joakim Erdfelt added a comment -

      Sachin,

      Greg has posted a patch against Jetty 6 (attachment JETTY-748.patch) that backports the fixes for this from Jetty 7.
      I can build a patched Jetty 6 for you to test against if you need it. (Note: Jetty 6 is in maintenance mode)

      Show
      Joakim Erdfelt added a comment - Sachin, Greg has posted a patch against Jetty 6 (attachment JETTY-748 .patch) that backports the fixes for this from Jetty 7. I can build a patched Jetty 6 for you to test against if you need it. (Note: Jetty 6 is in maintenance mode)
      Hide
      Greg Wilkins added a comment -

      I think I've finally found the underlying cause of the -1 return for the local port.

      In the finally block of the run method of AbstractConnector, there is a check to see if the thread is for acceptor==0 and if so, it closes the server socket. The problem is that the acceptor thread apparently can complete after the next open of the connector (in a stop/start) scenario. So the socket is opened and then closed a short time later.

      Quick fix is to remove this close (not needed as doStop does a close).

      But a better fix will be to make sure that calls to Server.doStop wait until the thread pool has stopped all threads before returning.

      Show
      Greg Wilkins added a comment - I think I've finally found the underlying cause of the -1 return for the local port. In the finally block of the run method of AbstractConnector, there is a check to see if the thread is for acceptor==0 and if so, it closes the server socket. The problem is that the acceptor thread apparently can complete after the next open of the connector (in a stop/start) scenario. So the socket is opened and then closed a short time later. Quick fix is to remove this close (not needed as doStop does a close). But a better fix will be to make sure that calls to Server.doStop wait until the thread pool has stopped all threads before returning.

        People

        • Assignee:
          Joakim Erdfelt
          Reporter:
          Greg Wilkins
        • Votes:
          1 Vote for this issue
          Watchers:
          6 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved: