JRuby (please use github issues at http://bugs.jruby.org)
  1. JRuby (please use github issues at http://bugs.jruby.org)
  2. JRUBY-5514

Errno::EBADF is sometimes raised instead of IOError when TCPSocket#readline is called after TCPSocket#close

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.5.1, JRuby 1.5.6, JRuby 1.6RC2
    • Fix Version/s: JRuby 1.6RC3
    • Component/s: None
    • Labels:
      None
    • Environment:
    • Number of attachments :
      0

      Description

      When TCPSocket#readline is called after a TCPSocket#close on the same client socket, if the #close comes from another thread, sometimes an Errno::EBADF is the result instead of an IOError: closed stream.

      Tracing through the code, what's happening is that the ChannelDescriptor responds to the getline() call with a BadDescriptorException because the underlying Channel responds false to isOpen(), which on the face of it seems wrong but I'm not familiar enough with the semantics to make a convincing argument. Also, given that the readStream.getline() call is guarded by a beforeBlockingCall(), I suspect that this actually indicates a race somewhere else in the code.

      It is proving difficult to narrow this down to a single test case, but the following code at least demonstrates the problem:

        require 'socket'
      
        server = TCPServer.new(0)
        server_thread = Thread.new{ server.accept while true }
      
        exceptions = []
      
        1000.times do |i|
          client_socket = TCPSocket.new("localhost", server.addr[1])
          close_thread = Thread.new{ sleep(Kernel.rand/100); client_socket.close }
          read_thread = Thread.new do 
            sleep(Kernel.rand/100); 
            begin
              client_socket.readline
            rescue Exception => e
              exceptions << e
            end
          end
        
          close_thread.join
          read_thread.join
        end
      
        p exceptions.map{|e| e.class.to_s}.inject({}){|m,r| m.merge r => true}.keys.sort
      

      The final output is, for me:

      ["EOFError", "Errno::EBADF", "Errno::EPIPE", "IOError"]

      EOFError and IOError are expected. EBADF and EPIPE are not.

        Activity

        Hide
        Charles Oliver Nutter added a comment -

        I get even worse output:

        ["EOFError", "Errno::EBADF", "Errno::EPIPE", "IOError", "Java::JavaLang::NullPointerException"]

        Digging in.

        Show
        Charles Oliver Nutter added a comment - I get even worse output: ["EOFError", "Errno::EBADF", "Errno::EPIPE", "IOError", "Java::JavaLang::NullPointerException"] Digging in.
        Hide
        Charles Oliver Nutter added a comment -

        Full native backtraces of the exceptions I'm getting locally: https://gist.github.com/832196

        Show
        Charles Oliver Nutter added a comment - Full native backtraces of the exceptions I'm getting locally: https://gist.github.com/832196
        Hide
        Charles Oliver Nutter added a comment -

        Fixed the NPE, looking at the others now. Our logic for this area is mostly ported from MRI, but we have to simulate errno exceptions atop NIO.

        Show
        Charles Oliver Nutter added a comment - Fixed the NPE, looking at the others now. Our logic for this area is mostly ported from MRI, but we have to simulate errno exceptions atop NIO.
        Hide
        Charles Oliver Nutter added a comment -

        FWIW, Ruby 1.8.7 appears to hang on this script and 1.9 only produces IOError:

        ~/projects/jruby &#10132; ruby blah.rb
        IOError
        closed stream
        blah.rb:15:in `readline'
        blah.rb:15
        blah.rb:12:in `initialize'
        blah.rb:12:in `new'
        blah.rb:12
        blah.rb:9:in `times'
        blah.rb:9
        
        ^Cblah.rb:10: Interrupt
        	from blah.rb:9:in `times'
        	from blah.rb:9
        
        
        ~/projects/jruby &#10132; ruby1.9 blah.rb
        IOError
        closed stream
        blah.rb:15:in `readline'
        blah.rb:15:in `block (2 levels) in <main>'
        
        ["IOError"]
        
        Show
        Charles Oliver Nutter added a comment - FWIW, Ruby 1.8.7 appears to hang on this script and 1.9 only produces IOError: ~/projects/jruby &#10132; ruby blah.rb IOError closed stream blah.rb:15:in `readline' blah.rb:15 blah.rb:12:in `initialize' blah.rb:12:in `new' blah.rb:12 blah.rb:9:in `times' blah.rb:9 ^Cblah.rb:10: Interrupt from blah.rb:9:in `times' from blah.rb:9 ~/projects/jruby &#10132; ruby1.9 blah.rb IOError closed stream blah.rb:15:in `readline' blah.rb:15:in `block (2 levels) in <main>' ["IOError"]
        Hide
        Charles Oliver Nutter added a comment -

        It seems like we leaned on our synthetic EBADF exception too heavily. EBADF is likely what would be thrown by the underlying IO calls in MRI if MRI didn't already track whether a stream had been closed manually. By changing some of our "already-closed" checks to throw a new ConnectionClosedException (that can be turned into IOError as MRI does it) we can probably come closer to matching MRI's output for all types of closed streams.

        However, I don't think we can match MRI exactly. The problem here is one of timing. Because you have a separate thread firing off the "close" call, it's impossible to know where in execution the main thread might be. It may be in the IO logic, it may be in the OpenFile logic that underlies that (which is where the NPE came from), or it may be in the stream logic (which is where the EBADF comes from). Concurrently closing a stream on a real concurrent implementation is likely to produce some EBADF no matter what we do.

        EPIPE I'm not sure about. Still investigating and tweaking.

        Show
        Charles Oliver Nutter added a comment - It seems like we leaned on our synthetic EBADF exception too heavily. EBADF is likely what would be thrown by the underlying IO calls in MRI if MRI didn't already track whether a stream had been closed manually. By changing some of our "already-closed" checks to throw a new ConnectionClosedException (that can be turned into IOError as MRI does it) we can probably come closer to matching MRI's output for all types of closed streams. However, I don't think we can match MRI exactly. The problem here is one of timing. Because you have a separate thread firing off the "close" call, it's impossible to know where in execution the main thread might be. It may be in the IO logic, it may be in the OpenFile logic that underlies that (which is where the NPE came from), or it may be in the stream logic (which is where the EBADF comes from). Concurrently closing a stream on a real concurrent implementation is likely to produce some EBADF no matter what we do. EPIPE I'm not sure about. Still investigating and tweaking.
        Hide
        Charles Oliver Nutter added a comment -

        the EPIPE/ESPIPE errors were incorrect; they were being raised when attempting to seek on a socket in a call that should ignore such errors. Remaining notes for the fixes are in the commit below.

        commit 15832d3b4dc4f4023b40b710db43f66ce9d0a2b7
        Author: Charles Oliver Nutter <headius@headius.com>
        Date: Thu Feb 17 14:46:38 2011 -0600

        Fix JRUBY-5514: Errno::EBADF is sometimes raised instead of IOError when TCPSocket#readline is called after TCPSocket#close

        The problem here is that under real concurrency, EBADF or equivalent are perfectly valid to see. Eventually you have to make a low-level IO call, and under concurrent load that could happen just slightly after another thread calls #close. There's little we can do to eliminate EBADF, but I have cleaned up some paths that raised it unnecessarily and others that raised EPIPE when they should not.

        I also cleaned up several places that could cause a NullPointerException when under concurrent load, because a stream gets nulled out after it has already been checked for non-nullness.

        Show
        Charles Oliver Nutter added a comment - the EPIPE/ESPIPE errors were incorrect; they were being raised when attempting to seek on a socket in a call that should ignore such errors. Remaining notes for the fixes are in the commit below. commit 15832d3b4dc4f4023b40b710db43f66ce9d0a2b7 Author: Charles Oliver Nutter <headius@headius.com> Date: Thu Feb 17 14:46:38 2011 -0600 Fix JRUBY-5514 : Errno::EBADF is sometimes raised instead of IOError when TCPSocket#readline is called after TCPSocket#close The problem here is that under real concurrency, EBADF or equivalent are perfectly valid to see. Eventually you have to make a low-level IO call, and under concurrent load that could happen just slightly after another thread calls #close. There's little we can do to eliminate EBADF, but I have cleaned up some paths that raised it unnecessarily and others that raised EPIPE when they should not. I also cleaned up several places that could cause a NullPointerException when under concurrent load, because a stream gets nulled out after it has already been checked for non-nullness.
        Hide
        brian walsh added a comment -

        Greetings. Appreciate your time.

        I'm running into a problem with a NPE in this area. Not reproducible yet.
        However, I thought I'd include a log snippet here in case it prompts any advice.

        Confused, in that as far as I can tell for jruby 1.6.5.1, line 356 in RubyFile.java is a case statement?
        356: case LOCK_UN | LOCK_NB:

        From a failure occurrence...

        • jruby.rack.jruby.version : 1.6.5.1
        • jruby.rack.rack.release : 1.2
        • jruby.rack.version : 1.0.10

        A Java::JavaLang::NullPointerException occurred in XXXXX#yyyyy:

        org.jruby.RubyFile.flock(RubyFile.java:356)

        -------------------------------
        Backtrace:
        -------------------------------

        org.jruby.RubyFile.flock(RubyFile.java:356)
        org.jruby.RubyFile$i$1$0$flock.call(RubyFile$i$1$0$flock.gen:65535)
        org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:167)
        org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36)
        org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
        org.jruby.ast.EnsureNode.interpret(EnsureNode.java:98)
        org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:75)
        .....

        Show
        brian walsh added a comment - Greetings. Appreciate your time. I'm running into a problem with a NPE in this area. Not reproducible yet. However, I thought I'd include a log snippet here in case it prompts any advice. Confused, in that as far as I can tell for jruby 1.6.5.1, line 356 in RubyFile.java is a case statement? 356: case LOCK_UN | LOCK_NB: From a failure occurrence... jruby.rack.jruby.version : 1.6.5.1 jruby.rack.rack.release : 1.2 jruby.rack.version : 1.0.10 A Java::JavaLang::NullPointerException occurred in XXXXX#yyyyy: org.jruby.RubyFile.flock(RubyFile.java:356) ------------------------------- Backtrace: ------------------------------- org.jruby.RubyFile.flock(RubyFile.java:356) org.jruby.RubyFile$i$1$0$flock.call(RubyFile$i$1$0$flock.gen:65535) org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:167) org.jruby.ast.FCallOneArgNode.interpret(FCallOneArgNode.java:36) org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104) org.jruby.ast.EnsureNode.interpret(EnsureNode.java:98) org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:75) .....

          People

          • Assignee:
            Charles Oliver Nutter
            Reporter:
            Alex Young
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: