JRuby (please use github issues at http://bugs.jruby.org)
  1. JRuby (please use github issues at http://bugs.jruby.org)
  2. JRUBY-6607

TCPSocket connect failure inside Timeout leaves orphaned sockets

    Details

    • Number of attachments :
      0

      Description

      This is the same problem as described in http://jira.codehaus.org/browse/JRUBY-6498.

      If you wrap a connect inside a Timeout, and the timeout expires, it leaves an orphaned socket in whatever state it was in when the timeout expired. In the following code, even after the timeout the socket is still in SYN_SENT, and after a short time it shows up via lsof as an orphaned socket with the status cannot identify protocol. In our case this took down a cluster of 80 jruby servers when our memcache servers reached their connection limit. The memcache client kept trying to connect, creating thousands of these things and we ran out of open file descriptors within minutes.

      <code>
      require 'timeout'
      require 'socket'

      begin
      Timeout.timeout(1) do
      sock = TCPSocket.new('10.255.255.1', 80)
      end
      rescue Timeout::Error => e
      puts e.message
      end
      sleep 1000
      </code>

        Issue Links

          Activity

          Hide
          Chris Ochs added a comment -

          On linux lsof shows the sockets as "Can't identify protocol', and they stay there until the process is killed, sometimes days later.

          Chris

          Show
          Chris Ochs added a comment - On linux lsof shows the sockets as "Can't identify protocol', and they stay there until the process is killed, sometimes days later. Chris
          Hide
          Charles Oliver Nutter added a comment -

          Ok, this may have a simpler explanation than I realized.

          When a TimeoutError is raised for the timeout call, we are not doing anything to guarantee the socket is closed. 1.6 doesn't appear to ever close the socket for an exceptional condition, and master only does it for expected exceptions from the Java API.

          I added a finally block that closes the channel if it did not connect successfully, and that appears to prevent sockets from getting stuck in SYN_WAIT or TIME_WAIT on my system (OS X SL).

          Will test this out and also patch the 1.6 branch a similar way (though we are not planning a 1.6.8 release).

          Show
          Charles Oliver Nutter added a comment - Ok, this may have a simpler explanation than I realized. When a TimeoutError is raised for the timeout call, we are not doing anything to guarantee the socket is closed. 1.6 doesn't appear to ever close the socket for an exceptional condition, and master only does it for expected exceptions from the Java API. I added a finally block that closes the channel if it did not connect successfully, and that appears to prevent sockets from getting stuck in SYN_WAIT or TIME_WAIT on my system (OS X SL). Will test this out and also patch the 1.6 branch a similar way (though we are not planning a 1.6.8 release).
          Hide
          Charles Oliver Nutter added a comment -

          The fix is in and looks good. It's on master in the commit below, and snuck into ef59f99 on the 1.6 branch along with a fix for JRUBY-6619. I added 6f398e8 on 1.6 branch to make the code match master a bit better.

          With the fix in place, I no longer see sockets stuck in SYN_SENT or TIME_WAIT after they've timed out.

          commit 3172fee546eb81e3e7814fdc38eabaaefc2d16c5
          Author: Charles Oliver Nutter <headius@headius.com>
          Date:   Thu May 17 10:52:35 2012 -0500
          
              Fix JRUBY-6607, JRUBY-6498
              
              We did not have a finally block to ensure that unsuccessful
              connections were properly closed. Timeout errors would cause us
              to leave the partially-initialized connection dangling, and the
              JVM does not clean up socket connections on GC. Adding a finally
              block and success flag appears to have tidied up lingering
              sockets.
          
          Show
          Charles Oliver Nutter added a comment - The fix is in and looks good. It's on master in the commit below, and snuck into ef59f99 on the 1.6 branch along with a fix for JRUBY-6619 . I added 6f398e8 on 1.6 branch to make the code match master a bit better. With the fix in place, I no longer see sockets stuck in SYN_SENT or TIME_WAIT after they've timed out. commit 3172fee546eb81e3e7814fdc38eabaaefc2d16c5 Author: Charles Oliver Nutter <headius@headius.com> Date: Thu May 17 10:52:35 2012 -0500 Fix JRUBY-6607, JRUBY-6498 We did not have a finally block to ensure that unsuccessful connections were properly closed. Timeout errors would cause us to leave the partially-initialized connection dangling, and the JVM does not clean up socket connections on GC. Adding a finally block and success flag appears to have tidied up lingering sockets.
          Hide
          Chris Ochs added a comment -

          Thank you Charles! Once again you guys are awsome.

          Show
          Chris Ochs added a comment - Thank you Charles! Once again you guys are awsome.
          Hide
          Charles Oliver Nutter added a comment -

          No problem, Chris! I'm just sorry this simple fix took a while to find!

          Show
          Charles Oliver Nutter added a comment - No problem, Chris! I'm just sorry this simple fix took a while to find!

            People

            • Assignee:
              Charles Oliver Nutter
              Reporter:
              Chris Ochs
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: