Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Not A Bug
-
Affects Version/s: JRuby 1.5.3
-
Fix Version/s: None
-
Component/s: Ruby 1.8.7
-
Labels:None
-
Environment:Ubuntu 10.04 LTS
java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8.1) (6b18-1.8.1-0ubuntu1)
OpenJDK 64-Bit Server VM (build 16.0-b13, mixed mode)
jruby-openssl (0.7.2.dev)
-
Number of attachments :0
Description
I have a jruby rest service which retrieves email from different imap accounts using ruby imap. After being up for a few hours, all of the worker threads become stuck within get_tagged_response on "@response_arrival.wait"
def get_tagged_response(tag)
until @tagged_responses.key?(tag)
@response_arrival.wait
end
return pick_up_tagged_response(tag)
end
They seem to be waiting on their @receiver_thread which is retrieving data from the imap server. When i do a thread dump, half of the threads are stuck here:
"Thread-9802" daemon prio=10 tid=0x0000000003e71000 nid=0x5e78 runnable [0x00007f3f6e9c6000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:83)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked <0x00007f3ff95fbe48> (a sun.nio.ch.Util$1)
- locked <0x00007f3ff95fbe30> (a java.util.Collections$UnmodifiableSet)
- locked <0x00007f3ff95fba70> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102)
at org.jruby.ext.openssl.SSLSocket.waitSelect(SSLSocket.java:240)
at org.jruby.ext.openssl.SSLSocket.sysread(SSLSocket.java:448)
at org.jruby.ext.openssl.SSLSocket$i_method_0_1$RUBYINVOKER$sysread.call(org/jruby/ext/openssl/SSLSocket$i_method_0_1$RUBYINVOKER$sysread.gen:65535)
at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:630)
at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:186)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:146)
at ruby.jit.fill_rbuff_0CEA65EC1C909A5CF6CB7E8100C08A3F334C6007.rescue_1$RUBY$_rescue__0(buffering.rb:35)
at ruby.jit.fill_rbuff_0CEA65EC1C909A5CF6CB7E8100C08A3F334C6007._file_(buffering.rb:34)
at ruby.jit.fill_rbuff_0CEA65EC1C909A5CF6CB7E8100C08A3F334C6007._file_(buffering.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:119)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:106)
at ruby.jit.gets_78EF17A1CC25A7841D76040F4F007ECC23A22E41._file_(buffering.rb:106)
at ruby.jit.gets_78EF17A1CC25A7841D76040F4F007ECC23A22E41._file_(buffering.rb)
at org.jruby.ast.executable.AbstractScript._file_(AbstractScript.java:39)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:153)
at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:309)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:148)
at ruby.jit.get_response_9C1FCDC0EA0182083B570911F4E3FECA09F08611._file_(imap.rb:994)
at ruby.jit.get_response_9C1FCDC0EA0182083B570911F4E3FECA09F08611._file_(imap.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:119)
at org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:289)
Could this be a problem with nio or open_ssl? I am using jruby-openssl (0.7.2.dev).
I originally submitted a ticket with the ruby developers but they referred me to jruby
.
http://redmine.ruby-lang.org/issues/show/3999
thx.
Activity
I saw that.. sorry.. i thought it would be best to create an issue after i got a reply to the ruby issue yesterday.
Right.. trinidad rest service connects to imap service for a bunch of different accounts.
It used to run for a day or so before i noticed but I noticed that it seems to start happening pretty soon with some connections. Eventually all the connections are hung up and that's when i begin to notice. It seems that lately that all the threads start to get hung up within a couple of hours.
My code is really split between multiple classes for imap connectivity but i will try to paste in some snippets:
Timeout.timeout(60) do
connection = Net::IMAP.new(host, port, ssl)
connection.login(email, password)
connection.examine(mailbox)
connection
end
Sometimes i fetch the headers:
imap.uid_fetch(uids, ['BODY.PEEK[HEADER]', 'RFC822.SIZE', 'UID'])
Then sometimes i fetch the bodies:
imap.uid_fetch(uids.collect(&:to_i), ['UID', 'BODY.PEEK[]'])
Then i logout and disconnect when the connection is destroyed:
connection.logout() rescue nil
connection.disconnect() rescue nil
Incidentally, the timeout does not seem to work for the login. I think it's because imap spawns another thread for receiving the data.
Thanks for your help!
From the backtraces, a thread is trying to 'gets' (imap.rb:994) and waiting for a socket to be readable. It's not 'hung'.
So we need to find the reason why the IMAP server is not responding.
Can you check the logfile of IMAP server you're connecting? The IMAP server might be closing connections from an error, or just normally.
And please check that the patch at JRUBY-5019 is applied to your jruby. Disconnecting IMAPS connection may not work on JRuby 1.5.3.
It seems mainly to hang when connecting to Yahoo imap server which unfortunately i cannot see the logs for.
Shouldn't the gets timeout after a while if the server is not responding?
Regarding 5019, that patch was way back in 1.5.1. I am using 1.5.3 so should i not be ok?
Thanks!
Ah, the server is not under control. I should have been aware of that.
The patch for net/imap.rb at JRUBY-5019 is applied at 2010-10-22 (1 week ago) so it's not yet released. Please apply the patch.
Timeout should work as expected. You're doing something like this, right?
begin
timeout(TIMEOUT) do
imap = init_imap_connection(...)
(do something good with imap)
end
ensure
imap.disconnect if imap
end
Oh.. i didn't realize that patch was not yet applied. I guess i should monkey patch it in? Do you think this patch could help?
In regards to timeout, i am doing something like that but it's not quite as simple as i am pooling imap connections using apache commons pool.
I have tried timeout in several areas and it does not seem to kick in when imap is hanging in this situation.
Thanks again!
One more thing. I am realizing that most of the locked up threads are not using ssl.
"Thread-140320" daemon prio=10 tid=0x00007f7cac477800 nid=0x6eff runnable [0x00007f7ca6075000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
at sun.nio.ch.IOUtil.read(IOUtil.java:224)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
- locked <0x00007f7d2116e158> (a java.lang.Object)
at org.jruby.util.io.ChannelStream.refillBuffer(ChannelStream.java:172)
at org.jruby.util.io.ChannelStream.bufferedRead(ChannelStream.java:916)
at org.jruby.util.io.ChannelStream.bufferedRead(ChannelStream.java:878)
at org.jruby.util.io.ChannelStream.fread(ChannelStream.java:1270) - locked <0x00007f7d21180388> (a org.jruby.util.io.ChannelStream)
at org.jruby.RubyIO.readNotAll(RubyIO.java:2745)
at org.jruby.RubyIO.read(RubyIO.java:2671)
at org.jruby.RubyIO$i_method_multi$RUBYINVOKER$read.call(org/jruby/RubyIO$i_method_multi$RUBYINVOKER$read.gen:65535)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:146)
at ruby.jit.get_response_9C1FCDC0EA0182083B570911F4E3FECA09F08611._file_(imap.rb:998)
at ruby.jit.get_response_9C1FCDC0EA0182083B570911F4E3FECA09F08611._file_(imap.rb)
at org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:119)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:106)
at org.jruby.ast.VCallNode.interpret(VCallNode.java:85)
at org.jruby.ast.LocalAsgnNode.interpret(LocalAsgnNode.java:123)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
at org.jruby.ast.RescueNode.executeBody(RescueNode.java:199)
at org.jruby.ast.RescueNode.interpretWithJavaExceptions(RescueNode.java:118)
at org.jruby.ast.RescueNode.interpret(RescueNode.java:110)
at org.jruby.ast.BeginNode.interpret(BeginNode.java:83)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
at org.jruby.ast.WhileNode.interpret(WhileNode.java:131)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
at org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
at org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:139)
at org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:159)
at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:106)
at org.jruby.ast.VCallNode.interpret(VCallNode.java:85)
at org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
at org.jruby.runtime.InterpretedBlock.evalBlockBody(InterpretedBlock.java:373)
at org.jruby.runtime.InterpretedBlock.yield(InterpretedBlock.java:327)
at org.jruby.runtime.BlockBody.call(BlockBody.java:78)
at org.jruby.runtime.Block.call(Block.java:89)
at org.jruby.RubyProc.call(RubyProc.java:224)
at org.jruby.RubyProc.call(RubyProc.java:207)
at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:94)
at java.lang.Thread.run(Thread.java:636)
Still stuck reading from the imap server, but not over ssl.
Thanks again for your help!
So there're 2 problems here;
1. timeout does not work as expected.
2. IMAP server get silent suddenly.
Let's try to solve issues in this order.
Can you find "ScheduledThreadPoolExecutor" in your thread dump? Should be named as "Thread-n". The executor should handle timeout. And please try to narrow timeout block region of your code.
In my threaddumps, I see the ScheduledThreadPoolExecutor:
"Thread-17" daemon prio=10 tid=0x0000000002381000 nid=0x1c04 waiting on condition [0x00007f7cb1656000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007f7cc3c3f168> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2081)
at java.util.concurrent.DelayQueue.take(DelayQueue.java:193)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:688)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:681)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1103)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
In regards to timeout, i have expanded my usage of it to all imap calls where as before, I was just using it for login.
I have been restarting my server today every hour to get around this problem, but i have reduced that to every 3 hours as i want to reproduce the error again with all threads going through a timeout block.
Thanks again!
-karl
I realized that backtraces you posted do not include 'org.jruby.ext.Timeout.timeout(Timeout.java:94)'. And I still have a feeling that there's something missing in the backtraces.
Can you try to pass '-J-Djruby.compile.mode=OFF' to jruby to start Trinidad and see how it goes?
Ok. It looks to me like timeout is working properly.
My code looks like:
def within_open_connection(& block)
connection = nil
Timeout.timeout(200) do
connection = borrow_connection
block.call(connection)
end
ensure
return_connection(connection)
end
And I see the following exception from apache commons pool:
E, 2010-10-31T03:14:53.385000 #23516 ERROR – : java.util.NoSuchElementException: Could not create a validated object, cause: execution expired:
org/apache/commons/pool/impl/GenericKeyedObjectPool.java:1221:in `borrowObject'
sun.reflect.GeneratedMethodAccessor52:-1:in `invoke'
sun/reflect/DelegatingMethodAccessorImpl.java:43:in `invoke'
java/lang/reflect/Method.java:616:in `invoke'
org/jruby/javasupport/JavaMethod.java:525:in `invokeDirectWithExceptionHandling'
./lib/mailbox.rb:22:in `borrow_connection'
/home/ubuntu/.rvm/gems/jruby-1.5.3/gems/newrelic_rpm-2.13.2/lib/new_relic/agent/method_tracer.rb:319:in `borrow_connection_with_trace_Mailbox_borrow_connection'
/home/ubuntu/.rvm/gems/jruby-1.5.3/gems/newrelic_rpm-2.13.2/lib/new_relic/agent/method_tracer.rb:141:in `trace_execution_scoped'
/home/ubuntu/.rvm/gems/jruby-1.5.3/gems/newrelic_rpm-2.13.2/lib/new_relic/agent/method_tracer.rb:314:in `borrow_connection_with_trace_Mailbox_borrow_connection'
./lib/mailbox.rb:8:in `within_open_connection'
./lib/mailbox.rb:7:in `within_open_connection'
./lib/imap/mailbox.rb:19:in `uids'
./lib/email_account.rb:54:in `uids'
./lib/email_gateway.rb:93:in `POST /mailbox/uids/?'
/home/ubuntu/.rvm/gems/jruby-1.5.3/gems/sinatra-1.1.0/lib/sinatra/base.rb:1032:in `compile!'
/home/ubuntu/.rvm/gems/jruby-1.5.3/gems/sinatra-1.1.0/lib/sinatra/base.rb:620:in `instance_eval'
/home/ubuntu/.rvm/gems/jruby-1.5.3/gems/sinatra-1.1.0/lib/sinatra/base.rb:620:in `route_eval'
So to me it seems like the Timeout is working properly as the "execution expired" messages comes from the Timeout.timeout method. I will leave things running as is tonight and check for hung threads tomorrow.
Thanks!
I am not sure it's related but i am suspicious because it looks like commons pool is throwing errors closing the imap connections to yahoo. (Most of the hanging threads are from yahoo imap connections).
E, [2010-10-31T17:58:17.515000 #10659] ERROR -- : exception destroying connection for email somebody@ymail.com
Java::JavaLang::NullPointerException :
org.jruby.ext.socket.RubyBasicSocket.shutdownInternal(RubyBasicSocket.java:633)
org.jruby.ext.socket.RubyBasicSocket.shutdownInternal(RubyBasicSocket.java:666)
org.jruby.ext.socket.RubyBasicSocket.shutdown(RubyBasicSocket.java:626)
org.jruby.ext.socket.RubyBasicSocket$i_method_0_1$RUBYINVOKER$shutdown.call(org/jruby/ext/socket/RubyBasicSocket$i_method_0_1$RUBYINVOKER$shutdown.gen:65535)
org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:626)
org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:178)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:106)
ruby.jit.disconnect_6D825BAB1C35249C11645A1C8B095585FD377C37.rescue_line_289_1(imap.rb:292)
ruby.jit.disconnect_6D825BAB1C35249C11645A1C8B095585FD377C37.rescue_1$RUBY$__rescue___0(imap.rb:289)
ruby.jit.disconnect_6D825BAB1C35249C11645A1C8B095585FD377C37.__file__(imap.rb:287)
ruby.jit.disconnect_6D825BAB1C35249C11645A1C8B095585FD377C37.__file__(imap.rb)
ruby.jit.disconnect_6D825BAB1C35249C11645A1C8B095585FD377C37.__file__(imap.rb)
org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:85)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:141)
org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:101)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:269)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:77)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:82)
ruby.jit.disconnect_with_trace_Net_IMAP_disconnect_4C18AA3A1535134399BCDC02EC50E7828886246F.block_0$RUBY$__block__(method_tracer.rb:319)
ruby.jit.disconnect_with_trace_Net_IMAP_disconnect_4C18AA3A1535134399BCDC02EC50E7828886246FBlockCallback$block_0$RUBY$__block__xx1.call(Unknown Source)
org.jruby.runtime.CompiledBlock.yield(CompiledBlock.java:115)
org.jruby.runtime.CompiledBlock.yield(CompiledBlock.java:98)
org.jruby.runtime.CompiledBlock.yieldSpecific(CompiledBlock.java:78)
org.jruby.runtime.Block.yieldSpecific(Block.java:99)
org.jruby.ast.ZYieldNode.interpret(ZYieldNode.java:25)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
org.jruby.ast.EnsureNode.interpret(EnsureNode.java:96)
org.jruby.ast.BeginNode.interpret(BeginNode.java:83)
org.jruby.ast.NewlineNode.interpret(NewlineNode.java:104)
org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:242)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:199)
org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:195)
org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:212)
ruby.jit.disconnect_with_trace_Net_IMAP_disconnect_4C18AA3A1535134399BCDC02EC50E7828886246F.__file__(method_tracer.rb:314)
ruby.jit.disconnect_with_trace_Net_IMAP_disconnect_4C18AA3A1535134399BCDC02EC50E7828886246F.__file__(method_tracer.rb)
org.jruby.ast.executable.AbstractScript.__file__(AbstractScript.java:35)
org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:119)
org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:159)
org.jruby.internal.runtime.methods.AliasMethod.call(AliasMethod.java:56)
org.jruby.runtime.callsite.CachingCallSite.cacheAndCall(CachingCallSite.java:289)
org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:108)
ruby.jit.destroy_object_576FE2178C45FC66092AF2E32A10494A189FAB33.rescue_3$RUBY$__rescue___2(poolable_object_factory.rb)
ruby.jit.destroy_object_576FE2178C45FC66092AF2E32A10494A189FAB33.rescue_1$RUBY$__rescue___0(poolable_object_factory.rb:18)
ruby.jit.destroy_object_576FE2178C45FC66092AF2E32A10494A189FAB33.block_0$RUBY$__block__(poolable_object_factory.rb:15)
The offending code is:
connection.disconnect() rescue nil
Thx!
Is it possible the Yahoo imap server is not particularly reliable?
If you're satisfied that this issue is resolved by the imap patch, perhaps we could move investigation of other problems to a different bug.
I agree Charles. I have added a more consistent Timeout block to all of my imap calls and I have not seen the problem in the last day and a half. I think the problem is probably yahoo imap not responding. As long as i have a proper timeout, i think i am ok.
I think maybe the imap library should have timeout baked in, but that's probably a feature request
.
Let's close this and i will let you know if i come across anything else.
Thanks for all of your help on this!
Finally I've understood what you're doing. Net::IMAP connection pooling with Commons Pool, right?
And the reason why timeout did not work is there were paths which were not covered timeout block if I understood correctly. So this problem is gone, OK?
The rest problem is that some IMAP servers go quiet suddenly for your service. I think it might be related to IMAP connection pooling. I don't know IMAP well but you may need to do something link 'ping' for keeping connection I guess.
Anyway, I agree that it's not a JRuby issue. Resolving this as 'Not A Bug'.
By the way, Karl, can I see the Ruby code 'borrow_connection'? I found the NullPointerException from org.jruby.ext.socket.RubyBasicSocket.shutdownInternal(RubyBasicSocket.java:633), which should not be happened I think.
Here's the complete code for borrow connection.
def within_open_connection(& block)
connection = nil
Timeout.timeout(200) do
connection = borrow_connection
block.call(connection)
end
ensure
return_connection(connection)
end
def connection_pool
@connection_pool ||= ConnectionPoolFactory.connection_pool_for_email(email_address)
end
def borrow_connection
connection_pool.borrow_object(email_address)
end
def return_connection(connection)
if (connection)
connection_pool.return_object(email_address, connection)
end
end
Here is the code that was throwing the null pointer:
def destroy_object(email_address, connection)
connection.logout()
connection.disconnect()
end
But i have changed it to this and it seems to have gone away:
def destroy_object(email_address, connection)
connection.logout()
connection.disconnect() unless connection.disconnected?
end
Thanks again!
Thank you. I'll check it and file the new issue if I find a bug of JRuby. connection.disconnect when already disconnected, might cause NPE inside.
Please use 'unless conn.disconnected?' workaround at present.
I just replied on the jruby-user list.
The stack trace looks that the client is waiting for the socket because the server does not send any data.
Let's pull things together.