Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: JRuby 1.5
-
Component/s: Performance
-
Labels:None
-
Number of attachments :4
Description
Based on bench_io_foreach under test/bench:
~/NetBeansProjects/jruby ➔ jruby --server test/bench/bench_io_foreach.rb
user system total real
IO.foreach(file) 9.223000 0.000000 9.223000 ( 9.222471)
user system total real
IO.foreach(file) 7.922000 0.000000 7.922000 ( 7.921815)
user system total real
IO.foreach(file) 7.939000 0.000000 7.939000 ( 7.939023)
user system total real
IO.foreach(file) 8.071000 0.000000 8.071000 ( 8.071060)
user system total real
IO.foreach(file) 7.893000 0.000000 7.893000 ( 7.892786)
~/NetBeansProjects/jruby ➔ ruby test/bench/bench_io_foreach.rb
user system total real
IO.foreach(file) 6.110000 0.340000 6.450000 ( 6.521132)
user system total real
IO.foreach(file) 6.100000 0.310000 6.410000 ( 6.463776)
user system total real
IO.foreach(file) 6.070000 0.300000 6.370000 ( 6.410429)
user system total real
IO.foreach(file) 6.060000 0.300000 6.360000 ( 6.373790)
user system total real
IO.foreach(file) 6.120000 0.320000 6.440000 ( 6.503263)
Attachments
-
- jruby-2810_a.patch
- 16/Oct/09 11:11 PM
- 13 kB
- R.J. Lorimer
-
- jruby-2810_b.patch
- 17/Oct/09 2:42 PM
- 13 kB
- R.J. Lorimer
-
- jruby-2810-backtraces.png
- 38 kB
- 15/Oct/09 10:55 PM
-
- jruby-2810-profile.png
- 52 kB
- 15/Oct/09 10:55 PM
Issue Links
| This issue depends upon: | ||||
| JRUBY-4117 | RubyIO#incrementLineno is Very Slow |
|
|
|
Activity
I had Joseph run some comparisons against a file with long lines, so that the cost/benefit of block dispatch did not skew the results, and it seems that we're still a bit slower for IO. But it's good to see that we're making up some ground.
Yea, there is an updated benchmark attached to JRUBY-3298
It's on the order of MRI 10s vs JRuby 120s.
An overwhelming majority of the time in the new benchmark (over 75%) is spent in the ByteList.grow routine, copying arrays - mostly caused by the append call that accepts a ByteBuffer (triggered by ChannelStream.getline(..)).
I'm going to spend some time analyzing why this is called so much with this benchmark. Looking at the numbers shows a bad fact: ByteList.grow is called over 1 million times, and ChannelStream.getline is called less than 25,000 times.
The ByteList growth factor is currently 150%; given that, it seems that the number of calls to ByteList.grow should be able to be much smaller than ChannelStream.getline.
This particular example creates really wide files - each line is several kilobytes in size. Since tall files are equally important to test, on my local I have split the benchmarks into two files to include one that has a lot of fairly short lines (10 phrases, 10,000 lines), and then this one, which has a few very long lines (10 lines, 10,000 phrases)
The problem with this benchmark is definitely byte[] management, as indicated by the 1,000,000+ grow count in the profiler.
The current 'getline' routine works in 4k chunks (the buffer for ChannelStream was changed from 16k to 4k as a legitimate fix to JRUBY-2657). The smaller the buffer size, the larger the impact to the grow routine on ByteList. Here is a stock run with the 4k buffer on my Ubuntu VM:
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 64.752000 0.000000 64.752000 ( 64.531000)
user system total real
IO.foreach(file) 31.674000 0.000000 31.674000 ( 31.674000)
user system total real
IO.foreach(file) 31.954000 0.000000 31.954000 ( 31.954000)
user system total real
IO.foreach(file) 29.359000 0.000000 29.359000 ( 29.358000)
user system total real
IO.foreach(file) 28.333000 0.000000 28.333000 ( 28.333000)
Here is my MRI 1.8.6 numbers:
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ruby bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 1.630000 0.880000 2.510000 ( 2.645813)
user system total real
IO.foreach(file) 1.630000 0.820000 2.450000 ( 2.563233)
user system total real
IO.foreach(file) 1.640000 0.910000 2.550000 ( 2.705807)
user system total real
IO.foreach(file) 1.520000 0.900000 2.420000 ( 2.550748)
user system total real
IO.foreach(file) 1.690000 1.000000 2.690000 ( 2.843453)
Here is the run again, with the buffer bumped to 16k (just to explore what would happen):
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 46.379000 0.000000 46.379000 ( 46.156000)
user system total real
IO.foreach(file) 18.629000 0.000000 18.629000 ( 18.629000)
user system total real
IO.foreach(file) 18.438000 0.000000 18.438000 ( 18.438000)
user system total real
IO.foreach(file) 18.400000 0.000000 18.400000 ( 18.400000)
user system total real
IO.foreach(file) 18.538000 0.000000 18.538000 ( 18.539000)
While that's a significant improvement, it's not fixing the root problem. The profiler indicates that the improvement is not because less data is being loaded at a time, but rather using a 16K buffer forces larger grow steps in the ByteList, which in this case cauess a lot fewer grow calls, and subsequently less array-to-array data copying. To be clear, I'm not suggesting the current 4k buffer be manipulated.
Today, each iteration of 'getline' creates and begins allocating a new ByteList, which is subsequently filled from the ByteBuffer. This is the most expensive part. One possible optimization would be to reuse the same underlying byte[] for all lines processed in the same thread (basically, for all iterations of the 'each'). This would prevent the reallocation and subsequent growing of the array for every line, at the expense of holding on to the potentially large single byte[] cache until the loop is complete.
This requires:
- A shared/passed-around byte[] pool/context from the top of the call-stack. This could be a parameter to the getline API, or a ThreadLocal-like pool could be used.
- Changes to ByteList to support a soft reset (that either moves the realSize indicator, without aggressively clearing the rest of the array out, or initializes off of a pre-filled by).
- Support baked into RubyIO to reuse/pool ByteList/byte[] objects.
ByteList is already coded in such a way that anything outside of the reserved range (everything above 'realSize') is ignored, so it's already very close to being able to reuse underlying byte[]s.
I hacked this in to the code as minimally invasive as I could, just to see what is possible. Basically, I created a static method that creates and caches a ByteList object. When asked to create another ByteList object, it copies the byte[] over to the new list, allowing the latter ByteList to be "pre-allocated" on the same array. The ByteList is then made the 'active' byte list. This is in no-way production ready code, but it at least exercises the idea reasonably well. Here is the hack class:
public class ByteListHack { public static ByteList temp; public static ByteList newByteList(int size){ ByteList bl = new ByteList(size); if(temp != null) { bl.bytes = temp.bytes; temp = bl; } else { temp = new ByteList(0); bl = temp; } return bl; } }
In the two 'getline' variants on RubyIO I replaced calls to "new ByteList(0)" with ByteListHack.newByteList(0);
As I said, very much a hack. The results of using this approach on the benchmark (which is single-threaded, and therefore safe in this isolated case to use with this static monstrosity) were surprising:
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 4.914000 0.000000 4.914000 ( 4.752000)
user system total real
IO.foreach(file) 3.978000 0.000000 3.978000 ( 3.978000)
user system total real
IO.foreach(file) 6.004000 0.000000 6.004000 ( 6.004000)
user system total real
IO.foreach(file) 4.556000 0.000000 4.556000 ( 4.556000)
user system total real
IO.foreach(file) 4.055000 0.000000 4.055000 ( 4.055000)
While those aren't quite MRI #s, they are a lot closer. A real implementation won't be quite this fast (the array won't be quite as long-to-live, nor will it be able to forgo thread safety), but it should still be a significant improvement, with the trade off being a little more memory cached for a short time. This also knocked a couple seconds off of the tall-file scenario, although it wasn't nearly as dramatic (as that works with relatively small byte[]s anyway, so reallocation isn't nearly as expensive).
I'm going to work through providing a real patch for this that is thread-safe and not as hacky. Can someone point me to the right place to check out and build ByteList? I tried here: http://github.com/jruby/bytelist but couldn't get the Ant script to build. Looks like it may need Maven?
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 64.752000 0.000000 64.752000 ( 64.531000)
user system total real
IO.foreach(file) 31.674000 0.000000 31.674000 ( 31.674000)
user system total real
IO.foreach(file) 31.954000 0.000000 31.954000 ( 31.954000)
user system total real
IO.foreach(file) 29.359000 0.000000 29.359000 ( 29.358000)
user system total real
IO.foreach(file) 28.333000 0.000000 28.333000 ( 28.333000)
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ruby bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 1.630000 0.880000 2.510000 ( 2.645813)
user system total real
IO.foreach(file) 1.630000 0.820000 2.450000 ( 2.563233)
user system total real
IO.foreach(file) 1.640000 0.910000 2.550000 ( 2.705807)
user system total real
IO.foreach(file) 1.520000 0.900000 2.420000 ( 2.550748)
user system total real
IO.foreach(file) 1.690000 1.000000 2.690000 ( 2.843453)
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 46.379000 0.000000 46.379000 ( 46.156000)
user system total real
IO.foreach(file) 18.629000 0.000000 18.629000 ( 18.629000)
user system total real
IO.foreach(file) 18.438000 0.000000 18.438000 ( 18.438000)
user system total real
IO.foreach(file) 18.400000 0.000000 18.400000 ( 18.400000)
user system total real
IO.foreach(file) 18.538000 0.000000 18.538000 ( 18.539000)
- A shared/passed-around byte[] pool/context from the top of the call-stack. This could be a parameter to the getline API, or a ThreadLocal-like pool could be used.
- Changes to ByteList to support a soft reset (that either moves the realSize indicator, without aggressively clearing the rest of the array out, or initializes off of a pre-filled by).
- Support baked into RubyIO to reuse/pool ByteList/byte[] objects.
public class ByteListHack { public static ByteList temp; public static ByteList newByteList(int size){ ByteList bl = new ByteList(size); if(temp != null) { bl.bytes = temp.bytes; temp = bl; } else { temp = new ByteList(0); bl = temp; } return bl; } }
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 4.914000 0.000000 4.914000 ( 4.752000)
user system total real
IO.foreach(file) 3.978000 0.000000 3.978000 ( 3.978000)
user system total real
IO.foreach(file) 6.004000 0.000000 6.004000 ( 6.004000)
user system total real
IO.foreach(file) 4.556000 0.000000 4.556000 ( 4.556000)
user system total real
IO.foreach(file) 4.055000 0.000000 4.055000 ( 4.055000)
Attached is a first attempt at a more reasonable implementation. This implementation isolates the byte[] caching inside the individual looping methods (specifically each_line and foreach). This ensures thread-safety and limits the possibility for memory leaks. It also has the least potential for improvement as compared to some less conservative implementations, but the difference is already apparent.
In short, there is easily more CPU cycles to be saved here, but this is a fairly safe patch that seems to shave a large amount of time off of this algorithm, and doesn't really affect anything except the RubyIO internal methods.
Please note that I didn't know the best way to patch ByteList, so this version uses the ByteList API as is, which feels a little hack-ish (manipulates the public 'bytes' instance var). I have put a TODO in the code accordingly, and would be happy to adjust with some guidance on how. It's isolated to a single method, so it's not a major mess.
Here is a timing for the 'wide' scenario prior to the patch:
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 63.970000 0.000000 63.970000 ( 63.764000)
user system total real
IO.foreach(file) 30.212000 0.000000 30.212000 ( 30.212000)
user system total real
IO.foreach(file) 30.973000 0.000000 30.973000 ( 30.973000)
user system total real
IO.foreach(file) 30.768000 0.000000 30.768000 ( 30.767000)
user system total real
IO.foreach(file) 32.813000 0.000000 32.813000 ( 32.813000)
Here is a timing after:
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 8.151000 0.000000 8.151000 ( 7.978000)
user system total real
IO.foreach(file) 7.193000 0.000000 7.193000 ( 7.193000)
user system total real
IO.foreach(file) 7.224000 0.000000 7.224000 ( 7.225000)
user system total real
IO.foreach(file) 7.205000 0.000000 7.205000 ( 7.205000)
user system total real
IO.foreach(file) 7.338000 0.000000 7.338000 ( 7.338000)
A 75% reduction in time spent.
Here is a timing for the 'tall' scenario prior to the patch:
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach.rb
user system total real
IO.foreach(file) 11.514000 0.000000 11.514000 ( 11.322000)
user system total real
IO.foreach(file) 10.215000 0.000000 10.215000 ( 10.214000)
user system total real
IO.foreach(file) 10.284000 0.000000 10.284000 ( 10.284000)
user system total real
IO.foreach(file) 10.226000 0.000000 10.226000 ( 10.226000)
user system total real
IO.foreach(file) 10.202000 0.000000 10.202000 ( 10.202000)
Here is a timing after:
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach.rb
user system total real
IO.foreach(file) 9.500000 0.000000 9.500000 ( 9.302000)
user system total real
IO.foreach(file) 8.309000 0.000000 8.309000 ( 8.310000)
user system total real
IO.foreach(file) 8.163000 0.000000 8.163000 ( 8.163000)
user system total real
IO.foreach(file) 7.983000 0.000000 7.983000 ( 7.983000)
user system total real
IO.foreach(file) 7.948000 0.000000 7.948000 ( 7.948000)
A 23% reduction in time spent.
In both cases, a lot less memory churn is going to occur as well, which should reduce GC effort.
This patch also includes the split of the benchmark files so both are available.
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 63.970000 0.000000 63.970000 ( 63.764000)
user system total real
IO.foreach(file) 30.212000 0.000000 30.212000 ( 30.212000)
user system total real
IO.foreach(file) 30.973000 0.000000 30.973000 ( 30.973000)
user system total real
IO.foreach(file) 30.768000 0.000000 30.768000 ( 30.767000)
user system total real
IO.foreach(file) 32.813000 0.000000 32.813000 ( 32.813000)
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 8.151000 0.000000 8.151000 ( 7.978000)
user system total real
IO.foreach(file) 7.193000 0.000000 7.193000 ( 7.193000)
user system total real
IO.foreach(file) 7.224000 0.000000 7.224000 ( 7.225000)
user system total real
IO.foreach(file) 7.205000 0.000000 7.205000 ( 7.205000)
user system total real
IO.foreach(file) 7.338000 0.000000 7.338000 ( 7.338000)
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach.rb
user system total real
IO.foreach(file) 11.514000 0.000000 11.514000 ( 11.322000)
user system total real
IO.foreach(file) 10.215000 0.000000 10.215000 ( 10.214000)
user system total real
IO.foreach(file) 10.284000 0.000000 10.284000 ( 10.284000)
user system total real
IO.foreach(file) 10.226000 0.000000 10.226000 ( 10.226000)
user system total real
IO.foreach(file) 10.202000 0.000000 10.202000 ( 10.202000)
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach.rb
user system total real
IO.foreach(file) 9.500000 0.000000 9.500000 ( 9.302000)
user system total real
IO.foreach(file) 8.309000 0.000000 8.309000 ( 8.310000)
user system total real
IO.foreach(file) 8.163000 0.000000 8.163000 ( 8.163000)
user system total real
IO.foreach(file) 7.983000 0.000000 7.983000 ( 7.983000)
user system total real
IO.foreach(file) 7.948000 0.000000 7.948000 ( 7.948000)
I took a quick look at the patch, and the biggest issue so far: LOTS of Rubyspec failures for IO (and in some other areas, related to IO).
For a full list of failures in IO, take a look here: http://gist.github.com/212301
Total about 50 new failures.
I looked at the patch too. I think the spec failures are from not copying the bytelist before passing it to RubyString#newString(); hence each string passed to the block is sharing the exact same byte array, and it gets overwritten for each line.
Wrapping buf with new ByteList(buf) should be enough, as that will copy the byte[] array.
btw, RJ, you should be able to use new ByteList(byte[] array, int offset, int len, boolean copy) constructor to wrap the array in a new ByteList(), instead of fiddling with the bytes array directly.
Yes - thanks for the comments. It occurred to me when I woke up this morning that the RubyString#newString was probably grabbing the byte[] directly and using it - woke me out of sleep, hah. The benchmarks were passing because the string doesn't change line by line, doh!
I'm thinking most of the benefits will be nuked by having to copy the array at the end, but we shall see. Round 2.
Okay, round 2 patch - aka "It's easy to be fast if you break the spec". This time I did run ant spec (Thanks, Vladimir), and only had one error, which I have with or without my patch related to enumerators - doesn't appear to be related to this code in anyway.
Also, I used the copy constructor this time through (thanks for pointing that out Wayne; exactly what I was looking for).
Here is the tall scenario:
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach.rb
user system total real
IO.foreach(file) 11.047000 0.000000 11.047000 ( 10.928000)
user system total real
IO.foreach(file) 10.163000 0.000000 10.163000 ( 10.163000)
user system total real
IO.foreach(file) 10.018000 0.000000 10.018000 ( 10.018000)
user system total real
IO.foreach(file) 10.740000 0.000000 10.740000 ( 10.740000)
user system total real
IO.foreach(file) 9.927000 0.000000 9.927000 ( 9.928000)
Not much of an improvement here; as expected copying the array at the end for the tall scenario is where most of the time is anyway.
Here is the wide scenario:
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 18.539000 0.000000 18.539000 ( 18.361000)
user system total real
IO.foreach(file) 11.258000 0.000000 11.258000 ( 11.258000)
user system total real
IO.foreach(file) 11.464000 0.000000 11.464000 ( 11.464000)
user system total real
IO.foreach(file) 11.334000 0.000000 11.334000 ( 11.334000)
user system total real
IO.foreach(file) 11.195000 0.000000 11.195000 ( 11.195000)
As expected - still a decent improvement; more like 65% reduction.
New patch is a replacement for the prior patch. Thanks again for reviewing!
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach.rb
user system total real
IO.foreach(file) 11.047000 0.000000 11.047000 ( 10.928000)
user system total real
IO.foreach(file) 10.163000 0.000000 10.163000 ( 10.163000)
user system total real
IO.foreach(file) 10.018000 0.000000 10.018000 ( 10.018000)
user system total real
IO.foreach(file) 10.740000 0.000000 10.740000 ( 10.740000)
user system total real
IO.foreach(file) 9.927000 0.000000 9.927000 ( 9.928000)
rjlorimer@rjlorimer-ruby-img:~/workspace/jruby/bench$ ../bin/jruby --server bench_io_foreach_wide.rb
user system total real
IO.foreach(file) 18.539000 0.000000 18.539000 ( 18.361000)
user system total real
IO.foreach(file) 11.258000 0.000000 11.258000 ( 11.258000)
user system total real
IO.foreach(file) 11.464000 0.000000 11.464000 ( 11.464000)
user system total real
IO.foreach(file) 11.334000 0.000000 11.334000 ( 11.334000)
user system total real
IO.foreach(file) 11.195000 0.000000 11.195000 ( 11.195000)
Looking forward to seeing the round 2 version; hopefully the time we're losing in duping those strings is not going to outweight the gains.
Oops, there is the round two, only 5 minutes before I posted my comment
I will have a look.
Well the patch seems ok, but I actually got a perf degradation for the "tall" case. My results before+after and for MRI 1.8.6 are here: http://gist.github.com/212440
It's depressing we can't approach MRI speed here even with an improved version ![]()
Okay - I may know what happened on your machine (at least, if you made the same mistake I did).
When I moved to the mac, I ran the bench_io_foreach.rb benchmark, and then patched and ran it again. In so doing, the patched version was slower, just like you saw. What I discovered is that by patching, I also changed the benchmark file (the old bench_io_foreach.rb is the new bench_io_foreach_wide.rb, and has VERY different performance characteristics).
After I found that, I reset my local git, copied in the patched benchmark files (apples to apples), and tried again. This time, both the tall and wide scenarios ran anywhere from 10-40% faster with the patch: http://gist.github.com/212470.
I'm hopeful to see if you had the same problem.
MRI is still faster for me, so I'm going to keep profiling - surprisingly the new hotspot is "RubyIO.incrementlineno"
To give some perspective, commenting out the "global line number variable" setting knocks another 2 seconds off of the "tall" benchmark. If we can find a way to optimize that code path, we will be much better off.
I've committed the changes by RJ Lorimer and made some improvements for JRUBY-4117, and things are looking a lot better. For the tall bench, we're almost as fast as MRI. For the wide bench, we're quite a bit faster.
Should I go ahead and resolve this, or do you guys want to keep digging around and get us convincingly faster than MRI? Since our block iteration should be pretty snappy, we're actually still a fair bit slower than MRI for the actual non-block-related parts of IO.foreach.
RJ's improvements committed in 1bd4ca2.
Resolving this. Any follow up work can be a new issue since I suspect it will require a different aspect of foreach or a different optimization than the one that ended happening and was applied for this issue.
This may not be valid anymore. More recent timings:
qbproger@linux-3r1v:~/src/jruby/bench> jruby --server bench_io_foreach.rb user system total real IO.foreach(file) 13.424000 0.000000 13.424000 ( 13.423922) user system total real IO.foreach(file) 9.707000 0.000000 9.707000 ( 9.707292) user system total real IO.foreach(file) 9.673000 0.000000 9.673000 ( 9.672721) user system total real IO.foreach(file) 9.662000 0.000000 9.662000 ( 9.661599) user system total real IO.foreach(file) 9.713000 0.000000 9.713000 ( 9.712665) qbproger@linux-3r1v:~/src/jruby/bench> ruby bench_io_foreach.rb user system total real IO.foreach(file) 10.750000 0.540000 11.290000 ( 11.751917) user system total real IO.foreach(file) 10.800000 0.470000 11.270000 ( 11.611140) user system total real IO.foreach(file) 10.780000 0.520000 11.300000 ( 11.613197) user system total real IO.foreach(file) 10.850000 0.500000 11.350000 ( 11.654579) user system total real IO.foreach(file) 10.820000 0.480000 11.300000 ( 11.626552)qbproger@linux-3r1v:~/src/jruby/bench> jruby --server bench_io_foreach.rb user system total real IO.foreach(file) 13.424000 0.000000 13.424000 ( 13.423922) user system total real IO.foreach(file) 9.707000 0.000000 9.707000 ( 9.707292) user system total real IO.foreach(file) 9.673000 0.000000 9.673000 ( 9.672721) user system total real IO.foreach(file) 9.662000 0.000000 9.662000 ( 9.661599) user system total real IO.foreach(file) 9.713000 0.000000 9.713000 ( 9.712665) qbproger@linux-3r1v:~/src/jruby/bench> ruby bench_io_foreach.rb user system total real IO.foreach(file) 10.750000 0.540000 11.290000 ( 11.751917) user system total real IO.foreach(file) 10.800000 0.470000 11.270000 ( 11.611140) user system total real IO.foreach(file) 10.780000 0.520000 11.300000 ( 11.613197) user system total real IO.foreach(file) 10.850000 0.500000 11.350000 ( 11.654579) user system total real IO.foreach(file) 10.820000 0.480000 11.300000 ( 11.626552)