Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: JRuby 1.7.0.pre1
-
Component/s: Performance
-
Labels:None
-
Number of attachments :3
Description
I just added a new benchmark called test/bench/bench_file_writing.rb that duplicates the way Rails log flushing works. JRuby is between 120% and 200% slower than MRI in most cases. This adds up to the Rails dark matter.
-
- add_lseek.patch
- 07/Feb/09 5:59 PM
- 8 kB
- Charles Oliver Nutter
-
- append.patch
- 10/Jul/08 3:00 AM
- 9 kB
- Wayne Meissner
-
- lseek_to_avoid_filechannel_size.patch
- 07/Feb/09 5:59 PM
- 23 kB
- Charles Oliver Nutter
Activity
Not sure how related it is, but I assume unpacking a gem is related to file writing.
Some stats (7x difference between MRI and RC3):
$ time warble war
/usr/bin/ruby1.8 -S gem unpack -v 2.0.2 rails
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/rails-2.0.2'
/usr/bin/ruby1.8 -S gem unpack -v 0.8.1 rake
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/rake-0.8.1'
/usr/bin/ruby1.8 -S gem unpack -v 2.0.2 activesupport
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/activesupport-2.0.2'
/usr/bin/ruby1.8 -S gem unpack -v 2.0.2 activerecord
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/activerecord-2.0.2'
/usr/bin/ruby1.8 -S gem unpack -v 2.0.2 actionpack
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/actionpack-2.0.2'
/usr/bin/ruby1.8 -S gem unpack -v 2.0.2 actionmailer
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/actionmailer-2.0.2'
/usr/bin/ruby1.8 -S gem unpack -v 2.0.2 activeresource
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/activeresource-2.0.2'
/usr/bin/ruby1.8 -S gem unpack -v 0.7.2 activerecord-jdbc-adapter
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/activerecord-jdbc-adapter-0.7.2'
/usr/bin/ruby1.8 -S gem unpack -v 0.2.3 chronic
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/chronic-0.2.3'
/usr/bin/ruby1.8 -S gem unpack -v 1.5.1 hoe
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/hoe-1.5.1'
/usr/bin/ruby1.8 -S gem unpack -v 0.4.5 rubyforge
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/rubyforge-0.4.5'
/usr/bin/ruby1.8 -S gem unpack -v 0.6 hpricot
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/hpricot-0.6'
/usr/bin/ruby1.8 -S gem unpack -v 1.0.0 rest-open-uri
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/rest-open-uri-1.0.0'
/usr/bin/ruby1.8 -S gem unpack -v 0.4.0 postgres-pr
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/postgres-pr-0.4.0'
jar cf nedss.war -C tmp/war .
real 0m6.791s
user 0m5.972s
sys 0m0.840s
$ time jruby -S warble war
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 2.0.2 rails
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/rails-2.0.2'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 0.8.1 rake
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/rake-0.8.1'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 2.0.2 activesupport
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/activesupport-2.0.2'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 2.0.2 activerecord
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/activerecord-2.0.2'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 2.0.2 actionpack
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/actionpack-2.0.2'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 2.0.2 actionmailer
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/actionmailer-2.0.2'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 2.0.2 activeresource
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/activeresource-2.0.2'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 0.2.3 chronic
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/chronic-0.2.3'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 1.5.1 hoe
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/hoe-1.5.1'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 0.4.5 rubyforge
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/rubyforge-0.4.5'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 0.6.161 hpricot
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/hpricot-0.6.161-java'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 1.0.0 rest-open-uri
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/rest-open-uri-1.0.0'
/home/mike/opt/jruby-1.1RC3/bin/jruby -S gem unpack -v 0.4.0 postgres-pr
Unpacked gem: '/home/mike/projects/projectfoo/appname/rails/nedss/tmp/war/WEB-INF/gems/gems/postgres-pr-0.4.0'
jar cf nedss.war -C tmp/war .
real 0m50.198s
user 0m56.632s
sys 0m1.712s
Quick profile suggests ChannelDescriptor#internalWrite is very expensive. The offending code is:
if (isSeekable() && originalModes.isAppendable()) {
FileChannel fileChannel = (FileChannel)channel;
fileChannel.position(fileChannel.size());
}
I tried a couple of things, with varying degrees of success.
1) Use a FileOutputStream in append mode instead of a RandomAccessFile to get a FileChannel in ChannelDescriptor.open(). This makes an auto-append FileChannel, but the speed was the same.
Its not surprising, since looking at the internals of FileChanelImpl, it does exactly what ChannelDescriptor#internalWrite was doing - sets the file position to FileChannel#size(), then does the write.
2) Use the absolute position version of FileChannel#write(). This produced about a 10% improvement.
3) Wrote a FileChannel implementation that only handles append-mode files. It assumes that the current file position is the end of the file, so doesn't bother seeking when doing a relative write. Any call to a method that will change the current position (e.g. read, position(long)), sets a flag, which causes FileChannel#write to reset the file position to the end of file.
Option #3 is what is in append.patch - with it, and jruby --server, jruby is about 10-20% slower than MRI on MacOS.
I've tried the patch, and the results seem mixed. Total times for test/bench/bench_file_writing.rb:
original patched mri 100k io.write 3.464000 2.745000 0.990000 100k io.write 1.186000 0.944000 1.000000 100k io.write 1.306000 3.791000 0.990000 100k io.write 1.098000 1.688000 1.030000 100k io.write 2.686000 1.760000 0.980000 100k io.write 1.108000 3.801000 0.970000 100k io.write 1.114000 0.923000 1.010000 100k io.write 1.881000 0.876000 0.960000 100k io.write 1.100000 2.712000 1.000000 100k io.write 1.119000 1.357000 0.99000
(java 6, ubuntu, -J-server)
Seems to be a lot of variance in the times. Strange.
The problem with option three is that the append logic (always seeking to end) was explicitly added because of cases where there may be multiple separate IO channels writing to the same file. We had a reported bug where append didn't write from the actual end and only wrote from what end each channel saw for itself. This caused all sorts of interesting problems. So I'm not sure making the assumption about being at the end of the file is going to work.
Given how variable the times are I have to wonder if perhaps there's excess object allocation going on. It starts to look like a GC pausing issue. The other option would be that the variability comes from the size of the write buffer being inappropriate and causing long pauses when the contents are flushed.
I just ran the numbers again myself and I'm not seeing that we're 120%-200% slower at all. It could certainly be that we've improved since this benchmark was created, but at the moment the difference is more like 12-15% slower, and that's not too bad considering we're going through NIO:
~/NetBeansProjects/jruby ➔ jruby --server test/bench/bench_file_writing.rb
user system total real
100k io.write 3.225000 0.000000 3.225000 ( 3.225000)
user system total real
100k io.write 2.135000 0.000000 2.135000 ( 2.136000)
user system total real
100k io.write 2.237000 0.000000 2.237000 ( 2.237000)
user system total real
100k io.write 2.045000 0.000000 2.045000 ( 2.045000)
user system total real
100k io.write 2.118000 0.000000 2.118000 ( 2.118000)
user system total real
100k io.write 2.108000 0.000000 2.108000 ( 2.108000)
user system total real
100k io.write 2.059000 0.000000 2.059000 ( 2.059000)
user system total real
100k io.write 2.095000 0.000000 2.095000 ( 2.094000)
user system total real
100k io.write 2.056000 0.000000 2.056000 ( 2.056000)
user system total real
100k io.write 2.066000 0.000000 2.066000 ( 2.066000)
~/NetBeansProjects/jruby ➔ ruby test/bench/bench_file_writing.rb
user system total real
100k io.write 0.280000 0.790000 1.070000 ( 1.863983)
user system total real
100k io.write 0.280000 0.790000 1.070000 ( 1.786511)
user system total real
100k io.write 0.280000 0.800000 1.080000 ( 1.762645)
user system total real
100k io.write 0.280000 0.800000 1.080000 ( 1.829995)
user system total real
100k io.write 0.280000 0.800000 1.080000 ( 1.774617)
user system total real
100k io.write 0.280000 0.790000 1.070000 ( 1.811901)
user system total real
100k io.write 0.280000 0.810000 1.090000 ( 1.779497)
user system total real
100k io.write 0.280000 0.790000 1.070000 ( 1.787534)
user system total real
100k io.write 0.270000 0.780000 1.050000 ( 1.797646)
user system total real
100k io.write 0.280000 0.790000 1.070000 ( 1.878580)
We're not faster, so this bug is still valid...but it seems like it's not so major a difference anymore. Still interested in suggestions, or perhaps in a patch based on Wayne's #2 option above.
I ran this again with -rffi/times and oddly we have higher times for both user and system, which means we're not just impled badly, we're asking more of the kernel for our write logic. I'm going to look into it a bit.
Attached another attempt that seems to have failed. add_lseek.patch adds lseek to JNA-POSIX, and the lseek_to_avoid...patch uses it for seeking to the end of the file in ChannelDescriptor.internalWrite. The idea was that it would improve performance by eliminating the call fo FileChannel.size, which does a stat under the covers. But it did not appear to help performance at all, and actually hurt it. I'm not sure if this is because the stat call doesn't amount to much or because JNA is slow.
Circling back to an old bug. Still seems to be valid...we're about 50% slower than 1.8 or 1.9. Here's a runhprof profile of the top 20 methods:
CPU TIME (ms) BEGIN (total = 145906) Thu Jul 14 15:35:15 2011 rank self accum count trace method 1 3.43% 3.43% 250780 344208 org.jruby.RubyIO.write 2 2.35% 5.77% 250781 344182 bench.bench_file_writing.block_3$RUBY$__file__ 3 2.08% 7.85% 250781 344165 sun.nio.ch.IOUtil.write 4 2.06% 9.91% 250782 344097 sun.nio.ch.FileChannelImpl.size 5 1.89% 11.80% 250781 344172 sun.nio.ch.FileChannelImpl.write 6 1.88% 13.68% 250783 344191 org.jruby.runtime.CompiledBlock.yield 7 1.85% 15.54% 250781 344157 sun.nio.ch.FileDispatcher.write 8 1.72% 17.25% 250782 344116 sun.nio.ch.FileChannelImpl.position 9 1.49% 18.74% 250781 344146 java.nio.DirectByteBuffer.put 10 1.44% 20.18% 250789 342191 org.jruby.util.io.ChannelDescriptor.internalWrite 11 1.43% 21.61% 250780 344204 org.jruby.RubyIO.fwrite 12 1.33% 22.94% 250781 344159 sun.nio.ch.IOUtil.writeFromNativeBuffer 13 1.33% 24.27% 250789 342192 org.jruby.util.io.ChannelDescriptor.write 14 1.27% 25.54% 250782 344055 org.jruby.ast.executable.RuntimeCache.getVariable 15 1.25% 26.79% 250782 344034 org.jruby.runtime.ThreadContext.pushFrameForBlock 16 1.20% 27.99% 501774 304736 java.lang.ThreadLocal$ThreadLocalMap.getEntry 17 1.17% 29.16% 501579 342136 java.lang.ThreadLocal.get 18 1.09% 30.25% 250781 344148 java.nio.DirectByteBuffer.put 19 1.03% 31.28% 250781 344136 sun.nio.ch.Util$BufferCache.get 20 0.83% 32.11% 250788 336712 org.jruby.runtime.CompiledBlock.prepareSelf
I made an interesting discovery! The file in this benchmark is set to sync = true, which means it's flushing after every write. Removing that and letting it buffer and flush naturally puts JRuby's performance on par with 1.8 and faster than 1.9. So there's something about sync mode that means we're doing too much.
With non-sync writes looking as fast as MRI, I almost want to resolve this, but I'll look into the sync stuff too.
Another discovery: removing File::APPEND also brings us on par with 1.8.7 and faster than 1.9.2.
Ok, a final discovery, and I think I can resolve this in a moment.
I realized that there must be a way to make the Java FileChannel do the append for us, since that's a pretty standard thing to want to do when writing to a file. It turns out that yes, if you get the FileChannel from a FileOutputStream that was opened in append mode, the underlying channel will do the append for you. I modified the file-opening logic in ChannelDescriptor to use new FileOutputStream(..., true).getChannel() when the modes did not include reading (since FileOutputStream is write-only. At first this didn't help on Java 6, perhaps because they're doing the same manual seek that we were doing. But on Java 7, which has had a good amount of NIO work, it brings our execution time to just slightly slower than 1.8.7, which is comfortably faster than 1.9.
Since it has now passed tests and specs, I'm going to go with it.
Shut 'er down! I've pushed the described change to master in the commit below. We're close enough, and considering MRI is only a couple hops from the kernel, we're doing darn good.
I feel this daily with Warbler. Try running Warbler with MRI vs. JRuby. It's at least 200% difference.