Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: JRuby 1.6RC1
-
Fix Version/s: JRuby 1.7.0.pre1
-
Component/s: Performance
-
Labels:None
-
Number of attachments :
Description
There's probably a lot of things that could be improved on the redblack benchmark that's been floating around.
gist/repo: git://gist.github.com/44301.git
numbers:
~/projects/gist-44301 ➔ jruby -v --server bm1.rb jruby 1.6.0.RC1 (ruby 1.8.7 patchlevel 330) (2011-01-26 80ab30a) (Java HotSpot(TM) Server VM 1.6.0_22) [darwin-i386-java] ./red_black_tree.rb:37 warning: instance variable @instance not initialized 5.216 2.908 2.943 2.913 3.051 ~/projects/gist-44301 ➔ jruby --1.9 -v --server bm1.rb jruby 1.6.0.RC1 (ruby 1.9.2 patchlevel 136) (2011-01-26 80ab30a) (Java HotSpot(TM) Server VM 1.6.0_22) [darwin-i386-java] /Users/headius/projects/gist-44301/red_black_tree.rb:37 warning: instance variable @instance not initialized 5.799 3.627 3.566 3.664 3.585
The 1.9 numbers are discussed in JRUBY-5427, but both numbers could probably be improved a lot. Rubinius numbers for this benchmark (which is admittedly very numeric and Ruby object-heavy) are in the neighborhood of 1s, so we're at best about 3x slower.
Top offenders from an allocation profile (for just the first two iterations) show a ton of Object[] being created (for ivar tables) along with the usual RubyFixnum objects and many, many IRubyObject[]:
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 33.78% 33.78% 43668384 909758 103320192 2152504 302748 org.jruby.RubyFixnum
2 12.59% 46.38% 16275392 254303 22672256 354254 314598 java.lang.Object[]
3 9.98% 56.35% 12896800 403025 28579072 893096 314705 org.jruby.runtime.builtin.IRubyObject[]
4 7.87% 64.22% 10172520 254313 14172560 354314 311802 org.jruby.RubyObject
5 6.33% 70.56% 8187136 255848 32000064 1000002 313755 org.jruby.runtime.builtin.IRubyObject[]
6 3.82% 74.38% 4937344 154292 11336064 354252 314707 org.jruby.runtime.builtin.IRubyObject[]
7 3.74% 78.12% 4839552 8 11362632 91 312361 org.jruby.runtime.builtin.IRubyObject[]
8 2.48% 80.60% 3201344 100042 6400736 200023 314747 org.jruby.runtime.builtin.IRubyObject[]
9 2.48% 83.07% 3200832 100026 6401920 200060 313772 org.jruby.runtime.builtin.IRubyObject[]
10 2.48% 85.55% 3200000 100000 6398400 199950 314745 org.jruby.runtime.builtin.IRubyObject[]
11 2.48% 88.03% 3199904 99997 8001472 250046 314746 org.jruby.runtime.builtin.IRubyObject[]
12 2.28% 90.31% 2948024 46003 2948024 46003 300000 java.lang.Object[]
13 1.41% 91.72% 1827880 45697 1827880 45697 300000 org.jruby.RubyObject
14 1.34% 93.06% 1737728 54304 8137696 254303 314751 org.jruby.runtime.builtin.IRubyObject[]
15 0.48% 93.55% 625968 9585 625968 9585 300000 char[]
16 0.30% 93.85% 383656 9574 383656 9574 300000 java.lang.String
17 0.11% 93.95% 140240 3506 166320 4158 308716 org.jruby.runtime.callsite.NormalCachingCallSite
18 0.10% 94.05% 125376 3918 435584 13612 308555 org.jruby.lexer.yacc.SimpleSourcePosition
19 0.09% 94.14% 115360 2884 142120 3553 308693 org.jruby.ast.ArrayNode
20 0.09% 94.23% 115360 2884 142120 3553 308696 java.util.ArrayList
In fact, the allocation numbers are totally dominated by fixnums, Object[], and IRubyObject[].
Some of the IRubyObject[] are due to blocks not being arity split. Some are due to dispatches going through array or varargs paths within JRuby.
I'm not sure whether the allocation overhead is the primary bottleneck, but it seems very likely at this point given the massive number of objects being created and thrown away.
We should explore this benchmark in detail and see what we can improve.
Playing with this a bit today...
I made some modifications that specialized any objects that descend from Object as their native root to have specific-sized objects rather than using an array to hold the values. The first attempt was much slower than the original, which I thought might be due to having switches in the IRbuyObject.getVariable implementations in my specialized versions. Then I tried to make specialized VariableAccessor subclasses and IRubyObject getVariable methods that in the specialized classes go straight to the fields. This improved things a bit, but not much.
Ultimately the specialized classes did not make an improvement. Although I have not dug into it, I assume this is due to having polymorphic paths for looking up variables...some going to the array, and some going to fields. So at least the simple approach to specializing object sizes does not have a large positive impact.
This may also indicate that having monomorphic paths all the way to array offsets is pretty fast; in some cases, the specialized logic was a good 1/6 slower than the monomrphic array logic.
Continuing to explore what might be making this slow.