Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.6RC1
    • Fix Version/s: JRuby 1.7.0.pre1
    • Component/s: Performance
    • Labels:
      None
    • Number of attachments :
      1

      Description

      There's probably a lot of things that could be improved on the redblack benchmark that's been floating around.

      gist/repo: git://gist.github.com/44301.git

      numbers:

      ~/projects/gist-44301 ➔ jruby -v --server bm1.rb
      jruby 1.6.0.RC1 (ruby 1.8.7 patchlevel 330) (2011-01-26 80ab30a) (Java HotSpot(TM) Server VM 1.6.0_22) [darwin-i386-java]
      ./red_black_tree.rb:37 warning: instance variable @instance not initialized
      5.216
      2.908
      2.943
      2.913
      3.051
      
      ~/projects/gist-44301 ➔ jruby --1.9 -v --server bm1.rb
      jruby 1.6.0.RC1 (ruby 1.9.2 patchlevel 136) (2011-01-26 80ab30a) (Java HotSpot(TM) Server VM 1.6.0_22) [darwin-i386-java]
      /Users/headius/projects/gist-44301/red_black_tree.rb:37 warning: instance variable @instance not initialized
      5.799
      3.627
      3.566
      3.664
      3.585
      

      The 1.9 numbers are discussed in JRUBY-5427, but both numbers could probably be improved a lot. Rubinius numbers for this benchmark (which is admittedly very numeric and Ruby object-heavy) are in the neighborhood of 1s, so we're at best about 3x slower.

      Top offenders from an allocation profile (for just the first two iterations) show a ton of Object[] being created (for ivar tables) along with the usual RubyFixnum objects and many, many IRubyObject[]:

                percent          live          alloc'ed  stack class
       rank   self  accum     bytes objs     bytes  objs trace name
          1 33.78% 33.78%  43668384 909758 103320192 2152504 302748 org.jruby.RubyFixnum
          2 12.59% 46.38%  16275392 254303  22672256 354254 314598 java.lang.Object[]
          3  9.98% 56.35%  12896800 403025  28579072 893096 314705 org.jruby.runtime.builtin.IRubyObject[]
          4  7.87% 64.22%  10172520 254313  14172560 354314 311802 org.jruby.RubyObject
          5  6.33% 70.56%   8187136 255848  32000064 1000002 313755 org.jruby.runtime.builtin.IRubyObject[]
          6  3.82% 74.38%   4937344 154292  11336064 354252 314707 org.jruby.runtime.builtin.IRubyObject[]
          7  3.74% 78.12%   4839552    8  11362632    91 312361 org.jruby.runtime.builtin.IRubyObject[]
          8  2.48% 80.60%   3201344 100042   6400736 200023 314747 org.jruby.runtime.builtin.IRubyObject[]
          9  2.48% 83.07%   3200832 100026   6401920 200060 313772 org.jruby.runtime.builtin.IRubyObject[]
         10  2.48% 85.55%   3200000 100000   6398400 199950 314745 org.jruby.runtime.builtin.IRubyObject[]
         11  2.48% 88.03%   3199904 99997   8001472 250046 314746 org.jruby.runtime.builtin.IRubyObject[]
         12  2.28% 90.31%   2948024 46003   2948024 46003 300000 java.lang.Object[]
         13  1.41% 91.72%   1827880 45697   1827880 45697 300000 org.jruby.RubyObject
         14  1.34% 93.06%   1737728 54304   8137696 254303 314751 org.jruby.runtime.builtin.IRubyObject[]
         15  0.48% 93.55%    625968 9585    625968  9585 300000 char[]
         16  0.30% 93.85%    383656 9574    383656  9574 300000 java.lang.String
         17  0.11% 93.95%    140240 3506    166320  4158 308716 org.jruby.runtime.callsite.NormalCachingCallSite
         18  0.10% 94.05%    125376 3918    435584 13612 308555 org.jruby.lexer.yacc.SimpleSourcePosition
         19  0.09% 94.14%    115360 2884    142120  3553 308693 org.jruby.ast.ArrayNode
         20  0.09% 94.23%    115360 2884    142120  3553 308696 java.util.ArrayList
      

      In fact, the allocation numbers are totally dominated by fixnums, Object[], and IRubyObject[].

      Some of the IRubyObject[] are due to blocks not being arity split. Some are due to dispatches going through array or varargs paths within JRuby.

      I'm not sure whether the allocation overhead is the primary bottleneck, but it seems very likely at this point given the massive number of objects being created and thrown away.

      We should explore this benchmark in detail and see what we can improve.

        Activity

        Hide
        Charles Oliver Nutter added a comment -

        Playing with this a bit today...

        I made some modifications that specialized any objects that descend from Object as their native root to have specific-sized objects rather than using an array to hold the values. The first attempt was much slower than the original, which I thought might be due to having switches in the IRbuyObject.getVariable implementations in my specialized versions. Then I tried to make specialized VariableAccessor subclasses and IRubyObject getVariable methods that in the specialized classes go straight to the fields. This improved things a bit, but not much.

        Ultimately the specialized classes did not make an improvement. Although I have not dug into it, I assume this is due to having polymorphic paths for looking up variables...some going to the array, and some going to fields. So at least the simple approach to specializing object sizes does not have a large positive impact.

        This may also indicate that having monomorphic paths all the way to array offsets is pretty fast; in some cases, the specialized logic was a good 1/6 slower than the monomrphic array logic.

        Continuing to explore what might be making this slow.

        Show
        Charles Oliver Nutter added a comment - Playing with this a bit today... I made some modifications that specialized any objects that descend from Object as their native root to have specific-sized objects rather than using an array to hold the values. The first attempt was much slower than the original, which I thought might be due to having switches in the IRbuyObject.getVariable implementations in my specialized versions. Then I tried to make specialized VariableAccessor subclasses and IRubyObject getVariable methods that in the specialized classes go straight to the fields. This improved things a bit, but not much. Ultimately the specialized classes did not make an improvement. Although I have not dug into it, I assume this is due to having polymorphic paths for looking up variables...some going to the array, and some going to fields. So at least the simple approach to specializing object sizes does not have a large positive impact. This may also indicate that having monomorphic paths all the way to array offsets is pretty fast; in some cases, the specialized logic was a good 1/6 slower than the monomrphic array logic. Continuing to explore what might be making this slow.
        Hide
        Charles Oliver Nutter added a comment -

        Attach patch for specialized objects.

        Show
        Charles Oliver Nutter added a comment - Attach patch for specialized objects.
        Hide
        Charles Oliver Nutter added a comment -

        JRuby 1.7 supports invokedynamic, which makes us comfortably the fastest Ruby implementation on this benchmark. We've also narrowed the gap considerably without invokedynamic, so I'm calling this fixed.

        Show
        Charles Oliver Nutter added a comment - JRuby 1.7 supports invokedynamic, which makes us comfortably the fastest Ruby implementation on this benchmark. We've also narrowed the gap considerably without invokedynamic, so I'm calling this fixed.

          People

          • Assignee:
            Charles Oliver Nutter
            Reporter:
            Charles Oliver Nutter
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: