JRuby (please use github issues at http://bugs.jruby.org)
  1. JRuby (please use github issues at http://bugs.jruby.org)
  2. JRUBY-6766

--profile slows down Array#join , which distorts profiling results

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.6.7
    • Fix Version/s: JRuby 1.7.0.pre2
    • Component/s: Performance
    • Labels:
    • Environment:
      jruby 1.6.7.2 (ruby-1.9.2-p312) (2012-05-01 26e08ba) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_04) [darwin-x86_64-java]
    • Number of attachments :
      0

      Description

      When code is being profiled, Array#join becomes slower. In addition, the slowness of Array#join is recorded in the profile data, rather than treated as overhead.

      cat > array_join.rb
      array = Array.new(100_000, nil)
      array.join
      $ cat > array_to_s.rb
      array = Array.new(100_000, nil)
      array.map(&:to_s)
      $ ruby --1.9 --version
      jruby 1.6.7.2 (ruby-1.9.2-p312) (2012-05-01 26e08ba) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_04) [darwin-x86_64-java]
      $ time ruby --1.9 /dev/null
      
      real	0m1.000s
      user	0m1.471s
      sys	0m0.089s
      $ time ruby --1.9 array_join.rb
      
      real	0m1.099s
      user	0m1.834s
      sys	0m0.095s
      $ time ruby --1.9 array_to_s.rb
      
      real	0m1.173s
      user	0m1.741s
      sys	0m0.092s
      $ time ruby --1.9 --profile /dev/null 2> dev_null.txt
      
      real	0m1.316s
      user	0m2.128s
      sys	0m0.110s
      $ time ruby --1.9 --profile array_join.rb 2> array_join.txt
      
      real	0m5.769s
      user	0m7.145s
      sys	0m0.141s
      $ time ruby --1.9 --profile array_to_s.rb 2> array_to_s.txt
      
      real	0m1.430s
      user	0m2.410s
      sys	0m0.108s
      
      # array_join.txt
      main thread profile results:
      Total time: 5.19
      
           total        self    children       calls  method
      ----------------------------------------------------------------
            4.47        0.30        4.17           4  Array#join
            4.07        4.04        0.03      100002  BasicObject#method_missing
            0.69        0.10        0.59          56  Kernel#require
            0.29        0.00        0.29          15  Kernel#require
            0.08        0.06        0.02      200014  Kernel#respond_to?
            0.06        0.06        0.00           1  Time.now
            0.05        0.01        0.04          53  Object.method_added
            0.04        0.00        0.04         111  Object.method_added
            0.03        0.01        0.03           6  Array#each
            0.03        0.03        0.00      100002  NoMethodError#initialize
            0.02        0.01        0.01          12  JavaUtilities#get_proxy_or_package_under_package
            0.02        0.02        0.00      100000  NilClass#to_s
            0.02        0.01        0.01          30  Module#module_eval
            0.02        0.02        0.00      200013  Kernel#respond_to_missing?
            0.02        0.00        0.02          13  Class#new
            0.02        0.02        0.00          32  Module#initialize_copy
            0.02        0.02        0.00         184  String#=~
            0.01        0.01        0.00           1  Array#initialize
            0.01        0.00        0.01           5  Java::Java.method_missing
            0.01        0.00        0.01           1  Gem.win_platform?
            0.01        0.00        0.01           1  Enumerable#find
            0.01        0.01        0.00          16  Module#private
            0.01        0.01        0.00           1  Kernel#trap
            0.01        0.01        0.00          85  Module#alias_method
            0.01        0.00        0.00          25  Gem::Deprecate#deprecate
            0.00        0.00        0.00           1  Java::OrgJrubyLexerYacc.method_missing
            0.00        0.00        0.00          31  Module#attr_accessor
            0.00        0.00        0.00          18  JavaUtilities#get_package_module_dot_format
            0.00        0.00        0.00           1  Module#instance_methods
            0.00        0.00        0.00          14  Module#include
            0.00        0.00        0.00           2  Java::OrgJruby.method_missing
            0.00        0.00        0.00          70  Module#undef_method
            0.00        0.00        0.00          16  Kernel#java
            0.00        0.00        0.00           2  Gem::Requirement.default
            0.00        0.00        0.00           1  Java::JavaUtil.method_missing
            0.00        0.00        0.00           2  Gem::Requirement#initialize
            0.00        0.00        0.00           1  Hash#reject
            0.00        0.00        0.00          10  JavaPackageModuleTemplate.__block__
            0.00        0.00        0.00          87  BasicObject#!=
            0.00        0.00        0.00           1  Java::JavaNio.method_missing
            0.00        0.00        0.00           2  JavaInterfaceTemplate.append_features
            0.00        0.00        0.00           1  Java::Org.method_missing
            0.00        0.00        0.00           2  Array#map!
            0.00        0.00        0.00           2  Gem::Requirement.parse
            0.00        0.00        0.00          32  Module#define_method
            0.00        0.00        0.00           2  Kernel#org
            0.00        0.00        0.00           4  Kernel.method_added
            0.00        0.00        0.00           1  Enumerable#sort_by
            0.00        0.00        0.00           1  JavaUtilities#get_package_module
            0.00        0.00        0.00           2  Gem::Version#initialize
      
      # array_to_s.txt
      main thread profile results:
      Total time: 0.84
      
           total        self    children       calls  method
      ----------------------------------------------------------------
            0.70        0.11        0.59          56  Kernel#require
            0.29        0.00        0.29          15  Kernel#require
            0.12        0.08        0.03           4  Array#collect
            0.06        0.06        0.00           1  Time.now
            0.04        0.01        0.04          53  Object.method_added
            0.04        0.00        0.04         111  Object.method_added
            0.03        0.01        0.03           6  Array#each
            0.03        0.03        0.00      100000  NilClass#to_s
            0.02        0.01        0.01          12  JavaUtilities#get_proxy_or_package_under_package
            0.02        0.02        0.01          30  Module#module_eval
            0.02        0.02        0.00          32  Module#initialize_copy
            0.02        0.00        0.02          13  Class#new
            0.02        0.02        0.00         184  String#=~
            0.01        0.01        0.00           1  Array#initialize
            0.01        0.00        0.01           5  Java::Java.method_missing
            0.01        0.00        0.01           1  Gem.win_platform?
            0.01        0.00        0.01           1  Enumerable#find
            0.01        0.01        0.00          16  Module#private
            0.01        0.01        0.00           1  Kernel#trap
            0.01        0.01        0.00          85  Module#alias_method
            0.01        0.00        0.00          25  Gem::Deprecate#deprecate
            0.00        0.00        0.00           2  Java::OrgJruby.method_missing
            0.00        0.00        0.00           1  Java::OrgJrubyLexerYacc.method_missing
            0.00        0.00        0.00           1  Hash#reject
            0.00        0.00        0.00          31  Module#attr_accessor
            0.00        0.00        0.00          18  JavaUtilities#get_package_module_dot_format
            0.00        0.00        0.00          87  BasicObject#!=
            0.00        0.00        0.00           1  Module#instance_methods
            0.00        0.00        0.00          14  Module#include
            0.00        0.00        0.00          70  Module#undef_method
            0.00        0.00        0.00          16  Kernel#java
            0.00        0.00        0.00           2  Gem::Requirement.default
            0.00        0.00        0.00           2  Gem::Requirement#initialize
            0.00        0.00        0.00           1  Java::JavaUtil.method_missing
            0.00        0.00        0.00          10  JavaPackageModuleTemplate.__block__
            0.00        0.00        0.00           1  Java::JavaNio.method_missing
            0.00        0.00        0.00           1  Java::Org.method_missing
            0.00        0.00        0.00           2  JavaInterfaceTemplate.append_features
            0.00        0.00        0.00           2  Array#map!
            0.00        0.00        0.00           2  Gem::Requirement.parse
            0.00        0.00        0.00          32  Module#define_method
            0.00        0.00        0.00           1  Enumerable#sort_by
            0.00        0.00        0.00           2  Kernel#org
            0.00        0.00        0.00           4  Kernel.method_added
            0.00        0.00        0.00           1  JavaUtilities#get_package_module
            0.00        0.00        0.00           2  Gem::Version#initialize
            0.00        0.00        0.00         223  Class#inherited
            0.00        0.00        0.00           2  Comparable#==
            0.00        0.00        0.00         432  Module#method_added
            0.00        0.00        0.00           2  Gem::Requirement#<=>
      

        Activity

        Hide
        Charles Oliver Nutter added a comment -

        I'm not sure I really see the problem here...

        Array#join is substantially more compute-intensive than Array#map, since it must construct a String and append all elements to it in addition to turning them into strings. Array#map simply adds modified elements to an array, which in this case simply does the to_s call. I would expect heavier methods like join to slow down more than map under profiling.

        Not also that while Array#join does show up at the top of the profiling results, I would not expect its "self" time to be substantially higher under profiling.

        In any case, I'll poke around Array#join a bit to see if there's an explanation for the slowdown.

        Show
        Charles Oliver Nutter added a comment - I'm not sure I really see the problem here... Array#join is substantially more compute-intensive than Array#map, since it must construct a String and append all elements to it in addition to turning them into strings. Array#map simply adds modified elements to an array, which in this case simply does the to_s call. I would expect heavier methods like join to slow down more than map under profiling. Not also that while Array#join does show up at the top of the profiling results, I would not expect its "self" time to be substantially higher under profiling. In any case, I'll poke around Array#join a bit to see if there's an explanation for the slowdown.
        Hide
        Charles Oliver Nutter added a comment -

        Ok, so a big part of this overhead in #join is the fact that in 1.9 mode we do a "checked" to_str on the elements. Checked means that it first checks whether the target object implements the coercion method, and then goes ahead and invokes it. This involves additional uncached method table hits compared to 1.8 mode.

        Perf numbers comparing 1.8 and 1.9 modes, profiled and unprofiled:

        system ~/projects/jruby $ jruby --1.8 -rbenchmark -e "10.times { ary = Array.new(1_000_000, nil); puts Benchmark.measure { ary.join } }"
          0.450000   0.020000   0.470000 (  0.209000)
          0.380000   0.020000   0.400000 (  0.190000)
          0.240000   0.020000   0.260000 (  0.119000)
          0.160000   0.000000   0.160000 (  0.107000)
          0.190000   0.000000   0.190000 (  0.108000)
          0.240000   0.010000   0.250000 (  0.121000)
          0.250000   0.020000   0.270000 (  0.122000)
          0.240000   0.010000   0.250000 (  0.120000)
          0.230000   0.010000   0.240000 (  0.116000)
          0.230000   0.000000   0.230000 (  0.118000)
        
        system ~/projects/jruby $ jruby --1.9 -rbenchmark -e "10.times { ary = Array.new(1_000_000, nil); puts Benchmark.measure { ary.join } }"
          0.680000   0.030000   0.710000 (  0.316000)
          0.480000   0.010000   0.490000 (  0.212000)
          0.280000   0.020000   0.300000 (  0.164000)
          0.280000   0.010000   0.290000 (  0.164000)
          0.290000   0.010000   0.300000 (  0.168000)
          0.290000   0.010000   0.300000 (  0.165000)
          0.270000   0.000000   0.270000 (  0.165000)
          0.270000   0.010000   0.280000 (  0.164000)
          0.280000   0.010000   0.290000 (  0.162000)
          0.270000   0.000000   0.270000 (  0.163000)
        
        system ~/projects/jruby $ jruby --profile --1.8 -rbenchmark -e "10.times { ary = Array.new(1_000_000, nil); puts Benchmark.measure { ary.join } }"
        Profiling enabled; ^C shutdown will now dump profile info
          0.850000   0.010000   0.860000 (  0.485000)
          0.570000   0.020000   0.590000 (  0.378000)
          0.470000   0.020000   0.490000 (  0.349000)
          0.520000   0.010000   0.530000 (  0.359000)
          0.560000   0.010000   0.570000 (  0.374000)
          0.530000   0.020000   0.550000 (  0.375000)
          0.530000   0.010000   0.540000 (  0.365000)
          0.530000   0.010000   0.540000 (  0.375000)
          0.530000   0.020000   0.550000 (  0.367000)
          0.520000   0.010000   0.530000 (  0.371000)
        <profile elided>
        
        system ~/projects/jruby $ jruby --profile --1.9 -rbenchmark -e "10.times { ary = Array.new(1_000_000, nil); puts Benchmark.measure { ary.join } }"
        Profiling enabled; ^C shutdown will now dump profile info
         94.430000   0.620000  95.050000 ( 92.819000)
        ^C
        
        Show
        Charles Oliver Nutter added a comment - Ok, so a big part of this overhead in #join is the fact that in 1.9 mode we do a "checked" to_str on the elements. Checked means that it first checks whether the target object implements the coercion method, and then goes ahead and invokes it. This involves additional uncached method table hits compared to 1.8 mode. Perf numbers comparing 1.8 and 1.9 modes, profiled and unprofiled: system ~/projects/jruby $ jruby --1.8 -rbenchmark -e "10.times { ary = Array.new(1_000_000, nil); puts Benchmark.measure { ary.join } }" 0.450000 0.020000 0.470000 ( 0.209000) 0.380000 0.020000 0.400000 ( 0.190000) 0.240000 0.020000 0.260000 ( 0.119000) 0.160000 0.000000 0.160000 ( 0.107000) 0.190000 0.000000 0.190000 ( 0.108000) 0.240000 0.010000 0.250000 ( 0.121000) 0.250000 0.020000 0.270000 ( 0.122000) 0.240000 0.010000 0.250000 ( 0.120000) 0.230000 0.010000 0.240000 ( 0.116000) 0.230000 0.000000 0.230000 ( 0.118000) system ~/projects/jruby $ jruby --1.9 -rbenchmark -e "10.times { ary = Array.new(1_000_000, nil); puts Benchmark.measure { ary.join } }" 0.680000 0.030000 0.710000 ( 0.316000) 0.480000 0.010000 0.490000 ( 0.212000) 0.280000 0.020000 0.300000 ( 0.164000) 0.280000 0.010000 0.290000 ( 0.164000) 0.290000 0.010000 0.300000 ( 0.168000) 0.290000 0.010000 0.300000 ( 0.165000) 0.270000 0.000000 0.270000 ( 0.165000) 0.270000 0.010000 0.280000 ( 0.164000) 0.280000 0.010000 0.290000 ( 0.162000) 0.270000 0.000000 0.270000 ( 0.163000) system ~/projects/jruby $ jruby --profile --1.8 -rbenchmark -e "10.times { ary = Array.new(1_000_000, nil); puts Benchmark.measure { ary.join } }" Profiling enabled; ^C shutdown will now dump profile info 0.850000 0.010000 0.860000 ( 0.485000) 0.570000 0.020000 0.590000 ( 0.378000) 0.470000 0.020000 0.490000 ( 0.349000) 0.520000 0.010000 0.530000 ( 0.359000) 0.560000 0.010000 0.570000 ( 0.374000) 0.530000 0.020000 0.550000 ( 0.375000) 0.530000 0.010000 0.540000 ( 0.365000) 0.530000 0.010000 0.540000 ( 0.375000) 0.530000 0.020000 0.550000 ( 0.367000) 0.520000 0.010000 0.530000 ( 0.371000) <profile elided> system ~/projects/jruby $ jruby --profile --1.9 -rbenchmark -e "10.times { ary = Array.new(1_000_000, nil); puts Benchmark.measure { ary.join } }" Profiling enabled; ^C shutdown will now dump profile info 94.430000 0.620000 95.050000 ( 92.819000) ^C
        Hide
        Charles Oliver Nutter added a comment -

        Actually, I was wrong; this logic does not go through the checkedCallMethod path, so I'm not yet certain why 1.9 mode is so much slower under profiling.

        Show
        Charles Oliver Nutter added a comment - Actually, I was wrong; this logic does not go through the checkedCallMethod path, so I'm not yet certain why 1.9 mode is so much slower under profiling.
        Hide
        Charles Oliver Nutter added a comment -

        So...I profiled the profiler (chuckle) and saw something very interesting. It appears in 1.9 mode we're generating a tremendous amount of backtraces in this scenario:

                Stub + native   Method                        
         66.5%     0  +  3174    java.lang.Throwable.getStackTraceElement
          8.0%     0  +   383    java.lang.Throwable.fillInStackTrace
          4.8%     0  +   231    java.lang.System.identityHashCode
          0.8%     0  +    39    java.lang.System.arraycopy
          0.4%     0  +    19    java.lang.System.nanoTime
          0.3%     0  +    14    java.lang.Throwable.getStackTraceDepth
          0.3%     0  +    12    java.lang.Thread.currentThread
          0.1%     0  +     4    sun.misc.Unsafe.compareAndSwapInt
          0.0%     0  +     2    java.lang.String.intern
          0.0%     0  +     2    java.util.zip.Inflater.inflateBytes
          0.0%     0  +     1    sun.misc.Unsafe.getInt
         81.3%     0  +  3881    Total stub
        

        Going to investigate. I'm 99% sure this is where the slowdown comes from.

        Show
        Charles Oliver Nutter added a comment - So...I profiled the profiler ( chuckle ) and saw something very interesting. It appears in 1.9 mode we're generating a tremendous amount of backtraces in this scenario: Stub + native Method 66.5% 0 + 3174 java.lang.Throwable.getStackTraceElement 8.0% 0 + 383 java.lang.Throwable.fillInStackTrace 4.8% 0 + 231 java.lang.System.identityHashCode 0.8% 0 + 39 java.lang.System.arraycopy 0.4% 0 + 19 java.lang.System.nanoTime 0.3% 0 + 14 java.lang.Throwable.getStackTraceDepth 0.3% 0 + 12 java.lang.Thread.currentThread 0.1% 0 + 4 sun.misc.Unsafe.compareAndSwapInt 0.0% 0 + 2 java.lang.String.intern 0.0% 0 + 2 java.util.zip.Inflater.inflateBytes 0.0% 0 + 1 sun.misc.Unsafe.getInt 81.3% 0 + 3881 Total stub Going to investigate. I'm 99% sure this is where the slowdown comes from.
        Hide
        Charles Oliver Nutter added a comment -
        commit 846be19993cc1f76d0bf0d1c9ec4ed8ad0772793
        Author: Charles Oliver Nutter <headius@headius.com>
        Date:   Tue Jul 24 16:56:16 2012 -0500
        
            Fix JRUBY-6766
            
            --profile slows down Array#join , which distorts profiling results
            
            This turned out to be a really good find.
            
            In order to avoid the cost of dispatching to methods like
            respond_to? and method_missing for value coercion, we pre-cache
            those methods in org.jruby.Ruby and use identity checks to
            shortcut faster, non-dynamic logic. This works well during normal
            execution, but under profiling all method objects are wrapped
            with ProfilingDynamicMethod objects that do the profile gathering.
            Because the identity of the original method then becomes obscured
            by the wrapper (which is created on a per-class basis) we always
            followed the slow path. The slow path in the case of
            method_missing raises an exception, and the exception backtraces
            were the source of profiling slowdown.
            
            I modified DelegatingDynamicMethod, the superclass of
            ProfilingDynamicMethod, to override equals() to consider only the
            wrapped method and fixed all related == comparisons to use
            equals(), ensuring that regardless of which wrapper was
            encountered the identity of the method would not be obscured. This
            returned profiled performance for e.g. Array#join to normal
            levels.
            
            I should note that this fix improves profiling performance for all
            coercion paths and methods that do coercion, so it has far-
            reaching implications.
            
            Thanks for reporting this :)
        
        :100644 100644 9bf8d7b... 5cb2feb... M	src/org/jruby/BasicObjectStub.java
        :100644 100644 9ad7c1c... d7919d8... M	src/org/jruby/Ruby.java
        :100644 100644 3d22552... 5792fa8... M	src/org/jruby/RubyBasicObject.java
        :100644 100644 21c5f7b... 1e4e5dc... M	src/org/jruby/RubyClass.java
        :100644 100644 8768ed4... 28b1768... M	src/org/jruby/RubyKernel.java
        :100644 100644 eb68310... c6ff2f2... M	src/org/jruby/internal/runtime/methods/DelegatingDynamicMethod.java
        :100644 100644 4594e1c... e63ffda... M	src/org/jruby/javasupport/util/RuntimeHelpers.java
        :100644 100644 5fee035... cb2b168... M	src/org/jruby/runtime/callsite/RespondToCallSite.java
        
        Show
        Charles Oliver Nutter added a comment - commit 846be19993cc1f76d0bf0d1c9ec4ed8ad0772793 Author: Charles Oliver Nutter <headius@headius.com> Date: Tue Jul 24 16:56:16 2012 -0500 Fix JRUBY-6766 --profile slows down Array#join , which distorts profiling results This turned out to be a really good find. In order to avoid the cost of dispatching to methods like respond_to? and method_missing for value coercion, we pre-cache those methods in org.jruby.Ruby and use identity checks to shortcut faster, non-dynamic logic. This works well during normal execution, but under profiling all method objects are wrapped with ProfilingDynamicMethod objects that do the profile gathering. Because the identity of the original method then becomes obscured by the wrapper (which is created on a per-class basis) we always followed the slow path. The slow path in the case of method_missing raises an exception, and the exception backtraces were the source of profiling slowdown. I modified DelegatingDynamicMethod, the superclass of ProfilingDynamicMethod, to override equals() to consider only the wrapped method and fixed all related == comparisons to use equals(), ensuring that regardless of which wrapper was encountered the identity of the method would not be obscured. This returned profiled performance for e.g. Array#join to normal levels. I should note that this fix improves profiling performance for all coercion paths and methods that do coercion, so it has far- reaching implications. Thanks for reporting this :) :100644 100644 9bf8d7b... 5cb2feb... M src/org/jruby/BasicObjectStub.java :100644 100644 9ad7c1c... d7919d8... M src/org/jruby/Ruby.java :100644 100644 3d22552... 5792fa8... M src/org/jruby/RubyBasicObject.java :100644 100644 21c5f7b... 1e4e5dc... M src/org/jruby/RubyClass.java :100644 100644 8768ed4... 28b1768... M src/org/jruby/RubyKernel.java :100644 100644 eb68310... c6ff2f2... M src/org/jruby/internal/runtime/methods/DelegatingDynamicMethod.java :100644 100644 4594e1c... e63ffda... M src/org/jruby/javasupport/util/RuntimeHelpers.java :100644 100644 5fee035... cb2b168... M src/org/jruby/runtime/callsite/RespondToCallSite.java

          People

          • Assignee:
            Charles Oliver Nutter
            Reporter:
            Andrew Grimm
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: