Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.7.0.pre1
    • Fix Version/s: JRuby 1.7.0.pre2
    • Component/s: Core Classes/Modules
    • Labels:
      None
    • Environment:
      Java 1.7.0_05-b05
    • Number of attachments :
      0

      Description

      Reading a larger number of files (27K in our case) is about 10 times slower with JRuby trunk compared to 1.6.x. Both interpreters were in 1.9 mode. Running the same benchmark in 1.8 mode reduces the difference significantly, or even removes it.

      script

      require 'json'
      start = Time.now
      entries = Dir.glob("../var/jobs/*/*/*/*/*/request")
      puts "Number of entries: #{entries.length}"
      File.open("urls", "w") do |fh|
        entries.each do |file|
          content = File.read(file)
          # json = JSON.load(content)
          # url = json['uri']
          # raise if url =~ /\n/
          # fh.puts url
        end
      end
      
      puts Time.now - start
      

      benchmark

      vjruby -v --1.9 extract.rb; jruby -v --1.9 extract.rb
      jruby 1.7.0.preview2.dev (ruby-1.9.3-p203) (2012-07-26 3ae8885) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_05) [darwin-x86_64-java]
      Number of entries: 27077
      29.691
      jruby 1.6.4 (ruby-1.9.2-p136) (2011-08-23 17ea768) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_05) [darwin-x86_64-java]
      Number of entries: 27077
      3.417
      

      sample

      vjruby --sample -v --1.9 extract.rb;
      jruby 1.7.0.preview2.dev (ruby-1.9.3-p203) (2012-07-26 3ae8885) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_05) [darwin-x86_64-java]
      Number of entries: 27077
      28.908
      
      Flat profile of 30,25 secs (2711 total ticks): main
      
        Interpreted + native Method
       13,3% 361 + 0 sun.nio.cs.ext.ExtendedCharsets.<init>
        0,6% 0 + 17 sun.nio.ch.FileDispatcherImpl.preClose0
        0,6% 16 + 0 org.jruby.Ruby.getClassLoader
        0,4% 0 + 12 java.io.RandomAccessFile.open
        0,4% 0 + 12 java.lang.Throwable.fillInStackTrace
        0,4% 0 + 10 java.lang.Class.forName0
        0,4% 10 + 0 sun.net.www.MessageHeader.grow
        0,3% 0 + 9 java.io.UnixFileSystem.getBooleanAttributes0
        0,3% 8 + 1 com.kenai.jffi.Foreign.invokeN2O2
        0,3% 7 + 0 java.util.Arrays.copyOfRange
        0,3% 7 + 0 java.util.regex.Matcher.<init>
        0,2% 6 + 0 java.nio.HeapCharBuffer.<init>
        0,2% 6 + 0 jnr.ffi.provider.AbstractArrayMemoryIO.<init>
        0,2% 0 + 6 sun.misc.Unsafe.getObject
        0,2% 6 + 0 org.jruby.internal.runtime.methods.DynamicMethod.call
        0,2% 1 + 4 java.security.AccessController.doPrivileged
        0,2% 5 + 0 org.jruby.javasupport.util.RuntimeHelpers.restructureBlockArgs19
        0,2% 5 + 0 org.jruby.util.Dir.glob_helper
        0,2% 2 + 3 java.io.UnixFileSystem.list
        0,2% 5 + 0 java.nio.Bits.reserveMemory
        0,1% 0 + 4 sun.nio.ch.FileDispatcherImpl.size0
        0,1% 4 + 0 org.jruby.RubyHash.alloc
        0,1% 0 + 4 sun.nio.ch.FileDispatcherImpl.read0
        0,1% 4 + 0 org.jruby.RubyIO.read19
        0,1% 4 + 0 org.jruby.util.ByteList.<init>
       25,0% 552 + 125 Total interpreted (including elided)
      
           Compiled + native Method
       24,0% 645 + 6 java.util.TreeMap.put
       23,4% 635 + 0 sun.nio.cs.AbstractCharsetProvider.charset
        0,5% 0 + 13 sun.nio.cs.StreamDecoder.implRead
        0,3% 0 + 8 org.jruby.util.io.ChannelStream.readall
        0,2% 0 + 5 java.lang.ClassLoader.getSystemClassLoader
        0,2% 0 + 5 java.lang.Object.<init>
        0,1% 0 + 4 org.jruby.util.io.ChannelStream.fclose
        0,1% 4 + 0 sun.nio.cs.ext.ExtendedCharsets.<init>
        0,1% 0 + 3 java.util.zip.ZipFile$ZipFileInputStream.close
        0,1% 0 + 2 org.jruby.RubyFile$INVOKER$i$0$2$initialize19.call
        0,0% 0 + 1 java.net.URL.openConnection
        0,0% 0 + 1 java.util.jar.JarFile.getEntry
        0,0% 1 + 0 java.util.HashMap.put
       49,2% 1285 + 48 Total compiled
      
               Stub + native Method
        9,4% 20 + 235 java.security.AccessController.doPrivileged
        5,7% 13 + 142 java.io.UnixFileSystem.getBooleanAttributes0
        3,9% 0 + 105 java.util.zip.ZipFile.getEntry
        0,7% 0 + 20 java.lang.Throwable.fillInStackTrace
        0,7% 0 + 19 java.lang.Class.forName0
        0,6% 10 + 5 java.security.AccessController.doPrivileged
        0,6% 0 + 15 java.lang.System.arraycopy
        0,5% 0 + 14 java.io.RandomAccessFile.open
        0,5% 0 + 13 java.util.zip.ZipFile.read
        0,4% 0 + 10 java.lang.Thread.currentThread
        0,4% 0 + 10 com.kenai.jffi.Foreign.invokeN2O2
        0,3% 0 + 9 sun.nio.ch.FileDispatcherImpl.preClose0
        0,2% 0 + 6 java.lang.Class.getClassLoader0
        0,2% 0 + 6 sun.misc.Unsafe.getObjectVolatile
        0,2% 0 + 6 sun.misc.Unsafe.setMemory
        0,2% 0 + 5 java.lang.Object.hashCode
        0,1% 0 + 4 sun.nio.ch.FileDispatcherImpl.read0
        0,1% 0 + 3 java.io.RandomAccessFile.close0
        0,1% 0 + 2 java.util.zip.ZipFile.freeEntry
        0,1% 0 + 2 sun.nio.ch.FileDispatcherImpl.size0
        0,1% 0 + 2 java.lang.String.intern
        0,1% 0 + 2 java.util.zip.Inflater.inflateBytes
        0,1% 0 + 2 java.util.zip.ZipFile.getEntryCSize
        0,0% 0 + 1 java.util.zip.ZipFile.getEntryTime
        0,0% 0 + 1 java.util.zip.ZipFile.getEntryCrc
       25,3% 43 + 644 Total stub (including elided)
      
        Thread-local ticks:
        0,5% 14 Class loader
      
      
      Flat profile of 0,00 secs (1 total ticks): DestroyJavaVM
      
        Thread-local ticks:
      100,0% 1 Blocked (of total)
      
      
      Global summary of 30,25 seconds:
      100,0% 2733 Received ticks
        0,7% 19 Received GC ticks
       14,5% 395 Compilation
        0,0% 1 Other VM operations
        0,5% 14 Class loader
      

      adhoc irc analysis

      18:11 < enebo> reto_, headius: This is likely because we have no caching CharsetTranscoder atm
      18:11 < reto_> okay, forget 1.8 mode completly, thats just startup
      18:12 < headius> so we could cache based on to/from in the future, I get that
      18:12 < headius> but I'm confused why we can't trust jcodings Encoding.getCharset first…are they reporting wrong Charset?
      18:12 < enebo> Many common IO cases we can make a fully caching CharsetTranscoder but I did not go there yet since I suspect transcoding fixes are not complete yet
      18:13 < headius> I see your comment about aliases, but if jcodings Encoding.getCharset is giving back a wrong charset, that's a bug
      18:13 < headius> it *should* be the proper Charset at least
      18:13 < enebo> headius: I don't remember exact details (that fix was from a long time ago) but it was definitely a problem with what jcodings had versus what we could get with forName
      18:13 < headius> hmm
      18:13 < enebo> Oh I remember
      18:14 < headius> I suspect that Charset.forName is a lot more expensive than we realize, based on profiles
      18:14 < enebo> ascii-8bit ends up as an alias for lots of ascii subsets
      18:14 < headius> I remember at one point in the past it was actually a synchronization bottleneck
      18:14 < enebo> It only has ONE charset associated with it
      18:14 < enebo> All aliases actually have their own charset
      18:14 < headius> ahhh sure
      18:14 < enebo> So the solution would be to have no aliases in jcodings
      18:14 < enebo> Or a map to charsets
      18:14 < enebo> for aliases
      18:15 < headius> hmmm yeah
      18:15 < enebo> big change to jcodings
      18:15 < enebo> but I think this problem is somewhat mitigated by specialized charsettranscoder
      18:15 < headius> yeah, I think it would be
      18:15 < enebo> Assuming this is not a tight loop doing n million new io and one op on it
      18:15 < headius> one that has Charset fields for to and from, and then we just save the CharsetTranscoder instances
      18:15 < enebo> Even then I suspect the charset lookup won't be the big cost
      18:16 < headius> it doesn't mutate
      18:16 < enebo> headius: yeah we just need one subclass when we know both to and from
      18:16 < enebo> In that case it will only happen once at construction of transcoder
      18:16 < headius> I don't think that would be a difficult addition
      18:16 < enebo> yeah it is simple
      18:16 < headius> subclass?
      18:16 < enebo> subclas of CharsetTranscoder
      18:16 < headius> I figured we'd have a cache in Ruby somewhere
      18:17 < headius> if not there, create a transcoder and save it
      18:17 < headius> it just seems simplest to me
      18:17 < enebo> This is just because we don't want it for n io ops on once io object?
      18:17 < headius> right
      18:18 < enebo> saving it from all io creation seems like way more complexity
      18:18 < enebo> Since you now have to manage the cache
      18:18 < headius> manage schmanage
      18:18 < enebo> and you need to key off to,from,codingactions
      18:18 < enebo> I guess charset itself could be cached simply too
      18:18 < headius> we could also just try a static forName cache first
      18:18 < enebo> but heh
      18:18 < headius> it could be static because we'd cache string => charset…all bootclasses
      

        Activity

        Hide
        Charles Oliver Nutter added a comment -

        This commit will probably resolve the bulk of your perf issues. There's more tuning needed in our transcoding subsystem, but we'll call this particular issue resolved.

        commit a00d2a223def2da87d105252b857c953de69a2e0
        Author: Charles Oliver Nutter <headius@headius.com>
        Date:   Tue Jul 31 12:28:01 2012 -0500
        
            Fix JRUBY-6802 (and probably JRUBY-6803)
            
            JSON generation for largish data structures v. slow under 1.7 (JDK 1.6)
            Reading thousands of files significantly slower with JRuby 1.7
            
            This isn't a complete fix, but it solves the most egregious perf
            issue.
            
            The transcoding logic in JRuby currently does Charset.forName()
            lookups repeatedly to get the proper Java Charset for a given
            jcodings Encoding. Unfortunately, the "ASCII-8BIT" name used by
            Ruby (and jcodings) does not map to a Charset, resulting in
            every transcoding operation triggering an exception from forName.
            Short-circuiting this logic explicitly for ASCII-8BIT avoids that
            issue.
            
            We're aware that there are other encodings that will also trigger
            exceptions, and that we're repeatedly looking up the same Charsets
            over and over. We will fix that in separate commits. Because the
            bulk of the performance issues in these two bugs are fixed by this
            patch, I'm going to call them resolved.
        
        :100644 100644 6183077... 5d626b2... M	src/org/jruby/util/CharsetTranscoder.java
        
        Show
        Charles Oliver Nutter added a comment - This commit will probably resolve the bulk of your perf issues. There's more tuning needed in our transcoding subsystem, but we'll call this particular issue resolved. commit a00d2a223def2da87d105252b857c953de69a2e0 Author: Charles Oliver Nutter <headius@headius.com> Date: Tue Jul 31 12:28:01 2012 -0500 Fix JRUBY-6802 (and probably JRUBY-6803) JSON generation for largish data structures v. slow under 1.7 (JDK 1.6) Reading thousands of files significantly slower with JRuby 1.7 This isn't a complete fix, but it solves the most egregious perf issue. The transcoding logic in JRuby currently does Charset.forName() lookups repeatedly to get the proper Java Charset for a given jcodings Encoding. Unfortunately, the "ASCII-8BIT" name used by Ruby (and jcodings) does not map to a Charset, resulting in every transcoding operation triggering an exception from forName. Short-circuiting this logic explicitly for ASCII-8BIT avoids that issue. We're aware that there are other encodings that will also trigger exceptions, and that we're repeatedly looking up the same Charsets over and over. We will fix that in separate commits. Because the bulk of the performance issues in these two bugs are fixed by this patch, I'm going to call them resolved. :100644 100644 6183077... 5d626b2... M src/org/jruby/util/CharsetTranscoder.java

          People

          • Assignee:
            Charles Oliver Nutter
            Reporter:
            Reto Schttel
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: