jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • JRuby
  • JRUBY-335

Slow startup caused by over-complicated loading heuristic, ClassLoader.getResource being slow

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: JRuby 0.9.1
  • Fix Version/s: JRuby 0.9.8
  • Component/s: Java Integration
  • Labels:
    None
  • Environment:
    Windows, Rails 1.1.6, Java 1.3-1.5

Description

When I start up a Rails app under JRuby with a large classpath, startup takes over an hour on a 2Ghz P4. Start-up is ten seconds or less with only the large classpath, or only Rails.

Large classpath = 100+ jars and class folders.

The immediate culprit seems to be Java's ClassLoader.getResource(path). Typically, this call takes less than a millisecond, but if 'path' starts with a drive letter, the call takes at least 700 ms, and seems to get slower over time. This seems to happen on many current versions of Java.

Paths that start with 'C:' are slow; paths that start with '/C:' are fast.

Dirty workaround: patch JRuby's LoadService to not on the classpath look for resources start with '[a-z]:'

The JRuby resource loading code appears over-complicated to me.

  • Options
    • Sort By Name
    • Sort By Date
    • Ascending
    • Descending
    • Download All

Attachments

  1. Text File
    loadservice_cleanup_optz.patch
    22/Dec/06 1:04 AM
    10 kB
    Charles Oliver Nutter
  2. Text File
    windows.patch
    27/Dec/06 3:47 PM
    2 kB
    Scott Willson

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Charles Oliver Nutter added a comment - 08/Dec/06 5:50 AM

Updating title to be more generic. We need a general refactoring of the loading heuristic so that commonly-accessed resources on cheap load locations (like filesystem) are fast-pathed before slow load paths like classloader resources or URL/JAR loads.

A couple ideas:

  • search filesystem locations before jar locations for all possible filename + extensions. Currently, it searches all load mechanisms for each filename + extension in turn
  • confirm the proper order of paths to look at for resources (site_ruby, ruby/1.8, etc) and make sure we're not searching more than is necessary
  • require that for classloader/jar resources, a load path entry be created for such resources. This would eliminate classloader/jar searching for most lookups and allow users to explicitly specify that classloader/jar should be searched (rather than implicitly doing it as now)
Show
Charles Oliver Nutter added a comment - 08/Dec/06 5:50 AM Updating title to be more generic. We need a general refactoring of the loading heuristic so that commonly-accessed resources on cheap load locations (like filesystem) are fast-pathed before slow load paths like classloader resources or URL/JAR loads. A couple ideas:
  • search filesystem locations before jar locations for all possible filename + extensions. Currently, it searches all load mechanisms for each filename + extension in turn
  • confirm the proper order of paths to look at for resources (site_ruby, ruby/1.8, etc) and make sure we're not searching more than is necessary
  • require that for classloader/jar resources, a load path entry be created for such resources. This would eliminate classloader/jar searching for most lookups and allow users to explicitly specify that classloader/jar should be searched (rather than implicitly doing it as now)
Hide
Permalink
Charles Oliver Nutter added a comment - 22/Dec/06 12:59 AM
Documentation on LoadService behavior pre-patch and post-patch, in an attempt to simplify and speed loading. The two issues addressed are JRUBY-335 and JRUBY-288. I will wiki this and post to the bugs, but it is posted here for discussion.

The altered version is simpler, more correct, and faster in some cases (rails, specifically). It's a pretty tangled web regardless.

0.9.2 behavior:

Kernel.load calls "findLibrary" with an exact filename only.
Kernel.require calls "findLibrary" with the specified filename appended with four possible extensions in turn:
1. rb
2. ast.ser
3. rb.ast.ser
4. jar
require also tries the filename as-is, which is incorrect behavior.

Default load path includes:
1. current dir (implicit, during searches)
2. the <JRuby lib> dir (as specified by jruby.lib system property)
3. <JRuby home>/lib/ruby/site_ruby/1.8 (note this and subsequent paths are using jruby.home property, not jruby.lib property)
4. <JRuby home>/lib/ruby/site_ruby/1.8/java
5. <JRuby home>/lib/ruby/site_ruby
6. <JRuby home>/lib/ruby/1.8
7. <JRuby home>/lib/ruby/site_ruby/1.8/java
8. <JRuby home>/lib/ruby
9. '.'

findLibrary:
1. checks builtin libraries first for supplied filename
2. calls "findFile" with given filename, returning null if result is null
3. if file (resource) is found:
3.1. if file ends with ".jar", returns a JarredScript with specified resource
3.2. if file ends with ".rb.ast.ser", returns a PreparsedScript with specified resource
3.3. otherwise returns ExternalScript with resource and supplied filename

findFile:
1. if file starts with "jar:" returns LoadServiceResource using supplied filename directly as a URL
2. if file is found in current dir and resulting current-dir + filename is absolute, returns LoadServiceResource with current-dir + filename as URL
3. retrieves a classloader from the JavaSupport object for this runtime (which ends up using the classloader in which that JavaSupport was loaded, usually System classloader)
4. replaces \\ with / in filename
5. if filename starts with /, attempts to load from classloader resources (this is incorrect; classloader resources *can not* start with a '/'.
6. Iterates over all entries in the load path:
6.1. If load path entry starts with "jar:" attempts to load from an associated JarFile, caching JarFile objects as they are encountered and instantiated.
6.2. Attempts to load file from the absolute path of (current dir + load path entry) (seems wrong to me, and creates two JRubyFile objects every time)
6.3. Attempts to load file from load path entry with '\\' substituted with '/' (also seems quite wrong, since most files' load paths will be absolute paths for ruby/1.8 and similar dirs, which would never work in a classloader)
7. attempts to load the filename directly from the classloader (again) regardless of whether it starts with / or not.
8. Failing all of the above, returns null, usually resulting in findLibrary returning nil. For require, this will then advance to the next extension

Problems with 0.9.2 behavior:

- require should never attempt to load the supplied name without appending extensions. With a file 'foo' in the current dir, require 'foo' should still fail, only accepting foo with some extension.
- For each extension, all possible locations are checked first, many involving classloader resources. This means that resources on the filesystem may take longer to lookup since they may come after many classloader hits.
- We have no files with extension ".ast.ser" that aren't ".rb.ast.ser", and I know of no reason why we ever would.
- Classloader resources are checked for all supplied filenames that start with '/', even though this is not valid for classloader resources.
- Classloader resources are (possibly again) checked for each load path, even though most load paths will be absolute paths to ruby/1.8 and related dirs.
- Classloader resources are checked a third time if nothing has been found using the same bare filename. If the file starts with '/' and nothing was found for the first classloader resource check above, this results in a repeat of the same search.
- We do not have a site_ruby/1.8/java dir. I'm not sure we ever will need it.
- We should not search ".", since changing the current dir (Dir.chdir) should cause files in our initial PWD to no longer be implicitly locatable (and "." will always resolve to the JVM's startup PWD)

New behavior (provided by supplied patch):

- Kernel.load behavior is unchanged, other than incidental changes in classloader resource lookup as a result of calling findLibrary and findLibraryWithClassloaders (documented below)
- Kernel.require behavior excludes as-is filenames from search
- the ast.ser extension is eliminated
- the site_ruby/1.8/java load path entry is removed
- the '.' load path entry is removed

The "smartLoad" method used for searching all expected extensions is changed as follows:

- it performs one search of all extensions against only filesystem-based locations
- it performs a second search of all extensions using classloader resource lookups or using direct URL-based resources (i.e. jar: urls)

Both searches:

- eliminate substituting '\\' for '/' in filenames (JRubyFile and Java's own logic should handle this ok, but this should be tested on Windows)

The filesystem-based search:

1. checks for the file in the current directory
2. checks for the file using each load path entry in turn

The classloader and URL-based search:

1. checks for jar: URLs and loads if found
2. retrieves Thread's context classloader (rather than always using the classloader JRuby was loaded in)
3. For each entry in the load path:
3.1. if entry is a jar: URL, uses that to load and cache JarFile objects that are then used to load the file if possible
3.2. if entry does not start with /, attempts to load from classloader resources
4. if filename does not start with /, attempts to load from classloader resources
5. if all else fails, returns null, usually resulting in the smartLoad search proceeding to the next extension

Justification for changes:

The changes:

- simplify the loading process
- eliminate extraneous paths that can not or should not be searched
- eliminate repetitive classloader resource searching
- eliminate searching of classloaders or jar loading until all potential filesystem locations have been exhausted
- appear to speed a simple "jirb" load by a very small amount
- improves the load time of a rails script from 8.7 seconds to 7.7 seconds (time jruby script/console --help)
- improves the load time of the same script with a very large classpath from 9 seconds to 7.8 seconds
Show
Charles Oliver Nutter added a comment - 22/Dec/06 12:59 AM
Documentation on LoadService behavior pre-patch and post-patch, in an attempt to simplify and speed loading. The two issues addressed are JRUBY-335 and JRUBY-288. I will wiki this and post to the bugs, but it is posted here for discussion.

The altered version is simpler, more correct, and faster in some cases (rails, specifically). It's a pretty tangled web regardless.

0.9.2 behavior:

Kernel.load calls "findLibrary" with an exact filename only.
Kernel.require calls "findLibrary" with the specified filename appended with four possible extensions in turn:
1. rb
2. ast.ser
3. rb.ast.ser
4. jar
require also tries the filename as-is, which is incorrect behavior.

Default load path includes:
1. current dir (implicit, during searches)
2. the <JRuby lib> dir (as specified by jruby.lib system property)
3. <JRuby home>/lib/ruby/site_ruby/1.8 (note this and subsequent paths are using jruby.home property, not jruby.lib property)
4. <JRuby home>/lib/ruby/site_ruby/1.8/java
5. <JRuby home>/lib/ruby/site_ruby
6. <JRuby home>/lib/ruby/1.8
7. <JRuby home>/lib/ruby/site_ruby/1.8/java
8. <JRuby home>/lib/ruby
9. '.'

findLibrary:
1. checks builtin libraries first for supplied filename
2. calls "findFile" with given filename, returning null if result is null
3. if file (resource) is found:
3.1. if file ends with ".jar", returns a JarredScript with specified resource
3.2. if file ends with ".rb.ast.ser", returns a PreparsedScript with specified resource
3.3. otherwise returns ExternalScript with resource and supplied filename

findFile:
1. if file starts with "jar:" returns LoadServiceResource using supplied filename directly as a URL
2. if file is found in current dir and resulting current-dir + filename is absolute, returns LoadServiceResource with current-dir + filename as URL
3. retrieves a classloader from the JavaSupport object for this runtime (which ends up using the classloader in which that JavaSupport was loaded, usually System classloader)
4. replaces \\ with / in filename
5. if filename starts with /, attempts to load from classloader resources (this is incorrect; classloader resources *can not* start with a '/'.
6. Iterates over all entries in the load path:
6.1. If load path entry starts with "jar:" attempts to load from an associated JarFile, caching JarFile objects as they are encountered and instantiated.
6.2. Attempts to load file from the absolute path of (current dir + load path entry) (seems wrong to me, and creates two JRubyFile objects every time)
6.3. Attempts to load file from load path entry with '\\' substituted with '/' (also seems quite wrong, since most files' load paths will be absolute paths for ruby/1.8 and similar dirs, which would never work in a classloader)
7. attempts to load the filename directly from the classloader (again) regardless of whether it starts with / or not.
8. Failing all of the above, returns null, usually resulting in findLibrary returning nil. For require, this will then advance to the next extension

Problems with 0.9.2 behavior:

- require should never attempt to load the supplied name without appending extensions. With a file 'foo' in the current dir, require 'foo' should still fail, only accepting foo with some extension.
- For each extension, all possible locations are checked first, many involving classloader resources. This means that resources on the filesystem may take longer to lookup since they may come after many classloader hits.
- We have no files with extension ".ast.ser" that aren't ".rb.ast.ser", and I know of no reason why we ever would.
- Classloader resources are checked for all supplied filenames that start with '/', even though this is not valid for classloader resources.
- Classloader resources are (possibly again) checked for each load path, even though most load paths will be absolute paths to ruby/1.8 and related dirs.
- Classloader resources are checked a third time if nothing has been found using the same bare filename. If the file starts with '/' and nothing was found for the first classloader resource check above, this results in a repeat of the same search.
- We do not have a site_ruby/1.8/java dir. I'm not sure we ever will need it.
- We should not search ".", since changing the current dir (Dir.chdir) should cause files in our initial PWD to no longer be implicitly locatable (and "." will always resolve to the JVM's startup PWD)

New behavior (provided by supplied patch):

- Kernel.load behavior is unchanged, other than incidental changes in classloader resource lookup as a result of calling findLibrary and findLibraryWithClassloaders (documented below)
- Kernel.require behavior excludes as-is filenames from search
- the ast.ser extension is eliminated
- the site_ruby/1.8/java load path entry is removed
- the '.' load path entry is removed

The "smartLoad" method used for searching all expected extensions is changed as follows:

- it performs one search of all extensions against only filesystem-based locations
- it performs a second search of all extensions using classloader resource lookups or using direct URL-based resources (i.e. jar: urls)

Both searches:

- eliminate substituting '\\' for '/' in filenames (JRubyFile and Java's own logic should handle this ok, but this should be tested on Windows)

The filesystem-based search:

1. checks for the file in the current directory
2. checks for the file using each load path entry in turn

The classloader and URL-based search:

1. checks for jar: URLs and loads if found
2. retrieves Thread's context classloader (rather than always using the classloader JRuby was loaded in)
3. For each entry in the load path:
3.1. if entry is a jar: URL, uses that to load and cache JarFile objects that are then used to load the file if possible
3.2. if entry does not start with /, attempts to load from classloader resources
4. if filename does not start with /, attempts to load from classloader resources
5. if all else fails, returns null, usually resulting in the smartLoad search proceeding to the next extension

Justification for changes:

The changes:

- simplify the loading process
- eliminate extraneous paths that can not or should not be searched
- eliminate repetitive classloader resource searching
- eliminate searching of classloaders or jar loading until all potential filesystem locations have been exhausted
- appear to speed a simple "jirb" load by a very small amount
- improves the load time of a rails script from 8.7 seconds to 7.7 seconds (time jruby script/console --help)
- improves the load time of the same script with a very large classpath from 9 seconds to 7.8 seconds
Hide
Permalink
Charles Oliver Nutter added a comment - 22/Dec/06 1:04 AM

Scott, if you could try this patch it would expediate its being included into trunk. On my system with about twenty jars in the classpath it improved load time for a simple rails script (--help) from 9 seconds down to 7.8. The same load without so many jars improves from 8.7 to 7.7. The primary feature of this patch is that it forces filesystem locations to be searched first, so it may help your situation. However, it's possible that with gems installed classloader resources will be searched first anyway since a require must fail for a gem to automatically start up.

A second suggestion for you to try would be to "rake rails:freeze:gems" in your Rails app, which will pull all gem contents into the vendor/rails subdir of the app, potentially allowing those resources to be loaded sooner in the search sequence.

An alternative solution for these issues has also been debated: only turn on classloader resource loading for specific load paths with a particular prefix, so that classloader is not searched most of the time.

Show
Charles Oliver Nutter added a comment - 22/Dec/06 1:04 AM Scott, if you could try this patch it would expediate its being included into trunk. On my system with about twenty jars in the classpath it improved load time for a simple rails script (--help) from 9 seconds down to 7.8. The same load without so many jars improves from 8.7 to 7.7. The primary feature of this patch is that it forces filesystem locations to be searched first, so it may help your situation. However, it's possible that with gems installed classloader resources will be searched first anyway since a require must fail for a gem to automatically start up. A second suggestion for you to try would be to "rake rails:freeze:gems" in your Rails app, which will pull all gem contents into the vendor/rails subdir of the app, potentially allowing those resources to be loaded sooner in the search sequence. An alternative solution for these issues has also been debated: only turn on classloader resource loading for specific load paths with a particular prefix, so that classloader is not searched most of the time.
Hide
Permalink
Charles Oliver Nutter added a comment - 23/Dec/06 7:26 PM

I have committed my changes along with additional fixes to allow require to handle files with extensions properly. I would like more independent testing of this before resolving, so I have committed to trunk for the original bug reporter and others to test it.

Remaining issues:

  • We should use our own file extension for extensions rather than .so
  • We may not need the JarredScript functionality if it's not being used. It would simplify loads to eliminate it (and we love deleting code!)
  • It may be worth providing a mechanism to turn off filesystem loading or classpath loading for cases where we know they'll fail or where we want to avoid their overhead.
Show
Charles Oliver Nutter added a comment - 23/Dec/06 7:26 PM I have committed my changes along with additional fixes to allow require to handle files with extensions properly. I would like more independent testing of this before resolving, so I have committed to trunk for the original bug reporter and others to test it. Remaining issues:
  • We should use our own file extension for extensions rather than .so
  • We may not need the JarredScript functionality if it's not being used. It would simplify loads to eliminate it (and we love deleting code!)
  • It may be worth providing a mechanism to turn off filesystem loading or classpath loading for cases where we know they'll fail or where we want to avoid their overhead.
Hide
Permalink
Scott Willson added a comment - 27/Dec/06 1:08 PM

This improves the start time greatly: from over an hour to 'just' 14 minutes. Of course, 14 minutes is still to long, but my situation (Windows, Rails, many jars) is an edge case. I will put together a patch that fixes my problem. I am afraid it's a bit of a hack, but at least it will illustrate the fix.

Show
Scott Willson added a comment - 27/Dec/06 1:08 PM This improves the start time greatly: from over an hour to 'just' 14 minutes. Of course, 14 minutes is still to long, but my situation (Windows, Rails, many jars) is an edge case. I will put together a patch that fixes my problem. I am afraid it's a bit of a hack, but at least it will illustrate the fix.
Hide
Permalink
Scott Willson added a comment - 27/Dec/06 3:47 PM

This patch drops my startup time from 14 minutes to 27 seconds. The basic point is to not call ClassLoader#getResource for paths that begin with a Windows drive letter (c:, d:, etc...).

It's several lines of code to deal with just one outlier case. I am not sure how it could be less intrusive or significantly faster. Perhaps we could keep a list of load paths that start with a Windows drive letter and those that do not.

Show
Scott Willson added a comment - 27/Dec/06 3:47 PM This patch drops my startup time from 14 minutes to 27 seconds. The basic point is to not call ClassLoader#getResource for paths that begin with a Windows drive letter (c:, d:, etc...). It's several lines of code to deal with just one outlier case. I am not sure how it could be less intrusive or significantly faster. Perhaps we could keep a list of load paths that start with a Windows drive letter and those that do not.
Hide
Permalink
Ola Bini added a comment - 02/Jan/07 4:27 AM

I have commited a partial fix to trunk for the Windows issue. It's pretty close to your solution, and will not try to load through a classloader anything that looks like a windows path. Hopefully that will improve the situation.

Show
Ola Bini added a comment - 02/Jan/07 4:27 AM I have commited a partial fix to trunk for the Windows issue. It's pretty close to your solution, and will not try to load through a classloader anything that looks like a windows path. Hopefully that will improve the situation.
Hide
Permalink
Scott Willson added a comment - 02/Jan/07 11:36 AM

I like the Ola's patch. Startup time is about 18 seconds, so I am happy!

Show
Scott Willson added a comment - 02/Jan/07 11:36 AM I like the Ola's patch. Startup time is about 18 seconds, so I am happy!
Hide
Permalink
Charles Oliver Nutter added a comment - 02/Jan/07 3:53 PM

The combined fixes seem to have resolved the issue, and the additional fix by Scott+Ola brings the startup time right back down to a reasonable number. Great teamwork on this one, from over an hour startup to 18 seconds!

Show
Charles Oliver Nutter added a comment - 02/Jan/07 3:53 PM The combined fixes seem to have resolved the issue, and the additional fix by Scott+Ola brings the startup time right back down to a reasonable number. Great teamwork on this one, from over an hour startup to 18 seconds!

People

  • Assignee:
    Charles Oliver Nutter
    Reporter:
    Scott Willson
Vote (1)
Watch (3)

Dates

  • Created:
    07/Dec/06 5:53 PM
    Updated:
    30/Apr/07 3:12 AM
    Resolved:
    02/Jan/07 3:53 PM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.