Issue Details (XML | Word | Printable)

Key: JRUBY-3299
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Nick Sieger
Reporter: Nick Sieger
Votes: 7
Watchers: 12
Operations

If you were logged in you would be able to see more operations.
JRuby

Unable to embed Hpricot gem in a jar file

Created: 10/Jan/09 01:43 PM   Updated: Today 03:28 AM   Resolved: Yesterday 09:58 PM
Return to search
Component/s: Embedding
Affects Version/s: JRuby 1.1.6
Fix Version/s: JRuby 1.5

Time Tracking:
Not Specified

File Attachments: 1. Java Source File CompoundJarURLStreamHandler.java (4 kB)
2. Java Source File CompundJarClassLoader.java (9 kB)
3. Text File hack_JRubyClassLoader.patch (14 kB)
4. Text File JRUBY-3299-tests.patch (2 kB)
5. Text File JRUBY_3299_tempfile_solution.patch (5 kB)
6. File testgem-0.0.1.gem (3 kB)



 Description  « Hide

Hpricot and other gems that include embedded Java code in jar files don't seem to work with the gems-in-jars feature.

$ java -jar jruby-complete-1.1.6.jar -S gem install -i ./hpricot hpricot  --no-rdoc --no-ri
JRuby limited openssl loaded. gem install jruby-openssl for full support.
http://wiki.jruby.org/wiki/JRuby_Builtin_OpenSSL
Successfully installed hpricot-0.6.164-java
1 gem installed
[13:32:24][/tmp/hjar]
$ l
total 18224
drwxr-xr-x  6 nicksieger  wheel      204 Jan 10 13:32 hpricot/
-rw-r--r--  1 nicksieger  wheel  9328431 Jan 10 13:31 jruby-complete-1.1.6.jar
[13:32:45][/tmp/hjar]
$ jar cf hpricot.jar -C hpricot  .
[13:33:06][/tmp/hjar]
$ l
total 19336
drwxr-xr-x  6 nicksieger  wheel      204 Jan 10 13:32 hpricot/
-rw-r--r--  1 nicksieger  wheel   568961 Jan 10 13:33 hpricot.jar
-rw-r--r--  1 nicksieger  wheel  9328431 Jan 10 13:31 jruby-complete-1.1.6.jar
[13:33:07][/tmp/hjar]
$ jar tf hpri
hpricot/     hpricot.jar  
[13:33:07][/tmp/hjar]
$ jar tf hpricot.jar | more
META-INF/
META-INF/MANIFEST.MF
cache/
cache/hpricot-0.6.164-java.gem
doc/
gems/
gems/hpricot-0.6.164-java/
gems/hpricot-0.6.164-java/.require_paths
gems/hpricot-0.6.164-java/CHANGELOG
gems/hpricot-0.6.164-java/COPYING
...
specifications/
specifications/hpricot-0.6.164-java.gemspec
[13:33:16][/tmp/hjar]
$ l
total 19336
drwxr-xr-x  6 nicksieger  wheel      204 Jan 10 13:32 hpricot/
-rw-r--r--  1 nicksieger  wheel   568961 Jan 10 13:33 hpricot.jar
-rw-r--r--  1 nicksieger  wheel  9328431 Jan 10 13:31 jruby-complete-1.1.6.jar
[13:33:16][/tmp/hjar]
$ java -jar jruby-complete-1.1.6.jar -rhpricot.jar -S irb
irb(main):001:0> require 'hpricot'
=> false
irb(main):002:0> h = Hpricot.parse(File.read("index.html"))
NameError: uninitialized constant Hpricot
	from file:/private/tmp/hjar/jruby-complete-1.1.6.jar!/irb/ruby-token.rb:102:in `const_missing'
	from (irb):3:in `irb_binding'
Maybe IRB bug!!


Stephen Bannasch added a comment - 17/Jan/09 06:34 PM

I wanted to document some of my work in this area.

RedCloth (4.1.1) is a gem quite similar to hpricot. It was originally written by Why and when building the gem it uses ragel to produce wither Java or C. Jason Garber has taken over development:

When building redcloth under JRuby the Java classes are archived into lib/redcloth_scan.jar which is included with the JRuby version of the gem.

I chose RedCloth to work on first because it uses a somewhat simpler (and to me hackable) Rakefile for building the gem based on echoe.

I'm experimenting to see if I can get a gem that normally uses a jar to use the .class files instead. If this works I could package the gem with .class files instead of the jar and then more easily include it in with jruby-complete.jar (avoiding the jar within a jar problem).

I modified the Rake task so the classes are copied to lib/ also:

[redcloth.git (nojar)]$ ls -l lib/
total 3360
-rw-r--r--  1 stephen  staff   24976 Jan 12 14:32 RedclothAttributes.class
-rw-r--r--  1 stephen  staff  560002 Jan 12 14:32 RedclothInline.class
-rw-r--r--  1 stephen  staff    9548 Jan 12 14:32 RedclothScanService$Base.class
-rw-r--r--  1 stephen  staff  597763 Jan 12 14:32 RedclothScanService$Transformer.class
-rw-r--r--  1 stephen  staff    5423 Jan 12 14:32 RedclothScanService.class
drwxr-xr-x  3 stephen  staff     102 Jul 27 21:23 case_sensitive_require
drwxr-xr-x  7 stephen  staff     238 Nov 20 17:59 redcloth
-rw-r--r--@ 1 stephen  staff    1522 Jan 12 18:39 redcloth.rb
-rw-r--r--  1 stephen  staff  506272 Jan 12 14:32 redcloth_scan.jar

RedclothScanService implements BasicLibraryService – it's adding methods to RedCloth::TextileDoc
You can see the RedclothScanService.java code ragel generated here: http://gist.github.com/46177

When JRuby requires a jar and there is a class in that jar that implements BasicLibraryService then it's basicLoad method get's called. It looks like it's called from: LoadService.smartLoad which calls the private method: tryLoadingLibraryOrScript.

If the classes are made available in lib/ then this statement:

require 'redcloth_scan'

can be replaced by these and the gem operates and passes all it tests:

require 'jruby'
$CLASSPATH <<  File.dirname(File.expand_path(__FILE__)) + '/'
Java::RedclothScanService.new.basicLoad(JRuby.runtime)

So at least for the redcloth gem which includes redcloth_scan.jar in it's lib/ dir – if you instead just put the class files in lib/ and replace: require 'redcloth_scan' with the three lines above it should work fine when embedded into a jar with the rest of JRuby.

It's not obvious to me yet how this could be adapted into a generalized solution which doesn't require changes to the gems.


Matt Burke added a comment - 19/Jan/09 11:50 AM

I started looking into this issue, too.

One thing I noticed in the example that Nick provided is that the "require 'hpricot'", near the end of the session, returned false. I assume this is because JRuby is saying that it already loaded 'hpricot' when it processed the -rhpricot option. So one caveat of using the gem-in-a-jar feature is that the jar shouldn't be named the same as anything in the jar files. e.g. renaming hpricot.jar to hpricot-0.6.164.jar changes the IRB session to this:

>>> ./bin/jruby -rhpricot-0.6.164.jar -S jirb
irb(main):001:0> require 'hpricot'
LoadError: no such file to load -- hpricot
        from (irb):2:in `require'
        from (irb):2
irb(main):002:0>

My thoughts on resolving this issue start with JRubyClassLoader looking for anything that's added with a url protocol of "jar". From there, it seems like there are a couple of approaches:

  • Preload everything in the jar when it's required / added to the class path.
  • Add the jar-in-a-jar to a list of search paths, and override findClass(String) to search the jars-in-jars for the required class.
  • Extract the jar-in-a-jar to a temp dir. Permissions and cleanup are obvious issues with this.

Charles Oliver Nutter added a comment - 20/Jan/09 03:37 AM

We do already have our own classloader for loading things, so it seems like making that smart about jars-in-jars would be the easiest way to go.


Charles Oliver Nutter added a comment - 25/Jan/09 09:52 PM

Ok, I've spent a few minutes playing with this. I think what we're looking for here is a way to have nested jar URLs. Currently, the logic in LoadService works fine loading stuff out of one level of jar, but it does not add that jar directly to the load path, so only classpath searches can search it...which do not support loading additional jar files. If I modify it to also add the jar URL to LoadPath, as follows:

diff --git a/src/org/jruby/runtime/load/JarredScript.java b/src/org/jruby/runtime/load/JarredScript.java
index 3299b67..9383688 100644
--- a/src/org/jruby/runtime/load/JarredScript.java
+++ b/src/org/jruby/runtime/load/JarredScript.java
@@ -63,6 +63,7 @@ public class JarredScript implements Library {
 
         // Make Java class files in the jar reachable from Ruby
         runtime.getJRubyClassLoader().addURL(jarFile);
+        runtime.getLoadService().getLoadPathArray().append(runtime.newString(jarFile.toString()));
 
         try {
             JarInputStream in = new JarInputStream(new BufferedInputStream(jarFile.openStream()));

...I can get it to load two levels of jars, which might be enough for Hpricot (since the inner jar does get added to the classloader). But I couldn't get it to require additional files out of the innermost jar.

To test whether this is good enough for Hpricot, I tried creating a simple Java class and putting it in a jar-in-a-jar:

$ cat TestNested.java
public class TestNested {
  public String hello() { return "hello"; }
}
$ javac TestNested.java
$ jar cvf inner.jar TestNested.class
added manifest
adding: TestNested.class(in = 269) (out= 199)(deflated 26%)
$ jar cvf outer.jar inner.jar
added manifest
adding: inner.jar(in = 657) (out= 437)(deflated 33%)
$ jruby -rjava -e "require 'outer.jar'; require 'inner.jar'; puts Java::TestNested.new.hello"
hello

Huzzah! This appears to work reasonably well. So we can at least get a jar full of classes to work from within another jar.

However I'd like to make this a bit more general-purpose, so it can support any number of nested jars and load path and require will still work correctly for anything within the nested-most jar file. So...help?

  • We need specs for various levels of nesting, so we know what we're working toward
  • We need to make the jar-file loading and searching logic cleaner than it is currently and general-purpose enough that it can nest deeply without breaking.

So, team, anyone up for the challenge?


Charles Oliver Nutter added a comment - 25/Jan/09 09:54 PM

Er, drat...I realized I didn't delete the files after jarring them, so it still was finding them on the . classpath:

$ rm inner.jar
$ rm TestNested.class
$ jruby -rjava -e "require 'outer.jar'; require 'inner.jar'; puts Java::TestNested.new.hello"
Java::TestNestedNewHello

So I think we're close but not quite there.


Matt Burke added a comment - 27/Jan/09 12:43 PM

I had started playing with this, too, and had gotten this far before asking my earlier question.

Index: src/org/jruby/util/JRubyClassLoader.java
===================================================================
--- src/org/jruby/util/JRubyClassLoader.java    (revision 8786)
+++ src/org/jruby/util/JRubyClassLoader.java    (working copy)
@@ -3,21 +3,47 @@
 import java.net.URL;
 import java.net.URLClassLoader;
 import java.security.ProtectionDomain;
+import java.io.InputStream;
 
 public class JRubyClassLoader extends URLClassLoader {
     private final static ProtectionDomain DEFAULT_DOMAIN
             = JRubyClassLoader.class.getProtectionDomain();
 
+    private List embeddedJars;
+
     public JRubyClassLoader(ClassLoader parent) {
         super(new URL[0], parent);
+        embeddedJars = new List();
     }
 
     // Change visibility so others can see it
     @Override
     public void addURL(URL url) {
-        super.addURL(url);
+        if(url.getProtocol().equals("jar")) {
+            // TODO: something like http://one-jar.cvs.sourceforge.net/viewvc/one-jar/one-jar/src/com/simontuffs/onejar/JarClassLoader.java?revision=1.35&view=markup
+            // 1A. Add protocol to start of url to indicate another classloader.
+            // 2A. When classes are pulled from that classloader, crack open the Jar file and read them ?
+            // 1B. Add jar file to a list to scan when asked for a class.
+            // 1C. Define all classes RIGHT NOW.
+            System.out.println(" loading embedded jar file " + url + " manually.");
+//            InputStream jarStream = url.openStream();
+            synchronized (embeddedJars) {
+                embeddedJars.add(url);
+            }
+        } else {
+            super.addURL(url);
+        }
     }
 
+    @Override
+    public Class findClass(java.lang.String className)
+      throws java.lang.ClassNotFoundException
+    {
+        // See if the class is in any of the embedded jars, and load it from there.
+        System.out.println("findClass(" + className + ")");
+        return super.findClass(className);
+    }
+
     public Class<?> defineClass(String name, byte[] bytes) {
         return super.defineClass(name, bytes, 0, bytes.length, DEFAULT_DOMAIN);
      }

It's really rough, and obviously spews tons of crap to stdout, but I think it just needs a little more work to finish. My first attempt was going to be an implementation of findClass that searches the embeddedJars for the requested class. I'll do this as I find free time.


Matt Burke added a comment - 28/Jan/09 02:22 PM

Here are some tests, and supporting java and ruby files, that should pass when this issue is resolved.


Matt Burke added a comment - 11/Feb/09 09:42 AM

This patch passes the tests, so it works as a proof-of-concept. It dumps the embedded jar files to disk (and never cleans them up), though, so it's not really production-quality.


László Bácsi added a comment - 16/Mar/09 04:01 AM - edited

I tried some ideas last week trying to solve this issue in another way. What I did was to look for jar files in the gems before packing them up in a jar and unpack those jars in the directory that would become the jar's root directory. That worked fine until I bumped into another issue.

It seems that gems-in-a-jar can't access files packaged with that gem. For example if I get the gem's version from a file named VERSION in the root directory of my gem i would do something like this:

File.read(File.join(File.dirname(__FILE__), "..", "VERSION")).strip

This will fail with the following exception:

file:/Users/LacKac/Working/Lab/tmp/testgem/test/gems.jar!/gems/testgem-0.0.1/lib/testgem.rb:6:in `version': No such file or directory - File not found - file:/Users/LacKac/Working/Lab/tmp/testgem/test/gems.jar!/gems/testgem-0.0.1/lib/../VERSION (Errno::ENOENT)
	from test.rb:6

I attached the testgem gem and I used this for testing:

require 'rubygems'
gem 'testgem'
require 'testgem'

p TestGem.test
p TestGem.version

László Bácsi added a comment - 16/Mar/09 04:06 AM

sorry for the formatting mess, I should've looked up the formatting help instead of trying to remember the syntax


Stuart Sierra added a comment - 17/Mar/09 11:30 AM

Could this be fixed in the Hpricot gem by not placing the Java .class files in JARs?

If I create a new JAR containing both the .rb and .class files from Hpricot, it works.

<target name="hpricot-jar" description="Create JAR for Hpricot">
  <jar destfile="${lib.dir}/hpricot-for-jruby.jar" compress="true" index="true">
    <fileset dir="${lib.dir}/ruby/gems/1.8/gems/hpricot-0.6.164-java/lib">
      <include name="**/*"/>
      <exclude name="**/*.jar"/>
    </fileset>
    <zipfileset src="${lib.dir}/ruby/gems/1.8/gems/hpricot-0.6.164-java/lib/universal-java1.6/hpricot_scan.jar"/>
    <zipfileset src="${lib.dir}/ruby/gems/1.8/gems/hpricot-0.6.164-java/lib/universal-java1.6/fast_xs.jar"/>
  </jar>
</target>

Nick Sieger added a comment - 17/Mar/09 12:27 PM

Stuart, you could do that, and you're certainly welcome to do so, however we're hoping to find a solution that doesn't involve modifying existing gems to make it easier for anyone to package any arbitrary gem that includes java code.


Gerald Boersma added a comment - 11/Jun/09 02:44 AM

Stuart, copying the jar files in ${lib.dir}/ruby/gems/1.8/gems/hpricot-0.6.164-java/lib/universal-java1.6 into the ${lib.dir} worked for me. This avoids the approach of having to create a special jar file. I have not tested it, but perhaps symlinks would work as well.


Jens-Christian Fischer added a comment - 13/Nov/09 03:18 AM

That seems to be related to the problems we are having with DB2 driver jars not found under since 1.4RC2. There was a change in the JRuby loader. Before it would add all jars in the jruby/lib directory to the classpath (Which makes the workaround by @Gerald work), but this has been deprecated.

Under Windows, the JARs aren't picked up if they are on the CLASPATH


Stas Garifulin added a comment - 06/Feb/10 09:14 AM

URL ClassLoader implementation for compound jars


Stas Garifulin added a comment - 06/Feb/10 09:22 AM

Hi JRuby team,

I attached generic classloader implementation which is able to load classes/resources
from embedded jars with arbitrary nesting level.

1. No temp files, all processing is performed in memory.
2. IO Stream API is used, so the code does not consume memory.
3. Custom URLStreamHandler is provided to support findResource(String) and findResources(String) methods.

Couldn't access jruby git source repository to provide ready to use diff. Altohugh integration should be simple.


Matt Fletcher added a comment - 06/Feb/10 03:37 PM

So I went ahead and jammed Stas's code in as quickly as I could to see if this will work. Charles' example is now working. For giggles, I also put a "goodbye.rb" file into inner.jar that prints out goodbye. Sure enough, requiring goodbye results in the expected message.

Attached is hack_JRubyClassLoader.patch. This is definitely not a real patch but it's something to play with. I'm going to try using this in the place I needed it and see what happens.

Thanks Stas.


Nick Sieger added a comment - 08/Feb/10 09:58 PM

Applied in 0226925, with some tests added. Looks great, thanks very much!


Vladimir Sizikov added a comment - 09/Feb/10 03:28 AM

Looks like this change caused CI failure on JDK5:

http://ci.jruby.org/job/jruby-test-java5/232/console

[apt] /builds/jobs/jruby-test-java5/workspace/src/org/jruby/util/JRubyClassLoader.java:144:
      method does not override a method from its superclass
[apt]                 @Override
[apt]                  ^
[apt] /builds/jobs/jruby-test-java5/workspace/src/org/jruby/util/JRubyClassLoader.java:153:
      method does not override a method from its superclass
[apt]                 @Override
[apt]                  ^
[apt] Note: Some input files use unchecked or unsafe operations.
[apt] Note: Recompile with -Xlint:unchecked for details.
[apt] 2 errors