Issue Details (XML | Word | Printable)

Key: JRUBY-3299
Type: Bug Bug
Status: Open Open
Priority: Major Major
Assignee: Nick Sieger
Reporter: Nick Sieger
Votes: 4
Watchers: 8
Operations

If you were logged in you would be able to see more operations.
JRuby

Unable to embed Hpricot gem in a jar file

Created: 10/Jan/09 01:43 PM   Updated: 11/Jun/09 02:44 AM
Component/s: Embedding
Affects Version/s: JRuby 1.1.6
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments: 1. Text File JRUBY-3299-tests.patch (2 kB)
2. Text File JRUBY_3299_tempfile_solution.patch (5 kB)
3. File testgem-0.0.1.gem (3 kB)



 Description  « Hide
Hpricot and other gems that include embedded Java code in jar files don't seem to work with the gems-in-jars feature.
$ java -jar jruby-complete-1.1.6.jar -S gem install -i ./hpricot hpricot  --no-rdoc --no-ri
JRuby limited openssl loaded. gem install jruby-openssl for full support.
http://wiki.jruby.org/wiki/JRuby_Builtin_OpenSSL
Successfully installed hpricot-0.6.164-java
1 gem installed
[13:32:24][/tmp/hjar]
$ l
total 18224
drwxr-xr-x  6 nicksieger  wheel      204 Jan 10 13:32 hpricot/
-rw-r--r--  1 nicksieger  wheel  9328431 Jan 10 13:31 jruby-complete-1.1.6.jar
[13:32:45][/tmp/hjar]
$ jar cf hpricot.jar -C hpricot  .
[13:33:06][/tmp/hjar]
$ l
total 19336
drwxr-xr-x  6 nicksieger  wheel      204 Jan 10 13:32 hpricot/
-rw-r--r--  1 nicksieger  wheel   568961 Jan 10 13:33 hpricot.jar
-rw-r--r--  1 nicksieger  wheel  9328431 Jan 10 13:31 jruby-complete-1.1.6.jar
[13:33:07][/tmp/hjar]
$ jar tf hpri
hpricot/     hpricot.jar  
[13:33:07][/tmp/hjar]
$ jar tf hpricot.jar | more
META-INF/
META-INF/MANIFEST.MF
cache/
cache/hpricot-0.6.164-java.gem
doc/
gems/
gems/hpricot-0.6.164-java/
gems/hpricot-0.6.164-java/.require_paths
gems/hpricot-0.6.164-java/CHANGELOG
gems/hpricot-0.6.164-java/COPYING
...
specifications/
specifications/hpricot-0.6.164-java.gemspec
[13:33:16][/tmp/hjar]
$ l
total 19336
drwxr-xr-x  6 nicksieger  wheel      204 Jan 10 13:32 hpricot/
-rw-r--r--  1 nicksieger  wheel   568961 Jan 10 13:33 hpricot.jar
-rw-r--r--  1 nicksieger  wheel  9328431 Jan 10 13:31 jruby-complete-1.1.6.jar
[13:33:16][/tmp/hjar]
$ java -jar jruby-complete-1.1.6.jar -rhpricot.jar -S irb
irb(main):001:0> require 'hpricot'
=> false
irb(main):002:0> h = Hpricot.parse(File.read("index.html"))
NameError: uninitialized constant Hpricot
	from file:/private/tmp/hjar/jruby-complete-1.1.6.jar!/irb/ruby-token.rb:102:in `const_missing'
	from (irb):3:in `irb_binding'
Maybe IRB bug!!


 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Stephen Bannasch added a comment - 17/Jan/09 06:34 PM

I wanted to document some of my work in this area.

RedCloth (4.1.1) is a gem quite similar to hpricot. It was originally written by Why and when building the gem it uses ragel to produce wither Java or C. Jason Garber has taken over development:

When building redcloth under JRuby the Java classes are archived into lib/redcloth_scan.jar which is included with the JRuby version of the gem.

I chose RedCloth to work on first because it uses a somewhat simpler (and to me hackable) Rakefile for building the gem based on echoe.

I'm experimenting to see if I can get a gem that normally uses a jar to use the .class files instead. If this works I could package the gem with .class files instead of the jar and then more easily include it in with jruby-complete.jar (avoiding the jar within a jar problem).

I modified the Rake task so the classes are copied to lib/ also:

[redcloth.git (nojar)]$ ls -l lib/
total 3360
-rw-r--r--  1 stephen  staff   24976 Jan 12 14:32 RedclothAttributes.class
-rw-r--r--  1 stephen  staff  560002 Jan 12 14:32 RedclothInline.class
-rw-r--r--  1 stephen  staff    9548 Jan 12 14:32 RedclothScanService$Base.class
-rw-r--r--  1 stephen  staff  597763 Jan 12 14:32 RedclothScanService$Transformer.class
-rw-r--r--  1 stephen  staff    5423 Jan 12 14:32 RedclothScanService.class
drwxr-xr-x  3 stephen  staff     102 Jul 27 21:23 case_sensitive_require
drwxr-xr-x  7 stephen  staff     238 Nov 20 17:59 redcloth
-rw-r--r--@ 1 stephen  staff    1522 Jan 12 18:39 redcloth.rb
-rw-r--r--  1 stephen  staff  506272 Jan 12 14:32 redcloth_scan.jar

RedclothScanService implements BasicLibraryService – it's adding methods to RedCloth::TextileDoc
You can see the RedclothScanService.java code ragel generated here: http://gist.github.com/46177

When JRuby requires a jar and there is a class in that jar that implements BasicLibraryService then it's basicLoad method get's called. It looks like it's called from: LoadService.smartLoad which calls the private method: tryLoadingLibraryOrScript.

If the classes are made available in lib/ then this statement:

require 'redcloth_scan'

can be replaced by these and the gem operates and passes all it tests:

require 'jruby'
$CLASSPATH <<  File.dirname(File.expand_path(__FILE__)) + '/'
Java::RedclothScanService.new.basicLoad(JRuby.runtime)

So at least for the redcloth gem which includes redcloth_scan.jar in it's lib/ dir – if you instead just put the class files in lib/ and replace: require 'redcloth_scan' with the three lines above it should work fine when embedded into a jar with the rest of JRuby.

It's not obvious to me yet how this could be adapted into a generalized solution which doesn't require changes to the gems.


Matt Burke added a comment - 19/Jan/09 11:50 AM
I started looking into this issue, too.

One thing I noticed in the example that Nick provided is that the "require 'hpricot'", near the end of the session, returned false. I assume this is because JRuby is saying that it already loaded 'hpricot' when it processed the -rhpricot option. So one caveat of using the gem-in-a-jar feature is that the jar shouldn't be named the same as anything in the jar files. e.g. renaming hpricot.jar to hpricot-0.6.164.jar changes the IRB session to this:

>>> ./bin/jruby -rhpricot-0.6.164.jar -S jirb
irb(main):001:0> require 'hpricot'
LoadError: no such file to load -- hpricot
        from (irb):2:in `require'
        from (irb):2
irb(main):002:0>

My thoughts on resolving this issue start with JRubyClassLoader looking for anything that's added with a url protocol of "jar". From there, it seems like there are a couple of approaches:

  • Preload everything in the jar when it's required / added to the class path.
  • Add the jar-in-a-jar to a list of search paths, and override findClass(String) to search the jars-in-jars for the required class.
  • Extract the jar-in-a-jar to a temp dir. Permissions and cleanup are obvious issues with this.

Charles Oliver Nutter added a comment - 20/Jan/09 03:37 AM
We do already have our own classloader for loading things, so it seems like making that smart about jars-in-jars would be the easiest way to go.

Charles Oliver Nutter added a comment - 25/Jan/09 09:52 PM
Ok, I've spent a few minutes playing with this. I think what we're looking for here is a way to have nested jar URLs. Currently, the logic in LoadService works fine loading stuff out of one level of jar, but it does not add that jar directly to the load path, so only classpath searches can search it...which do not support loading additional jar files. If I modify it to also add the jar URL to LoadPath, as follows:
diff --git a/src/org/jruby/runtime/load/JarredScript.java b/src/org/jruby/runtime/load/JarredScript.java
index 3299b67..9383688 100644
--- a/src/org/jruby/runtime/load/JarredScript.java
+++ b/src/org/jruby/runtime/load/JarredScript.java
@@ -63,6 +63,7 @@ public class JarredScript implements Library {
 
         // Make Java class files in the jar reachable from Ruby
         runtime.getJRubyClassLoader().addURL(jarFile);
+        runtime.getLoadService().getLoadPathArray().append(runtime.newString(jarFile.toString()));
 
         try {
             JarInputStream in = new JarInputStream(new BufferedInputStream(jarFile.openStream()));

...I can get it to load two levels of jars, which might be enough for Hpricot (since the inner jar does get added to the classloader). But I couldn't get it to require additional files out of the innermost jar.

To test whether this is good enough for Hpricot, I tried creating a simple Java class and putting it in a jar-in-a-jar:

$ cat TestNested.java
public class TestNested {
  public String hello() { return "hello"; }
}
$ javac TestNested.java
$ jar cvf inner.jar TestNested.class
added manifest
adding: TestNested.class(in = 269) (out= 199)(deflated 26%)
$ jar cvf outer.jar inner.jar
added manifest
adding: inner.jar(in = 657) (out= 437)(deflated 33%)
$ jruby -rjava -e "require 'outer.jar'; require 'inner.jar'; puts Java::TestNested.new.hello"
hello

Huzzah! This appears to work reasonably well. So we can at least get a jar full of classes to work from within another jar.

However I'd like to make this a bit more general-purpose, so it can support any number of nested jars and load path and require will still work correctly for anything within the nested-most jar file. So...help?

  • We need specs for various levels of nesting, so we know what we're working toward
  • We need to make the jar-file loading and searching logic cleaner than it is currently and general-purpose enough that it can nest deeply without breaking.

So, team, anyone up for the challenge?


Charles Oliver Nutter added a comment - 25/Jan/09 09:54 PM
Er, drat...I realized I didn't delete the files after jarring them, so it still was finding them on the . classpath:
$ rm inner.jar
$ rm TestNested.class
$ jruby -rjava -e "require 'outer.jar'; require 'inner.jar'; puts Java::TestNested.new.hello"
Java::TestNestedNewHello

So I think we're close but not quite there.


Matt Burke added a comment - 27/Jan/09 12:43 PM
I had started playing with this, too, and had gotten this far before asking my earlier question.
Index: src/org/jruby/util/JRubyClassLoader.java
===================================================================
--- src/org/jruby/util/JRubyClassLoader.java    (revision 8786)
+++ src/org/jruby/util/JRubyClassLoader.java    (working copy)
@@ -3,21 +3,47 @@
 import java.net.URL;
 import java.net.URLClassLoader;
 import java.security.ProtectionDomain;
+import java.io.InputStream;
 
 public class JRubyClassLoader extends URLClassLoader {
     private final static ProtectionDomain DEFAULT_DOMAIN
             = JRubyClassLoader.class.getProtectionDomain();
 
+    private List embeddedJars;
+
     public JRubyClassLoader(ClassLoader parent) {
         super(new URL[0], parent);
+        embeddedJars = new List();
     }
 
     // Change visibility so others can see it
     @Override
     public void addURL(URL url) {
-        super.addURL(url);
+        if(url.getProtocol().equals("jar")) {
+            // TODO: something like http://one-jar.cvs.sourceforge.net/viewvc/one-jar/one-jar/src/com/simontuffs/onejar/JarClassLoader.java?revision=1.35&view=markup
+            // 1A. Add protocol to start of url to indicate another classloader.
+            // 2A. When classes are pulled from that classloader, crack open the Jar file and read them ?
+            // 1B. Add jar file to a list to scan when asked for a class.
+            // 1C. Define all classes RIGHT NOW.
+            System.out.println(" loading embedded jar file " + url + " manually.");
+//            InputStream jarStream = url.openStream();
+            synchronized (embeddedJars) {
+                embeddedJars.add(url);
+            }
+        } else {
+            super.addURL(url);
+        }
     }
 
+    @Override
+    public Class findClass(java.lang.String className)
+      throws java.lang.ClassNotFoundException
+    {
+        // See if the class is in any of the embedded jars, and load it from there.
+        System.out.println("findClass(" + className + ")");
+        return super.findClass(className);
+    }
+
     public Class<?> defineClass(String name, byte[] bytes) {
         return super.defineClass(name, bytes, 0, bytes.length, DEFAULT_DOMAIN);
      }

It's really rough, and obviously spews tons of crap to stdout, but I think it just needs a little more work to finish. My first attempt was going to be an implementation of findClass that searches the embeddedJars for the requested class. I'll do this as I find free time.


Matt Burke added a comment - 28/Jan/09 02:22 PM
Here are some tests, and supporting java and ruby files, that should pass when this issue is resolved.

Matt Burke added a comment - 11/Feb/09 09:42 AM
This patch passes the tests, so it works as a proof-of-concept. It dumps the embedded jar files to disk (and never cleans them up), though, so it's not really production-quality.

László Bácsi added a comment - 16/Mar/09 04:01 AM - edited
I tried some ideas last week trying to solve this issue in another way. What I did was to look for jar files in the gems before packing them up in a jar and unpack those jars in the directory that would become the jar's root directory. That worked fine until I bumped into another issue.

It seems that gems-in-a-jar can't access files packaged with that gem. For example if I get the gem's version from a file named VERSION in the root directory of my gem i would do something like this:

File.read(File.join(File.dirname(__FILE__), "..", "VERSION")).strip

This will fail with the following exception:

file:/Users/LacKac/Working/Lab/tmp/testgem/test/gems.jar!/gems/testgem-0.0.1/lib/testgem.rb:6:in `version': No such file or directory - File not found - file:/Users/LacKac/Working/Lab/tmp/testgem/test/gems.jar!/gems/testgem-0.0.1/lib/../VERSION (Errno::ENOENT)
	from test.rb:6

I attached the testgem gem and I used this for testing:

require 'rubygems'
gem 'testgem'
require 'testgem'

p TestGem.test
p TestGem.version

László Bácsi added a comment - 16/Mar/09 04:06 AM
sorry for the formatting mess, I should've looked up the formatting help instead of trying to remember the syntax

Stuart Sierra added a comment - 17/Mar/09 11:30 AM
Could this be fixed in the Hpricot gem by not placing the Java .class files in JARs?

If I create a new JAR containing both the .rb and .class files from Hpricot, it works.

<target name="hpricot-jar" description="Create JAR for Hpricot">
  <jar destfile="${lib.dir}/hpricot-for-jruby.jar" compress="true" index="true">
    <fileset dir="${lib.dir}/ruby/gems/1.8/gems/hpricot-0.6.164-java/lib">
      <include name="**/*"/>
      <exclude name="**/*.jar"/>
    </fileset>
    <zipfileset src="${lib.dir}/ruby/gems/1.8/gems/hpricot-0.6.164-java/lib/universal-java1.6/hpricot_scan.jar"/>
    <zipfileset src="${lib.dir}/ruby/gems/1.8/gems/hpricot-0.6.164-java/lib/universal-java1.6/fast_xs.jar"/>
  </jar>
</target>

Nick Sieger added a comment - 17/Mar/09 12:27 PM
Stuart, you could do that, and you're certainly welcome to do so, however we're hoping to find a solution that doesn't involve modifying existing gems to make it easier for anyone to package any arbitrary gem that includes java code.

Gerald Boersma added a comment - 11/Jun/09 02:44 AM
Stuart, copying the jar files in ${lib.dir}/ruby/gems/1.8/gems/hpricot-0.6.164-java/lib/universal-java1.6 into the ${lib.dir} worked for me. This avoids the approach of having to create a special jar file. I have not tested it, but perhaps symlinks would work as well.