Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Blocker
-
Resolution: Fixed
-
Affects Version/s: JRuby 0.9.8, JRuby 0.9.9, JRuby 1.0.0RC1, JRuby 1.0.0RC2
-
Fix Version/s: JRuby 1.0.0RC3
-
Component/s: Java Integration
-
Labels:None
Description
After adding support for interfaces-as-modules, we found that a const_missing definition in that code was breaking rails. The purpose of it was to allow access to packages like JavaLang::Runnable to get at the Runnable interface, for example. We would like to re-expose this behavior in one of three ways:
- The original way
- With a module per package element, as in Java::Lang::Runnable (this may not work because we can't distinguish classes and modules)
- Using the lower-case package naming with double-colons, as in java::lang::Runnable (this has the added advantage that we can use it in class reopening, like class java::lang::Runnable)
Need a solution for 1.0. My vote goes for the third option.
-
- package_module_syntax_2.patch
- 17/May/07 8:17 AM
- 25 kB
- Bill Dortch
-
- package_module_syntax_3.patch
- 17/May/07 4:42 PM
- 33 kB
- Bill Dortch
-
- package_module_syntax.patch
- 17/May/07 4:00 AM
- 14 kB
- Bill Dortch
Issue Links
- is related to
-
JRUBY-1007
Not printing \XXX characters properly
-
-
JRUBY-1008
Rails pages that include \000 characters are including headers and chopping off the end of the page
-
Activity
I've been experimenting with approach #3, but hit a snag that is probably a stopper. I'll go into it below, but first comments on the other two approaches:
1. The original way. Even though top-level references are currently disabled due to a (possibly unresolvable) conflict with Rails use of Object#const_missing, it is still usable when qualified with Java::
class Java::JavaLang::String def raw_bytes b = Java::byte[length].new # yeah, it's deprecated, sue me get_bytes(0,length-1,b,0) b end end
I like this approach because:
- It's regular: Java::CamelCasePackage::ClassName.
- It's readable, and not nearly as painful to look at as my earlier JAVA::LANG::String approach.
- It can be used to open classes without having to pre-assign a constant, directly or via import/include_class.
- It won't break any tools that look at class/module names and expect a certain format.
- It won't break any tools (including a certain unpublished one, ahem) that start at Module.constants to discover classes/modules defined in the system (in preference to plowing through ObjectSpace).
- It doesn't require any special hackery to get module/class names to display.
- It makes for reasonably efficient references; the number of hops (method calls) required to get to a class does not increase with the number of segments in the package name.
- It works.
The downsides:
- It will be jarring to Java programmers new to (J)Ruby. (To which I say 'welcome to Ruby', and 'get over it')
- The Java:: prefix is a pain; I had hoped to allow its omission
- It won't work with non-standard (mixed- or upper-case) package names. (The other approaches have a similar problem.) I pondered this for a while a few weeks ago, and concluded: Do you really feel compelled to go out of your way to help that person? Sheesh, get a clue.
2. The Capitalized::Mod::Per::Package::Segment approach. This is where I actually started a while back, but quickly (if somewhat reluctantly) concluded it was unusable.
The merits:
- The biggest positive is that it appears at first blush to bridge the gap nicely between Java and Ruby; package names are segmented as in Java, but adhere to Ruby rules. Java programmers won't be altogether nonplussed.
- It shares many of the benefits of the current approach: it is readable (though less so as the number of package segments increases), can be used when opening classes, won't break any tools, and doesn't require any special hacks to implement.
The downsides:
- The big one, terminal as far as I'm concerned: There is no (good) way to disambiguate package and class names that are identical when rendered in Upper::First case. Consider the following very common pattern - I've seen variations on this virtually every place I've worked since there was a Java:
import java.awt.Color; // #=> Java::Awt::Color, a Class import java.awt.color.*; // #=> Java::Awt::Color, a Module import java.awt.color.ColorSpace; // #=> Java::Awt::Color::ColorSpace import java.awt.Event; import java.awt.event.*;
One suggestion was that we could just append constants for classes to classes that shared names with package nodes; for example, we would append class ColorSpace to class Java::Awt::Color. Aside from the fact that that's just sort of distasteful, unseemly really (and icky), consider the following:
include Java CS = Java::Awt::Color::ColorSpace
Ok, so does Java::Awt::Color now refer to a class, or a module? If you said class, then think about how we got there. We had to check whether class java.awt.Color existed before we decided. We also needed to check whether there was a class named java.Awt. And one in the default package named Java. (And maybe there wasn't a java.Awt today when we did our normal build, but what about tomorrow when we run ant test, or drop a new package in jruby/lib? Did we code anything that assumed that was a module? Hmm...).
- Not to mercilessly flog a moribund horse, but: the component packages/module nodes don't come free. There's a lookup operation to resolve each successive node name. Not terribly costly if used here and there, but you wouldn't want to use this syntax in code that was called frequently. As with the package.dot.Name syntax, the price goes up as the number of nodes increases. (I'm not sure if the interpreter/compiler optimizes around this at all.)
- Lastly: there's no compelling logical or functional reason to create a separate module for each node. Apart from aesthetic considerations (which are subjective in any case), there is no real benefit this approach would confer, were it workable, over the first approach.
3. The lower::case::colon::Approach. I think it would be useful to be able to do this; I use the package.dot.Name syntax all the time during ad-hoc console sessions, though never in real code due to performance issues. This would be about the same, with the added benefit that you could use it to open classes.
If that actually worked.
My experiments so far suggest that it will not work. I'll get to the details below, but first the pros and cons as I see them:
Pluses:
- Java-like and Java-looking syntax. Not quite as close as the package.dot.Name syntax, but nice if you can open classes with it. I wouldn't want to replace the dot syntax, though. (In fact, I really don't see this approach standing on its own; I'd see it as an adjunct to approach #1; reasons below.)
- Readable; though as with the Cap::Mod::Node approach, gets less readable as the number of nodes increases.
- Theory In Peril of Imminent Refutation: it can be used to open classes.
Downsides:
- If used exclusively, no real package/module names created in the Ruby space.
- Would require a hack to get it to appear in class/module name output (inspect, etc.) (I posted such a hack a while back to display the dot syntax package/class names.)
- Might break tools that examine the output of inspect, etc., if used for class/module name display.
- Suffers from the same efficiency problems as the dot syntax; you wouldn't want to use it in frequently-called code.
In spite of the downsides, I felt this would be a useful syntax to support, in addition to #1. So I set about trying to make it work. Below is some preliminary code to replace the current Package class. It solves one problem, which is that:
1. The class statement wants each node in the name to be a module. The current Package implementation returns an instance of Package.
module Pkg
class << self
keep = /^(__|class|inspect|object_id|to_s|equal\?|respond_to\?|dup|instance_|module_eval|const_|private|public)/
instance_methods.each do |m|
undef_method m unless m =~ keep
end
def package!(sym,parent)
pkg = self.dup
# real version will look at parent name
if (n = @name)
pkg.instance_variable_set(:@name,"#{n}.#{sym}")
else
pkg.instance_variable_set(:@name,"#{sym}")
end
parent.module_eval <<-EOM
def #{sym}; pkg; end
EOM
pkg
end
def method_missing(sym,*r)
if (s = sym.to_s[0,1]) == s.downcase
package! sym, self
else
JavaUtilities.get_proxy_class "#{@name}.#{sym}"
end
end
private :method_missing
def const_missing(sym)
p = JavaUtilities.get_proxy_class "#{@name}.#{sym}"
const_set(sym,p)
p
end
private :const_missing
end
end
So, let's try it out:
> S = Pkg::java::lang::String => Java::JavaLang::String > s = Pkg::java::lang::String.new 'xxx' => #<Java::JavaLang::String:0x161f39e @java_object=xxx> > s.instance_of?Pkg::java::lang::String => true > Java::JavaLang::String == Pkg::java::lang::String => true > java.lang.String == Pkg::java::lang::String => true
So far, so good. Let's open up a class:
> class Pkg::java::lang::String
> def boo
> 'boo!'
> end
> end
=> nil
No complaints, let's try it out:
> s.boo NoMethodError: undefined method `boo' for #<Java::JavaLang::String:0x161f39e @java_object=xxx> from (irb):1:in `method_missing' from (irb):1:in `binding'
Hmm...
> class Pkg::java::lang::String > puts self > end #<Module:01x178b64b>::String => nil > class Java::JavaLang::String > puts self > end Java::JavaLang::String => nil > class Java::JavaLang::String > def boo > 'boo!' > end > end => nil > s.boo => "boo!"
So, from this we've learned:
2. The class statement isn't fooled by our charade.
I don't know yet if there is a problem with my code, or if this just isn't possible. Play around with it if you have a few minutes, there might be an easy fix. It would be nice to be able to support this syntax. But it doesn't look promising right now.
-Bill
class Java::JavaLang::String def raw_bytes b = Java::byte[length].new # yeah, it's deprecated, sue me get_bytes(0,length-1,b,0) b end end
- It's regular: Java::CamelCasePackage::ClassName.
- It's readable, and not nearly as painful to look at as my earlier JAVA::LANG::String approach.
- It can be used to open classes without having to pre-assign a constant, directly or via import/include_class.
- It won't break any tools that look at class/module names and expect a certain format.
- It won't break any tools (including a certain unpublished one, ahem) that start at Module.constants to discover classes/modules defined in the system (in preference to plowing through ObjectSpace).
- It doesn't require any special hackery to get module/class names to display.
- It makes for reasonably efficient references; the number of hops (method calls) required to get to a class does not increase with the number of segments in the package name.
- It works.
- It will be jarring to Java programmers new to (J)Ruby. (To which I say 'welcome to Ruby', and 'get over it')
- The Java:: prefix is a pain; I had hoped to allow its omission
- It won't work with non-standard (mixed- or upper-case) package names. (The other approaches have a similar problem.) I pondered this for a while a few weeks ago, and concluded: Do you really feel compelled to go out of your way to help that person? Sheesh, get a clue.
- The biggest positive is that it appears at first blush to bridge the gap nicely between Java and Ruby; package names are segmented as in Java, but adhere to Ruby rules. Java programmers won't be altogether nonplussed.
- It shares many of the benefits of the current approach: it is readable (though less so as the number of package segments increases), can be used when opening classes, won't break any tools, and doesn't require any special hacks to implement.
- The big one, terminal as far as I'm concerned: There is no (good) way to disambiguate package and class names that are identical when rendered in Upper::First case. Consider the following very common pattern - I've seen variations on this virtually every place I've worked since there was a Java:
One suggestion was that we could just append constants for classes to classes that shared names with package nodes; for example, we would append class ColorSpace to class Java::Awt::Color. Aside from the fact that that's just sort of distasteful, unseemly really (and icky), consider the following:
import java.awt.Color; // #=> Java::Awt::Color, a Class import java.awt.color.*; // #=> Java::Awt::Color, a Module import java.awt.color.ColorSpace; // #=> Java::Awt::Color::ColorSpace import java.awt.Event; import java.awt.event.*;
Ok, so does Java::Awt::Color now refer to a class, or a module? If you said class, then think about how we got there. We had to check whether class java.awt.Color existed before we decided. We also needed to check whether there was a class named java.Awt. And one in the default package named Java. (And maybe there wasn't a java.Awt today when we did our normal build, but what about tomorrow when we run ant test, or drop a new package in jruby/lib? Did we code anything that assumed that was a module? Hmm...).include Java CS = Java::Awt::Color::ColorSpace
- Not to mercilessly flog a moribund horse, but: the component packages/module nodes don't come free. There's a lookup operation to resolve each successive node name. Not terribly costly if used here and there, but you wouldn't want to use this syntax in code that was called frequently. As with the package.dot.Name syntax, the price goes up as the number of nodes increases. (I'm not sure if the interpreter/compiler optimizes around this at all.)
- Lastly: there's no compelling logical or functional reason to create a separate module for each node. Apart from aesthetic considerations (which are subjective in any case), there is no real benefit this approach would confer, were it workable, over the first approach.
- Java-like and Java-looking syntax. Not quite as close as the package.dot.Name syntax, but nice if you can open classes with it. I wouldn't want to replace the dot syntax, though. (In fact, I really don't see this approach standing on its own; I'd see it as an adjunct to approach #1; reasons below.)
- Readable; though as with the Cap::Mod::Node approach, gets less readable as the number of nodes increases.
- Theory In Peril of Imminent Refutation: it can be used to open classes.
- If used exclusively, no real package/module names created in the Ruby space.
- Would require a hack to get it to appear in class/module name output (inspect, etc.) (I posted such a hack a while back to display the dot syntax package/class names.)
- Might break tools that examine the output of inspect, etc., if used for class/module name display.
- Suffers from the same efficiency problems as the dot syntax; you wouldn't want to use it in frequently-called code.
module Pkg
class << self
keep = /^(__|class|inspect|object_id|to_s|equal\?|respond_to\?|dup|instance_|module_eval|const_|private|public)/
instance_methods.each do |m|
undef_method m unless m =~ keep
end
def package!(sym,parent)
pkg = self.dup
# real version will look at parent name
if (n = @name)
pkg.instance_variable_set(:@name,"#{n}.#{sym}")
else
pkg.instance_variable_set(:@name,"#{sym}")
end
parent.module_eval <<-EOM
def #{sym}; pkg; end
EOM
pkg
end
def method_missing(sym,*r)
if (s = sym.to_s[0,1]) == s.downcase
package! sym, self
else
JavaUtilities.get_proxy_class "#{@name}.#{sym}"
end
end
private :method_missing
def const_missing(sym)
p = JavaUtilities.get_proxy_class "#{@name}.#{sym}"
const_set(sym,p)
p
end
private :const_missing
end
end
> S = Pkg::java::lang::String => Java::JavaLang::String > s = Pkg::java::lang::String.new 'xxx' => #<Java::JavaLang::String:0x161f39e @java_object=xxx> > s.instance_of?Pkg::java::lang::String => true > Java::JavaLang::String == Pkg::java::lang::String => true > java.lang.String == Pkg::java::lang::String => true
> class Pkg::java::lang::String
> def boo
> 'boo!'
> end
> end
=> nil
> s.boo NoMethodError: undefined method `boo' for #<Java::JavaLang::String:0x161f39e @java_object=xxx> from (irb):1:in `method_missing' from (irb):1:in `binding'
> class Pkg::java::lang::String > puts self > end #<Module:01x178b64b>::String => nil > class Java::JavaLang::String > puts self > end Java::JavaLang::String => nil > class Java::JavaLang::String > def boo > 'boo!' > end > end => nil > s.boo => "boo!"
I think the issue is that we return an anonymous module for package that's not the actual module where we want to find the class. See this IRB session, playing around with modules, classes, and the lower-cased colon syntax:
~/NetBeansProjects/jruby $ jirb --simple-prompt >> module Foo >> def self.bar; Bar; end >> module Bar >> def self.hello; puts 'hello'; end >> end >> end => nil >> def foo; Foo; end => nil >> foo::bar::hello hello => nil >> class foo::bar::Baz; def boo; puts 'boo'; end; end => nil >> foo::bar::Baz.new.boo boo => nil >> Foo::Bar::Baz.new.boo boo => nil >> class foo::bar::Baz; puts self; end Foo::Bar::Baz => nil
I'm sure this is valid, but we need the "package methods" to return the same thing we'd get using normal colon scoping. That seems to be the missing piece.
~/NetBeansProjects/jruby $ jirb --simple-prompt >> module Foo >> def self.bar; Bar; end >> module Bar >> def self.hello; puts 'hello'; end >> end >> end => nil >> def foo; Foo; end => nil >> foo::bar::hello hello => nil >> class foo::bar::Baz; def boo; puts 'boo'; end; end => nil >> foo::bar::Baz.new.boo boo => nil >> Foo::Bar::Baz.new.boo boo => nil >> class foo::bar::Baz; puts self; end Foo::Bar::Baz => nil
Ah, I think I may know how to make this work. Will experiment and get back shortly...
Hit some little issues in testing, will resolve this evening. Looks like I missed the RC2 cutoff anyway
...
The attached patch implements the lower::case::colon::package::Syntax. The dotted.package.Syntax and Java::CamelCasePackage::Syntax have been retained (the latter being part of the solution to implementing the new syntax). Tests included.
The attached patch (package_module_syntax_2.patch) supersedes the previous patch. It includes the functionality of that patch, plus:
1. Changed undef logic to use HashSet instead of Pattern/Matcher to select method names. Should be faster.
2. Miscellaneous cleanup and some minor performance tweaks.
Ughhh. Serious problem.
Neither the new lower::colon::Syntax nor the existing Java::CamelCase::Syntax works for opening classes unless the class constants have already been set on the package module by some other means (just referencing a class will do it). (In fact, trying to open a class that way before it is referenced otherwise results in the wrong class – a new class – being bound to the constant.) I don't know why this didn't turn up before; I was sure I'd opened classes that way immediately upon starting a JIRB session, but maybe I hadn't.
What happens is that const_missing is never called when the class is referenced in a class statement. I've been trying to track down how it actually works, but trying to do System.out.println from inside certain promising methods (RubyModule#getConstantInner or #setConstant, for example) results in this error during the build:
generate-method-classes:
[touch] Creating C:\jruby\jruby-1.0.0\build\__empty.rb
[java] Exception in thread "main" java.lang.NullPointerException
[java] at org.jruby.RubyClass.callMethod(RubyClass.java:149)
[java] at org.jruby.RubyObject.callMethod(RubyObject.java:473)
[java] at org.jruby.RubyObject.toString(RubyObject.java:230)
[java] at java.lang.String.valueOf(String.java:2827)
[java] at java.lang.StringBuffer.append(StringBuffer.java:219)
[java] at org.jruby.RubyModule.setConstant(RubyModule.java:409)
[java] at org.jruby.Ruby.initCoreClasses(Ruby.java:700)
[java] at org.jruby.Ruby.init(Ruby.java:611)
[java] at org.jruby.Ruby.newInstance(Ruby.java:246)
[java] at org.jruby.Main.runInterpreter(Main.java:169)
[java] at org.jruby.Main.run(Main.java:120)
[java] at org.jruby.Main.main(Main.java:95)
[java] Java Result: 1
[delete] Deleting: C:\jruby\jruby-1.0.0\build\__empty.rb
Anyway, this appears to be correct (or at least conformant) behavior; MRI doesn't call const_missing either from a class statement. The package.dot.Name syntax continues to work (even after the wrong class has been bound to a constant) because it relies on method_missing.
I still think this can be made to work, if in a slightly hacky way, by hooking the constant lookup from the Java side, in somewhat the way LoadService does it. This could be done pretty cleanly by defining a ConstantRequestListener interface and registering listeners with RubyModule for package modules, so there wouldn't be any direct references in core code to Java support code.
I'll trace around and see if I can figure out where/how the constants get read and set. If anyone has any ideas about how I can work around the problem above, let me know (though I can always just log to a file).
BTW, try hooking RubyModule#getConstantAt sometime (which does work for some reason, but is apparently not called by the class statement) and check out the thousands of calls during JRuby/JIRB startup, mostly for the same classes/modules over and over. Definitely an area where some efficiency can be gained.
Any ides, suggestions, moral support appreciated. Ugghh.
-Bill
generate-method-classes:
[touch] Creating C:\jruby\jruby-1.0.0\build\__empty.rb
[java] Exception in thread "main" java.lang.NullPointerException
[java] at org.jruby.RubyClass.callMethod(RubyClass.java:149)
[java] at org.jruby.RubyObject.callMethod(RubyObject.java:473)
[java] at org.jruby.RubyObject.toString(RubyObject.java:230)
[java] at java.lang.String.valueOf(String.java:2827)
[java] at java.lang.StringBuffer.append(StringBuffer.java:219)
[java] at org.jruby.RubyModule.setConstant(RubyModule.java:409)
[java] at org.jruby.Ruby.initCoreClasses(Ruby.java:700)
[java] at org.jruby.Ruby.init(Ruby.java:611)
[java] at org.jruby.Ruby.newInstance(Ruby.java:246)
[java] at org.jruby.Main.runInterpreter(Main.java:169)
[java] at org.jruby.Main.run(Main.java:120)
[java] at org.jruby.Main.main(Main.java:95)
[java] Java Result: 1
[delete] Deleting: C:\jruby\jruby-1.0.0\build\__empty.rb
Ok, it comes down to method RubyModule#defineOrGetClassUnder. I can pretty easily make this work, in a somewhat hacky way (it won't be really bad, though; pretty clean as hacks go).
I vote we do it, and look at less-hacky solutions after 1.0 (when I think much of Java support may change).
-Bill
Ok, made it work. The attached patch (package_module_syntax_3.patch) supersedes the previous patches. It includes the functionality of those patches, and adds:
1. Um, actually working. As noted earlier, the previous implementations did not work if a class was opened before it was otherwise referenced. That applied to the CamelCase syntax as well as the new lower::colon syntax (hmm, sounds like an intestinal issue).
Getting this to work required what could be considered a bit of a hack, though as with my earlier change to RubyClass to suppress calls to #inherited when installing classes into the Rubified Java class hierarchy, I think it is A Reasonable Thing To Do in the service of mapping an existing Java class hierarchy into Ruby space. (And it pales in comparison to some notions I have about the future of Java support, in which, among other things, a very different JavaClass is derived from RubyClass; but that's a discussion for another day.)
The fix entails defining an interface called ClassProvider, and adding some methods to RubyModule to add and search ClassProviders. I probably went a little overboard with the implementation, as it provides for multiple ClassProviders, when right now there will be, at most, one. Still, it isn't much code, and very straightforward. It doesn't add any significant processing overhead to RubyModule (in fact, I actually fixed a small inefficiency while I was there).
I did define a field in RubyModule to hold a List of ClassProviders; I'm not sure whether that might be an issue, or not. If it is, there are some other (less efficient) ways to link package modules with ClassProviders (a ClassProvider lookup/Map in Ruby.java, for example). We'd want to avoid direct references from RubyModule to Java support code.
Oh, and ClassProvider is currently defined in org.jruby.util; there might be a better spot for it. Or not.
Anyhow, it seems to work now, passes all tests and so on. (There are some new tests included in the patch, but there should probably be a few more.)
As far as I know, I still don't have commit rights, so someone else will have to apply this. (I submitted my request a while back, I think. The Codehaus/Xircles site isn't big on those sissy amenities like, you know, visual feedback, or navigability, or, heaven forbid, user-friendliness. I actually gave up on getting on the user/dev mailing lists at one point last fall, though I think that might have actually been broken at the time. Hard to tell.)
-Bill
Well, great work Bill...we'll have a look at this and get it in. I would like others to have a look as well. It certainly fixes the issue with opening a class with the longhand syntax, and it gives reasonable access to classes through a package-module mechanism.
Also see the discussion from this comment down: http://jira.codehaus.org/browse/JRUBY-903#action_94746