Issue Details (XML | Word | Printable)

Key: JRUBY-4037
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Charles Oliver Nutter
Reporter: Kornelius Kalnbach
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
JRuby

Ruby 1.8 compatibility: /#{/\w/}/uo does not work as expected

Created: 03/Oct/09 07:56 PM   Updated: 16/Oct/09 03:39 AM   Resolved: 15/Oct/09 12:28 PM
Return to search
Component/s: Core Classes/Modules
Affects Version/s: JRuby 1.3.1
Fix Version/s: JRuby 1.4

Time Tracking:
Not Specified

Environment: Mac OS X


 Description  « Hide

JRuby gives me different results than Ruby 1.8.6 (and 1.8.7) here:

~/ruby/coderay ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
~/ruby/coderay jruby -e 'p "ä"[/#{/\w/}/uo]'
nil
~/ruby/coderay jruby -v
jruby 1.3.1 (ruby 1.8.6p287) (2009-09-08 6586) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_15) [x86_64-java]

The problem seems to be that /o somehow treats the embedded regexp differently.



Charles Oliver Nutter made changes - 14/Oct/09 10:37 AM
Field Original Value New Value
Description JRuby gives me different results than Ruby 1.8.6 (and 1.8.7) here:

~/ruby/coderay ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
~/ruby/coderay jruby -e 'p "ä"[/#{/\w/}/uo]'
nil
~/ruby/coderay jruby -v
jruby 1.3.1 (ruby 1.8.6p287) (2009-09-08 6586) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_15) [x86_64-java]

The problem seems to be that /o somehow treats the embedded regexp differently.
JRuby gives me different results than Ruby 1.8.6 (and 1.8.7) here:

{noformat}
~/ruby/coderay ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
~/ruby/coderay jruby -e 'p "ä"[/#{/\w/}/uo]'
nil
~/ruby/coderay jruby -v
jruby 1.3.1 (ruby 1.8.6p287) (2009-09-08 6586) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_15) [x86_64-java]
{noformat}

The problem seems to be that /o somehow treats the embedded regexp differently.
Charles Oliver Nutter added a comment - 14/Oct/09 10:38 AM

Confirmed in both compiler and interpreter:

~/projects/jruby ➔ jruby -e 'p "ä"[/#{/\w/}/uo]'
nil

~/projects/jruby ➔ jruby -X-C -e 'p "ä"[/#{/\w/}/uo]'
nil

Charles Oliver Nutter added a comment - 14/Oct/09 10:40 AM

My rough guess would be that we're not honoring /u properly, since international characters on the command line should be getting in ok now. Start with RubyString.op_aref, along the RubyRegexp path, and see what happens.


James Abley added a comment - 14/Oct/09 03:14 PM - edited
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.8.6
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.8.7
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.9.1
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"ä"

I've tried debugging. I'm not sure that the text is coming through fine. The ByteList getting passed to the RubyRegexp.search() method has an Encoding of ASCIIEncoding?


James Abley added a comment - 14/Oct/09 03:57 PM - edited

thanks to lopex for instruction.

$ jruby -e 'p "ä"[/#{/\w/}/u]'
"\303\244"

And trying 1.9 for completeness

$ jruby --1.9 -e 'p "ä"[/#{/\w/}/u]'
-e:1: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string) (Encoding::CompatibilityError)

So, at the very least, the 'once' modifier isn't being handled properly.


Charles Oliver Nutter added a comment - 14/Oct/09 11:51 PM

I've committed a fix to master in ea77572. The problem was that when the "once" flag was set in the options, it screwed up the masking that determined the correct encoding to use for the regexp. Masking out the "once" bit solved it, and I did a little perf cleanup in the bytecode as well.

Anyone want to write up a quick spec for this? Comparing a "once" + "UTF-8" dynamic regexp's inspect output with a "UTF-8"-only dynamic regexp ought to do it.


Charles Oliver Nutter made changes - 15/Oct/09 12:01 PM
Fix Version/s JRuby 1.5 [ 15792 ]
Charles Oliver Nutter added a comment - 15/Oct/09 12:05 PM

Merged to 1.4 branch as well, since it's a trivial fix.


Charles Oliver Nutter made changes - 15/Oct/09 12:05 PM
Fix Version/s JRuby 1.5 [ 15792 ]
Fix Version/s JRuby 1.4 [ 15285 ]
Charles Oliver Nutter added a comment - 15/Oct/09 12:28 PM

Specs added and revision marker updated in b6a22bd.


Charles Oliver Nutter made changes - 15/Oct/09 12:28 PM
Resolution Fixed [ 1 ]
Assignee Charles Oliver Nutter [ headius ]
Status Open [ 1 ] Resolved [ 5 ]
James Abley added a comment - 16/Oct/09 01:59 AM - edited

I've added specs as well; http://github.com/jabley/rubyspec/commit/0adaa2a3b4968cd7d0bafd9df1c5f948ac7ea79f

Did I misunderstand?

UPDATE: never mind, your specs are a lot better!