JRuby

Ruby 1.8 compatibility: /#{/\w/}/uo does not work as expected

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: JRuby 1.3.1
  • Fix Version/s: JRuby 1.4
  • Component/s: Core Classes/Modules
  • Labels:
    None
  • Environment:
    Mac OS X
  • Number of attachments :
    0

Description

JRuby gives me different results than Ruby 1.8.6 (and 1.8.7) here:

~/ruby/coderay ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
~/ruby/coderay jruby -e 'p "ä"[/#{/\w/}/uo]'
nil
~/ruby/coderay jruby -v
jruby 1.3.1 (ruby 1.8.6p287) (2009-09-08 6586) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_15) [x86_64-java]

The problem seems to be that /o somehow treats the embedded regexp differently.

Activity

Hide
Charles Oliver Nutter added a comment -

Confirmed in both compiler and interpreter:

~/projects/jruby ➔ jruby -e 'p "ä"[/#{/\w/}/uo]'
nil

~/projects/jruby ➔ jruby -X-C -e 'p "ä"[/#{/\w/}/uo]'
nil
Show
Charles Oliver Nutter added a comment - Confirmed in both compiler and interpreter:
~/projects/jruby ➔ jruby -e 'p "ä"[/#{/\w/}/uo]'
nil

~/projects/jruby ➔ jruby -X-C -e 'p "ä"[/#{/\w/}/uo]'
nil
Hide
Charles Oliver Nutter added a comment -

My rough guess would be that we're not honoring /u properly, since international characters on the command line should be getting in ok now. Start with RubyString.op_aref, along the RubyRegexp path, and see what happens.

Show
Charles Oliver Nutter added a comment - My rough guess would be that we're not honoring /u properly, since international characters on the command line should be getting in ok now. Start with RubyString.op_aref, along the RubyRegexp path, and see what happens.
Hide
James Abley added a comment - - edited
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.8.6
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.8.7
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.9.1
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"ä"

I've tried debugging. I'm not sure that the text is coming through fine. The ByteList getting passed to the RubyRegexp.search() method has an Encoding of ASCIIEncoding?

Show
James Abley added a comment - - edited
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.8.6
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.8.7
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.9.1
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"ä"
I've tried debugging. I'm not sure that the text is coming through fine. The ByteList getting passed to the RubyRegexp.search() method has an Encoding of ASCIIEncoding?
Hide
James Abley added a comment - - edited

thanks to lopex for instruction.

$ jruby -e 'p "ä"[/#{/\w/}/u]'
"\303\244"

And trying 1.9 for completeness

$ jruby --1.9 -e 'p "ä"[/#{/\w/}/u]'
-e:1: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string) (Encoding::CompatibilityError)

So, at the very least, the 'once' modifier isn't being handled properly.

Show
James Abley added a comment - - edited thanks to lopex for instruction.
$ jruby -e 'p "ä"[/#{/\w/}/u]'
"\303\244"
And trying 1.9 for completeness
$ jruby --1.9 -e 'p "ä"[/#{/\w/}/u]'
-e:1: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string) (Encoding::CompatibilityError)
So, at the very least, the 'once' modifier isn't being handled properly.
Hide
Charles Oliver Nutter added a comment -

I've committed a fix to master in ea77572. The problem was that when the "once" flag was set in the options, it screwed up the masking that determined the correct encoding to use for the regexp. Masking out the "once" bit solved it, and I did a little perf cleanup in the bytecode as well.

Anyone want to write up a quick spec for this? Comparing a "once" + "UTF-8" dynamic regexp's inspect output with a "UTF-8"-only dynamic regexp ought to do it.

Show
Charles Oliver Nutter added a comment - I've committed a fix to master in ea77572. The problem was that when the "once" flag was set in the options, it screwed up the masking that determined the correct encoding to use for the regexp. Masking out the "once" bit solved it, and I did a little perf cleanup in the bytecode as well. Anyone want to write up a quick spec for this? Comparing a "once" + "UTF-8" dynamic regexp's inspect output with a "UTF-8"-only dynamic regexp ought to do it.
Hide
Charles Oliver Nutter added a comment -

Merged to 1.4 branch as well, since it's a trivial fix.

Show
Charles Oliver Nutter added a comment - Merged to 1.4 branch as well, since it's a trivial fix.
Hide
Charles Oliver Nutter added a comment -

Specs added and revision marker updated in b6a22bd.

Show
Charles Oliver Nutter added a comment - Specs added and revision marker updated in b6a22bd.
Hide
James Abley added a comment - - edited

I've added specs as well; http://github.com/jabley/rubyspec/commit/0adaa2a3b4968cd7d0bafd9df1c5f948ac7ea79f

Did I misunderstand?

UPDATE: never mind, your specs are a lot better!

Show
James Abley added a comment - - edited I've added specs as well; http://github.com/jabley/rubyspec/commit/0adaa2a3b4968cd7d0bafd9df1c5f948ac7ea79f Did I misunderstand? UPDATE: never mind, your specs are a lot better!

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: