|
Charles Oliver Nutter made changes - 14/Oct/09 10:37 AM
My rough guess would be that we're not honoring /u properly, since international characters on the command line should be getting in ok now. Start with RubyString.op_aref, along the RubyRegexp path, and see what happens. jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.8.6
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.8.7
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"\303\244"
jabley@miq-jabley:~/work/eclipse/git/jruby$ rvm use 1.9.1
jabley@miq-jabley:~/work/eclipse/git/jruby$ ruby -e 'p "ä"[/#{/\w/}/uo]'
"ä"
I've tried debugging. I'm not sure that the text is coming through fine. The ByteList getting passed to the RubyRegexp.search() method has an Encoding of ASCIIEncoding? thanks to lopex for instruction. $ jruby -e 'p "ä"[/#{/\w/}/u]'
"\303\244"
And trying 1.9 for completeness $ jruby --1.9 -e 'p "ä"[/#{/\w/}/u]'
-e:1: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string) (Encoding::CompatibilityError)
So, at the very least, the 'once' modifier isn't being handled properly. I've committed a fix to master in ea77572. The problem was that when the "once" flag was set in the options, it screwed up the masking that determined the correct encoding to use for the regexp. Masking out the "once" bit solved it, and I did a little perf cleanup in the bytecode as well. Anyone want to write up a quick spec for this? Comparing a "once" + "UTF-8" dynamic regexp's inspect output with a "UTF-8"-only dynamic regexp ought to do it.
Charles Oliver Nutter made changes - 15/Oct/09 12:01 PM
Merged to 1.4 branch as well, since it's a trivial fix.
Charles Oliver Nutter made changes - 15/Oct/09 12:05 PM
Specs added and revision marker updated in b6a22bd.
Charles Oliver Nutter made changes - 15/Oct/09 12:28 PM
I've added specs as well; http://github.com/jabley/rubyspec/commit/0adaa2a3b4968cd7d0bafd9df1c5f948ac7ea79f Did I misunderstand? UPDATE: never mind, your specs are a lot better! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Confirmed in both compiler and interpreter:
~/projects/jruby ➔ jruby -e 'p "ä"[/#{/\w/}/uo]' nil ~/projects/jruby ➔ jruby -X-C -e 'p "ä"[/#{/\w/}/uo]' nil