Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: JRuby 0.9.9, JRuby 1.0.0RC1, JRuby 1.0.0RC2
-
Fix Version/s: JRuby 1.0.0RC3
-
Component/s: Core Classes/Modules
-
Labels:None
-
Number of attachments :
Description
Regular expressions in UTF-8 (created like /.../u) don't work properly.
A test rb file is attached because I'm not sure what this form would do to the text, which is in UTF-8.
MRI works as expected.
Attachments
Issue Links
| This issue is related to: | ||||
| JRUBY-820 | Net::HTTP.get behaves differently form MRI, failing to get UTF8 properly |
|
|
|
| JRUBY-1007 | Not printing \XXX characters properly |
|
|
|
| JRUBY-1008 | Rails pages that include \000 characters are including headers and chopping off the end of the page |
|
|
|
Here's a first attempt at a patch...unfortunately it exposes an interesting problem.
The first line prints out "2", which is actually correct...if you're working with characters instead of bytes. Unfortunately, we want the byte position here, not the character position. Obviously passing the raw bytes to jregex isn't an option...it doesn't accept bytes, and it doesn't appear to like us directly casting the bytes into individual characters.
The second line print out "1", but I don't have a good explanation for that one.
So this will require a bit of thought.