Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Major
-
Resolution: Incomplete
-
Affects Version/s: JRuby 1.6.6
-
Fix Version/s: JRuby 1.7.0.pre1
-
Component/s: None
-
Labels:None
-
Environment:HideMac OS X Lion.
java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04-415-11M3635)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-415, mixed mode)
JRuby 1.6.5 / 1.6.6ShowMac OS X Lion. java -version java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04-415-11M3635) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-415, mixed mode) JRuby 1.6.5 / 1.6.6
-
Testcase included:yes
-
Number of attachments :
Description
While running the tests in the ruby library 'mustache' (link: https://github.com/defunkt/mustache), one test in particular is failing:
https://github.com/defunkt/mustache/blob/master/test/mustache_test.rb#L510-522
JRuby dies calling StringScanner#scan_until here:
https://github.com/defunkt/mustache/blob/master/lib/mustache/parser.rb#L231
You can reproduce the issue with the following:
require 'strscan'
regex = /(^[ \t]*)?\{\{/
text = "<h1>中文 {{test}}</h1>\n\n{{> utf8_partial}}\n"
text.force_encoding 'BINARY'
scanner = StringScanner.new(text)
scanner.scan_until(regex) # Fans spin up, and this method never returns.
This seems to happen regardless of whether or not JRuby is in 1.8 or 1.9 mode. I am running this test like so:
JRUBY_OPTS=--1.9 ruby -I"lib:test" test/mustache_test.rb -n test_utf8 -v
I've also run it with: JRUBY_OPTS="--1.9 LC_ALL=en_US.UTF-8"
It appears that this affects UTF-8 characters. If I replace the chinese characters with "foo bar", then there is no problem.
The bug appears to live in JRuby's Joni library:
https://github.com/jruby/joni/blob/master/src/org/joni/Matcher.java#L460-464
Basically, the call `enc.length(bytes, s, end)` returns -1 and 1, with each loop iteration. As an example, I added some logging to the method:
Config.log.println("entering loop...");
{ Config.log.println("start: s='" + s + "', prev='" + prev + "'"); if (matchCheck(origRange, s, prev)) return match(s); prev = s; s += enc.length(bytes, s, end); Config.log.println("end: enc.length='" + enc.length(bytes, s, end) + "', s='" + s + "', prev='" + prev + "'"); }do
while (s < range);
The output is basically:
entering loop...
start: s='0', prev='0'
end: enc.length='1', s='1', prev='0'
start: s='1', prev='0'
end: enc.length='1', s='2', prev='1'
start: s='2', prev='1'
end: enc.length='1', s='3', prev='2'
start: s='3', prev='2'
end: enc.length='-1', s='4', prev='3'
start: s='4', prev='3'
end: enc.length='1', s='3', prev='4'
start: s='3', prev='4'
end: enc.length='-1', s='4', prev='3'
start: s='4', prev='3'
end: enc.length='1', s='3', prev='4'
start: s='3', prev='4'
end: enc.length='-1', s='4', prev='3'
start: s='4', prev='3'
end: enc.length='1', s='3', prev='4'
start: s='3', prev='4'
end: enc.length='-1', s='4', prev='3'