Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: JRuby 1.7.0
-
Component/s: None
-
Labels:None
-
Number of attachments :
Description
This is taken as a comment after we closed https://github.com/jruby/jruby/issues/314.
irb(main):022:0> s = "\u{00F6}"
"\u00F6"
irb(main):023:0> yml = YAML.dump s
"--- \xF6\n"
irb(main):024:0> yml.encoding
#<Encoding:UTF-8>
irb(main):025:0> yml.encode("windows-1252")
Encoding::UndefinedConversionError: Input length = 2
from org/jruby/RubyString.java:7479:in `encode'
from (irb):25:in `evaluate'
from org/jruby/RubyKernel.java:1065:in `eval'
from org/jruby/RubyKernel.java:1390:in `loop'
from org/jruby/RubyKernel.java:1173:in `catch'
from org/jruby/RubyKernel.java:1173:in `catch'
from C:\jruby-1.7.0.RC2\/bin/jirb_swing:54:in `(root)'
irb(main):026:0>
I opened this because this is about dumping instead of loading. MRI will dump the chars are \xF6.
This appears to be a case where SnakeYAML is not properly handling a non-ascii string encoded into YAML. We decode the character content from RubyString into java.lang.String correctly regardless of external encoding, but SnakeYAML takes that string and encodes it incorrectly depending on what the JVM's file.encoding is set to. If set to a non-UTF-8 encoding, it's encoding the bytes in an invalid format.
This needs a SnakeYAML fix, as far as I can tell.