JRuby (please use github issues at http://bugs.jruby.org)
  1. JRuby (please use github issues at http://bugs.jruby.org)
  2. JRUBY-6325

Java::OrgYamlSnakeyamlReader::ReaderException: special characters are not allowed

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.7.0.pre1
    • Fix Version/s: JRuby 1.7.0.pre1
    • Labels:
      None
    • Environment:
      > jruby -v
      jruby 1.7.0.dev (ruby-1.9.3-p6) (2012-01-08 43db071) (Java HotSpot(TM) Client VM 1.6.0_29) [darwin-i386-java]
    • Number of attachments :
      0

      Description

      when running `bundle install` on 1.7.dev it fails. It works on 1.6.5.1.

      > bundle install
      Fetching git://github.com/dnagir/neo4j.git
      Installing multi_json (1.0.4)
      ....etc....
      Installing addressable (2.2.6)
      Installing diff-lcs (1.1.3)

      Java::OrgYamlSnakeyamlReader::ReaderException: special characters are not allowed
      An error occured while installing gherkin (2.6.9), and Bundler cannot continue.
      Make sure that `gem install gherkin -v '2.6.9'` succeeds before bundling.

      > gem install gherkin
      Fetching: gherkin-2.7.2-java.gem (100%)
      StreamReader.java:98:in `checkPrintable': unacceptable character '?' (0xFFFD) special characters are not allowed
      in "<reader>", position 187
      	from StreamReader.java:191:in `update'
      	from StreamReader.java:63:in `<init>'
      	from PsychParser.java:115:in `parse'
      	from PsychParser$INVOKER$i$1$0$parse.gen:65535:in `call'
      	from CachingCallSite.java:312:in `cacheAndCall'
      	from CachingCallSite.java:169:in `call'
      	from CallOneArgNode.java:57:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:190:in `call'
      	from DefaultMethod.java:199:in `call'
      	from CachingCallSite.java:312:in `cacheAndCall'
      	from CachingCallSite.java:169:in `call'
      	from FCallOneArgNode.java:36:in `interpret'
      	from CallNoArgNode.java:63:in `interpret'
      	from LocalAsgnNode.java:123:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:190:in `call'
      	from DefaultMethod.java:199:in `call'
      	from CachingCallSite.java:312:in `cacheAndCall'
      	from CachingCallSite.java:169:in `call'
      	from FCallOneArgNode.java:36:in `interpret'
      	from LocalAsgnNode.java:123:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:190:in `call'
      	from DefaultMethod.java:199:in `call'
      	from CachingCallSite.java:312:in `cacheAndCall'
      	from CachingCallSite.java:169:in `call'
      	from CallOneArgNode.java:57:in `interpret'
      	from LocalAsgnNode.java:123:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:190:in `call'
      	from DefaultMethod.java:199:in `call'
      	from CachingCallSite.java:312:in `cacheAndCall'
      	from CachingCallSite.java:169:in `call'
      	from CallOneArgNode.java:57:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from RescueNode.java:216:in `executeBody'
      	from RescueNode.java:120:in `interpretWithJavaExceptions'
      	from RescueNode.java:110:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:190:in `call'
      	from DefaultMethod.java:199:in `call'
      	from CachingCallSite.java:312:in `cacheAndCall'
      	from CachingCallSite.java:169:in `call'
      	from FCallOneArgNode.java:36:in `interpret'
      	from InstAsgnNode.java:95:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from EnsureNode.java:96:in `interpret'
      	from BeginNode.java:83:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from WhenOneArgNode.java:36:in `whenSlowTest'
      	from WhenOneArgNode.java:46:in `when'
      	from CaseNode.java:133:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from ASTInterpreter.java:112:in `INTERPRET_BLOCK'
      	from Interpreted19Block.java:203:in `evalBlockBody'
      	from Interpreted19Block.java:154:in `yield'
      	from Interpreted19Block.java:132:in `yieldSpecific'
      	from Block.java:108:in `yieldSpecific'
      	from YieldOneNode.java:35:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:112:in `INTERPRET_BLOCK'
      	from Interpreted19Block.java:203:in `evalBlockBody'
      	from Interpreted19Block.java:154:in `yield'
      	from Interpreted19Block.java:127:in `yieldSpecific'
      	from Block.java:99:in `yieldSpecific'
      	from RubyKernel.java:1337:in `loop'
      	from RubyKernel$INVOKER$s$0$0$loop.gen:65535:in `call'
      	from CachingCallSite.java:302:in `cacheAndCall'
      	from CachingCallSite.java:144:in `callBlock'
      	from CachingCallSite.java:153:in `callIter'
      	from FCallNoArgBlockNode.java:32:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:169:in `call'
      	from DefaultMethod.java:191:in `call'
      	from CachingCallSite.java:302:in `cacheAndCall'
      	from CachingCallSite.java:144:in `callBlock'
      	from CachingCallSite.java:153:in `callIter'
      	from CallNoArgBlockNode.java:64:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:255:in `call'
      	from DefaultMethod.java:223:in `call'
      	from CachingCallSite.java:342:in `cacheAndCall'
      	from CachingCallSite.java:212:in `callBlock'
      	from CachingCallSite.java:216:in `call'
      	from RubyClass.java:824:in `newInstance'
      	from RubyClass$INVOKER$i$newInstance.gen:65535:in `call'
      	from JavaMethod.java:290:in `call'
      	from WrapperMethod.java:62:in `call'
      	from CachingCallSite.java:332:in `cacheAndCall'
      	from CachingCallSite.java:203:in `call'
      	from FCallTwoArgNode.java:38:in `interpret'
      	from LocalAsgnNode.java:123:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from EnsureNode.java:96:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:255:in `call'
      	from DefaultMethod.java:223:in `call'
      	from CachingCallSite.java:342:in `cacheAndCall'
      	from CachingCallSite.java:212:in `callBlock'
      	from CachingCallSite.java:216:in `call'
      	from CallTwoArgBlockPassNode.java:62:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:298:in `call'
      	from DefaultMethod.java:239:in `call'
      	from CachingCallSite.java:362:in `cacheAndCall'
      	from CachingCallSite.java:246:in `callBlock'
      	from CachingCallSite.java:255:in `callIter'
      	from CallThreeArgBlockNode.java:64:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:276:in `call'
      	from DefaultMethod.java:231:in `call'
      	from CachingCallSite.java:352:in `cacheAndCall'
      	from CachingCallSite.java:237:in `call'
      	from FCallThreeArgNode.java:40:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from ASTInterpreter.java:112:in `INTERPRET_BLOCK'
      	from Interpreted19Block.java:203:in `evalBlockBody'
      	from Interpreted19Block.java:154:in `yield'
      	from Block.java:130:in `yield'
      	from RubyIO.java:1131:in `open'
      	from RubyKernel.java:285:in `open'
      	from RubyKernel.java:295:in `open19'
      	from RubyKernel$INVOKER$s$0$2$open19.gen:65535:in `call'
      	from DynamicMethod.java:224:in `call'
      	from CachingCallSite.java:342:in `cacheAndCall'
      	from CachingCallSite.java:212:in `callBlock'
      	from CachingCallSite.java:221:in `callIter'
      	from FCallTwoArgBlockNode.java:34:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from RescueNode.java:216:in `executeBody'
      	from RescueNode.java:120:in `interpretWithJavaExceptions'
      	from RescueNode.java:110:in `interpret'
      	from BeginNode.java:83:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from IfNode.java:119:in `interpret'
      	from IfNode.java:119:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:233:in `call'
      	from DefaultMethod.java:215:in `call'
      	from CachingCallSite.java:332:in `cacheAndCall'
      	from CachingCallSite.java:203:in `call'
      	from CallTwoArgNode.java:59:in `interpret'
      	from InstAsgnNode.java:95:in `interpret'
      	from OpAsgnOrNode.java:100:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from RescueNode.java:216:in `executeBody'
      	from RescueNode.java:120:in `interpretWithJavaExceptions'
      	from RescueNode.java:110:in `interpret'
      	from BeginNode.java:83:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:147:in `call'
      	from DefaultMethod.java:183:in `call'
      	from CachingCallSite.java:292:in `cacheAndCall'
      	from CachingCallSite.java:135:in `call'
      	from VCallNode.java:86:in `interpret'
      	from CallNoArgNode.java:63:in `interpret'
      	from InstAsgnNode.java:95:in `interpret'
      	from OpAsgnOrNode.java:100:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:147:in `call'
      	from DefaultMethod.java:183:in `call'
      	from CachingCallSite.java:292:in `cacheAndCall'
      	from CachingCallSite.java:135:in `call'
      	from VCallNode.java:86:in `interpret'
      	from CallNoArgNode.java:63:in `interpret'
      	from LocalAsgnNode.java:123:in `interpret'
      	from IfNode.java:111:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:147:in `call'
      	from DefaultMethod.java:183:in `call'
      	from CachingCallSite.java:292:in `cacheAndCall'
      	from CachingCallSite.java:135:in `call'
      	from VCallNode.java:86:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from IfNode.java:119:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from RescueNode.java:216:in `executeBody'
      	from RescueNode.java:120:in `interpretWithJavaExceptions'
      	from RescueNode.java:110:in `interpret'
      	from EnsureNode.java:96:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:147:in `call'
      	from DefaultMethod.java:183:in `call'
      	from CachingCallSite.java:292:in `cacheAndCall'
      	from CachingCallSite.java:135:in `call'
      	from CallNoArgNode.java:63:in `interpret'
      	from DAsgnNode.java:110:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:112:in `INTERPRET_BLOCK'
      	from Interpreted19Block.java:203:in `evalBlockBody'
      	from Interpreted19Block.java:191:in `yield'
      	from Interpreted19Block.java:174:in `yield'
      	from BlockBody.java:131:in `call'
      	from Block.java:111:in `call'
      	from RubyEnumerable.java:874:in `call'
      	from CallBlock.java:88:in `yield'
      	from Block.java:130:in `yield'
      	from RubyArray.java:1602:in `eachCommon'
      	from RubyArray.java:1609:in `each'
      	from RubyArray$INVOKER$i$0$0$each.gen:65535:in `call'
      	from JavaMethod.java:434:in `call'
      	from RubyClass.java:561:in `finvoke'
      	from RuntimeHelpers.java:533:in `invoke'
      	from RubyEnumerable.java:114:in `callEach'
      	from RubyEnumerable.java:917:in `each_with_indexCommon19'
      	from RubyEnumerable.java:933:in `each_with_index19'
      	from RubyEnumerable$INVOKER$s$0$0$each_with_index19.gen:65535:in `call'
      	from DynamicMethod.java:208:in `call'
      	from CachingCallSite.java:302:in `cacheAndCall'
      	from CachingCallSite.java:144:in `callBlock'
      	from CachingCallSite.java:153:in `callIter'
      	from CallNoArgBlockNode.java:64:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:233:in `call'
      	from DefaultMethod.java:215:in `call'
      	from CachingCallSite.java:332:in `cacheAndCall'
      	from CachingCallSite.java:203:in `call'
      	from CallTwoArgNode.java:59:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from RescueNode.java:216:in `executeBody'
      	from RescueNode.java:120:in `interpretWithJavaExceptions'
      	from RescueNode.java:110:in `interpret'
      	from BeginNode.java:83:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from ASTInterpreter.java:112:in `INTERPRET_BLOCK'
      	from Interpreted19Block.java:203:in `evalBlockBody'
      	from Interpreted19Block.java:154:in `yield'
      	from Block.java:130:in `yield'
      	from RubyArray.java:1602:in `eachCommon'
      	from RubyArray.java:1609:in `each'
      	from RubyArray$INVOKER$i$0$0$each.gen:65535:in `call'
      	from CachingCallSite.java:302:in `cacheAndCall'
      	from CachingCallSite.java:144:in `callBlock'
      	from CachingCallSite.java:153:in `callIter'
      	from CallNoArgBlockNode.java:64:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:147:in `call'
      	from DefaultMethod.java:183:in `call'
      	from CachingCallSite.java:292:in `cacheAndCall'
      	from CachingCallSite.java:135:in `call'
      	from VCallNode.java:86:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from IfNode.java:119:in `interpret'
      	from IfNode.java:119:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:190:in `call'
      	from DefaultMethod.java:199:in `call'
      	from CachingCallSite.java:312:in `cacheAndCall'
      	from CachingCallSite.java:169:in `call'
      	from CallSpecialArgNode.java:67:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from CaseNode.java:138:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:190:in `call'
      	from DefaultMethod.java:199:in `call'
      	from CachingCallSite.java:312:in `cacheAndCall'
      	from CachingCallSite.java:169:in `call'
      	from FCallOneArgNode.java:36:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from RescueNode.java:216:in `executeBody'
      	from RescueNode.java:120:in `interpretWithJavaExceptions'
      	from RescueNode.java:110:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:190:in `call'
      	from DefaultMethod.java:199:in `call'
      	from CachingCallSite.java:312:in `cacheAndCall'
      	from CachingCallSite.java:169:in `call'
      	from CallOneArgNode.java:57:in `interpret'
      	from NewlineNode.java:104:in `interpret'
      	from BlockNode.java:71:in `interpret'
      	from ASTInterpreter.java:75:in `INTERPRET_METHOD'
      	from InterpretedMethod.java:190:in `call'
      	from DefaultMethod.java:199:in `call'
      	from CachingCallSite.java:312:in `cacheAndCall'
      	from CachingCallSite.java:169:in `call'
      	from /Users/dnagir/.rvm/rubies/jruby-head/bin/gem:25:in `chained_0_rescue_1$RUBY$SYNTHETIC__file__'
      	from /Users/dnagir/.rvm/rubies/jruby-head/bin/gem:24:in `__file__'
      	from /Users/dnagir/.rvm/rubies/jruby-head/bin/gem:-1:in `load'
      	from Ruby.java:732:in `runScript'
      	from Ruby.java:725:in `runScript'
      	from Ruby.java:632:in `runNormally'
      	from Ruby.java:481:in `runFromMain'
      	from Main.java:343:in `doRunFromMain'
      	from Main.java:255:in `internalRun'
      	from Main.java:221:in `run'
      	from Main.java:205:in `run'
      	from Main.java:185:in `main'
      

        Activity

        Hide
        Hiro Asari added a comment -

        Strange. It works for me.

        $ jruby -v
        jruby 1.7.0.dev (ruby-1.8.7-p357) (2012-01-08 43db071) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_29) [darwin-x86_64-java]
        $ jruby -S gem install gherkin
        Fetching: gherkin-2.7.2-java.gem (100%)
        Successfully installed gherkin-2.7.2-java
        1 gem installed
        

        Does it work without RVM?

        Show
        Hiro Asari added a comment - Strange. It works for me. $ jruby -v jruby 1.7.0.dev (ruby-1.8.7-p357) (2012-01-08 43db071) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_29) [darwin-x86_64-java] $ jruby -S gem install gherkin Fetching: gherkin-2.7.2-java.gem (100%) Successfully installed gherkin-2.7.2-java 1 gem installed Does it work without RVM?
        Hide
        Dmytrii Nagirniak added a comment -

        Use 1.9 mode.

        Show
        Dmytrii Nagirniak added a comment - Use 1.9 mode.
        Hide
        Hiro Asari added a comment -

        Ah. Thanks. I confirmed it. I'm looking into this. Probably a fix similar to JRUBY-6200 is feasible.

        Show
        Hiro Asari added a comment - Ah. Thanks. I confirmed it. I'm looking into this. Probably a fix similar to JRUBY-6200 is feasible.
        Hide
        Dmytrii Nagirniak added a comment -

        Looks like a duplicate except that 6200 applied to 1.6.5, and this one to 1.7.dev.

        Show
        Dmytrii Nagirniak added a comment - Looks like a duplicate except that 6200 applied to 1.6.5, and this one to 1.7.dev.
        Hide
        Dmytrii Nagirniak added a comment - - edited

        NOTICE that id DID work on 1.6.5.1. It is broken on 1.7.dev.

        Show
        Dmytrii Nagirniak added a comment - - edited NOTICE that id DID work on 1.6.5.1. It is broken on 1.7.dev.
        Hide
        Dmytrii Nagirniak added a comment -

        Also the workaround setting `-J-Dfile.encoding=UTF-8` option does't work here.

        Show
        Dmytrii Nagirniak added a comment - Also the workaround setting `-J-Dfile.encoding=UTF-8` option does't work here.
        Hide
        Hiro Asari added a comment -

        This is a little stranger than I first thought.

        The exception is thrown from line 115 in PsychParser.java, when we read the YAML data from a non-{{read}}able object (a String, in this particular case). As noted, the 1.6 branch works, and the master doesn't. As far as I can tell, PsychParser.java is essentially unchanged for the purpose of this ticket.

        The problematic input is the gemspec for gherkin (which contains Aslak's last name). The data go through different transformations by the time it gets to PsychParser. So, that's where the real problem lies.

        Show
        Hiro Asari added a comment - This is a little stranger than I first thought. The exception is thrown from line 115 in PsychParser.java, when we read the YAML data from a non-{{read}}able object (a String, in this particular case). As noted, the 1.6 branch works, and the master doesn't. As far as I can tell, PsychParser.java is essentially unchanged for the purpose of this ticket. The problematic input is the gemspec for gherkin (which contains Aslak's last name). The data go through different transformations by the time it gets to PsychParser. So, that's where the real problem lies.
        Hide
        Dmytrii Nagirniak added a comment -

        Ok. So Aslak's name is the problem Just don't tell him

        But seriously, is there any workaround/fix?

        Show
        Dmytrii Nagirniak added a comment - Ok. So Aslak's name is the problem Just don't tell him But seriously, is there any workaround/fix?
        Hide
        Hiro Asari added a comment -

        Wait until I put my name in gemspec (in Japanese, that is)!

        Well, for a workaround, install the gem in the 1.8 mode. JRuby will pick up the same gem when running in the 1.9 mode, so that should work.

        I'll take a deeper look tomorrow. My hunch is that the problem lies in Zlib somewhere, which is responsible for unpacking the gem file and feeding gemspec data to Psych.

        Show
        Hiro Asari added a comment - Wait until I put my name in gemspec (in Japanese, that is)! Well, for a workaround, install the gem in the 1.8 mode. JRuby will pick up the same gem when running in the 1.9 mode, so that should work. I'll take a deeper look tomorrow. My hunch is that the problem lies in Zlib somewhere, which is responsible for unpacking the gem file and feeding gemspec data to Psych.
        Hide
        Dmytrii Nagirniak added a comment - - edited

        As long as your name won't break anything

        Thanks. Using 1.8 mode for install seems to work. Now have to figure out how to install it with Bundler. (Suggestions?)

        Show
        Dmytrii Nagirniak added a comment - - edited As long as your name won't break anything Thanks. Using 1.8 mode for install seems to work. Now have to figure out how to install it with Bundler. (Suggestions?)
        Hide
        Charles Oliver Nutter added a comment -

        The character complained about here is the character used when decoding a unicode stream fails. In this case, it's the � character.

        My guess is that when we're decoding the string before passing it to SnakeYAML, we're not handling the source encoding properly.

        Show
        Charles Oliver Nutter added a comment - The character complained about here is the character used when decoding a unicode stream fails. In this case, it's the � character. My guess is that when we're decoding the string before passing it to SnakeYAML, we're not handling the source encoding properly.
        Hide
        Hiro Asari added a comment -

        You mean \ufffd? I don't think that's in gemspec. \ufffd is the special character that SnakeYAML throws at us. This is the character that apparently JVM uses to indicate "data corruption". See JRUBY-6317.

        Show
        Hiro Asari added a comment - You mean \ufffd? I don't think that's in gemspec. \ufffd is the special character that SnakeYAML throws at us. This is the character that apparently JVM uses to indicate "data corruption". See JRUBY-6317 .
        Hide
        Charles Oliver Nutter added a comment -

        I assume from comments that this works on jruby-1_6 branch, but I would appreciate someone confirming that with a JRuby 1.6.6.dev build (http://ci.jruby.org/snapshots/release). For master, I have pushed a fix:

        commit dc1741a55917cb71bf311bb63ed9c92d7e423863
        Author: Charles Oliver Nutter <headius@headius.com>
        Date:   Mon Jan 9 01:09:31 2012 -0600
        
            Fix JRUBY-6325: Java::OrgYamlSnakeyamlReader::ReaderException: special characters are not allowed
            
            I do not have a clear explanation for why this was failing before,
            but by using an InputStreamReader with the charset from the Ruby
            string, rather than calling toString, we are able to deal with
            characters from the input strema. I will investigate further why
            the toString path did not work properly, but this fixes the issue.
            
            Can someone stick a test case into JRuby's regression tests or
            into RubySpec?
        
        Show
        Charles Oliver Nutter added a comment - I assume from comments that this works on jruby-1_6 branch, but I would appreciate someone confirming that with a JRuby 1.6.6.dev build ( http://ci.jruby.org/snapshots/release ). For master, I have pushed a fix: commit dc1741a55917cb71bf311bb63ed9c92d7e423863 Author: Charles Oliver Nutter <headius@headius.com> Date: Mon Jan 9 01:09:31 2012 -0600 Fix JRUBY-6325: Java::OrgYamlSnakeyamlReader::ReaderException: special characters are not allowed I do not have a clear explanation for why this was failing before, but by using an InputStreamReader with the charset from the Ruby string, rather than calling toString, we are able to deal with characters from the input strema. I will investigate further why the toString path did not work properly, but this fixes the issue. Can someone stick a test case into JRuby's regression tests or into RubySpec?
        Hide
        Hiro Asari added a comment -

        Charlie, I tested the 1.6 branch many times while I was looking at this issue.

        The commit indeed fixes this problem, but I still wonder how it works in the 1.6 branch: https://github.com/jruby/jruby/blob/b924ceff59e224731071df3bdfb29e14ba6aeed4/src/org/jruby/ext/psych/PsychParser.java#L107

        Am I missing something?

        Show
        Hiro Asari added a comment - Charlie, I tested the 1.6 branch many times while I was looking at this issue. The commit indeed fixes this problem, but I still wonder how it works in the 1.6 branch: https://github.com/jruby/jruby/blob/b924ceff59e224731071df3bdfb29e14ba6aeed4/src/org/jruby/ext/psych/PsychParser.java#L107 Am I missing something?
        Hide
        Charles Oliver Nutter added a comment -

        I investigated a bit more. It seems like the specification is not entering JRuby with a proper encoding, either because of a bug in RubyGems or a bug in JRuby. If I print out the Charset associated with the incoming String, this is what I see at the top of the gemspec:

        ISO-8859-1
        --- !ruby/object:Gem::Specification
        name: gherkin
        version: !ruby/object:Gem::Version
          version: 2.7.2
          prerelease: 
        platform: java
        authors:
        - Mike Sassak
        - Gregory Hnatiuk
        - Aslak Hellesøy
        

        This is after my fix. Note that Aslak's name is still mangled. It is encoded as UTF-8, but the encoding is incorrect in whatever produces the actual YAML content here.

        My fix basically just allows the ISO-8859-1 encoding to allow the bytes to pass through unmodified, even though they're still wrong for the resulting String. This could still cause problems, so I'm temporarily reopening while I investigate further.

        Show
        Charles Oliver Nutter added a comment - I investigated a bit more. It seems like the specification is not entering JRuby with a proper encoding, either because of a bug in RubyGems or a bug in JRuby. If I print out the Charset associated with the incoming String, this is what I see at the top of the gemspec: ISO-8859-1 --- !ruby/object:Gem::Specification name: gherkin version: !ruby/object:Gem::Version version: 2.7.2 prerelease: platform: java authors: - Mike Sassak - Gregory Hnatiuk - Aslak Hellesøy This is after my fix. Note that Aslak's name is still mangled. It is encoded as UTF-8, but the encoding is incorrect in whatever produces the actual YAML content here. My fix basically just allows the ISO-8859-1 encoding to allow the bytes to pass through unmodified, even though they're still wrong for the resulting String. This could still cause problems, so I'm temporarily reopening while I investigate further.
        Hide
        Charles Oliver Nutter added a comment -

        Ok, so the specification is not entering JRuby with the right encoding because this fails to associate the correct encoding with the resulting String:

        $ jruby --1.9 -rzlib -e "p Zlib::GzipReader.new(File.open('metadata.gz'), :external_encoding => 'UTF-8').read.encoding"
        UTF-8, null
        #<Encoding:ASCII-8BIT>
        

        The "UTF-8, null" is me logging what external and internal encoding is detected by the logic in RubyIO called by GzipReader to determine what encodings to use. This seems correct, but the resulting string should be left as UTF-8 since there's no internal encoding to which we are supposed to transcode.

        Continuing to investigate.

        Show
        Charles Oliver Nutter added a comment - Ok, so the specification is not entering JRuby with the right encoding because this fails to associate the correct encoding with the resulting String: $ jruby --1.9 -rzlib -e "p Zlib::GzipReader.new(File.open('metadata.gz'), :external_encoding => 'UTF-8').read.encoding" UTF-8, null #<Encoding:ASCII-8BIT> The "UTF-8, null" is me logging what external and internal encoding is detected by the logic in RubyIO called by GzipReader to determine what encodings to use. This seems correct, but the resulting string should be left as UTF-8 since there's no internal encoding to which we are supposed to transcode. Continuing to investigate.
        Hide
        Charles Oliver Nutter added a comment - - edited

        Ok, I have the correct fix now.

        The strings coming out of GzipReader weren't marked as the external encoding at all, even though we processed the encoding options. I modified GzipReader to construct the ByteLists it reads using external_encoding, if provided, and that actually made the original code in PsychParser work fine.

        Because the bug was actually in GzipReader and not in PsychParser, I have reverted the fix to the latter.

        I also made some minor tweaks to GzipReader's processing of encoding options and to our cached UTF-8 decoder to make it match more exactly the decoding options of the JDK's default decoders.

        commit a8dc697b17804e7f2a05bdd8f29c59d939a74265
        Author: Charles Oliver Nutter <headius@headius.com>
        Date:   Mon Jan 9 02:30:58 2012 -0600
        
            Revert "Fix JRUBY-6325: Java::OrgYamlSnakeyamlReader::ReaderException: special characters are not allowed"
            
            This reverts commit dc1741a55917cb71bf311bb63ed9c92d7e423863.
        
        commit ae12c0487cd62199b64c0d8640ce9d543401fbd8
        Author: Charles Oliver Nutter <headius@headius.com>
        Date:   Mon Jan 9 02:09:52 2012 -0600
        
            GzipReader should start bytelists using specified ext encoding.
        
        commit 86c50d8e7bcaac549c58ba665cae8fcb19de7091
        Author: Charles Oliver Nutter <headius@headius.com>
        Date:   Mon Jan 9 02:09:15 2012 -0600
        
            Make our caching UTF-8 decoder replace unmappable the same as default decoder.
        
        commit e11f2d9752e1cc0aeb7c67fa167de5b64be1d9b8
        Author: Charles Oliver Nutter <headius@headius.com>
        Date:   Mon Jan 9 01:50:46 2012 -0600
        
            Add null check for encoding options in zlib.
        
        Show
        Charles Oliver Nutter added a comment - - edited Ok, I have the correct fix now. The strings coming out of GzipReader weren't marked as the external encoding at all, even though we processed the encoding options. I modified GzipReader to construct the ByteLists it reads using external_encoding, if provided, and that actually made the original code in PsychParser work fine. Because the bug was actually in GzipReader and not in PsychParser, I have reverted the fix to the latter. I also made some minor tweaks to GzipReader's processing of encoding options and to our cached UTF-8 decoder to make it match more exactly the decoding options of the JDK's default decoders. commit a8dc697b17804e7f2a05bdd8f29c59d939a74265 Author: Charles Oliver Nutter <headius@headius.com> Date: Mon Jan 9 02:30:58 2012 -0600 Revert "Fix JRUBY-6325: Java::OrgYamlSnakeyamlReader::ReaderException: special characters are not allowed" This reverts commit dc1741a55917cb71bf311bb63ed9c92d7e423863. commit ae12c0487cd62199b64c0d8640ce9d543401fbd8 Author: Charles Oliver Nutter <headius@headius.com> Date: Mon Jan 9 02:09:52 2012 -0600 GzipReader should start bytelists using specified ext encoding. commit 86c50d8e7bcaac549c58ba665cae8fcb19de7091 Author: Charles Oliver Nutter <headius@headius.com> Date: Mon Jan 9 02:09:15 2012 -0600 Make our caching UTF-8 decoder replace unmappable the same as default decoder. commit e11f2d9752e1cc0aeb7c67fa167de5b64be1d9b8 Author: Charles Oliver Nutter <headius@headius.com> Date: Mon Jan 9 01:50:46 2012 -0600 Add null check for encoding options in zlib.
        Hide
        Hiro Asari added a comment -
        Show
        Hiro Asari added a comment - For tests, see https://github.com/rubyspec/rubyspec/pull/108

          People

          • Assignee:
            Charles Oliver Nutter
            Reporter:
            Dmytrii Nagirniak
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: