Details

    • Number of attachments :
      0

      Description

      As shown in the comments for JRUBY-6200, JRuby's "Psych" wrapper around SnakeYAML can't handle some unicode characters. This may be a SnakeYAML bug, or something in YAML spec we don't know about:

      system ~/projects/jruby $ jruby --1.9 -ryaml -e 'YAML.load("\ufffd".to_yaml)'
      StreamReader.java:98:in `checkPrintable': unacceptable character '�' (0xFFFD) special characters are not allowed
      in "<reader>", position 4
      	from StreamReader.java:191:in `update'
      	from StreamReader.java:63:in `<init>'
      	from PsychParser.java:115:in `parse'
      	from PsychParser$INVOKER$i$1$0$parse.gen:65535:in `call'
      

      I'm filing this because the original cases in JRUBY-6200 are resolved, and this seems to be a separate issue that won't affect most people.

        Issue Links

          Activity

          Hide
          Hiro Asari added a comment -

          My understanding is that this is SnakeYAML's design. SnakeYAML follows PyYAML and warns if it sees "nonprintable" characters.

          http://code.google.com/p/snakeyaml/source/browse/src/main/java/org/yaml/snakeyaml/reader/StreamReader.java#33

          As I noted in JRUBY-6200, MRI dutifully prints any UTF character we throw at it.

          $ ruby2.0 -v -ryaml -e 'p YAML.load("\ufffe".to_yaml)'
          ruby 2.0.0dev (2011-12-31 trunk 34165) [x86_64-darwin11.2.0]
          "\uFFFE"
          

          It seems to me that the best we can do is to catch org.yaml.snakeyaml.reader.ReaderException and inform the user of the problem (noting that \ufffd may be an indication that JVM might have thrown an exception). To print these characters which SnakeYAML deems nonprintable, I think we need to parse the YAML input before we pass it off to SnakeYAML; this defeats the purpose of using SnakeYAML in the first place.

          Show
          Hiro Asari added a comment - My understanding is that this is SnakeYAML's design. SnakeYAML follows PyYAML and warns if it sees "nonprintable" characters. http://code.google.com/p/snakeyaml/source/browse/src/main/java/org/yaml/snakeyaml/reader/StreamReader.java#33 As I noted in JRUBY-6200 , MRI dutifully prints any UTF character we throw at it. $ ruby2.0 -v -ryaml -e 'p YAML.load("\ufffe".to_yaml)' ruby 2.0.0dev (2011-12-31 trunk 34165) [x86_64-darwin11.2.0] "\uFFFE" It seems to me that the best we can do is to catch org.yaml.snakeyaml.reader.ReaderException and inform the user of the problem (noting that \ufffd may be an indication that JVM might have thrown an exception). To print these characters which SnakeYAML deems nonprintable, I think we need to parse the YAML input before we pass it off to SnakeYAML; this defeats the purpose of using SnakeYAML in the first place.
          Hide
          Hiro Asari added a comment -

          I opened http://code.google.com/p/snakeyaml/issues/detail?id=138 for ReaderException improvement, so that we may inform the user better.

          Show
          Hiro Asari added a comment - I opened http://code.google.com/p/snakeyaml/issues/detail?id=138 for ReaderException improvement, so that we may inform the user better.
          Hide
          Charles Oliver Nutter added a comment -

          This appears to only affect 0xFFFD. 0xFFFC works, and 0xFFFE and 0xFFFF do not but are disallowed by the YAML spec (http://yaml.org/spec/1.1/#id868518). Interestingly, the error for "E" and "F" is the same as for "D", but the spec appears to explicitly allow the "D" character. We may need to take this up with SnakeYAML.

          I am making the exception a bit better, anyway, and will mark this minor. We have fixed other bugs that should prevent the "D" character from appearing in typical usage (it is the UTF-8 replacement character, and was coming up because we decoded/encoded incorrectly in other places).

          Show
          Charles Oliver Nutter added a comment - This appears to only affect 0xFFFD. 0xFFFC works, and 0xFFFE and 0xFFFF do not but are disallowed by the YAML spec ( http://yaml.org/spec/1.1/#id868518 ). Interestingly, the error for "E" and "F" is the same as for "D", but the spec appears to explicitly allow the "D" character. We may need to take this up with SnakeYAML. I am making the exception a bit better, anyway, and will mark this minor. We have fixed other bugs that should prevent the "D" character from appearing in typical usage (it is the UTF-8 replacement character, and was coming up because we decoded/encoded incorrectly in other places).
          Hide
          Charles Oliver Nutter added a comment -

          Improvement to the exception committed to master@ae48d95 and jruby-1_6@d03248a.

          Show
          Charles Oliver Nutter added a comment - Improvement to the exception committed to master@ae48d95 and jruby-1_6@d03248a.
          Hide
          Fooled By Primes added a comment -

          I believe some relevant information can be found here regarding a fix in snakeyaml:
          http://code.google.com/p/snakeyaml/issues/detail?id=147&can=1&colspec=ID%20Type%20Status%20Priority%20Owner%20Summary

          Show
          Fooled By Primes added a comment - I believe some relevant information can be found here regarding a fix in snakeyaml: http://code.google.com/p/snakeyaml/issues/detail?id=147&can=1&colspec=ID%20Type%20Status%20Priority%20Owner%20Summary
          Hide
          Fooled By Primes added a comment -

          I believe the fix I mention above will be available in snakeyaml 1.11

          Show
          Fooled By Primes added a comment - I believe the fix I mention above will be available in snakeyaml 1.11
          Hide
          Lenny Marks added a comment -

          Fell upon this after getting the "Psych::SyntaxError: special characters are not allowed" from a serialized active-record attribute. JRuby-1.6.7.2, Rails 3.0.11. Most search results for "Psych::SyntaxError" are about the now resolved gem installation problem (e.g. JRUBY-6401 and related). Hopefully this will pop-up in future searches about serialized rails attributes. Anyway, for now, it's probably best to avoid YAML serialization of serialized activerecord attributes.

          irb(main):063:0> Foo.create!(:authors => ["\ufffdric"])
          Psych::SyntaxError: special characters are not allowed
          	from org/jruby/ext/psych/PsychParser.java:282:in `parse'
          	from /projects/eds_test/share/jruby/lib/ruby/1.9/psych.rb:148:in `parse_stream'
          	from /projects/eds_test/share/jruby/lib/ruby/1.9/psych.rb:119:in `parse'
          	from /projects/eds_test/share/jruby/lib/ruby/1.9/psych.rb:106:in `load'
          	from /Users/Shared/lenny/work/PRISM/rails/Submissions/vendor/bundle/jruby/1.9/gems/activerecord-3.0.11/lib/active_record/base.rb:1870:in `object_from_yaml'
          	from /Users/Shared/lenny/work/PRISM/rails/Submissions/vendor/bundle/jruby/1.9/gems/activerecord-3.0.11/lib/active_record/attribute_methods/read.rb:129:in `unserialize_attribute'
          	from /Users/Shared/lenny/work/PRISM/rails/Submissions/vendor/bundle/jruby/1.9/gems/activerecord-3.0.11/lib/active_record/attribute_methods/read.rb:112:in `_read_attribute'
          	from /Users/Shared/lenny/work/PRISM/rails/Submissions/vendor/bundle/jruby/1.9/gems/activerecord-3.0.11/lib/active_record/attribute_methods/read.rb:101:in `read_attribute'
          	from /Users/Shared/lenny/work/PRISM/rails/Submissions/vendor/bundle/jruby/1.9/gems/activerecord-3.0.11/lib/active_record/base.rb:1593:in `attribute_for_inspect'
          
          
          

          Interesting differences between JRuby-1.6.7.2 and JRuby-1.7.0-preview. The results are reversed.

          $ jruby-1.7.0.preview1> bin/jruby --1.9 -r yaml -e 'YAML.load(["\u263A"].to_yaml)'
          
          $ jruby-1.7.0.preview1> bin/jruby --1.9 -r yaml -e 'YAML.load(["\ufffd"].to_yaml)'
          Psych::SyntaxError: special characters are not allowed
                   parse at org/jruby/ext/psych/PsychParser.java:293
            parse_stream at /Network/projects/eds_test/share/jruby-1.7.0.preview1/lib/ruby/1.9/psych.rb:203
                   parse at /Network/projects/eds_test/share/jruby-1.7.0.preview1/lib/ruby/1.9/psych.rb:151
                    load at /Network/projects/eds_test/share/jruby-1.7.0.preview1/lib/ruby/1.9/psych.rb:127
                  (root) at -e:1
          
          $ jruby-1.6.7.2> bin/jruby --1.9 -r yaml -e 'YAML.load(["\u263A"].to_yaml)'
          Psych::SyntaxError: special characters are not allowed
                   parse at org/jruby/ext/psych/PsychParser.java:282
            parse_stream at /Network/projects/eds_test/share/jruby-1.6.7.2/lib/ruby/1.9/psych.rb:148
                   parse at /Network/projects/eds_test/share/jruby-1.6.7.2/lib/ruby/1.9/psych.rb:119
                    load at /Network/projects/eds_test/share/jruby-1.6.7.2/lib/ruby/1.9/psych.rb:106
                  (root) at -e:1
          
          $ jruby-1.6.7.2> bin/jruby --1.9 -r yaml -e 'YAML.load(["\ufffd"].to_yaml)'
          
          Show
          Lenny Marks added a comment - Fell upon this after getting the "Psych::SyntaxError: special characters are not allowed" from a serialized active-record attribute. JRuby-1.6.7.2, Rails 3.0.11. Most search results for "Psych::SyntaxError" are about the now resolved gem installation problem (e.g. JRUBY-6401 and related). Hopefully this will pop-up in future searches about serialized rails attributes. Anyway, for now, it's probably best to avoid YAML serialization of serialized activerecord attributes. irb(main):063:0> Foo.create!(:authors => ["\ufffdric"]) Psych::SyntaxError: special characters are not allowed from org/jruby/ext/psych/PsychParser.java:282:in `parse' from /projects/eds_test/share/jruby/lib/ruby/1.9/psych.rb:148:in `parse_stream' from /projects/eds_test/share/jruby/lib/ruby/1.9/psych.rb:119:in `parse' from /projects/eds_test/share/jruby/lib/ruby/1.9/psych.rb:106:in `load' from /Users/Shared/lenny/work/PRISM/rails/Submissions/vendor/bundle/jruby/1.9/gems/activerecord-3.0.11/lib/active_record/base.rb:1870:in `object_from_yaml' from /Users/Shared/lenny/work/PRISM/rails/Submissions/vendor/bundle/jruby/1.9/gems/activerecord-3.0.11/lib/active_record/attribute_methods/read.rb:129:in `unserialize_attribute' from /Users/Shared/lenny/work/PRISM/rails/Submissions/vendor/bundle/jruby/1.9/gems/activerecord-3.0.11/lib/active_record/attribute_methods/read.rb:112:in `_read_attribute' from /Users/Shared/lenny/work/PRISM/rails/Submissions/vendor/bundle/jruby/1.9/gems/activerecord-3.0.11/lib/active_record/attribute_methods/read.rb:101:in `read_attribute' from /Users/Shared/lenny/work/PRISM/rails/Submissions/vendor/bundle/jruby/1.9/gems/activerecord-3.0.11/lib/active_record/base.rb:1593:in `attribute_for_inspect' Interesting differences between JRuby-1.6.7.2 and JRuby-1.7.0-preview. The results are reversed. $ jruby-1.7.0.preview1> bin/jruby --1.9 -r yaml -e 'YAML.load(["\u263A"].to_yaml)' $ jruby-1.7.0.preview1> bin/jruby --1.9 -r yaml -e 'YAML.load(["\ufffd"].to_yaml)' Psych::SyntaxError: special characters are not allowed parse at org/jruby/ext/psych/PsychParser.java:293 parse_stream at /Network/projects/eds_test/share/jruby-1.7.0.preview1/lib/ruby/1.9/psych.rb:203 parse at /Network/projects/eds_test/share/jruby-1.7.0.preview1/lib/ruby/1.9/psych.rb:151 load at /Network/projects/eds_test/share/jruby-1.7.0.preview1/lib/ruby/1.9/psych.rb:127 (root) at -e:1 $ jruby-1.6.7.2> bin/jruby --1.9 -r yaml -e 'YAML.load(["\u263A"].to_yaml)' Psych::SyntaxError: special characters are not allowed parse at org/jruby/ext/psych/PsychParser.java:282 parse_stream at /Network/projects/eds_test/share/jruby-1.6.7.2/lib/ruby/1.9/psych.rb:148 parse at /Network/projects/eds_test/share/jruby-1.6.7.2/lib/ruby/1.9/psych.rb:119 load at /Network/projects/eds_test/share/jruby-1.6.7.2/lib/ruby/1.9/psych.rb:106 (root) at -e:1 $ jruby-1.6.7.2> bin/jruby --1.9 -r yaml -e 'YAML.load(["\ufffd"].to_yaml)'
          Hide
          Charles Oliver Nutter added a comment -

          Perhaps I should have looked at Hiro's comment sooner.

          The lines in question in SnakeYAML clearly state that the rejection of 0xFFFD is intentional since Java uses that character to indicate data corruption. I doubt this is something we could get SnakeYAML to change.

          I also doubt this is something anyone will run into during normal YAML use.

          I'm going to mark this resolved, since as far as we know this is the only remaining difference from libyaml in accepted characters.

          Show
          Charles Oliver Nutter added a comment - Perhaps I should have looked at Hiro's comment sooner. The lines in question in SnakeYAML clearly state that the rejection of 0xFFFD is intentional since Java uses that character to indicate data corruption. I doubt this is something we could get SnakeYAML to change. I also doubt this is something anyone will run into during normal YAML use. I'm going to mark this resolved, since as far as we know this is the only remaining difference from libyaml in accepted characters.
          Hide
          Charles Oliver Nutter added a comment -

          Oh...and Fooled's comment points at SnakeYAML discussion to remove the 0xFFFD limitation, so I guess they decided Java doesn't do this after all

          We will update SnakeYAML when it comes along. I'll just leave this resolved.

          Show
          Charles Oliver Nutter added a comment - Oh...and Fooled's comment points at SnakeYAML discussion to remove the 0xFFFD limitation, so I guess they decided Java doesn't do this after all We will update SnakeYAML when it comes along. I'll just leave this resolved.

            People

            • Assignee:
              Charles Oliver Nutter
              Reporter:
              Charles Oliver Nutter
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: