Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.7.0.pre1
    • Fix Version/s: JRuby 1.7.0.pre2
    • Component/s: Encoding, Ruby 1.9.3, Windows
    • Labels:
      None
    • Environment:
      Windows 7 x64
    • Number of attachments :
      0

      Description

      When trying to install the nokogiri gem in 1.9 mode jruby complains about an invalid byte sequence. Not sure if the problem is with jruby or the gemspec but the gem installs in 1.9 using 1.6.7.

      jruby -v --1.9 -S gem install nokogiri

      produces output:

      jruby 1.7.0.preview1 (ruby-1.9.3-p203) (2012-05-19 00c8c98) (Java HotSpot(TM) Client VM 1.7.0_04) [Windows 7-x86-java]
      Fetching: nokogiri-1.5.2-java.gem (100%)
      Invalid gemspec in [C:/Dev/sdk/jruby-1.7.0-preview1/lib/ruby/gems/shared/specifications/nokogiri-1.5.2-java.gemspec]: invalid byte sequence in UTF-8
      Successfully installed nokogiri-1.5.2-java

      The main problem being invalid byte sequence in UTF-8.

      If I try installing the gem in 1.8 mode then there are no problems.

        Issue Links

          Activity

          Hide
          Charles Oliver Nutter added a comment -

          Should be fixed by JRUBY-6739; linking.

          Show
          Charles Oliver Nutter added a comment - Should be fixed by JRUBY-6739 ; linking.
          Hide
          Thomas E Enebo added a comment -

          I am looking into this one today and I will state what the problem is without the actual fix. Zlib is not using default external encoding (which was an original though I had); it is using file.encoding property of Java itself. I can reproduce this problem on MacOS:

          jruby -J-Dfile.encoding=Windows-1252 -S gem install nokogiri
          Fetching: nokogiri-1.5.5-java.gem (100%)
          Invalid gemspec in [/Users/enebo/work/jruby/lib/ruby/gems/shared/specifications/nokogiri-1.5.5-java.gemspec]: invalid byte sequence in UTF-8
          Successfully installed nokogiri-1.5.5-java
          1 gem installed
          

          I suspect we are calling something in Java and doing a getBytes() and assuming it is the right encoding on the Ruby side. We shall see....

          Show
          Thomas E Enebo added a comment - I am looking into this one today and I will state what the problem is without the actual fix. Zlib is not using default external encoding (which was an original though I had); it is using file.encoding property of Java itself. I can reproduce this problem on MacOS: jruby -J-Dfile.encoding=Windows-1252 -S gem install nokogiri Fetching: nokogiri-1.5.5-java.gem (100%) Invalid gemspec in [/Users/enebo/work/jruby/lib/ruby/gems/shared/specifications/nokogiri-1.5.5-java.gemspec]: invalid byte sequence in UTF-8 Successfully installed nokogiri-1.5.5-java 1 gem installed I suspect we are calling something in Java and doing a getBytes() and assuming it is the right encoding on the Ruby side. We shall see....
          Hide
          Thomas E Enebo added a comment -

          This seems to be a bug in our YAML support:

          jruby -J-Dfile.encoding=Windows-1252 snippets/ym.rb
          

          ym.rb:

          # encoding: utf-8
          
          require "yaml"
          
          y = YAML.load <<EOS
          --- 
          description: "Nokogiri (\xE9\x8B\xB8) is an HTML, ..."
          EOS
          
          p y
          

          I believe YAML should always be UTF-8 and not use locale ever. Looking into a fix.

          Show
          Thomas E Enebo added a comment - This seems to be a bug in our YAML support: jruby -J-Dfile.encoding=Windows-1252 snippets/ym.rb ym.rb: # encoding: utf-8 require "yaml" y = YAML.load <<EOS --- description: "Nokogiri (\xE9\x8B\xB8) is an HTML, ..." EOS p y I believe YAML should always be UTF-8 and not use locale ever. Looking into a fix.
          Hide
          Thomas E Enebo added a comment -

          Fixed in commit 407c88c. The underlying issue is that locale encoding (or file.coding) was being used to construct new strings. I did minor String construction gymnastics to generate UTF-8 strings instead. In theory, YAML could also produce utf-16 or utf-32 encodings, but I could see no way of knowing which one is present. Worst-case you get utf-8 but it still works. Possibly I am missing something about SnakeYAML.

          Show
          Thomas E Enebo added a comment - Fixed in commit 407c88c. The underlying issue is that locale encoding (or file.coding) was being used to construct new strings. I did minor String construction gymnastics to generate UTF-8 strings instead. In theory, YAML could also produce utf-16 or utf-32 encodings, but I could see no way of knowing which one is present. Worst-case you get utf-8 but it still works. Possibly I am missing something about SnakeYAML.
          Hide
          Craig Waterman added a comment -

          Works for me on Windows 7 64-bit, compiled from master. Nokogiri installs, no byte-sequence errors.

          Show
          Craig Waterman added a comment - Works for me on Windows 7 64-bit, compiled from master. Nokogiri installs, no byte-sequence errors.

            People

            • Assignee:
              Thomas E Enebo
              Reporter:
              Andrew Davey
            • Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: