JRuby (please use github issues at http://bugs.jruby.org)
  1. JRuby (please use github issues at http://bugs.jruby.org)
  2. JRUBY-6572

File.open with 'b' options should always 'gets' string with 'ASCII-8BIT' encoding, regardless the value of 'Encoding.default_internal'

    Details

    • Number of attachments :
      2

      Description

      If I set Encoding.default_internal to 'utf-8', I can't read binary from file even I open file with the 'b' option.
      The File.gets always return a string encoding with "Encoding.default_internal".

      1. run_this_reappear.rb
        0.9 kB
        Hao Yuan
      2. zh_cn-utf8.txt
        0.0 kB
        Hao Yuan

        Activity

        Hide
        Hao Yuan added a comment - - edited

        I think this maybe an Encoding problem.
        See the script:

        def to_hex(str)
          s = ""
          str.bytes.each { |b| s += b.to_s(16) }
          return s
        end
        def show
          File.open('zh_cn-utf8.txt', 'rb') do |file|
            line =  file.gets
            puts "line: #{line}"
            puts "!!!to_hex(line): #{to_hex(line)}"
            puts "!!!line.encoding.name: #{line.encoding.name}"
            puts "!!!Encoding.default_internal: #{Encoding.default_internal}"
          end
        end
        

        There is the output(notice the hex, utf-8 encoding for '中' is 'e4b8ada'!!!):

        <<<Encoding.default_external = nil
        line: &#20013;
        {color:red}!!!to_hex(line): e4b8ada
        !!!line.encoding.name: ASCII-8BIT{color}
        !!!Encoding.default_internal:
        ----------------------------
        <<<Encoding.default_external = "utf8"
        line: 中
        {color:red}!!!to_hex(line): c3a4c2b8c2ada
        !!!line.encoding.name: UTF-8{color}
        !!!Encoding.default_internal: UTF-8
        
        Show
        Hao Yuan added a comment - - edited I think this maybe an Encoding problem. See the script: def to_hex(str) s = "" str.bytes.each { |b| s += b.to_s(16) } return s end def show File.open('zh_cn-utf8.txt', 'rb') do |file| line = file.gets puts "line: #{line}" puts "!!!to_hex(line): #{to_hex(line)}" puts "!!!line.encoding.name: #{line.encoding.name}" puts "!!!Encoding.default_internal: #{Encoding.default_internal}" end end There is the output (notice the hex, utf-8 encoding for '中' is 'e4b8ada'!!!) : <<<Encoding.default_external = nil line: &#20013; {color:red}!!!to_hex(line): e4b8ada !!!line.encoding.name: ASCII-8BIT{color} !!!Encoding.default_internal: ---------------------------- <<<Encoding.default_external = "utf8" line: 中 {color:red}!!!to_hex(line): c3a4c2b8c2ada !!!line.encoding.name: UTF-8{color} !!!Encoding.default_internal: UTF-8
        Hide
        Diego Plentz added a comment -

        Reproducible here using jruby 1.6.7(1.9mode) and mri 1.9.3p194. I added a test for this bug here https://github.com/plentz/jruby_report/commit/9920a7e8df19017cd753e4361be47ad0b0369076

        Show
        Diego Plentz added a comment - Reproducible here using jruby 1.6.7(1.9mode) and mri 1.9.3p194. I added a test for this bug here https://github.com/plentz/jruby_report/commit/9920a7e8df19017cd753e4361be47ad0b0369076
        Hide
        Charles Oliver Nutter added a comment -

        Here's my ouptut of the attached script. Definitely seems to be an encoding problem when default_external gets changed?

        system ~/projects/jruby/tmp $ jruby run_this_reappear.rb 
        <<<Encoding.default_external = nil
        line: &#20013;
        !!!to_hex(line): e4b8ada
        !!!line.encoding.name: ASCII-8BIT
        !!!Encoding.default_internal: 
        file.external_encoding: ASCII-8BIT
        file.internal_encoding: 
        Encoding.default_external: UTF-8
        ----------------------------
        <<<Encoding.default_external = "utf8"
        line: 中
        !!!to_hex(line): c3a4c2b8c2ada
        !!!line.encoding.name: UTF-8
        !!!Encoding.default_internal: UTF-8
        file.external_encoding: ASCII-8BIT
        file.internal_encoding: UTF-8
        Encoding.default_external: UTF-8
        
        system ~/projects/jruby/tmp $ ruby1.9.3 run_this_reappear.rb 
        <<<Encoding.default_external = nil
        line: &#20013;
        !!!to_hex(line): e4b8ada
        !!!line.encoding.name: ASCII-8BIT
        !!!Encoding.default_internal: 
        file.external_encoding: ASCII-8BIT
        file.internal_encoding: 
        Encoding.default_external: UTF-8
        ----------------------------
        <<<Encoding.default_external = "utf8"
        line: &#20013;
        !!!to_hex(line): e4b8ada
        !!!line.encoding.name: ASCII-8BIT
        !!!Encoding.default_internal: UTF-8
        file.external_encoding: ASCII-8BIT
        file.internal_encoding: 
        Encoding.default_external: UTF-8
        

        We match when default_external is nil, but appear to honor it when we should not if it is set.

        Show
        Charles Oliver Nutter added a comment - Here's my ouptut of the attached script. Definitely seems to be an encoding problem when default_external gets changed? system ~/projects/jruby/tmp $ jruby run_this_reappear.rb <<<Encoding.default_external = nil line: &#20013; !!!to_hex(line): e4b8ada !!!line.encoding.name: ASCII-8BIT !!!Encoding.default_internal: file.external_encoding: ASCII-8BIT file.internal_encoding: Encoding.default_external: UTF-8 ---------------------------- <<<Encoding.default_external = "utf8" line: 中 !!!to_hex(line): c3a4c2b8c2ada !!!line.encoding.name: UTF-8 !!!Encoding.default_internal: UTF-8 file.external_encoding: ASCII-8BIT file.internal_encoding: UTF-8 Encoding.default_external: UTF-8 system ~/projects/jruby/tmp $ ruby1.9.3 run_this_reappear.rb <<<Encoding.default_external = nil line: &#20013; !!!to_hex(line): e4b8ada !!!line.encoding.name: ASCII-8BIT !!!Encoding.default_internal: file.external_encoding: ASCII-8BIT file.internal_encoding: Encoding.default_external: UTF-8 ---------------------------- <<<Encoding.default_external = "utf8" line: &#20013; !!!to_hex(line): e4b8ada !!!line.encoding.name: ASCII-8BIT !!!Encoding.default_internal: UTF-8 file.external_encoding: ASCII-8BIT file.internal_encoding: Encoding.default_external: UTF-8 We match when default_external is nil, but appear to honor it when we should not if it is set.
        Hide
        Charles Oliver Nutter added a comment -
        commit 8fecee704ce6c7441fa3d907ee09113f44b89977
        Author: Charles Oliver Nutter <headius@headius.com>
        Date:   Tue May 15 23:08:38 2012 -0500
        
            Fix JRUBY-6572 and unexclude a bunch of passing tests.
            
            I mostly just traced through the MRI logic for handling binary
            mode and tried to match up code. There's also a few fixes to make
            some of our ported logic match MRI more exactly.
            
            We have accumulated enough fixes that test_io_m17n has many tests
            working now. I removed them from excludes.
        
        Show
        Charles Oliver Nutter added a comment - commit 8fecee704ce6c7441fa3d907ee09113f44b89977 Author: Charles Oliver Nutter <headius@headius.com> Date: Tue May 15 23:08:38 2012 -0500 Fix JRUBY-6572 and unexclude a bunch of passing tests. I mostly just traced through the MRI logic for handling binary mode and tried to match up code. There's also a few fixes to make some of our ported logic match MRI more exactly. We have accumulated enough fixes that test_io_m17n has many tests working now. I removed them from excludes.

          People

          • Assignee:
            Charles Oliver Nutter
            Reporter:
            Hao Yuan
          • Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: