Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.7.0.pre1
    • Fix Version/s: JRuby 1.7.0.pre2
    • Component/s: Encoding
    • Labels:
      None
    • Environment:
      OSX
    • Number of attachments :
      0

      Description

      Reading a UTF-16LE file does not return the correct characters:

      Jeffs-MacBook-Pro:~ jeffwilliams$ ruby -v
      ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-darwin11.2.0]
      Jeffs-MacBook-Pro:~ jeffwilliams$ ruby write-utf16le.rb
      Jeffs-MacBook-Pro:~ jeffwilliams$ od -t x1 test1
      0000000 55 00 54 00 46 00 2d 00 31 00 36 00 4c 00 45 00
      0000020 0a 00
      0000022
      Jeffs-MacBook-Pro:~ jeffwilliams$ ruby read-utf16le.rb
      U
      T
      F
      -
      1
      6
      L
      E

      Jeffs-MacBook-Pro:~ jeffwilliams$ ruby -v
      jruby 1.7.0.preview1 (ruby-1.9.3-p203) (2012-05-19 00c8c98) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_05) [darwin-x86_64-java]
      Jeffs-MacBook-Pro:~ jeffwilliams$ ruby read-utf16le.rb
      U

      T

      F

      -

      1

      6

      L

      E

      Jeffs-MacBook-Pro:~ jeffwilliams$

      Where the relevant files are:

      write-utf16le.rb:
      File.open("test1", "wb:UTF-16LE") do |f|
      f.puts "UTF-16LE"
      end

      read-utf16le.rb:
      File.open("test1", "rb:UTF-16LE") do |f|
      f.each_char do |char|
      puts char
      end
      end

        Issue Links

          Activity

          Hide
          Charles Oliver Nutter added a comment -

          Confirmed on master. Also, the write side of this does not seem to properly write UTF-16 to the file. I had to use MRI.

          Show
          Charles Oliver Nutter added a comment - Confirmed on master. Also, the write side of this does not seem to properly write UTF-16 to the file. I had to use MRI.
          Hide
          Charles Oliver Nutter added a comment -

          Marking as a blocker for 1.7.0 final. We need to be able to read UTF-16 properly

          Show
          Charles Oliver Nutter added a comment - Marking as a blocker for 1.7.0 final. We need to be able to read UTF-16 properly
          Hide
          Thomas E Enebo added a comment -

          Looked at this a little bit. It appears our "puts" will literally obj.toString().getByteList(). I see no code path to deal with transcoding as it winds through towards performing a write(). This is also a little funky in that we dyn-dispatch from puts to write. It seems even if we are supposed to do this for compat reasons we can check isBuiltin to avoid the actual dispatch (for another time/bug though).

          I will examine MRI a little bit to see if there is a more systemic mechanism for the transcode. I am also going to dig into the read.

          Show
          Thomas E Enebo added a comment - Looked at this a little bit. It appears our "puts" will literally obj.toString().getByteList(). I see no code path to deal with transcoding as it winds through towards performing a write(). This is also a little funky in that we dyn-dispatch from puts to write. It seems even if we are supposed to do this for compat reasons we can check isBuiltin to avoid the actual dispatch (for another time/bug though). I will examine MRI a little bit to see if there is a more systemic mechanism for the transcode. I am also going to dig into the read.
          Hide
          Charles Oliver Nutter added a comment -

          Information from my examination yesterday...

          JRuby mimics MRI's IO API layout mostly...these writes eventually call into io_fwrite(), which does the actual writing to the stream. In 1.9, the content is passed through do_writeconv(), which uses NEEDS_WRITECONV macro and numerous other checks to determine if transcoding is needed and how it should proceed.

          I attempted to wire in a naive version of this logic that just checks if incoming bytelist encoding != IO external encoding. That fixed the UTF-16 writing issue, but broke other write cases where transcoding should not be happening. We do not appear to have enough flags to indicate whether transcoding should happen, as exist on the MRI IO "fptr" structure.

          Talked to Tom and he's looking into this today.

          Show
          Charles Oliver Nutter added a comment - Information from my examination yesterday... JRuby mimics MRI's IO API layout mostly...these writes eventually call into io_fwrite(), which does the actual writing to the stream. In 1.9, the content is passed through do_writeconv(), which uses NEEDS_WRITECONV macro and numerous other checks to determine if transcoding is needed and how it should proceed. I attempted to wire in a naive version of this logic that just checks if incoming bytelist encoding != IO external encoding. That fixed the UTF-16 writing issue, but broke other write cases where transcoding should not be happening. We do not appear to have enough flags to indicate whether transcoding should happen, as exist on the MRI IO "fptr" structure. Talked to Tom and he's looking into this today.
          Hide
          Thomas E Enebo added a comment -

          Big bomb of commits this morning fixed this and many other transcoding issues.

          Show
          Thomas E Enebo added a comment - Big bomb of commits this morning fixed this and many other transcoding issues.

            People

            • Assignee:
              Thomas E Enebo
              Reporter:
              Jeff Williams
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: