Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.6.7
    • Fix Version/s: JRuby 1.7.0.pre2
    • Component/s: None
    • Labels:
      None
    • Environment:
      jruby 1.7.0.preview1 (ruby-1.9.3-p203) (2012-05-19 00c8c98) (Java HotSpot(TM) Client VM 1.6.0_23) [Windows 7-x86-java]
    • Number of attachments :
      0

      Description

      Hi,

      I'm seeing different default external encodings between JRuby and MRI,
      which is causing me a bit of grief trying to convert some win32 stuff
      to FFI.

      ruby 1.9.3p194 (2012-04-20) [i386-mingw32]

      C:\Users\djberge\Repositories\win32-dir>irb
      irb(main):001:0> Encoding.default_external
      => #<Encoding:IBM437>

      jruby 1.7.0.preview1 (ruby-1.9.3-p203) (2012-05-19 00c8c98) (Java
      HotSpot(TM) Client VM 1.6.0_23) [Windows 7-x86-java]

      C:\Users\djberge\Repositories\win32-dir>jirb
      irb(main):001:0> Encoding.default_external
      => #<Encoding:Windows-1252>

      It seems MRI is using the console encoding by default for strings,
      while JRuby is using the system encoding. Which is correct?

      Park Heesob was kind enough to link me to a couple of related articles:

      http://en.wikipedia.org/wiki/Code_page
      http://blogs.msdn.com/b/michkap/archive/2005/02/08/369197.aspx

      Regards,

      Dan

        Activity

        Hide
        Daniel Berger added a comment - - edited

        Here's an example of some problematic code being caused by this (I think):

        require 'ffi'
        
        class Windows
          extend FFI::Library
          ffi_lib :shell32
        
          attach_function :SHGetFolderPath, :SHGetFolderPathW, [:ulong, :int, :ulong, :ulong, :buffer_out], :ulong
        
          def self.folder
            buf = 0.chr * 1024
            buf.encode!('UTF-16LE')
        
            if SHGetFolderPath(0, 47, 0, 1, buf) != 0
              raise SystemCallError, FFI.errno, "SHGetFolderPath"
            end
        
            buf.encode(Encoding.default_external).strip
          end
        end
        
        p Windows.folder
        
        # MRI
        
        "C:\\Documents and Settings\\All Users\\Start Menu\\Programs\\Administrative Tools"
        
        # JRuby
        
        Encoding::UndefinedConversionError: Input length = 1
          encode at org/jruby/RubyString.java:7446
          folder at folders.rb:17
          (root) at folders.rb:21
        

        With 1.6.7.2 in --1.9 mode I was getting an InvalidByteSequence error, btw.

        Show
        Daniel Berger added a comment - - edited Here's an example of some problematic code being caused by this (I think): require 'ffi' class Windows extend FFI::Library ffi_lib :shell32 attach_function :SHGetFolderPath, :SHGetFolderPathW, [:ulong, :int, :ulong, :ulong, :buffer_out], :ulong def self.folder buf = 0.chr * 1024 buf.encode!('UTF-16LE') if SHGetFolderPath(0, 47, 0, 1, buf) != 0 raise SystemCallError, FFI.errno, "SHGetFolderPath" end buf.encode(Encoding.default_external).strip end end p Windows.folder # MRI "C:\\Documents and Settings\\All Users\\Start Menu\\Programs\\Administrative Tools" # JRuby Encoding::UndefinedConversionError: Input length = 1 encode at org/jruby/RubyString.java:7446 folder at folders.rb:17 (root) at folders.rb:21 With 1.6.7.2 in --1.9 mode I was getting an InvalidByteSequence error, btw.
        Hide
        Daniel Berger added a comment -

        Alright, it would seem that if I force JRuby to use the ffi gem, instead of the ffi that ships as part of JRuby, then the above example works. However, my original point about default string encodings is still out there.

        Show
        Daniel Berger added a comment - Alright, it would seem that if I force JRuby to use the ffi gem, instead of the ffi that ships as part of JRuby, then the above example works. However, my original point about default string encodings is still out there.
        Hide
        ryenus added a comment - - edited

        Java always use Windows system locale.

        See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4153167

        We probably need to use `java.io.Console` (1.6+) via `System.console()`.

        Show
        ryenus added a comment - - edited Java always use Windows system locale. See http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4153167 We probably need to use `java.io.Console` (1.6+) via `System.console()`.
        Hide
        ryenus added a comment -

        yes, things could work with 'java.io.Console'

        see https://github.com/jruby/jruby/pull/186
        and https://gist.github.com/2819224

        Show
        ryenus added a comment - yes, things could work with 'java.io.Console' see https://github.com/jruby/jruby/pull/186 and https://gist.github.com/2819224
        Hide
        Charles Oliver Nutter added a comment -

        Exploring the native parts of Console a bit...

        Console's constructor initializes the charset (cs field) using Charset.forName(encoding()) where encoding() is a native method.

        Windows version:

        JNIEXPORT jstring JNICALL
        Java_java_io_Console_encoding(JNIEnv *env, jclass cls)
        {
            char buf[64];
            int cp = GetConsoleCP();
            if (cp >= 874 && cp <= 950)
                sprintf(buf, "ms%d", cp);
            else
                sprintf(buf, "cp%d", cp);
            return JNU_NewStringPlatform(env, buf);
        }
        

        UNIX version:

        JNIEXPORT jstring JNICALL
        Java_java_io_Console_encoding(JNIEnv *env, jclass cls)
        {
            return NULL;
        }
        

        So basically only Windows has the special behavior here. I am somewhat inclined to limit the Console-wrangling to Windows.

        I will see if we can pull in ryenus's PR with the additional tweak to only do it on Windows.

        Show
        Charles Oliver Nutter added a comment - Exploring the native parts of Console a bit... Console's constructor initializes the charset (cs field) using Charset.forName(encoding()) where encoding() is a native method. Windows version: JNIEXPORT jstring JNICALL Java_java_io_Console_encoding(JNIEnv *env, jclass cls) { char buf[64]; int cp = GetConsoleCP(); if (cp >= 874 && cp <= 950) sprintf(buf, "ms%d", cp); else sprintf(buf, "cp%d", cp); return JNU_NewStringPlatform(env, buf); } UNIX version: JNIEXPORT jstring JNICALL Java_java_io_Console_encoding(JNIEnv *env, jclass cls) { return NULL; } So basically only Windows has the special behavior here. I am somewhat inclined to limit the Console-wrangling to Windows. I will see if we can pull in ryenus's PR with the additional tweak to only do it on Windows.
        Hide
        Charles Oliver Nutter added a comment -

        Ok, I went with ryenus's PR plus the following commit:

        commit 9a0113b61de235286703789840b73b2e6209b031
        Author: Charles Oliver Nutter <headius@headius.com>
        Date:   Tue Jul 3 14:01:49 2012 -0500
        
            Additional tweak for JRUBY-6679
            
            Since the special-cased behavior for code page is only necessary
            on Windows, I limited it to Windows. I also modified the exception
            handling to simply allow the null return, which then falls back on
            normal default Java charset.
        
        :100644 100644 b815a6c... e162de4... M	src/org/jruby/runtime/encoding/EncodingService.java
        
        Show
        Charles Oliver Nutter added a comment - Ok, I went with ryenus's PR plus the following commit: commit 9a0113b61de235286703789840b73b2e6209b031 Author: Charles Oliver Nutter <headius@headius.com> Date: Tue Jul 3 14:01:49 2012 -0500 Additional tweak for JRUBY-6679 Since the special-cased behavior for code page is only necessary on Windows, I limited it to Windows. I also modified the exception handling to simply allow the null return, which then falls back on normal default Java charset. :100644 100644 b815a6c... e162de4... M src/org/jruby/runtime/encoding/EncodingService.java

          People

          • Assignee:
            Charles Oliver Nutter
            Reporter:
            Daniel Berger
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: