JRuby (please use github issues at http://bugs.jruby.org)
  1. JRuby (please use github issues at http://bugs.jruby.org)
  2. JRUBY-3053

Fix all cases in RubyFile where byte[]-based strings are being passed to Java file APIs as-is

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Incomplete
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Core Classes/Modules
    • Labels:
      None
    • Number of attachments :
      3

      Description

      As part of JRUBY-3050, a number of unfixed cases in RubyFile were marked with "UNICODE?". These cases are passing the byte[]-based string to Java File APIs without decoding them, which generally causes unicode filenames to be broken.

      We need to fix all such cases, probably similar to the fixes for JRUBY-3050.

      1. begin_fixing_encoding_for_filenames.patch
        47 kB
        Charles Oliver Nutter
      2. patch_3053_r7980.diff
        20 kB
        Christian Seiler
      3. patch_3053.diff
        27 kB
        Christian Seiler

        Activity

        Hide
        Charles Oliver Nutter added a comment -

        I went through all cases and they all appeared to be perfectly valid places we needed to use getUnicodeValue. So I changed them all, and all tests run. Hopefully Christian Seiler can whip together a set of tests for us to include, or we could consider reworking all our existing File-related tests to work against files with unicode characters in them.

        Show
        Charles Oliver Nutter added a comment - I went through all cases and they all appeared to be perfectly valid places we needed to use getUnicodeValue. So I changed them all, and all tests run. Hopefully Christian Seiler can whip together a set of tests for us to include, or we could consider reworking all our existing File-related tests to work against files with unicode characters in them.
        Hide
        Charles Oliver Nutter added a comment -

        Committed fixes in r7951.

        Show
        Charles Oliver Nutter added a comment - Committed fixes in r7951.
        Hide
        Christian Seiler added a comment -

        Ah, so I'm littile late... Attached all fixes. There were a few changes in RubyDir.java, too. Most work went into the test-cases, which should be pretty comprehensive now and increase general test-coverage a littile bit.

        I added two tests which still fail:

        1) Failure:
        test_dirname_extname_unicode(TestFile) [test/test_file.rb:463]:
        <".xml\303\274"> expected but was
        <".xml\374">.

        2) Failure:
        test_openfile_path(TestFile) [test/test_file.rb:448]:
        <"build.xml\303\274"> expected but was
        <"build.xml\374">.

        In these cases probably the coercion back to UTF-8 doesn't happen. I didn't investigate any further for now. You may simply outcomment them (since they are new tests and just indicate an existing bug).

        And there is a third, new test case which fails. I wanted to test the readlink method, which doesn't work on my machine at all. Mabye I don't get what this method is about..

        3) Error:
        test_readlink(TestFile):
        Errno::EINVAL: Invalid argument - invalid argument - build.xml
        test/test_file.rb:666:in `test_readlink'

        As said before these are new test cases, you can safely outcomment them for now. One more thing to mention: I did the tests on Ubuntu only. Hopefully there are no bad surprises on ther platforms.

        Show
        Christian Seiler added a comment - Ah, so I'm littile late... Attached all fixes. There were a few changes in RubyDir.java, too. Most work went into the test-cases, which should be pretty comprehensive now and increase general test-coverage a littile bit. I added two tests which still fail: 1) Failure: test_dirname_extname_unicode(TestFile) [test/test_file.rb:463] : <".xml\303\274"> expected but was <".xml\374">. 2) Failure: test_openfile_path(TestFile) [test/test_file.rb:448] : <"build.xml\303\274"> expected but was <"build.xml\374">. In these cases probably the coercion back to UTF-8 doesn't happen. I didn't investigate any further for now. You may simply outcomment them (since they are new tests and just indicate an existing bug). And there is a third, new test case which fails. I wanted to test the readlink method, which doesn't work on my machine at all. Mabye I don't get what this method is about.. 3) Error: test_readlink(TestFile): Errno::EINVAL: Invalid argument - invalid argument - build.xml test/test_file.rb:666:in `test_readlink' As said before these are new test cases, you can safely outcomment them for now. One more thing to mention: I did the tests on Ubuntu only. Hopefully there are no bad surprises on ther platforms.
        Hide
        Charles Oliver Nutter added a comment -

        Reopening to examine the additional cases. I tried to apply the patch as-is, but it failed many cases.

        Show
        Charles Oliver Nutter added a comment - Reopening to examine the additional cases. I tried to apply the patch as-is, but it failed many cases.
        Hide
        Christian Seiler added a comment -

        The current state (as of r7980) is fine for me (attachment_fu), so no need to urge that in 1.1.5. I will attach the new diffs against r7980 since your changes in r7951 created a bunch of conflicts (no big deal, just what you expect when two people edit the same lines).

        Of course I'm curious about the failing tests? Are there more than the ones mentioned above?

        Show
        Christian Seiler added a comment - The current state (as of r7980) is fine for me (attachment_fu), so no need to urge that in 1.1.5. I will attach the new diffs against r7980 since your changes in r7951 created a bunch of conflicts (no big deal, just what you expect when two people edit the same lines). Of course I'm curious about the failing tests? Are there more than the ones mentioned above?
        Hide
        Christian Seiler added a comment -

        Here are my diffs against r7980. One more thing: test_dir.rb reports an error, but it has been doing that since I tried the first time without any changes from my side:

        .Exception in thread "main" java.lang.NoClassDefFoundError: org/jruby/util/Pwd
        Caused by: java.lang.ClassNotFoundException: org.jruby.util.Pwd
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
        E............
        Finished in 18.366867 seconds.

        1) Error:
        test_chdir_and_pwd(TestDir):
        NoMethodError: undefined method `strip' for nil:NilClass
        test/test_dir.rb:125:in `test_chdir_and_pwd'
        test/test_dir.rb:114:in `chdir'
        test/test_dir.rb:114:in `test_chdir_and_pwd'

        Obvisouly there is missing some lib or something

        Show
        Christian Seiler added a comment - Here are my diffs against r7980. One more thing: test_dir.rb reports an error, but it has been doing that since I tried the first time without any changes from my side: .Exception in thread "main" java.lang.NoClassDefFoundError: org/jruby/util/Pwd Caused by: java.lang.ClassNotFoundException: org.jruby.util.Pwd at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) E............ Finished in 18.366867 seconds. 1) Error: test_chdir_and_pwd(TestDir): NoMethodError: undefined method `strip' for nil:NilClass test/test_dir.rb:125:in `test_chdir_and_pwd' test/test_dir.rb:114:in `chdir' test/test_dir.rb:114:in `test_chdir_and_pwd' Obvisouly there is missing some lib or something
        Hide
        Charles Oliver Nutter added a comment -

        Ah-ha, I think I found where my problem was. When native functions are available, we use the native system's mkdir (for example) to be able to chmod at the same time. Unfortunately it appears that the native layer we're using (jna-posix) may not be converting the string back into the appropriate encoding for the platform. So the dir names come out garbled and are usually impossible to create.

        If I run test_dir with native functions disabled, I get only one failure in test_dir. However, test_file still seems to have a bunch of failures.

        At this point, I think we have too much to fix for 1.1.5, since this is now getting into jna-posix. I'm going to draw that to Tom Enebo's attention, but I think we need to take a step back and make sure encodings are being handled correctly through the full pipeline.

        Are there any fixes that you think are critical to get into 1.1.5?

        FYI, here's the errors I get on both test_dir and test_file, with and without native enabled:

        &#9670; jruby test/test_file.rb 
        Loaded suite test/test_file
        Started
        .E.F.E....................E..F.E..........E..E.FFE.......
        Finished in 2.630624 seconds.
        
          1) Error:
        test_chmod(TestFile):
        Errno::ENOENT: No such file or directory - No such file or directory - /Users/headius/projects/jruby/build.xml
            test/test_file.rb:656:in `test_chmod'
        
          2) Failure:
        test_chown_unicode(TestFile) [test/test_file.rb:872]:
        <1> expected but was
        <0>.
        
          3) Error:
        test_dirname_extname_unicode(TestFile):
        Errno::ENOENT: No such file or directory -  No such file or directory - "./testDir_1ü/build.xmlü"
            test/test_file.rb:469:in `test_dirname_extname_unicode'
        
          4) Error:
        test_file_symlink(TestFile):
        Errno::EEXIST: File exists - File exists - build.xml or build.xml.linkü
            test/test_file.rb:675:in `test_file_symlink'
        
          5) Failure:
        test_file_times(TestFile) [test/test_file.rb:693]:
        Exception raised:
        Class: <Errno::ENOENT>
        Message: <"No such file or directory - No such file or directory - /Users/headius/projects/jruby/build.xml\303\274">
        ---Backtrace---
        test/test_file.rb:695:in `test_file_times'
        test/test_file.rb:693:in `test_file_times'
        ---------------
        
          6) Error:
        test_file_truncated_after_changing_directory(TestFile):
        Errno::ENOENT: No such file or directory - No such file or directory - ./testDir_1
            test/test_file.rb:510:in `test_file_truncated_after_changing_directory'
        
          7) Error:
        test_ftype_unicode(TestFile):
        Errno::ENOENT: No such file or directory - No such file or directory - /Users/headius/projects/jruby/build.xml
            test/test_file.rb:477:in `test_ftype_unicode'
        
          8) Error:
        test_mkdir(TestFile):
        Errno::ENOENT: No such file or directory - No such file or directory - dir_tmp
            test/test_file.rb:379:in `test_mkdir'
        
          9) Failure:
        test_openfile_path(TestFile) [test/test_file.rb:448]:
        <"build.xml\303\274"> expected but was
        <"build.xml\374">.
        
         10) Failure:
        test_opening_readonly_file_for_write_raises_eacces(TestFile) [test/test_file.rb:862]:
        <Errno::EACCES> exception expected but none was thrown.
        
         11) Error:
        test_readlink(TestFile):
        Errno::EINVAL: Invalid argument - invalid argument - build.xml
            test/test_file.rb:666:in `test_readlink'
        
        57 tests, 260 assertions, 4 failures, 7 errors
        [headius @ cnutter:~/projects/jruby]
        &#9670; jruby -J-Djruby.native.enabled=false test/test_dir.rb 
        Loaded suite test/test_dir
        Started
        .F............
        Finished in 2.981045 seconds.
        
          1) Failure:
        test_chdir_and_pwd(TestDir)
            [test/test_dir.rb:125:in `test_chdir_and_pwd'
             test/test_dir.rb:114:in `chdir'
             test/test_dir.rb:114:in `test_chdir_and_pwd']:
        <"testDir\303\234_4"> expected but was
        <"testDirU?_4">.
        
        14 tests, 25 assertions, 1 failures, 0 errors
        [headius @ cnutter:~/projects/jruby]
        &#9670; jruby -J-Djruby.native.enabled=false test/test_file.rb 
        Loaded suite test/test_file
        Started
        .FFF.F.........................................F.E......F
        Finished in 3.093983 seconds.
        
          1) Failure:
        test_chmod(TestFile) [test/test_file.rb:658]:
        <33206> expected to be != to
        <33206>.
        
          2) Failure:
        test_chown_accepts_nil_and_minus_one(TestFile) [test/test_file.rb:877]:
        <1> expected but was
        <0>.
        
          3) Failure:
        test_chown_unicode(TestFile) [test/test_file.rb:872]:
        <1> expected but was
        <0>.
        
          4) Failure:
        test_dirname_extname_unicode(TestFile) [test/test_file.rb:463]:
        <".xml\303\274"> expected but was
        <".xml\374">.
        
          5) Failure:
        test_openfile_path(TestFile) [test/test_file.rb:448]:
        <"build.xml\303\274"> expected but was
        <"build.xml\374">.
        
          6) Error:
        test_readlink(TestFile):
        Errno::EINVAL: Invalid argument - invalid argument - build.xml
            test/test_file.rb:666:in `test_readlink'
        
          7) Failure:
        test_umask_noarg_does_not_zero(TestFile) [test/test_file.rb:908]:
        <128> expected but was
        <0>.
        
        57 tests, 267 assertions, 6 failures, 1 errors
        
        Show
        Charles Oliver Nutter added a comment - Ah-ha, I think I found where my problem was. When native functions are available, we use the native system's mkdir (for example) to be able to chmod at the same time. Unfortunately it appears that the native layer we're using (jna-posix) may not be converting the string back into the appropriate encoding for the platform. So the dir names come out garbled and are usually impossible to create. If I run test_dir with native functions disabled, I get only one failure in test_dir. However, test_file still seems to have a bunch of failures. At this point, I think we have too much to fix for 1.1.5, since this is now getting into jna-posix. I'm going to draw that to Tom Enebo's attention, but I think we need to take a step back and make sure encodings are being handled correctly through the full pipeline. Are there any fixes that you think are critical to get into 1.1.5? FYI, here's the errors I get on both test_dir and test_file, with and without native enabled: &#9670; jruby test/test_file.rb Loaded suite test/test_file Started .E.F.E....................E..F.E..........E..E.FFE....... Finished in 2.630624 seconds. 1) Error: test_chmod(TestFile): Errno::ENOENT: No such file or directory - No such file or directory - /Users/headius/projects/jruby/build.xml test/test_file.rb:656:in `test_chmod' 2) Failure: test_chown_unicode(TestFile) [test/test_file.rb:872]: <1> expected but was <0>. 3) Error: test_dirname_extname_unicode(TestFile): Errno::ENOENT: No such file or directory - No such file or directory - "./testDir_1ü/build.xmlü" test/test_file.rb:469:in `test_dirname_extname_unicode' 4) Error: test_file_symlink(TestFile): Errno::EEXIST: File exists - File exists - build.xml or build.xml.linkü test/test_file.rb:675:in `test_file_symlink' 5) Failure: test_file_times(TestFile) [test/test_file.rb:693]: Exception raised: Class: <Errno::ENOENT> Message: <"No such file or directory - No such file or directory - /Users/headius/projects/jruby/build.xml\303\274"> ---Backtrace--- test/test_file.rb:695:in `test_file_times' test/test_file.rb:693:in `test_file_times' --------------- 6) Error: test_file_truncated_after_changing_directory(TestFile): Errno::ENOENT: No such file or directory - No such file or directory - ./testDir_1 test/test_file.rb:510:in `test_file_truncated_after_changing_directory' 7) Error: test_ftype_unicode(TestFile): Errno::ENOENT: No such file or directory - No such file or directory - /Users/headius/projects/jruby/build.xml test/test_file.rb:477:in `test_ftype_unicode' 8) Error: test_mkdir(TestFile): Errno::ENOENT: No such file or directory - No such file or directory - dir_tmp test/test_file.rb:379:in `test_mkdir' 9) Failure: test_openfile_path(TestFile) [test/test_file.rb:448]: <"build.xml\303\274"> expected but was <"build.xml\374">. 10) Failure: test_opening_readonly_file_for_write_raises_eacces(TestFile) [test/test_file.rb:862]: <Errno::EACCES> exception expected but none was thrown. 11) Error: test_readlink(TestFile): Errno::EINVAL: Invalid argument - invalid argument - build.xml test/test_file.rb:666:in `test_readlink' 57 tests, 260 assertions, 4 failures, 7 errors [headius @ cnutter:~/projects/jruby] &#9670; jruby -J-Djruby.native.enabled=false test/test_dir.rb Loaded suite test/test_dir Started .F............ Finished in 2.981045 seconds. 1) Failure: test_chdir_and_pwd(TestDir) [test/test_dir.rb:125:in `test_chdir_and_pwd' test/test_dir.rb:114:in `chdir' test/test_dir.rb:114:in `test_chdir_and_pwd']: <"testDir\303\234_4"> expected but was <"testDirU?_4">. 14 tests, 25 assertions, 1 failures, 0 errors [headius @ cnutter:~/projects/jruby] &#9670; jruby -J-Djruby.native.enabled=false test/test_file.rb Loaded suite test/test_file Started .FFF.F.........................................F.E......F Finished in 3.093983 seconds. 1) Failure: test_chmod(TestFile) [test/test_file.rb:658]: <33206> expected to be != to <33206>. 2) Failure: test_chown_accepts_nil_and_minus_one(TestFile) [test/test_file.rb:877]: <1> expected but was <0>. 3) Failure: test_chown_unicode(TestFile) [test/test_file.rb:872]: <1> expected but was <0>. 4) Failure: test_dirname_extname_unicode(TestFile) [test/test_file.rb:463]: <".xml\303\274"> expected but was <".xml\374">. 5) Failure: test_openfile_path(TestFile) [test/test_file.rb:448]: <"build.xml\303\274"> expected but was <"build.xml\374">. 6) Error: test_readlink(TestFile): Errno::EINVAL: Invalid argument - invalid argument - build.xml test/test_file.rb:666:in `test_readlink' 7) Failure: test_umask_noarg_does_not_zero(TestFile) [test/test_file.rb:908]: <128> expected but was <0>. 57 tests, 267 assertions, 6 failures, 1 errors
        Hide
        Charles Oliver Nutter added a comment -

        BTW, what platform are you running this on? It's odd that you wouldn't have native support, so it must be a more obscure setup.

        Show
        Charles Oliver Nutter added a comment - BTW, what platform are you running this on? It's odd that you wouldn't have native support, so it must be a more obscure setup.
        Hide
        Christian Seiler added a comment -

        It's Ubuntu 8.04. Maybe my builds are broken in regards to JNA? I'll pick up a dist from the CI server and check out if r7951 changes work with that build, too (i.e. my issues with attachment_fu are fixed). If this works, I don't have any need to push further changes into 1.1.5.

        BTW errors 4), 5) and 6) in your second test (without JNA) happen on my system too (as I've written above). 4) and 5) expose yet unfixed bugs, I wasn't quite sure how to fix them for now (likely some missing coercion of UTF-16 back to UTF-8 or something like this). These are simply new test-cases from my side (and can be deactivated for 1.1.5).

        6) is calso caused by a new test-case. I simply wanted to have the readlink method tested, but for some reason the test fails. Maybe I don't get what readlink is about...

        Show
        Christian Seiler added a comment - It's Ubuntu 8.04. Maybe my builds are broken in regards to JNA? I'll pick up a dist from the CI server and check out if r7951 changes work with that build, too (i.e. my issues with attachment_fu are fixed). If this works, I don't have any need to push further changes into 1.1.5. BTW errors 4), 5) and 6) in your second test (without JNA) happen on my system too (as I've written above). 4) and 5) expose yet unfixed bugs, I wasn't quite sure how to fix them for now (likely some missing coercion of UTF-16 back to UTF-8 or something like this). These are simply new test-cases from my side (and can be deactivated for 1.1.5). 6) is calso caused by a new test-case. I simply wanted to have the readlink method tested, but for some reason the test fails. Maybe I don't get what readlink is about...
        Hide
        Christian Seiler added a comment -

        Update: It seems my builds work fine. Running test_file.rb without JNA I get exactly the same results/errors as you (I explained errors 4-6 in my previous comment, the other errors simply happen because of missing functionality when JNA is disabled).

        My theory: The native / POSIX layer seems to work fine with my setup unicode-wise, and doesn't with your's (MacOS I assume). How does MRI behave? Maybe you can try:

        ruby test/test_file.rb --name test_dir_enumerable

        Works with MRI on my machine.

        Show
        Christian Seiler added a comment - Update: It seems my builds work fine. Running test_file.rb without JNA I get exactly the same results/errors as you (I explained errors 4-6 in my previous comment, the other errors simply happen because of missing functionality when JNA is disabled). My theory: The native / POSIX layer seems to work fine with my setup unicode-wise, and doesn't with your's (MacOS I assume). How does MRI behave? Maybe you can try: ruby test/test_file.rb --name test_dir_enumerable Works with MRI on my machine.
        Hide
        Charles Oliver Nutter added a comment -

        That must be the problem here then...after applying your patch, MRI can't run test_file cleanly either. So I suspect we have a case here where just translating to and from UTF-8 is no longer the right path.

        I'm going to play with it a bit and see if I can track down what's going on.

        Show
        Charles Oliver Nutter added a comment - That must be the problem here then...after applying your patch, MRI can't run test_file cleanly either. So I suspect we have a case here where just translating to and from UTF-8 is no longer the right path. I'm going to play with it a bit and see if I can track down what's going on.
        Hide
        Christian Seiler added a comment -

        Yep, RubyFile#getUnicodeValue simply assumes the incoming strings to be encoded as UTF-8, which is probably not the case in alle environments. The "locale" command shows that my Ubuntu box uses UTF-8.

        There should be a way to find out the appropriate encoding.!?

        BTW you can easily play with the special character, it's defined as constant at the top of the test classes. As an easy "fix" to make tests succeeding for now you could simply use a trivial character.

        Show
        Christian Seiler added a comment - Yep, RubyFile#getUnicodeValue simply assumes the incoming strings to be encoded as UTF-8, which is probably not the case in alle environments. The "locale" command shows that my Ubuntu box uses UTF-8. There should be a way to find out the appropriate encoding.!? BTW you can easily play with the special character, it's defined as constant at the top of the test classes. As an easy "fix" to make tests succeeding for now you could simply use a trivial character.
        Hide
        Charles Oliver Nutter added a comment -

        It seems like this is some problem on OS X. When I run with native enabled, I'm getting a ton of errors, almost all cases where it calls into our POSIX layer. I think we're going to have to punt on this and see if we can figure out what's wrong in the POSIX layer or in JNA.

        The changes are all good, but seem to have exposed a problem in the POSIX layer. Sorry that we won't be able to get these additional changes into 1.1.5.

        Show
        Charles Oliver Nutter added a comment - It seems like this is some problem on OS X. When I run with native enabled, I'm getting a ton of errors, almost all cases where it calls into our POSIX layer. I think we're going to have to punt on this and see if we can figure out what's wrong in the POSIX layer or in JNA. The changes are all good, but seem to have exposed a problem in the POSIX layer. Sorry that we won't be able to get these additional changes into 1.1.5.
        Hide
        Charles Oliver Nutter added a comment -

        WIll examine this and the possibility of making getUnicodeValue or a new method that use the default platform encoding. That should fix remaining issues.

        Show
        Charles Oliver Nutter added a comment - WIll examine this and the possibility of making getUnicodeValue or a new method that use the default platform encoding. That should fix remaining issues.
        Hide
        Charles Oliver Nutter added a comment -

        I made another valiant attempt at this. It seems to be a very tricky problem to solve.

        My effort is attached. Largely it moves from assuming unicode to using platform-default encoding for everything. Unfortunately this seemed to help some things and failed to help others. It worked well when the loaded file was encoded in the same encoding, but it was extremely easy for it to get messed up, since both NetBeans and TextEdit chose some other encoding. There also seems to be some trouble between Java APIs and JNA-based APIs...one seemed to like the normal "encode" method's output when the other did not.

        I think this is a larger issue than simply file APIs. Our string transcoding logic, throughout JRuby, works in many different ways. Sometimes we use platform default, sometimes UTF-8, sometimes "plain" or ISO-8859-1. There's very little rhyme or reason to any of it.

        I believe what we need here is to make some real decisions about encoding and look at each place in JRuby where we transcode to decide how to support those decisions. For example:

        1. What encoding do we assume files are in if no -K is specified? Platform default? UTF-8? There's no one right answer, so we need one that's reasonably consistent and easy to follow. Currently we appear to assume ISO-8859-1, which does not always handle extended characters properly. Platform default on Mac would be MacRoman, but we have no jcodings support for that or other platform-specific encodings.
        2. We must decode strings from source consistently using the encoding we decide on for #1. This means that in 1.8.6 mode we should always represent strings in the same encoding, and always decode them as that encoding. In 1.9 mode, our default encoding should be whatever we choose in #1, and otherwise we use the specified encoding for a file or individual string.
        3. When calling Java APIs, our properly-decoded strings should work fine...this appeared to be the case on every instance I worked on last night.
        4. When calling native APIs, we need to make sure that JNA/JFFI are properly encoding the string on the way out to whatever the platform API expects. In this case, it was unclear whether that should be UTF-8 or the extended ISO-8859-1 variant that TextEdit saves as.

        I leave this comment here because filesystem interaction is probably the most common place where we see extended character problems.

        Show
        Charles Oliver Nutter added a comment - I made another valiant attempt at this. It seems to be a very tricky problem to solve. My effort is attached. Largely it moves from assuming unicode to using platform-default encoding for everything. Unfortunately this seemed to help some things and failed to help others. It worked well when the loaded file was encoded in the same encoding, but it was extremely easy for it to get messed up, since both NetBeans and TextEdit chose some other encoding. There also seems to be some trouble between Java APIs and JNA-based APIs...one seemed to like the normal "encode" method's output when the other did not. I think this is a larger issue than simply file APIs. Our string transcoding logic, throughout JRuby, works in many different ways. Sometimes we use platform default, sometimes UTF-8, sometimes "plain" or ISO-8859-1. There's very little rhyme or reason to any of it. I believe what we need here is to make some real decisions about encoding and look at each place in JRuby where we transcode to decide how to support those decisions. For example: What encoding do we assume files are in if no -K is specified? Platform default? UTF-8? There's no one right answer, so we need one that's reasonably consistent and easy to follow. Currently we appear to assume ISO-8859-1, which does not always handle extended characters properly. Platform default on Mac would be MacRoman, but we have no jcodings support for that or other platform-specific encodings. We must decode strings from source consistently using the encoding we decide on for #1. This means that in 1.8.6 mode we should always represent strings in the same encoding, and always decode them as that encoding. In 1.9 mode, our default encoding should be whatever we choose in #1, and otherwise we use the specified encoding for a file or individual string. When calling Java APIs, our properly-decoded strings should work fine...this appeared to be the case on every instance I worked on last night. When calling native APIs, we need to make sure that JNA/JFFI are properly encoding the string on the way out to whatever the platform API expects. In this case, it was unclear whether that should be UTF-8 or the extended ISO-8859-1 variant that TextEdit saves as. I leave this comment here because filesystem interaction is probably the most common place where we see extended character problems.
        Hide
        Charles Oliver Nutter added a comment -

        I'm afraid this is still a bigger job than we can do right now. I think along with continued encoding work in 1.3 we'll be forcing the issue of what encoding strings are "actually" in, and along with this we'll start transcoding everything the right way. This bug will serve as the marker that we need to propagate encoding-related fixes/cleanup to do the right thing when calling Java APIs.

        Show
        Charles Oliver Nutter added a comment - I'm afraid this is still a bigger job than we can do right now. I think along with continued encoding work in 1.3 we'll be forcing the issue of what encoding strings are "actually" in, and along with this we'll start transcoding everything the right way. This bug will serve as the marker that we need to propagate encoding-related fixes/cleanup to do the right thing when calling Java APIs.
        Hide
        Charles Oliver Nutter added a comment -

        Punt again, and I feel bad for doing so (I really do).

        Show
        Charles Oliver Nutter added a comment - Punt again, and I feel bad for doing so (I really do).
        Hide
        Thomas E Enebo added a comment -

        Another punt unfortunately...

        Show
        Thomas E Enebo added a comment - Another punt unfortunately...
        Hide
        Charles Oliver Nutter added a comment -

        Most of these had to be fixed to support 1.9 encoding stuff, and any remaining issues should be or will be reported as individual issues.

        Show
        Charles Oliver Nutter added a comment - Most of these had to be fixed to support 1.9 encoding stuff, and any remaining issues should be or will be reported as individual issues.

          People

          • Assignee:
            Charles Oliver Nutter
            Reporter:
            Charles Oliver Nutter
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: