Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: JRuby 1.1.2
-
Fix Version/s: JRuby 1.1.3
-
Component/s: Interpreter
-
Labels:None
-
Environment:WinXP SP2
-
Number of attachments :
Description
I'm a Chinese user, the language set of my system is Simplified Chinese.
For example, here is a test.rb:
#------------------------------
ARGV.each do |arg|
puts "#
\n"
end
#-----------------------------
And I will use @ to replace any Simplified Chinese Characters(Take the consideration of your non-Chinese Support system).
Now test.rb is under D:@@\
The command is : jruby.bat D:@@\test.rb arg1 @@@ arg3
Then jruby told me that she can't find D:??\test.rb. As you see, jruby use the same amouts of ? to replace Chinese Characters which results to an error.
Well, I move test.rb to D:\test\ , then execute : jruby.bat D:\test\test.rb arg1 @@@ arg3 .
Now the output is :
arg1
???
arg3
It seems that this bug exists in every case involed command line arguments.
Yesterday I downloaded the source code and find something maybe useful(Now I'm using another machine, I can only write them down according to my memory):
In org\jruby\utils\RubyFile.java , there is a createfile method. After "filepath=new String(filepath.getBytes("ISO-8859-1"),"UTF-8")", the Chinese Characters in the String "filepath" are replaced with '?'. I don't understand the meaning of this line of code, but I know the "ISO-8859-1" is the main encoding method of English speaking countries, when there is a Chinese character in "filepath", "getBytes()" can't recognize it within ISO-5589-1, and then replace it with '?' in the return bytes.
Then I commented this line, and it worked! Jruby can process .rb files whose path including Chinese characters. But maybe this line of code is of use somewhere else, which I hope you can tell me.
Now the problem is only the arguments.
I traced it from org\jruby\RubyGlobal.java to the build_lib\ByteList.jar which I can't find source code(why?), now I'm sure the problem lies in Bytelist.jar, and it should be something like "getBytes("ISO-8859-1")".
In org\jruby\RubyGlobal.java, there's a RubyArray called argvArray[], the program adds every arguments to it, then define it as "ARGV". Just after "argvArray.add(runtime.newString(argv[i]))", the Chinese characters in the argv[i] is replaced with '?'. When I start with runtime.newString finally I got nothing because I only traced it to ByteList.jar.
Then I replace that line with simply "argvArray.add(argv[i])". But in my WinXP SP2 it still didn't work(useing the same example, the output text is arg1\n <TextOfAMess>\n arg3\n), but things is getting better because there is no longer '?'s. Later I restart my computer to enter my Linux, whose locale is UTF-8, what surprised me is that everything became OK. Thinking of my WinXp's locale is GB2312(Simplified Chinese), maybe jruby had converted the arguments in the ARGV to UTF-8, and the problem of XP is XP just can't display them normally.
But if I use the arguments as a path, more complex problem occurs.
For example: jruby.bat test2.rb "D:\test@@\readme.txt"
And I tried to read the contents of "readme.txt" in the test2.rb, even in Linux, jruby would told my can't find D:\test\<TextOfAMess>\readme.txt.
Then I have to stopped because things was worse than I had expected.
My English is poor, and I hope you can get want I'm writing about. Thanks for reading, thanks for your help.
Very helpful, thank you! I think we should modify that filePath line for certain to use whatever the host encoding is. As for the arguments, I think the only issue is that when feeding them into ARGV we need to make sure they're encoded from host encoding to UTF-8, rather than assuming they're coming in as ISO-8859-1 as on western systems.
This shouldn't be hard to fix, so I'm marking it for 1.1.3 and we'll try to get it in soon. Thank you for the report!