JRuby

Iconv character set option //translit is not supported

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: JRuby 0.9.8
  • Fix Version/s: JRuby 1.1RC2
  • Component/s: Core Classes/Modules
  • Labels:
    None
  • Environment:
    OS X, 10.4.9 Intel core 2 duo (macbook pro)
  • Number of attachments :
    0

Description

Trying to convert from 'ascii/ignore//translit' to some other character set, Iconv under JRuby doesn't seem to recognize the translit (allow transliteration) and ignore (ignore conversion errors) options (appended to the character type string as demonstrated). Raises java.nio.charset.IllegalCharsetNameException. See stack trace:

java.nio.charset.IllegalCharsetNameException: ascii//ignore//translit

Charset.java:285:in `java.nio.charset.Charset.checkName'
Charset.java:459:in `java.nio.charset.Charset.lookup2'
Charset.java:438:in `java.nio.charset.Charset.lookup'
Charset.java:497:in `java.nio.charset.Charset.forName'
Charset.java:285:in `java.nio.charset.Charset.checkName'
Charset.java:459:in `java.nio.charset.Charset.lookup2'
Charset.java:438:in `java.nio.charset.Charset.lookup'
Charset.java:497:in `java.nio.charset.Charset.forName'
Charset.java:285:in `java.nio.charset.Charset.checkName'
Charset.java:459:in `java.nio.charset.Charset.lookup2'
Charset.java:438:in `java.nio.charset.Charset.lookup'
Charset.java:497:in `java.nio.charset.Charset.forName'
Charset.java:285:in `java.nio.charset.Charset.checkName'
Charset.java:459:in `java.nio.charset.Charset.lookup2'
Charset.java:438:in `java.nio.charset.Charset.lookup'
Charset.java:497:in `java.nio.charset.Charset.forName'
Charset.java:285:in `java.nio.charset.Charset.checkName'
Charset.java:459:in `java.nio.charset.Charset.lookup2'
Charset.java:438:in `java.nio.charset.Charset.lookup'
Charset.java:497:in `java.nio.charset.Charset.forName'
app/models/question.rb:4:in `call'
app/controllers/questions_controller.rb:58:in `transaction'
app/controllers/questions_controller.rb:58:in `create'
app/controllers/questions_controller.rb:60:in `create'

Same iconv method call works fine under normal Ruby 1.8.5 VM.

Activity

Hide
Charles Oliver Nutter added a comment -

The problem here is that we aren't even handling this formatted input string for encodings. This is how iconv specifies appropriate settings for the conversion, much like Java's charset stuff does. We just need to find all the options, handle the parsing, and set up charset appropriately.

Finish for 1.0...shouldn't be a huge deal now that we know about it, but any information on the various iconv settings would speed the process along.

Show
Charles Oliver Nutter added a comment - The problem here is that we aren't even handling this formatted input string for encodings. This is how iconv specifies appropriate settings for the conversion, much like Java's charset stuff does. We just need to find all the options, handle the parsing, and set up charset appropriately. Finish for 1.0...shouldn't be a huge deal now that we know about it, but any information on the various iconv settings would speed the process along.
Hide
Thomas E Enebo added a comment -

Bumping (to be done with other iconv issues)

Show
Thomas E Enebo added a comment - Bumping (to be done with other iconv issues)
Hide
Thomas E Enebo added a comment -

soichiro ohba added support for ignore. Transliteration appears to be a tougher problem. Marking as 1.x feature since we will not get this done before final 1.0 release (unless someone knows an easy way to add transliteration)

Show
Thomas E Enebo added a comment - soichiro ohba added support for ignore. Transliteration appears to be a tougher problem. Marking as 1.x feature since we will not get this done before final 1.0 release (unless someone knows an easy way to add transliteration)
Hide
Charles Oliver Nutter added a comment -

Revisit for 1.1. Maybe not fixed, but we should give it a go.

Show
Charles Oliver Nutter added a comment - Revisit for 1.1. Maybe not fixed, but we should give it a go.
Hide
Thomas E Enebo added a comment -

Unless we can find an outside library which does tranliteration I don't think we can do this since Java does not support it.

Show
Thomas E Enebo added a comment - Unless we can find an outside library which does tranliteration I don't think we can do this since Java does not support it.
Hide
Charles Oliver Nutter added a comment -

Is any additional work needed to make the iconv string at least not blow up? Perhaps we could ignore the translit part or display a warning and continue.

Show
Charles Oliver Nutter added a comment - Is any additional work needed to make the iconv string at least not blow up? Perhaps we could ignore the translit part or display a warning and continue.
Hide
Koichiro Ohba added a comment -

How about the use of ICU4J?
http://www.icu-project.org/index.html

Show
Koichiro Ohba added a comment - How about the use of ICU4J? http://www.icu-project.org/index.html
Hide
Charles Oliver Nutter added a comment -

Very interesting Koichiro...I did not know this project existed. Would it provide the missing functionality we need? Is there any down side to using it instead of the Java Charset classes?

Show
Charles Oliver Nutter added a comment - Very interesting Koichiro...I did not know this project existed. Would it provide the missing functionality we need? Is there any down side to using it instead of the Java Charset classes?
Hide
Koichiro Ohba added a comment -

Yes, ICU4J would provide the translit functionality to iconv on JRuby.
(For further information, see http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/Transliterator.html )

But, there is a down side actually. It's too huge. (icu4j.jar<4.4MB>, icu4j-charsets.jar<2.4MB>).
It might be overdoing to include such a huge library just for the translit fanctionality, which can be said rarely-used.
I think we can just avoid throwing fatal errors and leave that missing.

Show
Koichiro Ohba added a comment - Yes, ICU4J would provide the translit functionality to iconv on JRuby. (For further information, see http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/Transliterator.html ) But, there is a down side actually. It's too huge. (icu4j.jar<4.4MB>, icu4j-charsets.jar<2.4MB>). It might be overdoing to include such a huge library just for the translit fanctionality, which can be said rarely-used. I think we can just avoid throwing fatal errors and leave that missing.
Hide
Charles Oliver Nutter added a comment -

I would be satisfied if we can fix it to just ignore the translit and perhaps warn. If anyone complains about translit missing we'll bother them to write us a gem-based Iconv extension with ICU4J.

Show
Charles Oliver Nutter added a comment - I would be satisfied if we can fix it to just ignore the translit and perhaps warn. If anyone complains about translit missing we'll bother them to write us a gem-based Iconv extension with ICU4J.
Hide
Ola Bini added a comment -

Have added tests to trunk to make sure that we actually ignore these values.

Show
Ola Bini added a comment - Have added tests to trunk to make sure that we actually ignore these values.

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: