JRuby (please use github issues at http://bugs.jruby.org)
  1. JRuby (please use github issues at http://bugs.jruby.org)
  2. JRUBY-6637

String encoding difference between jruby --1.9 and MRI 1.9 (force_encoding a no-op in this case?)

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.6.7, JRuby 1.7.0.pre1
    • Fix Version/s: JRuby 1.7.0.pre1
    • Component/s: Encoding
    • Labels:
    • Environment:
      OSX 10.6.8, jruby 1.7.0.dev (ruby-1.9.3-p139) (2012-04-24 6a30d22) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_31) [darwin-x86_64-java] and
    • Number of attachments :
      1

      Description

      Strings seem to be unicode despite attempt to force another encoding under jruby, but not MRI. e.g.

      cp866 = "\x8D".force_encoding("IBM866") # "\x8D" under MRI, "\u008D" under jruby

      See the attached for a longer explanation and sample code from jrochkind

      I don't understand why the attached script is problematic but the encoding tests in the jruby source work.

      Unfortunately, I don't really know what I'm doing – just trying to patch a gem and noticing the difference.

        Activity

        Hide
        Thomas E Enebo added a comment -

        The first output from running this script appears to be fixed now. "\x8D" is printed now on inspect instead of unicode escape. The encode("UTF-8") and below is still wrong however. It is as if we are not properly transcoding this value and just slapping some byte in front of it.

        Show
        Thomas E Enebo added a comment - The first output from running this script appears to be fixed now. "\x8D" is printed now on inspect instead of unicode escape. The encode("UTF-8") and below is still wrong however. It is as if we are not properly transcoding this value and just slapping some byte in front of it.
        Hide
        Thomas E Enebo added a comment -

        Fixed [master da288e0]. The base of this problem was our encoding object was still remembering it was IBM866, but it was being associated with a more generic encoding (ISO-8859-1). This was losing any ability to transcode the non-shared bits of 8859_1. This would affect many many encodings ability to transcode non-ascii.

        If you have any other issues please open a new issue.

        Show
        Thomas E Enebo added a comment - Fixed [master da288e0] . The base of this problem was our encoding object was still remembering it was IBM866, but it was being associated with a more generic encoding (ISO-8859-1). This was losing any ability to transcode the non-shared bits of 8859_1. This would affect many many encodings ability to transcode non-ascii. If you have any other issues please open a new issue.

          People

          • Assignee:
            Thomas E Enebo
            Reporter:
            Bill Dueber
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: