Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.6.5
    • Fix Version/s: JRuby 1.7.0.pre1
    • Component/s: Ruby 1.9.2
    • Labels:
      None
    • Environment:
      Mac OSX Lion
      Jruby 1.6.5.1
      JRUBY_OPTS="--1.9"
    • Number of attachments :
      0

      Description

      File.open supports option "r:bom|utf-8" in 1.8 mode, but not in 1.9 mode.

      I came across this trying to use the Hoe library (cf. https://github.com/seattlerb/hoe/blob/master/lib/hoe.rb, function read_utf), which works in 1.8, but not in 1.9.

      `File.open 'Manifest.txt', "r:bom|utf-8"` will yield the following error stack:

      ArgumentError: unknown encoding name - bom|utf-8
      from org/jruby/RubyIO.java:1066:in `set_encoding'
      from org/jruby/RubyFile.java:464:in `initialize'
      from org/jruby/RubyIO.java:1111:in `open'

        Activity

        Hide
        Matt Hauck added a comment -

        I should say, it appears these file encodings are not supported at all in 1.8 mode, and they have not yet been added in 1.9.

        Show
        Matt Hauck added a comment - I should say, it appears these file encodings are not supported at all in 1.8 mode, and they have not yet been added in 1.9.
        Hide
        Charles Oliver Nutter added a comment - - edited

        This is still a problem. From what I can gather out of the MRI sources, the bom| prefix is used to indicate that the BOM from the file should be used to detect the file's external encoding, otherwise falling back on the given encoding. Here's docs from MRI's IO.open:

         *  If ext_enc starts with 'BOM|', check whether the input has a BOM. If
         *  there is a BOM, strip it and set external encoding as
         *  what the BOM tells. If there is no BOM, use ext_enc without 'BOM|'.
        

        We also can't simply ignore this, since ignoring it would mean we read the file's BOM (in the case of e.g. UTF-16) as normal bytes.

        Show
        Charles Oliver Nutter added a comment - - edited This is still a problem. From what I can gather out of the MRI sources, the bom| prefix is used to indicate that the BOM from the file should be used to detect the file's external encoding, otherwise falling back on the given encoding. Here's docs from MRI's IO.open: * If ext_enc starts with 'BOM|', check whether the input has a BOM. If * there is a BOM, strip it and set external encoding as * what the BOM tells. If there is no BOM, use ext_enc without 'BOM|'. We also can't simply ignore this, since ignoring it would mean we read the file's BOM (in the case of e.g. UTF-16) as normal bytes.
        Hide
        Charles Oliver Nutter added a comment -

        This should have been marked fixed in 1.7.0.pre1.

        Show
        Charles Oliver Nutter added a comment - This should have been marked fixed in 1.7.0.pre1.

          People

          • Assignee:
            Charles Oliver Nutter
            Reporter:
            Matt Hauck
          • Votes:
            1 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: