Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: JRuby 1.6.4
    • Fix Version/s: JRuby 1.6.5, JRuby 1.7.0.pre1
    • Component/s: None
    • Labels:
      None
    • Environment:
      JRuby 1.6.4, 1.8.7 mode
      Rails 3.0.7
      Java version "1.6.0_24"
      Ubuntu 10.04.1 LTS
    • Number of attachments :
      0

      Description

      After update from 1.6.2 to 1.6.4 queries with not UTF-8 encoded strings started raising exceptions. Same queries were working without any problems on 1.6.2.

      Example string causing exception (taken from email's from field): =?iso-8859-1?Q?Tony_Banz=E1_K=E9lluz?=

      Exception message:

      ActiveRecord::JDBCError: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near...

      Top of the backtrace:

      activerecord (3.0.7) lib/active_record/connection_adapters/abstract_adapter.rb:207:in `log'
      activerecord-jdbc-adapter (1.1.3) lib/arjdbc/jdbc/adapter.rb:183:in `execute'
      activerecord-jdbc-adapter (1.1.3) lib/arjdbc/jdbc/adapter.rb:199:in `jdbc_update'
      arel (2.0.10) lib/arel/crud.rb:20:in `update'
      ...

        Activity

        Hide
        Anthony Juckel added a comment -

        I've tried replicating this issue locally with jruby-1.6.4, with activerecord-jdbcmysql-adapter 1.1.3, rails et al. 3.0.7, and have been unsuccessful. Can you make a small test script that reproduces this error?

        Show
        Anthony Juckel added a comment - I've tried replicating this issue locally with jruby-1.6.4, with activerecord-jdbcmysql-adapter 1.1.3, rails et al. 3.0.7, and have been unsuccessful. Can you make a small test script that reproduces this error?
        Hide
        Michal Ochman added a comment -

        Hi,

        There is an example script how to test it. Please make sure you've got "mail" gem installed (I'm on 2.2.19), change query (it depends on your DB schema, but it can be any query you want, just to pass that string extracted from email), and run it using rails runner command. On 1.6.4 it fails, while on 1.6.2 works properly.

        require 'rubygems'
        require 'mail'

        mail_data = <<-END
        From: =?iso-8859-1?Q?Tony_Banz=E1_K=E9lluz?= <some@email.com>

        Email body
        END

        mail = Mail.read_from_string(mail_data)
        from_field = mail['from'].addrs[0].decoded

        # change this query to any query you want - just pass from_field to it
        User.find_by_email(from_field)

        puts 'ok, worked properly'

        Show
        Michal Ochman added a comment - Hi, There is an example script how to test it. Please make sure you've got "mail" gem installed (I'm on 2.2.19), change query (it depends on your DB schema, but it can be any query you want, just to pass that string extracted from email), and run it using rails runner command. On 1.6.4 it fails, while on 1.6.2 works properly. require 'rubygems' require 'mail' mail_data = <<-END From: =?iso-8859-1?Q?Tony_Banz=E1_K=E9lluz?= <some@email.com> Email body END mail = Mail.read_from_string(mail_data) from_field = mail ['from'] .addrs [0] .decoded # change this query to any query you want - just pass from_field to it User.find_by_email(from_field) puts 'ok, worked properly'
        Hide
        Charles Oliver Nutter added a comment -

        Played with this briefly.

        Anthony was on IRC and told us the problem comes when calling getUnicodeString on the RubyString that holds that decoded mail result. The problem seems to be that although we have an iso-8859-1 encoded String, we can't know that when we need to turn the string into a Java UTF-16 string because 1.8 mode has no concept of per-string encodings. So getUnicodeString always assumes you've got UTF-8 bytes, and that's probably where the string is getting demolished.

        Show
        Charles Oliver Nutter added a comment - Played with this briefly. Anthony was on IRC and told us the problem comes when calling getUnicodeString on the RubyString that holds that decoded mail result. The problem seems to be that although we have an iso-8859-1 encoded String, we can't know that when we need to turn the string into a Java UTF-16 string because 1.8 mode has no concept of per-string encodings. So getUnicodeString always assumes you've got UTF-8 bytes, and that's probably where the string is getting demolished.
        Hide
        Anthony Juckel added a comment -

        After poking around for a while, it appears that JRuby 1.6.2 matched the behavior of MRI 1.8.7 more closely in that it would happily accept an invalid encoded string, and pass it on down the layers as UTF-8. MySQL doesn't appear to care either, though, had you actually stored the value of the test case above in your DB as UTF-8, MySQL wouldn't actually find it (at least, in my simple test DB).

        JRuby 1.6.4 got more strict with it's handling, and unfortunately the UTF-8 parser stops when it finds invalid input (and isn't passing that failure up the stack), therefore the query is truncated by the time it gets to MySQL.

        To solve this, I'd suggest using something like the following to try a simple scrub of your data, if it's in an unknown encoding, and then the behavior matches between MRI and JRuby.

        u1 = User.find_by_email(from_field) # returns nil
        puts "u1: #

        {u1.to_json}

        "
        u2 = User.find_by_email(ActiveSupport::Multibyte::Chars.new(from_field).tidy_bytes) # finds the record
        puts "u2: #

        {u2.to_json}

        "

        If you actually know the encoding of the value, use Iconv or something similar to transcode the data to UTF-8 before passing it on to activerecord.

        Show
        Anthony Juckel added a comment - After poking around for a while, it appears that JRuby 1.6.2 matched the behavior of MRI 1.8.7 more closely in that it would happily accept an invalid encoded string, and pass it on down the layers as UTF-8. MySQL doesn't appear to care either, though, had you actually stored the value of the test case above in your DB as UTF-8, MySQL wouldn't actually find it (at least, in my simple test DB). JRuby 1.6.4 got more strict with it's handling, and unfortunately the UTF-8 parser stops when it finds invalid input (and isn't passing that failure up the stack), therefore the query is truncated by the time it gets to MySQL. To solve this, I'd suggest using something like the following to try a simple scrub of your data, if it's in an unknown encoding, and then the behavior matches between MRI and JRuby. u1 = User.find_by_email(from_field) # returns nil puts "u1: # {u1.to_json} " u2 = User.find_by_email(ActiveSupport::Multibyte::Chars.new(from_field).tidy_bytes) # finds the record puts "u2: # {u2.to_json} " If you actually know the encoding of the value, use Iconv or something similar to transcode the data to UTF-8 before passing it on to activerecord.
        Hide
        Hiro Asari added a comment -

        I merged Anthony's pull request.

        Show
        Hiro Asari added a comment - I merged Anthony's pull request.

          People

          • Assignee:
            Hiro Asari
            Reporter:
            Michal Ochman
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: