Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Won't Fix
    • Affects Version/s: JRuby 1.1.4
    • Fix Version/s: None
    • Component/s: Core Classes/Modules
    • Labels:
      None
    • Environment:
      using ubuntu 8.04
    • Number of attachments :
      2

      Description

      ###############################################################
      #a simple rails controller , rails 2.0.2 and jruby 1.1.4
      ###############################################################

      class TesterController < ApplicationController
      require 'jcode'
      def index(len=8)
      chars = ("a".."z").to_a + ("A".."Z").to_a + ("0".."9").to_a
      guid = ""
      1.upto(len)

      { |i| guid << chars[rand(chars.size-1)] }

      render :text => guid
      end
      end

      ###############################################################
      #am using jruby 1.1.4
      #and am getting this error when i do require 'jcode' in line 2,
      #but when removing it things goes ok....
      #too short multibyte code string: #/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/
      #/home/khelll/projects/jruby/lib/ruby/1.8/jcode.rb:66:in `end_regexp'
      #/home/khelll/projects/jruby/lib/ruby/1.8/jcode.rb:79:in `succ!'
      #/home/khelll/projects/jruby/lib/ruby/1.8/jcode.rb:94:in `succ'
      #app/controllers/tester_controller.rb:4:in `each'
      #app/controllers/tester_controller.rb:4:in `index'
      #:1:in `initialize'
      ###############################################################

      1. jcode.rb.patch
        0.7 kB
        lunlumo
      2. pastie-263899.rb
        1 kB
        khaled al habache

        Activity

        Hide
        Marcin Miel?y?ski added a comment -

        Hmm, reduced testcase:

        require 'jcode'
        $KCODE = "utf-8"
        p "\x80".succ

        Imho, it should fail undef MRI but gives: "x\201"

        1.9 even more surprisingly:

        1. coding: utf-8
          p "\x80".succ

        -> "\x01\x80"

        Show
        Marcin Miel?y?ski added a comment - Hmm, reduced testcase: require 'jcode' $KCODE = "utf-8" p "\x80".succ Imho, it should fail undef MRI but gives: "x\201" 1.9 even more surprisingly: coding: utf-8 p "\x80".succ -> "\x01\x80"
        Thomas E Enebo made changes -
        Field Original Value New Value
        Assignee Thomas E Enebo [ enebo ]
        Component/s Core Classes/Modules [ 12230 ]
        Hide
        Aslak Helles°y added a comment -

        I can confirm that this bug is apparent on Rails 2.2 and JRuby 1.1.5. I'm guessing it has to do with Rails Unicode library not working on JRuby with jcode activated.

        Show
        Aslak Helles°y added a comment - I can confirm that this bug is apparent on Rails 2.2 and JRuby 1.1.5. I'm guessing it has to do with Rails Unicode library not working on JRuby with jcode activated.
        Hide
        Charles Oliver Nutter added a comment -

        Confirmed on trunk/1.2...it appears to be something regexp related. Here's a reduced case:

        $ jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/"
        -e:1: too short multibyte code string: /[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/ (RegexpError)
        

        The regexp in question is hardcoded in jcode.rb.

        Show
        Charles Oliver Nutter added a comment - Confirmed on trunk/1.2...it appears to be something regexp related. Here's a reduced case: $ jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/" -e:1: too short multibyte code string: /[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/ (RegexpError) The regexp in question is hardcoded in jcode.rb.
        Hide
        Luke Monahan added a comment -

        An even quicker way to get this to happen, I think it's the same:

        jruby -Ku -e "/\375/"
        
        Show
        Luke Monahan added a comment - An even quicker way to get this to happen, I think it's the same: jruby -Ku -e "/\375/"
        Hide
        Marcin Miel?y?ski added a comment -

        This is kinda problematic since \375 is not a valid utf-8 character and it actually doesn't make sense to build character classes (ranges) from it.

        Show
        Marcin Miel?y?ski added a comment - This is kinda problematic since \375 is not a valid utf-8 character and it actually doesn't make sense to build character classes (ranges) from it.
        Hide
        Charles Oliver Nutter added a comment -

        The original reproduction stands though...it's in actual code. It certainly could be flawed, though.

        Show
        Charles Oliver Nutter added a comment - The original reproduction stands though...it's in actual code. It certainly could be flawed, though.
        Hide
        JT Zemp added a comment -

        I've confirmed it running
        Using jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM) Client VM 1.5.0_19) [i386-java]
        prompt $ jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/"
        -e:1: too short multibyte code string: /[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/ (SyntaxError)

        Show
        JT Zemp added a comment - I've confirmed it running Using jruby 1.3.1 (ruby 1.8.6p287) (2009-06-15 2fd6c3d) (Java HotSpot(TM) Client VM 1.5.0_19) [i386-java] prompt $ jruby -Ku -e "/ [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] $/" -e:1: too short multibyte code string: / [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] $/ (SyntaxError)
        Hide
        JT Zemp added a comment -

        I just tried it in jruby 1.4 as well

        Using jruby 1.4.0dev (ruby 1.8.7p174) (2009-08-24 be4901c) (Java HotSpot(TM) Client VM 1.5.0_19) [i386-java]
        prompt $ jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/"
        -e:1: too short multibyte code string: /[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/ (SyntaxError)

        Show
        JT Zemp added a comment - I just tried it in jruby 1.4 as well Using jruby 1.4.0dev (ruby 1.8.7p174) (2009-08-24 be4901c) (Java HotSpot(TM) Client VM 1.5.0_19) [i386-java] prompt $ jruby -Ku -e "/ [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] $/" -e:1: too short multibyte code string: / [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] $/ (SyntaxError)
        Hide
        Jens-Christian Fischer added a comment -

        We ran into this bug too... Some experimenting shows:

        original problem:
        /cygdrive/c/Programme/jruby-1.3.1/lib/ruby/gems/1.8/cache
        $ jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/"
        -e:1: too short multibyte code string: /[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/ (SyntaxError)

        Replacing / / with Regexp.new (as done a few lines below the defintion of the mulitbyte string in jcode.rb:

        /cygdrive/c/Programme/jruby-1.3.1/lib/ruby/gems/1.8/cache
        $ jruby -Ku -e "Regexp.new('[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]', 0, 'n')"

        works!!

        now using the same parameters "on" for the regexp generation fails:

        /cygdrive/c/Programme/jruby-1.3.1/lib/ruby/gems/1.8/cache
        $ jruby -Ku -e "Regexp.new('^[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$', 0, 'on')"
        -e:1: too short multibyte code string: /^[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/ (RegexpError)

        Dropping the "o" parameter from the Regexp, works:

        /cygdrive/c/Programme/jruby-1.3.1/lib/ruby/gems/1.8/cache
        $ jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/n"

        The temporaray fix for us was to patch line 66 of jcode.rb to drop the o from the reqexp generation. If I understand the option correctly, it's not needed here, anyway.

        Show
        Jens-Christian Fischer added a comment - We ran into this bug too... Some experimenting shows: original problem: /cygdrive/c/Programme/jruby-1.3.1/lib/ruby/gems/1.8/cache $ jruby -Ku -e "/ [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] | [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] /" -e:1: too short multibyte code string: / [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] | [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] / (SyntaxError) Replacing / / with Regexp.new (as done a few lines below the defintion of the mulitbyte string in jcode.rb: /cygdrive/c/Programme/jruby-1.3.1/lib/ruby/gems/1.8/cache $ jruby -Ku -e "Regexp.new(' [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] | [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] ', 0, 'n')" works!! now using the same parameters "on" for the regexp generation fails: /cygdrive/c/Programme/jruby-1.3.1/lib/ruby/gems/1.8/cache $ jruby -Ku -e "Regexp.new('^ [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] | [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] $', 0, 'on')" -e:1: too short multibyte code string: /^ [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] | [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] $/ (RegexpError) Dropping the "o" parameter from the Regexp, works: /cygdrive/c/Programme/jruby-1.3.1/lib/ruby/gems/1.8/cache $ jruby -Ku -e "/ [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] | [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] /n" The temporaray fix for us was to patch line 66 of jcode.rb to drop the o from the reqexp generation. If I understand the option correctly, it's not needed here, anyway.
        Hide
        lunlumo added a comment -

        Third parameter of Regexp.new is the character set, that given match context. So, the below code work well.

        $ jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/on"

        But, execute the below program, then only last statement raise exception.

        test.rb
        $KCODE='u'
        
        puts 'a&#65345;' =~ /[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/n #=> 1
        
        puts 'a&#65345;' =~ /[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/on #=> 1
        
        pattern = '[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]'
        puts 'a&#65345;' =~ /#{pattern}/on #=> SyntaxError
        
        

        Therefore, I suspect, both o and n option given in jruby, then expand expression to string first, and create regex object after.
        Original probrem in jcode.rb is the same cause in the above code. (Attached ad-hoc patch for jcode.rb.)

        Show
        lunlumo added a comment - Third parameter of Regexp.new is the character set, that given match context. So, the below code work well. $ jruby -Ku -e "/ [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] | [\xc0-\xdf] [\x80-\xbf] | [\xe0-\xef] [\x80-\xbf] [\x80-\xbf] /on" But, execute the below program, then only last statement raise exception. test.rb $KCODE='u' puts 'a&#65345;' =~ /[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/n #=> 1 puts 'a&#65345;' =~ /[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/on #=> 1 pattern = '[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]' puts 'a&#65345;' =~ /#{pattern}/on #=> SyntaxError Therefore, I suspect, both o and n option given in jruby, then expand expression to string first, and create regex object after. Original probrem in jcode.rb is the same cause in the above code. (Attached ad-hoc patch for jcode.rb.)
        lunlumo made changes -
        Attachment jcode.rb.patch [ 44403 ]
        Hide
        phoenix added a comment -

        jruby 1.4.0 (ruby 1.8.7 patchlevel 174) (2009-11-02 69fbfa3) (Java HotSpot(TM) Client VM 1.6.0_14) [x86-java]

        prompt $ jruby -Ku -e "/\265\332 \306\332/"
        -e:1: too short multibyte code string: /\265\332 \306\332/ (SyntaxError)

        Show
        phoenix added a comment - jruby 1.4.0 (ruby 1.8.7 patchlevel 174) (2009-11-02 69fbfa3) (Java HotSpot(TM) Client VM 1.6.0_14) [x86-java] prompt $ jruby -Ku -e "/\265\332 \306\332/" -e:1: too short multibyte code string: /\265\332 \306\332/ (SyntaxError)
        Hide
        Ivo Wever added a comment - - edited

        Seems fixed in 1.5.6.

        jruby -v -Ku -e "/\265\332 \306\332/"
        jruby 1.5.6 (ruby 1.8.7 patchlevel 249) (2010-12-03 9cf97c3) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_20) [amd64-java]
        -e:1 warning: Useless use of a literal in void context.
        

        but not other output

        Other snippets from this issue, including

         jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/"
         jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/"
         jruby -Ku -e "Regexp.new('^[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$', 0, 'on')"
         jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/on"
        

        give the same result

        Show
        Ivo Wever added a comment - - edited Seems fixed in 1.5.6. jruby -v -Ku -e "/\265\332 \306\332/" jruby 1.5.6 (ruby 1.8.7 patchlevel 249) (2010-12-03 9cf97c3) (Java HotSpot(TM) 64-Bit Server VM 1.6.0_20) [amd64-java] -e:1 warning: Useless use of a literal in void context. but not other output Other snippets from this issue, including jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/" jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$/" jruby -Ku -e "Regexp. new ('^[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]$', 0, 'on')" jruby -Ku -e "/[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]|[\xc0-\xdf][\x80-\xbf]|[\xe0-\xef][\x80-\xbf][\x80-\xbf]/on" give the same result
        Hide
        Charles Oliver Nutter added a comment -

        jcode is gone in 1.9.3, so I'm marking this Won't Fix.

        Show
        Charles Oliver Nutter added a comment - jcode is gone in 1.9.3, so I'm marking this Won't Fix.
        Charles Oliver Nutter made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Assignee Charles Oliver Nutter [ headius ]
        Resolution Won't Fix [ 2 ]

          People

          • Assignee:
            Charles Oliver Nutter
            Reporter:
            khaled al habache
          • Votes:
            4 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: