JRuby (please use github issues at http://bugs.jruby.org)
  1. JRuby (please use github issues at http://bugs.jruby.org)
  2. JRUBY-6141

All MatchData objects resulting from an invocation of String#scan are updated with the current match

    Details

    • Testcase included:
      yes
    • Patch Submitted:
      Yes
    • Number of attachments :
      4

      Description

      Each iteration of String#scan should result in a unique MatchData returned by Regexp.last_match, with each MatchData object reflecting the results of its match. Thus if String#scan resulted in two matches, after the first match Regexp.last_match would have different values than Regexp.last_match would have after the second match.

      However, in the current implementation each MatchData resulting from an invocation of String#scan has the values of the most recent match. For example, from the attached unit test:

        def test_scan
          firstmatch = nil
      
          str = "testing"
          re = Regexp.new('(t[^t]*)')
          
          str.scan(re) do |match|
            if firstmatch.nil?
              firstmatch = Regexp.last_match
      
              assert_equal "tes", firstmatch[0]
            else
              secondmatch = Regexp.last_match
              assert_equal "ting", secondmatch[0]
      
              # not the same object
              assert firstmatch.object_id != secondmatch.object_id
              
              # should still be the value of the first match
              assert_equal "tes", firstmatch[0]
            end
          end
        end
      

      although they are different objects (per the object_id assertions) firstmatch and secondmatch have the same results, so the assertion:

              assert_equal "tes", firstmatch[0]
      

      will fail, with firstmatch[0] equaling "ting", the results for the second match. (Note that this test succeeds with Ruby 1.8 and 1.9.)

      The reason for this behavior is that during String#scan, a MatchData is created for each match. The MatchData object has an attribute "regs" (org.joni.Region), which refers to where the pattern matched in the string.

      The issue is that when String#scan creates MatchData objects for each pattern match, each of the MatchData objects refer to the same Region instance. Subsequent matches result in the Region object being updated, and each MatchData object sharing a reference to that Region will have the same value, as used in MatchData#to_s and MatchData#captures.

      The solution is to clone the Region object for each newly-created MatchData, as in the attached patch.

      1. 0001-JRUBY-6141-fixed-so-that-MatchData-are-cloned-so-tha.patch
        1 kB
        Jeff Pace
      2. 0002-JRUBY-6141-added-RubySpec-test.patch
        1 kB
        Jeff Pace
      3. rubyregexp.patch
        0.9 kB
        Jeff Pace
      4. test_string_scan.rb
        0.6 kB
        Jeff Pace

        Activity

        Hide
        Hiro Asari added a comment -

        Jeff,

        I tested the patch, and it looks good. Can you submit a git-formatted patch, though, so that I can sign off and give you credit for it?

        Also, it would be nice if you can turn your test into a RubySpec.

        Thanks.

        Show
        Hiro Asari added a comment - Jeff, I tested the patch, and it looks good. Can you submit a git-formatted patch, though, so that I can sign off and give you credit for it? Also, it would be nice if you can turn your test into a RubySpec. Thanks.
        Hide
        Jeff Pace added a comment -

        Fix for org/jruby/RubyRegexp.java.

        Show
        Jeff Pace added a comment - Fix for org/jruby/RubyRegexp.java.
        Hide
        Jeff Pace added a comment -

        RubySpec test for this issue.

        Show
        Jeff Pace added a comment - RubySpec test for this issue.
        Hide
        Hiro Asari added a comment -

        I merged and pushed these patches to both the master and 1.6 branches.

        Show
        Hiro Asari added a comment - I merged and pushed these patches to both the master and 1.6 branches.

          People

          • Assignee:
            Hiro Asari
            Reporter:
            Jeff Pace
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: