JRuby

JRuby regex support for "\G" is different than MRI

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: JRuby 1.0.0RC2
  • Fix Version/s: JRuby 1.1b1
  • Component/s: Core Classes/Modules
  • Labels:
    None
  • Environment:
    svn rev 3795
  • Testcase included:
    yes
  • Number of attachments :
    0

Description

This code performs differently in MRI than JRuby:

regx = /\G(\w+): (\w)(\r\n)+/
data = "Name: Foobar\r\nAge: 35\r\nWeight: 180\r\nHair: brown\r\n\r\n"
data.scan(regx) {|k,v| p [k,v]}

Another person on irc also noted:

lopex: this doesnt work either: "abc" =~ /\G(.*)/

Activity

Hide
Steve Shreeve added a comment -

Actually, that post was not accurate. Here's the source again:

regx = /\G(\w+): (\w)(\r\n)+/
data = "Name: Foobar\r\nAge: 35\r\nWeight: 180\r\nHair: brown\r\n\r\n"
data.scan(regx) {|k,v| p [k,v]}

Show
Steve Shreeve added a comment - Actually, that post was not accurate. Here's the source again: regx = /\G(\w+): (\w)(\r\n)+/ data = "Name: Foobar\r\nAge: 35\r\nWeight: 180\r\nHair: brown\r\n\r\n" data.scan(regx) {|k,v| p [k,v]}
Hide
Steve Shreeve added a comment -

Okay, I think this version should parse correctly:

regx = /\G(\w+): +(\w+)(\r\n)+/
data = "Name: Foobar\r\nAge: 35\r\nWeight: 180\r\nHair: brown\r\n\r\n"
data.scan(regx) {|k,v| p [k,v]}
Show
Steve Shreeve added a comment - Okay, I think this version should parse correctly:
regx = /\G(\w+): +(\w+)(\r\n)+/
data = "Name: Foobar\r\nAge: 35\r\nWeight: 180\r\nHair: brown\r\n\r\n"
data.scan(regx) {|k,v| p [k,v]}
Hide
Thomas E Enebo added a comment -

I can see this in Ruby C source, but I have no idea what this is for...omitting \G seems to give me the same output. Is there any documentation on this option?

Show
Thomas E Enebo added a comment - I can see this in Ruby C source, but I have no idea what this is for...omitting \G seems to give me the same output. Is there any documentation on this option?
Hide
Steve Shreeve added a comment -

The anchor "\G" matches at the position where the previous match ended. It continues at the end of the previous match. This is useful in methods such as #scan, where this is often required. Without "\G", #scan might well skip leading characters in subsequent iterations, which is exactly what "\G" prevents.

Here's is a good explanation:

http://www.regular-expressions.info/continue.html

In essence the "\G" anchor "picks up" where the last match ended, which is slightly different than the regex without it.

Show
Steve Shreeve added a comment - The anchor "\G" matches at the position where the previous match ended. It continues at the end of the previous match. This is useful in methods such as #scan, where this is often required. Without "\G", #scan might well skip leading characters in subsequent iterations, which is exactly what "\G" prevents. Here's is a good explanation: http://www.regular-expressions.info/continue.html In essence the "\G" anchor "picks up" where the last match ended, which is slightly different than the regex without it.
Hide
Florian Gross added a comment -

I just tried to use \G to write a custom scan() method, but it didn't to what I expected it to do from Steve's comment:

res = []
str = "hello world. a sample string."
re = /\G\w+/
while match = str.match(re)
  res << match.to_s
end

Fills res with an infinite number of "hello"s.

http://use.perl.org/comments.pl?sid=22426&op=&threshold=0&commentsort=0&mode=thread&pid=34282#34303 says \G in Ruby matches the start of the current match. Which is the same as \A (start of string) if you are matching from the start of string:

"foobar".match(/\Abar/).to_s # => ""
"foobar".match(/\Gbar/).to_s # => ""

# note: match() only supports the start argument in 1.9...
"foobar".match(/\Abar/, 3).to_s # => ""
"foobar".match(/\Gbar/, 3).to_s # => "bar"

It also seems to work that way for scan():

"aa1 aa2 ba3 ".scan(/a+\d\s*/) # => ["aa1 ", "aa2 ", "a3 "]
"aa1 aa2 ba3 ".scan(/\Aa+\d\s*/) # => ["aa1 "]
"aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "]

Took me a while to find all this out. Always great to have such things undocumented.

Show
Florian Gross added a comment - I just tried to use \G to write a custom scan() method, but it didn't to what I expected it to do from Steve's comment:
res = []
str = "hello world. a sample string."
re = /\G\w+/
while match = str.match(re)
  res << match.to_s
end
Fills res with an infinite number of "hello"s. http://use.perl.org/comments.pl?sid=22426&op=&threshold=0&commentsort=0&mode=thread&pid=34282#34303 says \G in Ruby matches the start of the current match. Which is the same as \A (start of string) if you are matching from the start of string:
"foobar".match(/\Abar/).to_s # => ""
"foobar".match(/\Gbar/).to_s # => ""

# note: match() only supports the start argument in 1.9...
"foobar".match(/\Abar/, 3).to_s # => ""
"foobar".match(/\Gbar/, 3).to_s # => "bar"
It also seems to work that way for scan():
"aa1 aa2 ba3 ".scan(/a+\d\s*/) # => ["aa1 ", "aa2 ", "a3 "]
"aa1 aa2 ba3 ".scan(/\Aa+\d\s*/) # => ["aa1 "]
"aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "]
Took me a while to find all this out. Always great to have such things undocumented.
Hide
Steve Shreeve added a comment -

Here's the output of Florian's example in irb and jirb:

IRB:

irb(main):001:0> "aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "]
=> ["aa1 ", "aa2 "]

JIRB:

irb(main):001:0> "aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "]
=> []
Show
Steve Shreeve added a comment - Here's the output of Florian's example in irb and jirb:
IRB:

irb(main):001:0> "aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "]
=> ["aa1 ", "aa2 "]

JIRB:

irb(main):001:0> "aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "]
=> []
Hide
Ola Bini added a comment -

The support was there, it only handled the first match case badly, which meant it never got started. It's fixed in trunk now.

Show
Ola Bini added a comment - The support was there, it only handled the first match case badly, which meant it never got started. It's fixed in trunk now.

People

Vote (2)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: