Issue Details (XML | Word | Printable)

Key: JRUBY-1046
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Ola Bini
Reporter: Steve Shreeve
Votes: 2
Watchers: 2
Operations

If you were logged in you would be able to see more operations.
JRuby

JRuby regex support for "\G" is different than MRI

Created: 31/May/07 06:39 PM   Updated: 23/Apr/08 10:04 AM   Resolved: 08/Jul/07 07:39 AM
Return to search
Component/s: Core Classes/Modules
Affects Version/s: JRuby 1.0.0RC2
Fix Version/s: JRuby 1.1b1

Time Tracking:
Not Specified

Environment: svn rev 3795

Testcase included: yes


 Description  « Hide

This code performs differently in MRI than JRuby:

regx = /\G(\w+): (\w)(\r\n)+/
data = "Name: Foobar\r\nAge: 35\r\nWeight: 180\r\nHair: brown\r\n\r\n"
data.scan(regx) {|k,v| p [k,v]}

Another person on irc also noted:

lopex: this doesnt work either: "abc" =~ /\G(.*)/



Steve Shreeve added a comment - 31/May/07 06:43 PM

Actually, that post was not accurate. Here's the source again:

regx = /\G(\w+): (\w)(\r\n)+/
data = "Name: Foobar\r\nAge: 35\r\nWeight: 180\r\nHair: brown\r\n\r\n"
data.scan(regx) {|k,v| p [k,v]}


Steve Shreeve added a comment - 31/May/07 06:47 PM

Okay, I think this version should parse correctly:

regx = /\G(\w+): +(\w+)(\r\n)+/
data = "Name: Foobar\r\nAge: 35\r\nWeight: 180\r\nHair: brown\r\n\r\n"
data.scan(regx) {|k,v| p [k,v]}

Thomas E Enebo added a comment - 31/May/07 08:46 PM

I can see this in Ruby C source, but I have no idea what this is for...omitting \G seems to give me the same output. Is there any documentation on this option?


Steve Shreeve added a comment - 31/May/07 10:42 PM

The anchor "\G" matches at the position where the previous match ended. It continues at the end of the previous match. This is useful in methods such as #scan, where this is often required. Without "\G", #scan might well skip leading characters in subsequent iterations, which is exactly what "\G" prevents.

Here's is a good explanation:

http://www.regular-expressions.info/continue.html

In essence the "\G" anchor "picks up" where the last match ended, which is slightly different than the regex without it.


Florian Gross added a comment - 01/Jul/07 08:35 AM

I just tried to use \G to write a custom scan() method, but it didn't to what I expected it to do from Steve's comment:

res = []
str = "hello world. a sample string."
re = /\G\w+/
while match = str.match(re)
  res << match.to_s
end

Fills res with an infinite number of "hello"s.

http://use.perl.org/comments.pl?sid=22426&op=&threshold=0&commentsort=0&mode=thread&pid=34282#34303 says \G in Ruby matches the start of the current match. Which is the same as \A (start of string) if you are matching from the start of string:

"foobar".match(/\Abar/).to_s # => ""
"foobar".match(/\Gbar/).to_s # => ""

# note: match() only supports the start argument in 1.9...
"foobar".match(/\Abar/, 3).to_s # => ""
"foobar".match(/\Gbar/, 3).to_s # => "bar"

It also seems to work that way for scan():

"aa1 aa2 ba3 ".scan(/a+\d\s*/) # => ["aa1 ", "aa2 ", "a3 "]
"aa1 aa2 ba3 ".scan(/\Aa+\d\s*/) # => ["aa1 "]
"aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "]

Took me a while to find all this out. Always great to have such things undocumented.


Steve Shreeve added a comment - 02/Jul/07 01:45 PM

Here's the output of Florian's example in irb and jirb:

IRB:

irb(main):001:0> "aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "]
=> ["aa1 ", "aa2 "]

JIRB:

irb(main):001:0> "aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "]
=> []

Ola Bini added a comment - 08/Jul/07 07:39 AM

The support was there, it only handled the first match case badly, which meant it never got started. It's fixed in trunk now.