|
Okay, I think this version should parse correctly:
regx = /\G(\w+): +(\w+)(\r\n)+/
data = "Name: Foobar\r\nAge: 35\r\nWeight: 180\r\nHair: brown\r\n\r\n"
data.scan(regx) {|k,v| p [k,v]}
I can see this in Ruby C source, but I have no idea what this is for...omitting \G seems to give me the same output. Is there any documentation on this option?
The anchor "\G" matches at the position where the previous match ended. It continues at the end of the previous match. This is useful in methods such as #scan, where this is often required. Without "\G", #scan might well skip leading characters in subsequent iterations, which is exactly what "\G" prevents.
Here's is a good explanation: http://www.regular-expressions.info/continue.html In essence the "\G" anchor "picks up" where the last match ended, which is slightly different than the regex without it. I just tried to use \G to write a custom scan() method, but it didn't to what I expected it to do from Steve's comment:
res = [] str = "hello world. a sample string." re = /\G\w+/ while match = str.match(re) res << match.to_s end Fills res with an infinite number of "hello"s. http://use.perl.org/comments.pl?sid=22426&op=&threshold=0&commentsort=0&mode=thread&pid=34282#34303 "foobar".match(/\Abar/).to_s # => "" "foobar".match(/\Gbar/).to_s # => "" # note: match() only supports the start argument in 1.9... "foobar".match(/\Abar/, 3).to_s # => "" "foobar".match(/\Gbar/, 3).to_s # => "bar" It also seems to work that way for scan(): "aa1 aa2 ba3 ".scan(/a+\d\s*/) # => ["aa1 ", "aa2 ", "a3 "] "aa1 aa2 ba3 ".scan(/\Aa+\d\s*/) # => ["aa1 "] "aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "] Took me a while to find all this out. Always great to have such things undocumented. Here's the output of Florian's example in irb and jirb:
IRB: irb(main):001:0> "aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "] => ["aa1 ", "aa2 "] JIRB: irb(main):001:0> "aa1 aa2 ba3 ".scan(/\Ga+\d\s*/) # => ["aa1 ", "aa2 "] => [] |
|||||||||||||||||||||||||||||||||||||||||||||||||
regx = /\G(\w+): (\w)(\r\n)+/
data = "Name: Foobar\r\nAge: 35\r\nWeight: 180\r\nHair: brown\r\n\r\n"
data.scan(regx) {|k,v| p [k,v]}