groovy
  1. groovy
  2. GROOVY-3443

Patch to enhance String class with a find method that takes regular expressions

    Details

    • Type: New Feature New Feature
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.6
    • Fix Version/s: 1.6.1, 1.7-beta-1
    • Component/s: regular expressions
    • Labels:
      None
    • Testcase included:
      yes
    • Patch Submitted:
      Yes
    • Number of attachments :
      2

      Description

      Groovy makes working with regular expressions much easier than Java, but there are still a couple of holes in the 1.6 implementation.

      I've attached a patch that adds a couple of simple methods that make regular expressions much easier to use.

      One of the most common use cases is to search a string for a regular expression pattern. If a match is found, then do something with the matched value.

      Currently in groovy, the recommended way to do this is to create a matcher and then use indexes to work with any matches that might be found:

      assert "10292" == ("New York, NY 10292" =~ /\d

      {5}/)[0]

      If you try to do that on a string that doesn't actually match the regular expression, you'll get an IndexOutOfBoundException. To be safe, you need to check matcher.find() (not matches as that requires the entire string to match!) to see if the string is actually in there:

      def m = ("New York, NY" =~ /\d{5}

      /)[0]
      def zip
      if (m.find())

      { zip = m[0] }

      It also has inconsistent behavior if the regular expression happens to have capture groups in it. Then it returns an array containing the match and the capture groups, forcing you to index into that array to actually get the match you want:

      assert "c" == ("foo car baz" =~ /(.)ar/)[0][1]

      Groovy has already added closure aware replace method to the String class. The patch adds a complimentary find method to string that will return the string matched by the closure without needing to worry about matcher objects and array indexes.

      You can either call it without a closure to get the full found match back (even if it has groups in it):

      assert "10292" == "New York, NY 10292".find(/\d

      {5}/)

      It safely returns a null if the match isn't found, which can clean up boilerplate safety checks quite a bit. The user can check for null using groovy truth if they want to:

      def zip = "New York, NY".find(/\d{5}

      /) // returns null

      if (zip)

      { ... }

      If you want to work with capture groups, or manipulate the value, you can pass a closure to the find method that will be passed the full match as well as any capture groups (just as the collection based regular expression methods work):

      // no capture groups, only the match is passed to the closure
      assert "bar" == "foo bar baz".find(/.ar/)

      { match -> return match }

      // one capture group
      assert "b" == "foo bar baz".find(/(.)ar/)

      { match, firstLetter -> return firstLetter }

      // many capture groups, all passed to the closure after the full match
      assert "2339999" == "adsf 233-9999 adsf".find(/(\d

      {3})?-?(\d{3}

      )-(\d

      {4}

      )/)

      { match, areaCode, exchange, stationNumber -> assert "233-9999" == match assert null == areaCode assert "233" == exchange assert "9999" == stationNumber return "$exchange$stationNumber" }

      The patch also includes a number of unit tests to exercise and demonstrate the functionality.

      I get the most traffic on my blog to a post that I did explaning regular expressions in groovy, and I think that people struggle a bit with the current implementation. I think this patch makes working with regular expressions much easier and more intuitive.

      1. string_regex_find_v2.diff
        18 kB
        Ted Naleid
      2. string_regex_find.diff
        6 kB
        Ted Naleid

        Activity

        Hide
        blackdrag blackdrag added a comment - - edited

        the implementation uses a plain string, which is then compiled into a pattern. I suggest we think about using "~" because that creates a pattern automatically. And the user or even the compiler could decide to cache that value. That would for example mean: assert "10292" == "New York, NY 10292".find(~/\d

        {5}

        /)

        Show
        blackdrag blackdrag added a comment - - edited the implementation uses a plain string, which is then compiled into a pattern. I suggest we think about using "~" because that creates a pattern automatically. And the user or even the compiler could decide to cache that value. That would for example mean: assert "10292" == "New York, NY 10292".find(~/\d {5} /)
        Hide
        Ted Naleid added a comment -

        The main issue that I have with taking a pattern rather than a string is that it's inconsistent with the API for the replaceAll method that are already attached to String. It expectes a string regular expression, rather than a compiled Pattern object :

        http://groovy.codehaus.org/api/org/codehaus/groovy/runtime/DefaultGroovyMethods.html#replaceAll(java.lang.String,%20java.lang.String,%20groovy.lang.Closure)

        I could see adding another method in addition to the string version that also takes a precompiled pattern, but I think there's some dissonance with only having a pattern version of find.

        I'd be happy to modify the patch to do this if desired.

        Show
        Ted Naleid added a comment - The main issue that I have with taking a pattern rather than a string is that it's inconsistent with the API for the replaceAll method that are already attached to String. It expectes a string regular expression, rather than a compiled Pattern object : http://groovy.codehaus.org/api/org/codehaus/groovy/runtime/DefaultGroovyMethods.html#replaceAll(java.lang.String,%20java.lang.String,%20groovy.lang.Closure ) I could see adding another method in addition to the string version that also takes a precompiled pattern, but I think there's some dissonance with only having a pattern version of find. I'd be happy to modify the patch to do this if desired.
        Hide
        Paul King added a comment - - edited

        Re: I suggest we think about using "~"

        A Pattern variant might be useful, though I would lean towards providing both as we already have a String variant for String.eachMatch(). And then we should provide a Pattern variant for eachMatch too.

        I am also thinking that a findAll variant of the current proposal would be in order:

        assert ['cat', 'sat', 'hat'] == 'The cat sat on the hat'.findAll('.at')
        
        Show
        Paul King added a comment - - edited Re: I suggest we think about using "~" A Pattern variant might be useful, though I would lean towards providing both as we already have a String variant for String.eachMatch() . And then we should provide a Pattern variant for eachMatch too. I am also thinking that a findAll variant of the current proposal would be in order: assert ['cat', 'sat', 'hat'] == 'The cat sat on the hat'.findAll('.at')
        Hide
        Ted Naleid added a comment -

        Based on feedback, I've updated the patch which originally had these methods:

        public static String find(String self, String regex)
        public static String find(String self, String regex, Closure closure)

        To add the following methods:

        public static String find(String self, Pattern pattern)
        public static String find(String self, Pattern pattern, Closure closure)

        public static List findAll(String self, String regex)
        public static List findAll(String self, Pattern pattern)

        public static List findAll(String self, String regex, Closure closure)
        public static List findAll(String self, Pattern pattern, Closure closure)

        public static String eachMatch(String self, Pattern pattern, Closure closure)

        All methods have associated unit tests as well that demonstrate their functionality. Let me know if any further changes are desired.

        Show
        Ted Naleid added a comment - Based on feedback, I've updated the patch which originally had these methods: public static String find(String self, String regex) public static String find(String self, String regex, Closure closure) To add the following methods: public static String find(String self, Pattern pattern) public static String find(String self, Pattern pattern, Closure closure) public static List findAll(String self, String regex) public static List findAll(String self, Pattern pattern) public static List findAll(String self, String regex, Closure closure) public static List findAll(String self, Pattern pattern, Closure closure) public static String eachMatch(String self, Pattern pattern, Closure closure) All methods have associated unit tests as well that demonstrate their functionality. Let me know if any further changes are desired.
        Hide
        Ted Naleid added a comment -

        Updated patch with additional methods.

        Show
        Ted Naleid added a comment - Updated patch with additional methods.
        Hide
        Paul King added a comment -

        Patch looks good to me.

        Show
        Paul King added a comment - Patch looks good to me.
        Hide
        Paul King added a comment -

        Added

        Show
        Paul King added a comment - Added
        Hide
        Paul King added a comment -

        The changes should now be available in the recently released 1.6.1. Thanks for the patch.

        Show
        Paul King added a comment - The changes should now be available in the recently released 1.6.1. Thanks for the patch.

          People

          • Assignee:
            Paul King
            Reporter:
            Ted Naleid
          • Votes:
            10 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: