History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: GROOVY-1637
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Unassigned
Reporter: Andreas Sahlbach
Votes: 0
Watchers: 5
Operations

If you were logged in you would be able to see more operations.
groovy

String minus() fails in case of regex chars in operand [minor breaking change]

Created: 04/Jan/07 02:10 PM   Updated: 25/Jan/08 11:34 AM
Component/s: groovy-jdk
Affects Version/s: 1.0
Fix Version/s: 1.5.2

Time Tracking:
Issue & Sub-Tasks
Issue Only
Not Specified

Environment: all

Sub-Tasks  All   Open   

 Description  « Hide
The current implementation of String.minus fails if the operand contains regular expression special characters. The current implementation looks like this:
/**
  • Remove a part of a String
    *
  • @param left a String
  • @param value a String part to remove
  • @return a String minus the part to be removed
    */
    public static String minus(String left, Object value) { String text = toString(value); return left.replaceFirst(text, ""); }

I would suggest to escape all regex special characters in the "text" String or to choose a different implementation.
Examples of wrong or strange behaviour:
groovy> println "asd" - "a
c"
groovy> go
Caught: java.util.regex.PatternSyntaxException: Illegal control escape sequence near index 2
a\c
^
at gjdk.java.lang.String_GroovyReflector.invoke(Unknown Source)
at CommandLine.run(CommandLine.groovy:1)

groovy> println "should_not_match_but_matches"-"should_...match"
groovy> go
but_matches

groovy> println "asd"-"a
ud"
groovy> go
Caught: java.util.regex.PatternSyntaxException: Illegal Unicode escape sequence near index 4
a\ud
^
at gjdk.java.lang.String_GroovyReflector.invoke(Unknown Source)
at CommandLine.run(CommandLine.groovy:1)



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Andreas Sahlbach - 04/Jan/07 02:19 PM
hmm...some of my examples above are messed up, because Jira replaces my double-backslashes with newlines. So here they come again:
groovy> println "asd" - "a\\c"
groovy> go
Caught: java.util.regex.PatternSyntaxException: Illegal control escape sequence near index 2
a\c
^
at gjdk.java.lang.String_GroovyReflector.invoke(Unknown Source)
at CommandLine.run(CommandLine.groovy:1)

groovy> println "should_not_match_but_matches"-"should_...match"
groovy> go
but_matches

groovy> println "asd"-"a\\ud"
groovy> go
Caught: java.util.regex.PatternSyntaxException: Illegal Unicode escape sequence near index 4
a\ud
^
at gjdk.java.lang.String_GroovyReflector.invoke(Unknown Source)
at CommandLine.run(CommandLine.groovy:1)

Paul King - 29/Dec/07 08:18 PM
Now escapes regex meta-characters.

Jim White - 13/Jan/08 10:43 AM
I just found this issue because of a report by a user discovering the unexpected regex behavior of minus(String, Object).

I object to making a breaking change to the behavior of this (or any) public API at a bug fix release level (like 1.5.2). At a minimum this type of change must wait until a minor release like 1.6.

But I'm not sure I'd agree with this change at all. I do agree that this behavior is at first a surprise, but I suspect the original implementor intended this and it is consistent with Groovy's support of regex. Certainly it has been this way since 1.0 and is documented (although the "see also" makes the assumption that the reader is a Java expert or will click the link).

I do believe the documentation should be updated to make the current behavior clearer. And if it were to change in the future, then the release notes are going to need to make a big notice about this breaking change.

And lastly, if you're gonna yank the regex behavior, don't use String.replaceFirst at all since it is the worst possible performance (quote a regex pattern, compile the pattern, then do regex replacement)! If you want to string literal replacement you use String.indexOf(String).


Paul King - 14/Jan/08 06:56 AM - edited
In Groovy 1.0, the Javadoc made no mention of the regex-style handling under the covers for the DGM#minus(String, Object) method (nor did GinA as far as I can find). In 1.5.0 and 1.5.1, the Javadoc is not very clear but mentions this regex behavior. We should clarify this doco in 1.5.2 if we can. The proposed behavior for 1.6.0 is that if the remove target is a pattern, regex-like behavior will occur. If target is not a pattern, the text will simply be literally removed from the String. This results in the following behavior:
assert 'abcda.ce' - /a.c/ == 'abcde'
assert 'abcda.ce' - ~/a.c/ == 'da.ce'

Paul King - 14/Jan/08 06:58 AM
Reopened until we decide what to do for 1.5.2/1.6. For 1.5.2 we need at least some doco/clarification and not merge any 1.6 only behavior on to that branch.

Paul King - 17/Jan/08 06:57 AM - edited
Leaving this issue to cover just the 1.5.2 case, the 1.6 case is now covered in a subtask.
Just to summarize options at this stage:
  1. leave code on branch as is but remove misleading '@see String#replaceFirst(String,String)' from javadoc as this is unintentionally leaking implementation (update this issue as won't fix for 1.5.2 - leaves desired behavior ambiguous for 1.5.x)
  2. include r10049 and r10349 on branch (close this issue - minor breaking change will be noted in release notes - another minor breaking change will be noted in 1.6 release notes)
  3. include r10350 on branch (builds on previous two which will also be added) (close this issue - minor breaking change will be noted in release notes and 1.5.2 behavior will be aligned with 1.6+ behavior). This is my preferred solution.

Olivier Armand - 21/Jan/08 12:33 PM
Browsing those patches, I found that RegexUtils.quote seems to have an incorrect behavior in some cases.
System.out.println(RegexUtils.quote("\\")); // ouputs \Q\\E
System.out.println(RegexUtils.quote("\\\\E")); // outputs \Q\\E\\E\Q\E

Jim White - 21/Jan/08 02:01 PM
Those are the same results as Pattern.quote:

monospaced
import java.util.regex.Pattern

assert Pattern.quote('
') == /\Q
E/
assert Pattern.quote('\\\\E') == /\Q\\E\\E\Q\E/

println Pattern.quote('
')
println Pattern.quote("\\\\E")
monospaced

And speaking of RegexUtils it also needs to implement quoteReplacement for Matcher.quoteReplacement. But I guess that's for another JIRA...


Olivier Armand - 21/Jan/08 03:00 PM
You're right, forget my comment.

Guillaume Laforge - 25/Jan/08 11:34 AM
Should be fixed after the merge of trunk into the branch.