Details
Description
Charles Souillard reports a problem while trying to evaluate the following expression : 2+1-1+1. It should be 3.0 but is 1.0:
XPath xpath = new DOMXPath("2+1-1+1");
List results = (List) xpath.selectNodes(null);
for (int i=0;i<results.size();i++) {
Object result = results.get
;
System.out.println("result = "+result);
}
A test is now in CVS.
Attachments
-
- assoc2.patch
- 13/Jun/05 7:28 PM
- 5 kB
- Brian Ewins
Activity
Its not quite that the operators (individually) arent left/right associative. The way the evaluation order works in saxpath, its as if +, - are treated as the /same/ right associative operator. ie the expression above evaluates as "2 + (1 - (1 + 1))".
The problem is the way the parser recurses. If you take a look at org.jaxen.saxpath.base.XPathReader.additiveExpr() you'll see that it parses that as:
AdditiveExpr:: MultiplicativeExpr
| MultiplicativeExpr '+' AdditiveExpr |
| MultiplicativeExpr '-' AdditiveExpr |
when the grammar said this:
AdditiveExpr:: MultiplicativeExpr
| AdditiveExpr '+' MultiplicativeExpr |
| AdditiveExpr '-' MultiplicativeExpr |
(its slightly obscured because one level of recursion has been unrolled, correcting the problem on shorter expressions). The usual solution, in any compiler text (see eg http://www.google.co.uk/search?q=eliminating+left+recursion), is to introduce a new nonterminal to transform direct left recursion into right recursion.
AdditiveExpr:: MultiplicativeExpr FooExpr?
FooExpr:: '+' MultiplicativeExpr FooExpr? | '-' MultiplicativeExpr FooExpr?
you can eliminate the extra nonterminal again, using javacc syntax:
AdditiveExpr:: MultiplicativeExpr ('+' MultiplicativeExpr | '-' MultiplicativeExpr)*
Applying that to the saxpath code gives, roughly:
void additiveExpr() throws org.jaxen.saxpath.SAXPathException
{
XPathHandler handler = getXPathHandler();
handler.startAdditiveExpr();
multiplicativeExpr();
Stack stack = new Stack();
int la = LA(1);
while (la == PLUS || la == MINUS) {
switch ( LA(1) ) {
// MINUS case ommitted for clarity
case PLUS:
}
la = LA(1);
}
while (!stack.isEmpty()) {
handler.endAdditiveExpr(stack.pop());
}
// the last expr is always just a multiplicative expr.
handler.endAdditiveExpr(Operator.NO_OP);
}
The same fix needs applied to all left-recursive nonterminals (OrExpr, AndExpr, etc).
A second solution is to abandon saxpath. These problems are far easier to fix with a parser generator.
| MultiplicativeExpr '+' AdditiveExpr |
| MultiplicativeExpr '-' AdditiveExpr |
| AdditiveExpr '+' MultiplicativeExpr |
| AdditiveExpr '-' MultiplicativeExpr |
whoops slight error there... this is what I get for just typing code into the comments. Looping on 2+3 you should see events like:
startAdditiveExpr
numberExpr "2"
startAdditiveExpr
numberExpr "3"
endAdditiveExpr(NO_OP)
endAdditiveExpr(PLUS)
the code above does it wrong because I put the NO_OP in the wrong place. Code should read:
// the last expr is always just a multiplicative expr.
handler.endAdditiveExpr(Operator.NO_OP);
while (!stack.isEmpty()) {
handler.endAdditiveExpr(stack.pop());
}
So it should be:
void additiveExpr() throws org.jaxen.saxpath.SAXPathException
{
getXPathHandler().startAdditiveExpr();
multiplicativeExpr();
Stack stack = new Stack();
int la = LA(1);
while (la == PLUS || la == MINUS) {
switch ( LA(1) ) {
// MINUS case ommitted for clarity
case PLUS:
case MINUS:
{ match( MINUS ); stack.push(new Integer(Operator.SUBTRACT)); getXPathHandler().startAdditiveExpr(); multiplicativeExpr(); break; } }
la = LA(1);
}
// the last expr is always just a multiplicative expr.
getXPathHandler().endAdditiveExpr(Operator.NO_OP);
while (!stack.isEmpty()) {
getXPathHandler().endAdditiveExpr(((Integer) stack.pop()).intValue());
}
}
This breaks more tests that the current code does. What am I getting wrong?
I would like to eventually move to a parser-generator based solution. However, if we can fix saxpath first for 1.1 that woudl be nice. It is a public API after all. I'm vaguely thinking we coudl move to a parser-generator in 2.0 when we break the API. However, if we can't get 1.1 to work with SAXPath then we may need to break the API earlier than planned. We could always deprecate saxpath and leave it in for now but stop using it I suppose.
I'm not sure what's wrong if thats failing for you. I just typed in the code that matches the grammar, since you asked for suggestions, I didn't even compile it. If you want I can take a run at this and try to produce a tested patch.
I'd suspect that any test cases that test saxpath directly may fail, since this changes the tree structure even for simple expressions. Were there any failures in the DocumentNavigatorTest? I also noticed in my own experiments (not looking at this bug specifically) that some of what simplify() does is odd - it seemed to alter the parse tree so it was no longer like the one generated by the handler, eg null predicate lists instead of empty predicate lists. I'd doubt that wasnt the problem here, but I ended up commenting out the call to simplify() in DefaultXPathExpr when I was working on the internals, it muddies the waters.
I agree that getting saxpath fixed is a laudable goal, but as I've said before, I don't like that code, and I've already written the replacement. The parser patch is independent of the other stuff I was working on, I'll post it up-to-date with your changes this evening. It involves a small change (5 lines-ish) in BaseXPath, and the javacc-generated code. I think that last time I looked at it, this passed all of the tests; but you've added a lot since then. At least it would leave you with options.
The DocumentNavigator tests all failed as did all the tests I'd added in BaseXPath specifically to check for asssociativity. Previously some of these were passing.
Please do take a run at this. If you can fix it, great!
This fixes the eval-order problem for +, -, *, mod, div. Still several operators to do. Some new tests fail (as they test exactly what saxpath returns). The problem with this patch is, it generates start events out of order - only the end events are generated in the correct order. I have to rely on the behaviour of JaxenHandler, which maintains a stack. Looking at what I did, saxpath needs major surgery to generate the correct events in the correct order - the stack in jaxenhandler needs pulled back into saxpath, essentially making saxpath build a tree, then send events to jaxenhandler, only for it to build a tree again. There's a bad smell here somewhere... I thought I'd post the patch in its current state for you to review.
The patch seems to work. I only see one test failing with it that wasn't failing before; and as you point out, that test is probably wrong. The tests that test for associativity now pass so I'm going to go ahead and commit this. Then I'll work on fixing (or removing) the broken test. Next I'll probably write a few tests for the other operators you suspect are broken. Smelly it may be, but if it correctly handles XPath then it's a clear improvement on what we had before. Thanks!
I'm not sure this needs to be fixed for OrExpr and AndExpr. left recursive though they may be. Try as I might I haven't been able to concoct an expression where associativity makes a difference. I think the binary nature of boolean operators saves us here.
Ok I wrote a long comment there and lost it to the jira re-indexer... yes you're right about and/or, I misspoke. They have different precedence and are associative (not just left-associative) so the bug can't affect them. The other operators affected are the equality and relational operators. See this diff:
http://cvs.codehaus.org/viewrep/jaxen/jaxen/xml/test/tests.xml?r1=1.70&r2=1.69
... where I added tests for those ops (its section 3.4 of the spec)
and this diff:
http://cvs.codehaus.org/viewrep/jaxen/jaxen/src/java/main/org/jaxen/saxpath/base/XPathReader.java?r1=1.5&r2=1.4
where I applied the same incorrect solution that had previously been used for +/. Its mindbendingly difficult to think of cases where this could fail now, as you need at least 4 terms in an expression to get past the corrective action of my previous patch. I'd suggest undoing that 1.4>1.5 patch above so you can get the original equality/relational tests to fail, before applying the same fix used on the arithmetic operators.
It's difficult but I managed to do it with your hints that more than four terms were required. 5 > 4 > 3 > 2 > 1 fails as do several other tests I've now checked in. these are all in BaseXPathTest.
I've now added tests that demonstrate how some equality and inequality expressions fail as well.
I've used a similar fix for equality and inequality expressions. However for relational expressions it doesn't seem to work. The problem may involve a use of additive expressions where multiplicative expression are needed or vice versa or some such, based on the excpeiton message from the test cases. The new code looks like this:
private void relationalExpr() throws SAXPathException
{
additiveExpr();
int la = LA(1);
while (la == LESS_THAN || la == GREATER_THAN || la == LESS_THAN_EQUALS || la == Operator.GREATER_THAN_EQUALS)
{
switch ( la )
{
case LESS_THAN:
case GREATER_THAN:
{ match( GREATER_THAN ); getXPathHandler().startRelationalExpr(); additiveExpr(); getXPathHandler().endRelationalExpr( GREATER_THAN ); break; }case GREATER_THAN_EQUALS:
{ match( GREATER_THAN_EQUALS ); getXPathHandler().startRelationalExpr(); additiveExpr(); getXPathHandler().endRelationalExpr( GREATER_THAN_EQUALS ); break; }case LESS_THAN_EQUALS:
{ match( LESS_THAN_EQUALS ); getXPathHandler().startRelationalExpr(); additiveExpr(); getXPathHandler().endRelationalExpr( LESS_THAN_EQUALS ); break; } }
la = LA(1);
}
}
Anyone see anything obviously wrong with it?
OK. The problem is that I was effectively using TokenTypes.LESS_THAN instead of Operator.LESS_THAN and so forth. These are qurie confusingly named. That's fixed now. For instance,
case LESS_THAN_EQUALS:
{ match( LESS_THAN_EQUALS ); getXPathHandler().startRelationalExpr(); additiveExpr(); getXPathHandler().endRelationalExpr( Operator.LESS_THAN_EQUALS ); break; }There's still something going on. >= doesn't work though the other three do.
I've got it fixed now. It was the same problem. I was using Operator.GREATER_THAN_EQUALS where I needed TokenTypes.GREATER_THAN_EQUALS. The confusion between these two is a big problem. I'm going to see if I can do something about this.
Still to resolve: we are not pushing the event callbacks through XPathHandler in the same order we used to. This is only an issue for people implementing XPathHandler themselves. It does not affect people doing more normal XPath operations. How important is this? I'm inclined to just let this one go. I think all we're doing is skipping some callbacks that would otherwise have Operator.NO_OP, whcih maybe clients don't want to see anyway.
I don't think the event problem is a huge issue, as basically its the design of saxpath that's broken. I was thinking about this today. Only the end events are important... need a stack... doh! What saxpath is really doing is converting expressions to postfix form:
1 + 2 => 1 2 endAdditiveExpr
.
I don't think this helps much, but it does look even more like we'll never be able to generate start events in the right order - we'd need to insert them at random places deep in the stack.
As for users of these events: there is a handler that tries to turn paths into xml, supposedly to allow optimising transformations. The xml thats produced is junk now.
Is that handler that turns XPaths into XML hiding in Jaxen somewhere or if not, where can I find it? Sounds like it might be useful; and if I can play with it maybe I can figure out how to fix the event order.
I thought it was in cvs somewhere, I've just looked and I can't see it, possibly its been removed, possibly I was imagining things. The closest thing I can find on the list is a thread on "squashing prefixes" from Alex Cruise, where someone suggested to him that the he should go write a saxpath handler.
I think this is fixed as far as it needs to be for now. If there is a bug, then it doesn't get expressed when doing XPath evaluation.
If the incorrect order of start expressions bothers anyone, then we'll need a clear test case; and we can reopen this bug or open a new bug to look at fixing that in the post-1.1 time frame.
This is not a precedence problem as I originally thought. There is no precedence issue since + and - have the same precedence. Rather the problem is associativity. These operators seems to be right associative instead of left associative. Does that help anyone see how to fix this?