jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • jaxen
  • JAXEN-193

child::* in non-final position causes XPathSyntaxException

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Critical Critical
  • Resolution: Fixed
  • Affects Version/s: 1.1
  • Fix Version/s: 1.1
  • Component/s: saxpath
  • Labels:
    None

Description

The valid XPath expression

/*[ * or processing-instruction() ]

causes Jaxen to report an XPathSyntaxException with message "Expected: ]" (at char 6).

Re-arrange it to this:

/*[ processing-instruction() or * ]

and the problem goes away.

From some probing of the XPathLexer class, it looks as though '' in the sequence ' or...' is being handled incorrectly here as a multiply operator rather than a NameTest wildcard.

A (so far insufficiently tested) fix is to apply the rules relating to '*' in section 3.7 of XPath 1.0 as soon as star() is called in XPathLexer.java and return disambiguated token types for each of multiply operator and NameTest wildcard.

  • Options
    • Sort By Name
    • Sort By Date
    • Ascending
    • Descending
    • Download All

Attachments

  1. Text File
    jaxen.patch
    07/Jul/08 12:37 PM
    7 kB
    Brian Ewins
  2. Text File
    jaxen2.patch
    07/Jul/08 12:40 PM
    7 kB
    Brian Ewins

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Elliotte Rusty Harold added a comment - 26/Apr/08 7:16 PM

I can confirm this one. I've added a test case for it.

Show
Elliotte Rusty Harold added a comment - 26/Apr/08 7:16 PM I can confirm this one. I've added a test case for it.
Hide
Permalink
Brian Ewins added a comment - 16/Jun/08 5:00 AM

As you say this is to do with 3.7. Short answer is that we had this in the javacc version of the parser ages ago, the implementation is at the bottom of this comment. Saxpath should have been binned years ago, etc etc.

Long answer: that tokenizer was written to be stateful, using the previous token to determine what the current token should be emitted as. The same fix would probably work here. A direct translation doesn't correspond to a change in 'star()', but a second switch statement in 'nextToken()'.

An implementation would be: instead of storing 'previousToken' in XPathLexer, store 'boolean expectOperator = false;'. Remove the switch statement from identifierOrOperatorName(), and place it at the end of nextToken(), use it to set 'expectOperator'. In the first switch statement in nextToken, the entry for '*' should read:

case '*':
{
if (expectOperator) { token = star(); } else { // or something less ambiguous; I havent checked the knock-on effects of this token = identifier(); }
break;

and the entry for default:
default:
{
if ( isIdentifierStartChar( LA(1) ) )
{
if (expectOperator) { token = operatorName(); } else { token = identifier(); }
}
}

this boils down to much the same change as in the first comment, but there should be a bit less duplication of the previousToken logic.

For completeness, here's how it looked in javacc:

/*
If there is a preceding token and the preceding token is not one of @, ::,
(, [, , or an Operator, then a * must be recognized as a MultiplyOperator
and an NCName must be recognized as an OperatorName.
*/
<OP> TOKEN: {
<MultiplyOperator: "*"> : DEFAULT

<AND:"and"> : DEFAULT
<OR:"or"> : DEFAULT
<MOD:"mod"> : DEFAULT
<DIV:"div"> : DEFAULT
}

<*> TOKEN: {
<LEFT_PAREN:"("> : DEFAULT

<RIGHT_PAREN:")"> : OP
<LEFT_SQUARE:"["> : DEFAULT
<RIGHT_SQUARE:"]"> : OP
<PERIODS:".."> : OP
<Number: <Digits> ("." (<Digits>)?)? "." <Digits>>
<PERIOD:"."> : OP
<AT:"@"> : DEFAULT
<COMMA:","> : DEFAULT
<COLONS:"::"> : DEFAULT
<SLASHES: "//"> :DEFAULT
<SLASH: "/"> :DEFAULT
<PIPE: " "> :DEFAULT
<PLUS: "+"> :DEFAULT
<MINUS: "-"> :DEFAULT
<EQ: "="> :DEFAULT
<NE: "!="> :DEFAULT
<LTE: "<="> :DEFAULT
<LT: "<"> :DEFAULT
<GTE: ">="> :DEFAULT
<GT: ">"> :DEFAULT
<Literal: "\"" (~["\""])* "\"" "'" (~["'"])* "'"> : OP
<VariableReference: "$" <QName>> : OP
}
Show
Brian Ewins added a comment - 16/Jun/08 5:00 AM As you say this is to do with 3.7. Short answer is that we had this in the javacc version of the parser ages ago, the implementation is at the bottom of this comment. Saxpath should have been binned years ago, etc etc. Long answer: that tokenizer was written to be stateful, using the previous token to determine what the current token should be emitted as. The same fix would probably work here. A direct translation doesn't correspond to a change in 'star()', but a second switch statement in 'nextToken()'. An implementation would be: instead of storing 'previousToken' in XPathLexer, store 'boolean expectOperator = false;'. Remove the switch statement from identifierOrOperatorName(), and place it at the end of nextToken(), use it to set 'expectOperator'. In the first switch statement in nextToken, the entry for '*' should read: case '*': { if (expectOperator) { token = star(); } else { // or something less ambiguous; I havent checked the knock-on effects of this token = identifier(); } break; and the entry for default: default: { if ( isIdentifierStartChar( LA(1) ) ) { if (expectOperator) { token = operatorName(); } else { token = identifier(); } } } this boils down to much the same change as in the first comment, but there should be a bit less duplication of the previousToken logic. For completeness, here's how it looked in javacc: /* If there is a preceding token and the preceding token is not one of @, ::, (, [, , or an Operator, then a * must be recognized as a MultiplyOperator and an NCName must be recognized as an OperatorName. */ <OP> TOKEN: { <MultiplyOperator: "*"> : DEFAULT
<AND:"and"> : DEFAULT
<OR:"or"> : DEFAULT
<MOD:"mod"> : DEFAULT
<DIV:"div"> : DEFAULT }
<*> TOKEN: { <LEFT_PAREN:"("> : DEFAULT
<RIGHT_PAREN:")"> : OP
<LEFT_SQUARE:"["> : DEFAULT
<RIGHT_SQUARE:"]"> : OP
<PERIODS:".."> : OP
<Number: <Digits> ("." (<Digits>)?)? "." <Digits>>
<PERIOD:"."> : OP
<AT:"@"> : DEFAULT
<COMMA:","> : DEFAULT
<COLONS:"::"> : DEFAULT
<SLASHES: "//"> :DEFAULT
<SLASH: "/"> :DEFAULT
<PIPE: " "> :DEFAULT
<PLUS: "+"> :DEFAULT
<MINUS: "-"> :DEFAULT
<EQ: "="> :DEFAULT
<NE: "!="> :DEFAULT
<LTE: "<="> :DEFAULT
<LT: "<"> :DEFAULT
<GTE: ">="> :DEFAULT
<GT: ">"> :DEFAULT
<Literal: "\"" (~["\""])* "\"" "'" (~["'"])* "'"> : OP
<VariableReference: "$" <QName>> : OP }
Hide
Permalink
Brian Ewins added a comment - 16/Jun/08 5:02 AM

ugh. jira completely messed up the formatting there, sorry. Unfortunately the preview button shows a completely different (also wrong) formatting, so I've no idea how to fix that up. You get the gist.

Show
Brian Ewins added a comment - 16/Jun/08 5:02 AM ugh. jira completely messed up the formatting there, sorry. Unfortunately the preview button shows a completely different (also wrong) formatting, so I've no idea how to fix that up. You get the gist.
Hide
Permalink
Brian Ewins added a comment - 07/Jul/08 12:37 PM

Sorry meant to get back to you on this Elliotte. I'll attach 2 variants of the patch, the first one leaves the getter and setter for PreviousToken in place - in case you want to preserve the API. Both variants add a second token type (STAR_OPERATOR) for * as a multiply op. I can get away without doing that but the changes started to look pretty ugly; you end up using Token.IDENTIFIER for '' instead, and having to test everywhere that's used to see if it matches ''. Bleh. Tests ran mostly ok for me except for a jdom issue which I believe is my environment - its picking up an older/newer jdom somewhere.

Show
Brian Ewins added a comment - 07/Jul/08 12:37 PM Sorry meant to get back to you on this Elliotte. I'll attach 2 variants of the patch, the first one leaves the getter and setter for PreviousToken in place - in case you want to preserve the API. Both variants add a second token type (STAR_OPERATOR) for * as a multiply op. I can get away without doing that but the changes started to look pretty ugly; you end up using Token.IDENTIFIER for '' instead, and having to test everywhere that's used to see if it matches ''. Bleh. Tests ran mostly ok for me except for a jdom issue which I believe is my environment - its picking up an older/newer jdom somewhere.
Hide
Permalink
Brian Ewins added a comment - 07/Jul/08 12:40 PM

Second variant with the unused previousToken methods removed. BTW another issue you run into if you try to do this with identifier() is that it won't collect the token text for '*', you get lots of parse errors because the token text collected becomes ''. Or something. I didn't delve too far into that because this solution was neater.

Show
Brian Ewins added a comment - 07/Jul/08 12:40 PM Second variant with the unused previousToken methods removed. BTW another issue you run into if you try to do this with identifier() is that it won't collect the token text for '*', you get lots of parse errors because the token text collected becomes ''. Or something. I didn't delve too far into that because this solution was neater.
Hide
Permalink
Elliotte Rusty Harold added a comment - 29/Nov/08 11:59 AM

Brian's patch works. Committed

Show
Elliotte Rusty Harold added a comment - 29/Nov/08 11:59 AM Brian's patch works. Committed

People

  • Assignee:
    Brian Ewins
    Reporter:
    Andrew Sales
Vote (0)
Watch (0)

Dates

  • Created:
    19/Oct/07 9:53 AM
    Updated:
    29/Nov/08 12:00 PM
    Resolved:
    29/Nov/08 11:59 AM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.