Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Minor
-
Resolution: Fixed
-
Affects Version/s: 1.1
-
Fix Version/s: 1.1.3
-
Component/s: None
-
Labels:None
-
Number of attachments :
Description
If I use Jaxen to evaluate an XPath predicate and that predicate requires a text node to be implicitly cast to a numeric value, the results of the predicate are not what I expect. If, however, I do the cast explicitly, then I do get the expected results.
As an example, assume I have the following input data:
<geo_data xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
<loc id="1">
<geo:long>-122.424761</geo:long>
<geo:lat>37.776694</geo:lat>
</loc>
<loc id="2">
<geo:long>-122.539001465</geo:long>
<geo:lat>37.9743995667</geo:lat>
</loc>
<loc id="3">
<geo:long>-122.42099762</geo:long>
<geo:lat>37.7573013306</geo:lat>
</loc>
<loc id="4">
<geo:long>-122.427001953</geo:long>
<geo:lat>37.7430992126</geo:lat>
</loc>
<loc id="5">
<geo:long>-122.419998169</geo:long>
<geo:lat>37.7448005676</geo:lat>
</loc>
</geo_data>
If I try to run the following XPath expression, I get zero results:
//geo:lat[(text() >= 37.75)]
But I expect to get 3 results. And indeed, if I specify an explicit cast of the text() node to numeric, that's exactly what I get:
//geo:lat[(number(text()) >= 37.75)]
I skimmed through the XPath spec and tried out a bunch of different expressions, and based on the results, my guess (and it's only a guess) is that Jaxen has a bug where implicit casting of a Text node to a numeric value incorrectly uses the "toString()" value from the Text node (ex. "#text: 37.7448005676"), which leads to a result that is NaN.
I (perhaps erroneously) came to the above guess based on the following observations.
=================================
Sample Expressions & Observations
=================================
Observation 1:
--------------
//geo:lat[(text() >= '37.75')] ==> 0 results
//geo:lat[(text() >= 37.75)] ==> 0 results
From this it is clear that specifying the literal without quotes doesn't change the results; in either case Jaxen returns 0 results.
Observation 2:
--------------
//geo:lat[(text() >= '37.7430992126')] ==> 0 results
//geo:lat[(text() >= 37.7430992126)] ==> 0 results
Using a precise number with the same number of decimal digits as the text() node does not change the results. So this does not appear to be an issue of string padding.
Observation 3:
--------------
//geo:lat[(text() = '37.7430992126')] ==> 1 result
//geo:lat[(text() = 37.7430992126)] ==> 0 results
//geo:lat[(text() != '37.7430992126')] ==> 4 results
//geo:lat[(text() != 37.7430992126)] ==> 5 results
Doing EQUALS or NOT EQUALS instead of GREATER-THAN-OR-EQUALS does make a difference. Jaxen appears to do different things when we have "=, Unable to render embedded object: File (=" verses when we have "<, <=, >, >=". When using "=" or ") not found.=" we get the expected results if the literal is a string, but we get unexpected results when the literal is a number.
Observation 4:
--------------
//geo:lat[number(text()) >= '37.75'] ==> 3 results
//geo:lat[number(text()) >= 37.75] ==> 3 results
//geo:lat[number(text()) = '37.7430992126'] ==> 1 result
//geo:lat[number(text()) = 37.7430992126] ==> 1 result
Using the "number()" function on the text node gives us the expected results. This is true regardless of whether the literal is a string or a number.
Observation 5:
--------------
In section 3.4 of the XPath 1.0 specification (http://www.w3.org/TR/xpath) an explanation of comparison operator behavior is given. The behavior depends on two primary factors:
a) Whether or not the operands are "node-sets".
b) Whether the operator is "=, !=", or whether it is "<, <=, >, >=".
Factor "b" has a direct correlation with observation #3: i.e. the behavior of Jaxen changes depending on whether we are using "=, !=" verses "<, <=, >, >=".
Factor "a" has a direct correlation with observation #4: i.e. the behavior of Jaxen changes depending on whether we have a "node-set" (that is, "text()") verses an atomic value ("number(text())").
==============
XPath Snippets
==============
R1: "If one object to be compared is a node-set and the other is a number, then the comparison will be true if and only if there is a node in the node-set such that the result of performing the comparison on the number to be compared and on the result of converting the string-value of that node to a number using the number function is true. If one object to be compared is a node-set and the other is a string, then the comparison will be true if and only if there is a node in the node-set such that the result of performing the comparison on the string-value of the node and the other string is true."
R2: "When neither object to be compared is a node-set and the operator is = or !=, then the objects are compared by converting them to a common type as follows and then comparing them. [...] if at least one object to be compared is a number, then each object to be compared is converted to a number as if by applying the number function. Otherwise, both objects to be compared are converted to strings as if by applying the string function."
R3: "When neither object to be compared is a node-set and the operator is <=, <, >= or >, then the objects are compared by converting both objects to numbers and comparing the numbers according to IEEE 754."
============================
Assumptions and Explanations
============================
IF we *ASSUME* that:
a) Jaxen takes a Text node, which constitutes a node-set, and
turns it into a string or a numeric value, which is no longer
a node-set, as part of R1 processing. Then it passes that
value down to R2 or R3, along with the literal on the right
side of the comparison; AND
b) Jaxen correctly converts string literals into numeric
values; AND
c) Jaxen fails to correctly convert the Text node's string
value into a number and instead produces something along
the lines of an NaN,
THEN all of the results for the expressions that I try become (mostly) explainable. Here are the details.
Q1. //geo:lat[(text() >= '37.75')] ==> 0 results
Q2. //geo:lat[(text() >= 37.75)] ==> 0 results
Q3. //geo:lat[(text() >= '37.7430992126')] ==> 0 results
Q4. //geo:lat[(text() >= 37.7430992126)] ==> 0 results
For Q1 thru Q4, R1 dicates that we get a string value from the text node; then R3 indicates that both operands should be converted to a number. If attempts to convert the text node's string value into a numeric value returns NaN, the comparison will never return true, hence no results.
Q5. //geo:lat[(text() = 37.7430992126)] ==> 0 results
Q6. //geo:lat[(text() != 37.7430992126)] ==> 5 results
For Q5 and Q6, R1 dictates that we get a numeric value from the text node. If retrieval of the numeric value from the Text node returns NaN, then subsequent equality comparison with the numeric literal will never return true; but IN-equality comparison could perhaps return true because NaN is clearly not the same as a valid number...? (just guessing).
Q7. //geo:lat[(text() = '37.7430992126')] ==> 1 result
Q8. //geo:lat[(text() != '37.7430992126')] ==> 4 results
For Q7 and Q8, R1 dicates that we get a string value from the text node; then R2 indicates that both operands should be converted to a string--which they both already are. So the comparison runs as a string comparison and returns the expected results, because the text node string value was never converted to a numeric.
Q9. //geo:lat[number(text()) >= '37.75'] ==> 3 results
Q10. //geo:lat[number(text()) >= 37.75] ==> 3 results
For Q9 and Q10, the number() function correctly converts the text node's string value to a numeric value; then since neither operand is a node-set R1 does NOT apply. R3 dictates that all operands are converted to numeric, which is a no-op on the value returned from the number() function and which works correctly for the right-hand literals. Thus the comparison returns correct results.
So it all kind of makes a bit of sense if we assume that Jaxen is having problems with implicit conversion of the string value of a Text node returned by "text()" into a numeric value...
But all of that said, I could simply be missing something key. So if this is working as designed, then please excuse the noise...
Attaching a simple, standalone (not JUnit) repro for the reported issue. I tried it against Jaxen 1.1.1 and Jaxen 1.1.2, but am seeing the same behavior in both releases.
Extract the three files into a directory, then from within that directory...
To compile:
javac -cp C:/Progs/jaxen-1.1.2/jaxen-1.1.2.jar JaxenXPCheck.java
To run:
java -cp "C:/Progs/jaxen-1.1.2/jaxen-1.1.2.jar;." JaxenXPCheck [XPath expressions]
XPath expressions can be specified as arguments using quoted values. If you do not specify any expressions, the repro will simply run two default expressions, where the only difference between the two is implicit vs. explicit casting. Ex.
> java -cp "C:/Progs/jaxen-1.1.2/jaxen-1.1.2.jar;." JaxenXPCheck
Expr: //geo:lat[(text() >= 37.75)]
=> Found list w/ 0 matching nodes.Expr: //geo:lat[(number(text()) >= 37.75)]
=> Found list w/ 3 matching nodes.=> Found list w/ 0 matching nodes. Expr: //geo:lat[(number(text()) >= 37.75)]=> Found list w/ 3 matching nodes.