Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Critical
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: 1.1
-
Component/s: None
-
Labels:None
-
Number of attachments :0
Description
According the documentation of BaseXPath.selectNodes,
"nodes will be returned in document-order" unless I'm
"using XPath expressions involving the union operator".
But it seems that //somthing expressions return the result
not in document order. E.g for this XML:
<a>
<x>1</x>
<b>
<x>2</x>
<x>3</x>
</b>
<x>4</x>
</a>
XPath expression //x will return the nodes in order 1, 4,
2, 3.
I have tried it with both 1.0 FCS and the 1.1 beta
Activity
It was a bad choice in implementation, but easily corrected.
A path involving // is a full walk of the tree, and unfortunately, we picked breadth-first instead of depth-first.
I think it's one change of how we push nodes onto the walk stack in one method. Correction of it is left as an exercise to the reader.
It's certainly not obvious to me how to fix this. If anyone sees, please do. After some single stepping with a debugger, I have determined the following:
// returns the nodes in the correct order. That is /descendant-or-self::node() returns all nodes in the document in document order as expected. It appears that the DescendantAxisIterator and DescendantOrSelfAxisIterator work. (Not 100% certain here since I was using my own custom iterators rather than the standard ones from org.jaxen but pretty sure) I don't think the bug is here (or if there is a bug, it's not only here).
The problem doesn't arise until you have something like //x or, unabbreviated/ /descendant-or-self::node()/child::x
Specifically it seems to be the DefaultNameStep.evaluate method that gets the order wrong; specifically this for loop:
for (int i = 0; i < contextSize; ++i) {
eachContextNode = contextNodeSet.get
;
Iterator axisNodeIter = axisIterator(eachContextNode, support);
if (axisNodeIter == null || axisNodeIter.hasNext() == false)
// ensure only unique matching nodes in the result
while (axisNodeIter.hasNext()) {
eachAxisNode = axisNodeIter.next();
if (matches(eachAxisNode, support)) {
if (unique.put(eachAxisNode, PRESENT) == null)
}
}
// evaluate the predicates
newNodeSet.addAll(getPredicateSet().evaluatePredicates(interimSet, support));
interimSet.clear();
}
However, the fix is not yet obvious to me so if anyone else sees how to fix this, please do so.
FYI, I suspect this is the same problem as Jaxen-15, which says:
Please see
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23933
Which I think describes the issue and saves me retyping essentially the same data.
If we fix this, I think we can close that too.
FYI Elliotte: unfortunately an apple jdk upgrade has hosed my IDE, I'm only able to browse the code at the moment. However I went over the code for the two descendant iterators and they seem overly complex, but correct. They both would work with this logic (merged for brevity):
Iterator children;
// constructor for Descendant axis
Descendant() {
children = EMPTY_ITERATOR;
stack.push(children(node));
}
// constructor for DescendantOrSelf axis
DescendantOrSelf() {
children = EMPTY_ITERATOR;
stack.push(new SingletonList(node).iterator());
}
boolean hasNext() {
while (!children.hasNext()) {
if (stack.isEmpty())
children = (Iterator) stack.pop();
}
return true;
}
Object next() {
if (hasNext())
throw new NoSuchElementException();
}
This doesnt change the logic much, so won't fix this bug (it removes an unnecessary check in internalCreateIterator, and avoids using nulls as sentinels). Just posting this in case it helps, I'll apply this tidy up once I've got my IDE back.
Upgrading this bug to critical because working around it is proving to be a major performance hit in XOM (something like 50% of the execution time on some queries)
After quite a bit of work with the debugger I have determined that the bug is not where I thought it was at all. The DescendantAxisIterator and DescendantOrSelfAxisIterator classes are correct. In fact, /descendant::x and /descendant-or-self::x behave as expected. The following block of code in DefaultNameStep.evaluate is what shuffles the nodes out of document order:
for (int i = 0; i < contextSize; ++i) {
eachContextNode = contextNodeSet.get
;
Iterator axisNodeIter = axisIterator(eachContextNode, support);
if (axisNodeIter == null || axisNodeIter.hasNext() == false)
while (axisNodeIter.hasNext()) {
eachAxisNode = axisNodeIter.next();
if (matches(eachAxisNode, support)) {
if (unique.put(eachAxisNode, PRESENT) == null)
}
}
// evaluate the predicates
newNodeSet.addAll(getPredicateSet().evaluatePredicates(interimSet, support));
interimSet.clear();
}
}
I think what is going on is that because //x is really /descendant-or-self::node()/child::x Jaxen does the following:
1. Grab all the nodes that match /descendant-or-self::node(). jaxen returns these in the correct order.
2. Iterate through that list, and for each node in that list, in order, evaluate child::x. Because it goes in order it gets the children of <a>, including <x>1</x> and <x>4</x>, before it evaluates the children of b.
Jaxen simply appends these to the list, and finally returns <x1>1</x>, <x1>4</x>, <x>2></x>, <x>3</x>.
Jaxen is assuming that because it gets the order of each step right, it will get the eventual order right, but this is not true.
For the moment I am not sure how to fix this problem. I suspect it extends way beyond this simple case. Possibly there's a way we can fix the order within the block of code I've indicated here. Possibly, however, Jaxen needs some sort of resort step, either at the end of evaluation, or at the end of evaluation of each step to make sure the nodes come out in document order as promised.
Possibly Jaxen needs to attach some sort of sequence number to each node as its read in the first time, so that the nodes can be efficiently sorted when they come out. I'm not quite sure how that would work. Possibly there's another solution. Suggestions are appreciated.
you are absolutely correct as to what's going on. trying to crunch it through my brain now to see if there's a way around this w/o re-sorting it all.
We can fix this specific problem by injecting the following into DefaultAbsoluteLocationPath:
public Expr simplify()
{
final DefaultAbsoluteLocationPath expr = (DefaultAbsoluteLocationPath)super.simplify();
if( 2 == getSteps().size() )
{
final Step second = (Step)getSteps().get( 1 );
if( Axis.DESCENDANT_OR_SELF == ( (Step)getSteps().get( 0 ) ).getAxis()
&& second instanceof DefaultNameStep
&& Axis.CHILD == second.getAxis() )
}
return expr;
}
But, this just fixes this specific problem. Would like to find a general solution.
I honestly think this is just a simple change.
Switching from a bredth-first walk to a depth-first should be almost trivial, no?
bob, i saw your earlier comment about that, but i didn't see where it was an issue.
the descendant-or-self iterator does to depth first, as it should be needed.
that step returns a node-set of all document nodes, in order.
next step is a child iterator for 'x'.
since the first node in the list is 'a', the children of it are x/1 and x/4. this will happen regardless of how the first step is evaluated (be it depth or breadth)
unless in depth-vs-breadth you are talking about how the various steps play together? (ie, only evaluate first node of first step, pass to 2nd step, etc).. but i think that might be a huge change to the entire way jaxen operates, and i can't work it out in my head if that wouldn't totally break things.
This is now fixed. All node-sets are returned in document order.
test to illustrate this:
<document url="xml/moreover.xml">
<context select="/">
<valueOf select="(//*)[5]">StockAccess</valueOf>
</context>
</document>