After quite a bit of work with the debugger I have determined that the bug is not where I thought it was at all. The DescendantAxisIterator and DescendantOrSelfAxisIterator classes are correct. In fact, /descendant::x and /descendant-or-self::x behave as expected. The following block of code in DefaultNameStep.evaluate is what shuffles the nodes out of document order:
for (int i = 0; i < contextSize; ++i) {
eachContextNode = contextNodeSet.get
;
Iterator axisNodeIter = axisIterator(eachContextNode, support);
if (axisNodeIter == null || axisNodeIter.hasNext() == false) {
continue;
}
while (axisNodeIter.hasNext()) {
eachAxisNode = axisNodeIter.next();
if (matches(eachAxisNode, support)) {
if (unique.put(eachAxisNode, PRESENT) == null) {
interimSet.add(eachAxisNode);
}
}
}
// evaluate the predicates
newNodeSet.addAll(getPredicateSet().evaluatePredicates(interimSet, support));
interimSet.clear();
}
}
I think what is going on is that because //x is really /descendant-or-self::node()/child::x Jaxen does the following:
1. Grab all the nodes that match /descendant-or-self::node(). jaxen returns these in the correct order.
2. Iterate through that list, and for each node in that list, in order, evaluate child::x. Because it goes in order it gets the children of <a>, including <x>1</x> and <x>4</x>, before it evaluates the children of b.
Jaxen simply appends these to the list, and finally returns <x1>1</x>, <x1>4</x>, <x>2></x>, <x>3</x>.
Jaxen is assuming that because it gets the order of each step right, it will get the eventual order right, but this is not true.
For the moment I am not sure how to fix this problem. I suspect it extends way beyond this simple case. Possibly there's a way we can fix the order within the block of code I've indicated here. Possibly, however, Jaxen needs some sort of resort step, either at the end of evaluation, or at the end of evaluation of each step to make sure the nodes come out in document order as promised.
Possibly Jaxen needs to attach some sort of sequence number to each node as its read in the first time, so that the nodes can be efficiently sorted when they come out. I'm not quite sure how that would work. Possibly there's another solution. Suggestions are appreciated.
test to illustrate this:
<document url="xml/moreover.xml">
<context select="/">
<valueOf select="(//*)[5]">StockAccess</valueOf>
</context>
</document>