jaxen
  1. jaxen
  2. JAXEN-215

NodeComparator does not return correct results for sibling Attributes/Namespaces

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.3
    • Fix Version/s: 1.1.4
    • Component/s: core
    • Labels:
      None
    • Environment:
      all
    • Testcase included:
      yes
    • Number of attachments :
      2

      Description

      The NodeComparator.compare() method has two faults when taken from the perspective where a predicate function returns multiple Attribute ( or Namespace) siblings.
      The first issue is that compare(Object o1,Object o2) should return 0 when o1 == o2. This should be default behaviour as during the sorting process it is technically legal for the comparator to be called that way.

      The bigger issue is that if o1 and o2 are both siblings, and both either Attributes, or Namespaces, then the code 'falls through', and uses the compare() result of the parent (Element) to test for the document order.

      Unfortunately, they have the same parent (and the same depth), and, as a result, they always compare() as value 1. This always puts the o1 sibling after the o2 sibling.

      Further, this breaks the Comparator 'contract', because:
      1 == compare(SiblingA, SiblingB)
      also:
      1 == compare(SiblingB, SiblingA)

      Attached are two files, the first is a test case showing the problem.

      The second is a NodeComparator .java class with two changes to the current version in subversion:

      1. it immediately returns 0 for o1 == o2
      2. it does a special sibling-compare for sibling Attributes/Namespaces

      The motivation for '2' is that the XPath specification does allow a preceding/following axis for namespaces/attributes (likely because DOM has no Attribute order assumption), but, almost all implementations have a 'natural' order for Attributes and Namespaces, and the axis-order must be consistent for any evaluation anyways.

      This change provides consistency and reliability in the NodeComparator class.

        Activity

        Hide
        Rolf JDom added a comment -

        error in description...: The text
        "The motivation for '2' is that the XPath specification does allow a ..."
        should in fact read:
        "The motivation for '2' is that the XPath specification does NOT allow a ..."

        Show
        Rolf JDom added a comment - error in description...: The text "The motivation for '2' is that the XPath specification does allow a ..." should in fact read: "The motivation for '2' is that the XPath specification does NOT allow a ..."
        Hide
        Elliotte Rusty Harold added a comment -

        Attribute (and namespace) nodes do not have any order. Thus there's no guarantee what order they come out in when selected by XPath. There's no guaranteed order on either the namespace or attribute axis. Thus the test you propose is going beyond the XPath 1.0 specification.

        There may be an issue here, but I'm not yet convinced. (A different test case that demonstrated a clear violation of the XPath 1.0 spec would convince me.) NodeComparator itself is an internal, implementation detail. I wouldn't rely on it.

        Show
        Elliotte Rusty Harold added a comment - Attribute (and namespace) nodes do not have any order. Thus there's no guarantee what order they come out in when selected by XPath. There's no guaranteed order on either the namespace or attribute axis. Thus the test you propose is going beyond the XPath 1.0 specification. There may be an issue here, but I'm not yet convinced. (A different test case that demonstrated a clear violation of the XPath 1.0 spec would convince me.) NodeComparator itself is an internal, implementation detail. I wouldn't rely on it.
        Hide
        Rolf JDom added a comment -

        I originally described the issue as two-fold, the first issue being a 'broken' compareTo operation. But, I think this first issue is secondary of the second, larger issue.

        Thus, it's pointless discussing the first issue unless there is some 'agreement' on the second issue.

        I acknowledge XML does not specify/dictate a specific order for Attribute and Namespace nodes. Specifically, the XML specification says 'Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.'

        The heart of this issue though, is that while there is no 'specified' order to the attributes in the Element, there is still a 'list' of attributes. When accessing the attributes from an XML representation (whether SAX, DOM, JDOM, whatever), you always have to access these attributes somehow, and, in each of these mecahnisms, there is a sequential nature to the attributes.... you access them by index, or you access them using an iterator, or as a list, or something. The mechanism is not important, but what is important, is that, for some XML implementation 'X' you access the Attributes using some scheme 'Y', and that scheme implies some sequence to the Attributes. The actual sequence is not specified by the XML specification, but by the underlying implementation 'X'.

        XPath has it's own specification, and it dictates that nodes should be returned in 'Document Order'.

        Again, while there is no formal specification of what the sequence should be, the underlying implementaiton 'X' exposes the Attributes in some sequence. It is this sequence that determines what the 'Document Order' of the Arributes are.

        The 'heart' of this issue is that the XPath expression for the attributes of an Element Node should return the Attribute nodes in the same order as the underlying implementation would.

        But, the Jaxen implementation deliberately re-orders the attribute nodes in to some sequence that is completely 'arbitratry'. Not only does it not enforce document order, but it does even worse, it re-orders the nodes in to some format that is guaranteed to not be 'document order'.

        The test case I attached is 'simple', it happens to use DOM as an 'implementation', but it proves that the first attribute returned by the 'get all attributes for an element' XPath expression is not the same as 'Attribute 0' from the DOM element (i.e. XPath result is not in 'Document Order')

        So, while there is no specification for the order of attributes, Jaxen denies the underlying 'Document Order' and replaces it with 'something'.

        If this issue is recognized then it becomes a case of detemining how Jaxen is 'broken' in this respect, and the 'right' way to fix it. To that end I have identifed a specific bug in the Node Comparator, and added an extesion for ensuring the 'Document Order' of Attribute and Namespace declarations. These actual implementations are secondary to the desired result of 'correcting' the document-order 'shuffle'.

        Show
        Rolf JDom added a comment - I originally described the issue as two-fold, the first issue being a 'broken' compareTo operation. But, I think this first issue is secondary of the second, larger issue. Thus, it's pointless discussing the first issue unless there is some 'agreement' on the second issue. I acknowledge XML does not specify/dictate a specific order for Attribute and Namespace nodes. Specifically, the XML specification says 'Note that the order of attribute specifications in a start-tag or empty-element tag is not significant.' The heart of this issue though, is that while there is no 'specified' order to the attributes in the Element, there is still a 'list' of attributes. When accessing the attributes from an XML representation (whether SAX, DOM, JDOM, whatever), you always have to access these attributes somehow, and, in each of these mecahnisms, there is a sequential nature to the attributes.... you access them by index, or you access them using an iterator, or as a list, or something. The mechanism is not important, but what is important, is that, for some XML implementation 'X' you access the Attributes using some scheme 'Y', and that scheme implies some sequence to the Attributes. The actual sequence is not specified by the XML specification, but by the underlying implementation 'X'. XPath has it's own specification, and it dictates that nodes should be returned in 'Document Order'. Again, while there is no formal specification of what the sequence should be, the underlying implementaiton 'X' exposes the Attributes in some sequence. It is this sequence that determines what the 'Document Order' of the Arributes are. The 'heart' of this issue is that the XPath expression for the attributes of an Element Node should return the Attribute nodes in the same order as the underlying implementation would. But, the Jaxen implementation deliberately re-orders the attribute nodes in to some sequence that is completely 'arbitratry'. Not only does it not enforce document order, but it does even worse, it re-orders the nodes in to some format that is guaranteed to not be 'document order'. The test case I attached is 'simple', it happens to use DOM as an 'implementation', but it proves that the first attribute returned by the 'get all attributes for an element' XPath expression is not the same as 'Attribute 0' from the DOM element (i.e. XPath result is not in 'Document Order') So, while there is no specification for the order of attributes, Jaxen denies the underlying 'Document Order' and replaces it with 'something'. If this issue is recognized then it becomes a case of detemining how Jaxen is 'broken' in this respect, and the 'right' way to fix it. To that end I have identifed a specific bug in the Node Comparator, and added an extesion for ensuring the 'Document Order' of Attribute and Namespace declarations. These actual implementations are secondary to the desired result of 'correcting' the document-order 'shuffle'.
        Hide
        Rolf JDom added a comment -

        As an additional/alternate way of looking at this problem:

        it should always be true for a particular XPath expression "@*" that the XPath.selectSingleNode(element) should return (assuming 'element' has some attributes):

        • in DOM-based Jaxen - element.getAttributes().item(0)
        • in JDOM-based Jaxen - element.getAttribute(0)
        • in Dom4J-based Jaxen - element.attribute(0)
        • in XOM-based Jaxen - element.getAttribute(0)
          ....

        etc.

        Rolf

        Show
        Rolf JDom added a comment - As an additional/alternate way of looking at this problem: it should always be true for a particular XPath expression "@*" that the XPath.selectSingleNode(element) should return (assuming 'element' has some attributes): in DOM-based Jaxen - element.getAttributes().item(0) in JDOM-based Jaxen - element.getAttribute(0) in Dom4J-based Jaxen - element.attribute(0) in XOM-based Jaxen - element.getAttribute(0) .... etc. Rolf
        Hide
        Elliotte Rusty Harold added a comment -

        The XPath 1.0 spec is explicit on this point: "The namespace nodes are defined to occur before the attribute nodes. The relative order of namespace nodes is implementation-dependent. The relative order of attribute nodes is implementation-dependent."

        I'm not willing to commit to more than that, and I'd be very hesitant to write code that depends on a specific order of attribute nodes. I certainly don't want to check in a test case that says, "This is the order of the of the attribute nodes, and any other order is a failure."

        XPath.selectSingleNode(element) is somewhat flaky and ill-defined here for precisely this reason. In general I'd try to avoid it.

        Show
        Elliotte Rusty Harold added a comment - The XPath 1.0 spec is explicit on this point: "The namespace nodes are defined to occur before the attribute nodes. The relative order of namespace nodes is implementation-dependent. The relative order of attribute nodes is implementation-dependent." I'm not willing to commit to more than that, and I'd be very hesitant to write code that depends on a specific order of attribute nodes. I certainly don't want to check in a test case that says, "This is the order of the of the attribute nodes, and any other order is a failure." XPath.selectSingleNode(element) is somewhat flaky and ill-defined here for precisely this reason. In general I'd try to avoid it.
        Hide
        Rolf JDom added a comment -

        I'm trying to think of the most persuasive arguments I can, and to summarize them....
        I see four types of arguments, one based on 'specifications', and the second based on 'common sense'.
        The third type of argument is to challenge you to defend your position that it's doing 'the right thing' currently. Finally, there are some statements/observations that need to be clarified/reconsidered:

        Specification Argument
        ======================
        I see the most compelling 'specifications' argument as:

        1. XML does not specify an particular relative order for Attributes or Namespaces.
        2. XMLPath determines that nodes should be reurned in 'document order', and that Namespaces should come before Attributes. XMLPath does not mandate a particular relative Attribute or Namespace order.
        3. Jaxen 'specifies' that the 'Navigator' interface is used to 'navigate' the document... implying that (and this the 'gray area') the 'Navigator' determines 'Document Order'
        4. The 'Navigator' interface specifies two methods to return a Namespace and Attribute 'axis' respectively, which 'represent' the 'Document Order' of the Namespace or Attribue axis respectively.

        On a fundamental level, I believe the bug is that Jaxen does not return the attribute:: or namespace:: axes in the same order as the respective Navigator.getAttributeAxisIterator() or getNamespaceAxisIterator() iterators.

        Common Sense Arguments
        ======================
        As for the 'common sense' arguments, I can list 'disadvantages' and 'advantages' of each alternative:

        1. jaxen 'unilaterally' removes any (potentially) implied order

        • jaxen does this without any 'specified' or 'documented' authority.
          There is nothing saying it 'must' do this.
        • The Jaxen results are not in the same order as the Navigator.
        • there is no advantage to doing this, i.e. there is no reason why
          Jaxen's current mechanism is better than the alternative, other
          than 'it has always done that'. It is not (significantly) 'faster',
          'better', 'more logical', 'predictable'.
          2. new imposed order is non-deterministic, even if the implied Navigator
          order was deterministic
        • XPath has relative 'predicate' based expressions. These are
          impossible to be 'deterministic' with Jaxen. e.g. The XPath
          specification allows the expression "@*[5]", but in the current
          XPath that is non-deterministic.
          3. The attribute and namespace axes are 'different' to the other axes.
          4. Jaxen is 'advertised' as being able to be applied to architectures
          other than the plain XML model (Java ByteCode, Beans, etc.), but
          that 'extension' may be impossible.
          5. Finally, 'fixing' this issue carries no 'cost'. There is no
          'downside' other than the actual effort of making the change, and
          releasing the revised code. Specifically:
        • no-one can predict/rely on any particular order of the data
          anyway, so it's not like anyone can notice a change
        • fixing the order will not 'break' any specification.

        The "Current State" argument
        ============================

        Challenging the current implementation is relatively easy:
        1. Why does Jaxen ignore the Navigator's iteration order
        2. Why does Jaxen use the current scheme to impose some 'arbitrary' and 'random' order

        Clarifications
        ==============
        (playing devil's advocate)
        You say: Attribute (and namespace) nodes do not have any order. Thus there's no guarantee what order they come out in when selected by XPath. There's no guaranteed order on either the namespace or attribute axis.

        I say: they do have an order, just not a 'specified' order in the XML specification. The XPath specification does not impose any order on them either. There is no documentation in Jaxen indicating that it changes the Navigator's implied order, so, thus, the implication is (it makes sense) that the Navigator's axis order is the order to expect results.
        I say: that it is a big leap to go from "the specification does not require any particular order" to "we can re-order the data at our whim to anything we want"...

        You say: I'm not willing to commit to more than that, and I'd be very hesitant to write code that depends on a specific order of attribute nodes. I certainly don't want to check in a test case that says, "This is the order of the of the attribute nodes, and any other order is a failure."

        I suggest the correct response to that would be: That's missing the core issue. The issue is not to make Jaxen 'impose' the order for the attribute nodes, but for jaxen to 'use' the order given to it by the Navigator. For example, if the Navigator says that the fifth Attribute is 'X', then "@*[5]" == 'X'

        You say: XPath.selectSingleNode(element) is somewhat flaky and ill-defined here for precisely this reason. In general I'd try to avoid it.

        I suggest: Using the Navigator's implied order removes the flakiness from selectSingleNode()! That's a good thing!

        Show
        Rolf JDom added a comment - I'm trying to think of the most persuasive arguments I can, and to summarize them.... I see four types of arguments, one based on 'specifications', and the second based on 'common sense'. The third type of argument is to challenge you to defend your position that it's doing 'the right thing' currently. Finally, there are some statements/observations that need to be clarified/reconsidered: Specification Argument ====================== I see the most compelling 'specifications' argument as: 1. XML does not specify an particular relative order for Attributes or Namespaces. 2. XMLPath determines that nodes should be reurned in 'document order', and that Namespaces should come before Attributes. XMLPath does not mandate a particular relative Attribute or Namespace order. 3. Jaxen 'specifies' that the 'Navigator' interface is used to 'navigate' the document... implying that (and this the 'gray area') the 'Navigator' determines 'Document Order' 4. The 'Navigator' interface specifies two methods to return a Namespace and Attribute 'axis' respectively, which 'represent' the 'Document Order' of the Namespace or Attribue axis respectively. On a fundamental level, I believe the bug is that Jaxen does not return the attribute:: or namespace:: axes in the same order as the respective Navigator.getAttributeAxisIterator() or getNamespaceAxisIterator() iterators. Common Sense Arguments ====================== As for the 'common sense' arguments, I can list 'disadvantages' and 'advantages' of each alternative: 1. jaxen 'unilaterally' removes any (potentially) implied order jaxen does this without any 'specified' or 'documented' authority. There is nothing saying it 'must' do this. The Jaxen results are not in the same order as the Navigator. there is no advantage to doing this, i.e. there is no reason why Jaxen's current mechanism is better than the alternative, other than 'it has always done that'. It is not (significantly) 'faster', 'better', 'more logical', 'predictable'. 2. new imposed order is non-deterministic, even if the implied Navigator order was deterministic XPath has relative 'predicate' based expressions. These are impossible to be 'deterministic' with Jaxen. e.g. The XPath specification allows the expression "@* [5] ", but in the current XPath that is non-deterministic. 3. The attribute and namespace axes are 'different' to the other axes. 4. Jaxen is 'advertised' as being able to be applied to architectures other than the plain XML model (Java ByteCode, Beans, etc.), but that 'extension' may be impossible. 5. Finally, 'fixing' this issue carries no 'cost'. There is no 'downside' other than the actual effort of making the change, and releasing the revised code. Specifically: no-one can predict/rely on any particular order of the data anyway, so it's not like anyone can notice a change fixing the order will not 'break' any specification. The "Current State" argument ============================ Challenging the current implementation is relatively easy: 1. Why does Jaxen ignore the Navigator's iteration order 2. Why does Jaxen use the current scheme to impose some 'arbitrary' and 'random' order Clarifications ============== (playing devil's advocate) You say: Attribute (and namespace) nodes do not have any order. Thus there's no guarantee what order they come out in when selected by XPath. There's no guaranteed order on either the namespace or attribute axis. I say: they do have an order, just not a 'specified' order in the XML specification. The XPath specification does not impose any order on them either. There is no documentation in Jaxen indicating that it changes the Navigator's implied order, so, thus, the implication is (it makes sense) that the Navigator's axis order is the order to expect results. I say: that it is a big leap to go from "the specification does not require any particular order" to "we can re-order the data at our whim to anything we want"... You say: I'm not willing to commit to more than that, and I'd be very hesitant to write code that depends on a specific order of attribute nodes. I certainly don't want to check in a test case that says, "This is the order of the of the attribute nodes, and any other order is a failure." I suggest the correct response to that would be: That's missing the core issue. The issue is not to make Jaxen 'impose' the order for the attribute nodes, but for jaxen to 'use' the order given to it by the Navigator. For example, if the Navigator says that the fifth Attribute is 'X', then "@* [5] " == 'X' You say: XPath.selectSingleNode(element) is somewhat flaky and ill-defined here for precisely this reason. In general I'd try to avoid it. I suggest: Using the Navigator's implied order removes the flakiness from selectSingleNode()! That's a good thing!
        Hide
        Rolf JDom added a comment -

        Another issue/symptom of this problem has come up: Java6 and Java7 produce different results. As a result I have done some more detailed analysis.

        First though, the analysis. In Jaxen expressions are evaluated and (unique) matching nodes are stored (in navigator iterator order) on the 'interim' nodelist. Once the full expression is completed (on all context nodes) then the interim nodelist is sometimes sorted using the NodeComparator.

        In Java6 the sort mechanism used is a 'Merge' sort. The actual implementation compares a node against the previous node, and if the previous node scores higher, it swaps them.

        The current NodeComparator happens to 'compare' all sibling attributes as '1', which means that, in every instance where the Merge sort in Java6 compares sibling attributes it always swaps them. I have not fully analyzed the process but I think this results in the attribute that was returned first on the Navigator's attribute axis will always end up last in the sequence of sibling Attributes in the results. I have not fully analyzed what happens with the other nodes, but it seems to 'reverse' the order.

        In Java7 though, the sort is a completely different mechanism (described as a TimSort - http://en.wikipedia.org/wiki/Timsort ). There is no easy way to analyze the results of this sort with the current NodeComparator because it does different things with different sizes of data.

        The bottom line is that selectNodes() for Java6 returns different results when compared to the same code running on Java7.

        The following example should illustrate the point (note that the second XPath only select nodes from one context, so the sort never happens). Note that if the NodeComparator was 'fixed' to sort the sibling nodes in sibling order it would not be a problem

        Here's a JDOM example:

        public static void main(String[] args) throws JDOMException {
        Element root = new Element("root");
        root.setAttribute("one", "one");
        root.setAttribute("two", "two");
        root.setAttribute("three", "three");
        root.setAttribute("four", "four");
        Document doc = new Document(root);
        List<?> nodes;
        XPath xp;

        System.out.println("All Attributes");
        xp = XPath.newInstance("//attribute::*");
        nodes = xp.selectNodes(doc);
        for (Object o : nodes)

        { System.out.println(o); }

        System.out.println("Root Attributes");
        xp = XPath.newInstance("attribute::*");
        nodes = xp.selectNodes(root);
        for (Object o : nodes) { System.out.println(o); }

        }

        The results of this are (on Java6):
        All Attributes
        [Attribute: four="four"]
        [Attribute: three="three"]
        [Attribute: two="two"]
        [Attribute: one="one"]
        Root Attributes
        [Attribute: one="one"]
        [Attribute: two="two"]
        [Attribute: three="three"]
        [Attribute: four="four"]

        The results of this are (on Java7):
        All Attributes
        [Attribute: one="one"]
        [Attribute: two="two"]
        [Attribute: three="three"]
        [Attribute: four="four"]
        Root Attributes
        [Attribute: one="one"]
        [Attribute: two="two"]
        [Attribute: three="three"]
        [Attribute: four="four"]

        Show
        Rolf JDom added a comment - Another issue/symptom of this problem has come up: Java6 and Java7 produce different results. As a result I have done some more detailed analysis. First though, the analysis. In Jaxen expressions are evaluated and (unique) matching nodes are stored (in navigator iterator order) on the 'interim' nodelist. Once the full expression is completed (on all context nodes) then the interim nodelist is sometimes sorted using the NodeComparator. In Java6 the sort mechanism used is a 'Merge' sort. The actual implementation compares a node against the previous node, and if the previous node scores higher, it swaps them. The current NodeComparator happens to 'compare' all sibling attributes as '1', which means that, in every instance where the Merge sort in Java6 compares sibling attributes it always swaps them. I have not fully analyzed the process but I think this results in the attribute that was returned first on the Navigator's attribute axis will always end up last in the sequence of sibling Attributes in the results. I have not fully analyzed what happens with the other nodes, but it seems to 'reverse' the order. In Java7 though, the sort is a completely different mechanism (described as a TimSort - http://en.wikipedia.org/wiki/Timsort ). There is no easy way to analyze the results of this sort with the current NodeComparator because it does different things with different sizes of data. The bottom line is that selectNodes() for Java6 returns different results when compared to the same code running on Java7. The following example should illustrate the point (note that the second XPath only select nodes from one context, so the sort never happens). Note that if the NodeComparator was 'fixed' to sort the sibling nodes in sibling order it would not be a problem Here's a JDOM example: public static void main(String[] args) throws JDOMException { Element root = new Element("root"); root.setAttribute("one", "one"); root.setAttribute("two", "two"); root.setAttribute("three", "three"); root.setAttribute("four", "four"); Document doc = new Document(root); List<?> nodes; XPath xp; System.out.println("All Attributes"); xp = XPath.newInstance("//attribute::*"); nodes = xp.selectNodes(doc); for (Object o : nodes) { System.out.println(o); } System.out.println("Root Attributes"); xp = XPath.newInstance("attribute::*"); nodes = xp.selectNodes(root); for (Object o : nodes) { System.out.println(o); } } The results of this are (on Java6): All Attributes [Attribute: four="four"] [Attribute: three="three"] [Attribute: two="two"] [Attribute: one="one"] Root Attributes [Attribute: one="one"] [Attribute: two="two"] [Attribute: three="three"] [Attribute: four="four"] The results of this are (on Java7): All Attributes [Attribute: one="one"] [Attribute: two="two"] [Attribute: three="three"] [Attribute: four="four"] Root Attributes [Attribute: one="one"] [Attribute: two="two"] [Attribute: three="three"] [Attribute: four="four"]
        Hide
        Rolf JDom added a comment -

        Getting back to this issue, I have found the relevant parts of the XPath spec which confirm this is in fact a Jaxen bug.

        http://www.w3.org/TR/xpath/#data-model

        The above item in the XPath specification deals with the XPath Document Model. In particular, it says:

        > This section describes how XPath models an XML document as a tree. This model is conceptual only and does not mandate any particular implementation.

        From a Jaxen/JDOM perspective it is fair to say that JDOM is the 'implementation' of the XML model.

        In the same Data Model section of the XPath spec it then describes what 'document order' is. http://www.w3.org/TR/xpath/#dt-document-order
        In particular, it says:

        > The namespace nodes are defined to occur before the attribute nodes. The relative order of namespace nodes is implementation-dependent. The relative order of attribute nodes is implementation-dependent.

        JDOM interfaces with Jaxen using a Navigator which returns the Namespace and Attribute nodes in the JDOM model's 'Document-Order'

        This particular JAXEN-215 bug is because the Jaxen code ignores this implementation-dependent order and imposes some other, arbitrary order.

        Show
        Rolf JDom added a comment - Getting back to this issue, I have found the relevant parts of the XPath spec which confirm this is in fact a Jaxen bug. http://www.w3.org/TR/xpath/#data-model The above item in the XPath specification deals with the XPath Document Model. In particular, it says: > This section describes how XPath models an XML document as a tree. This model is conceptual only and does not mandate any particular implementation. From a Jaxen/JDOM perspective it is fair to say that JDOM is the 'implementation' of the XML model. In the same Data Model section of the XPath spec it then describes what 'document order' is. http://www.w3.org/TR/xpath/#dt-document-order In particular, it says: > The namespace nodes are defined to occur before the attribute nodes. The relative order of namespace nodes is implementation-dependent. The relative order of attribute nodes is implementation-dependent. JDOM interfaces with Jaxen using a Navigator which returns the Namespace and Attribute nodes in the JDOM model's 'Document-Order' This particular JAXEN-215 bug is because the Jaxen code ignores this implementation-dependent order and imposes some other, arbitrary order.
        Hide
        Elliotte Rusty Harold added a comment -

        The implementation dependent order is jaxen's order. If JDOM wants some other order, then it's free to reorder nodes after it gets them back from Jaxen.

        Show
        Elliotte Rusty Harold added a comment - The implementation dependent order is jaxen's order. If JDOM wants some other order, then it's free to reorder nodes after it gets them back from Jaxen.
        Hide
        Rolf JDom added a comment -

        > The implementation dependent order is jaxen's order.

        Respectfully, I believe you are wrong on this point. Jaxen is not the 'owner' of the 'Data Model'. Jaxen (by design) 'delegates' ownership of the XPath 'Data Model' to the 'Navigator'. The 'Navigator' is the 'implementation' of the Data Model, and thus the Navigator is the authority on (from a Jaxen perspective) of what the document model is, and thus by extension what the 'document order' is.

        To illustrate the 'bug' using 'Proof by Contradiction' logic (using the attributes axis).....

        Assume that Jaxen 'owns' the 'document order' of Attributes.

        From the Navigator documentation: getAttributeAxisIterator() returns "an Iterator capable of traversing the axis, not null"

        To 'traverse' the axis it first has to know not only what the axis contains but also what the order of the axis is. Jaxen determines the order. But Jaxen cannot know the order until it has all the Attributes and also only after it has run the XPath expression (because the Jaxen 'order' is currently dependent on the XPath expression).

        This is 'absurd', therefore the assumption must be wrong. Jaxen cannot 'own' the document order and expect the Navigator to return Attributes in document-order.

        Conclusion

        The navigator 'owns' document order (of everything, including Attributes), and it exposes the document order in the sequence returned by the axis iterators. Jaxen ignores this implied order for both the namespace and attribute axes. Hence Jaxen has a bug.

        Rolf

        Show
        Rolf JDom added a comment - > The implementation dependent order is jaxen's order. Respectfully, I believe you are wrong on this point. Jaxen is not the 'owner' of the 'Data Model'. Jaxen (by design) 'delegates' ownership of the XPath 'Data Model' to the 'Navigator'. The 'Navigator' is the 'implementation' of the Data Model, and thus the Navigator is the authority on (from a Jaxen perspective) of what the document model is, and thus by extension what the 'document order' is. To illustrate the 'bug' using 'Proof by Contradiction' logic (using the attributes axis)..... Assume that Jaxen 'owns' the 'document order' of Attributes. From the Navigator documentation: getAttributeAxisIterator() returns "an Iterator capable of traversing the axis, not null" To 'traverse' the axis it first has to know not only what the axis contains but also what the order of the axis is. Jaxen determines the order. But Jaxen cannot know the order until it has all the Attributes and also only after it has run the XPath expression (because the Jaxen 'order' is currently dependent on the XPath expression). This is 'absurd', therefore the assumption must be wrong. Jaxen cannot 'own' the document order and expect the Navigator to return Attributes in document-order. Conclusion The navigator 'owns' document order (of everything, including Attributes), and it exposes the document order in the sequence returned by the axis iterators. Jaxen ignores this implied order for both the namespace and attribute axes. Hence Jaxen has a bug. Rolf
        Hide
        Rolf JDom added a comment -

        I am trying to figure out the motivation for the way you implemented this fix...

        Just so I understand it correctly, there is no XML-based specification for determining the order of Attribute and Namespace declarations, similarly there is no XPath specification, yet, there is a Jaxen specification that says the Namespaces/Attribute axes are provided by the Navigator....

        Yet, from what I can tell, you arbitrarilily (against the Jaxen documentation) re-sort all the attributes by the QName, and the namespaces by prefix? Why are those orders better than the actual axis order?

        Before this fix, the Attribute/Namespace orders would only be 'wrong' (different to the attribute/namespace axes) when there was a 'union' XPath operator, but now the attribute/namespace results are often wrong because they are always re-sorted alphabetically!

        I am happy to help/work on making a solution work that makes sense... and if my 'fix' for #221 was not complete (because DOM navigator returns different values each time you access it's Namespace axis....) then I can work to fix that. But this issue is not fixed, it is more broken than before!

        Show
        Rolf JDom added a comment - I am trying to figure out the motivation for the way you implemented this fix... Just so I understand it correctly, there is no XML-based specification for determining the order of Attribute and Namespace declarations, similarly there is no XPath specification, yet, there is a Jaxen specification that says the Namespaces/Attribute axes are provided by the Navigator.... Yet, from what I can tell, you arbitrarilily (against the Jaxen documentation) re-sort all the attributes by the QName, and the namespaces by prefix? Why are those orders better than the actual axis order? Before this fix, the Attribute/Namespace orders would only be 'wrong' (different to the attribute/namespace axes) when there was a 'union' XPath operator, but now the attribute/namespace results are often wrong because they are always re-sorted alphabetically! I am happy to help/work on making a solution work that makes sense... and if my 'fix' for #221 was not complete (because DOM navigator returns different values each time you access it's Namespace axis....) then I can work to fix that. But this issue is not fixed, it is more broken than before!
        Hide
        Elliotte Rusty Harold added a comment -

        I had to pick something that would give us a predictable sort order to make Java 7 work, so I picked the simplest thing that could possibly work. (Well, not quite the simplest. Three other things that seemed simpler failed. This was my fourth attempt to fix Jaxen-221.) Beyond that, this approach satisfies the XPath 1.0 spec to my satisfaction. If you can contrive an XPath expression for which this generates demonstrably incorrect output, as Abhay Kumar Yadav did in Jaxen-221, then I'll try to fix it. But remember that the XPath specification does not define any order for attribute and namespace nodes (though I notice now it does require "The namespace nodes are defined to occur before the attribute nodes." I should check to see that we've got that right.) The deep issue here is that XML parsers are not required to, and in practice don't, generate attribute and namespace nodes in any particular order. Indeed you can parse the same document several times and get the attributes in different orders each time. Indeed in some APIs, you can parse a document once, read the attributes several times, and get them in different orders each time. Attributes and namespace are fundamentally unordered. Any ordering jaxen provides is a convenience for navigational purposes. It may change from one release to the next; indeed from one run to the next; and client code should not rely on it.

        Show
        Elliotte Rusty Harold added a comment - I had to pick something that would give us a predictable sort order to make Java 7 work, so I picked the simplest thing that could possibly work. (Well, not quite the simplest. Three other things that seemed simpler failed. This was my fourth attempt to fix Jaxen-221.) Beyond that, this approach satisfies the XPath 1.0 spec to my satisfaction. If you can contrive an XPath expression for which this generates demonstrably incorrect output, as Abhay Kumar Yadav did in Jaxen-221, then I'll try to fix it. But remember that the XPath specification does not define any order for attribute and namespace nodes (though I notice now it does require "The namespace nodes are defined to occur before the attribute nodes." I should check to see that we've got that right.) The deep issue here is that XML parsers are not required to, and in practice don't, generate attribute and namespace nodes in any particular order. Indeed you can parse the same document several times and get the attributes in different orders each time. Indeed in some APIs, you can parse a document once , read the attributes several times, and get them in different orders each time. Attributes and namespace are fundamentally unordered. Any ordering jaxen provides is a convenience for navigational purposes. It may change from one release to the next; indeed from one run to the next; and client code should not rely on it.
        Hide
        Rolf JDom added a comment -

        Hi Elliotte.

        It would be great to have a 'clean' discussion about this... the comments on this issue is cumbersome, and the mailing list for Jaxen appears to have been unused for half-a-decade...

        But, in my head, the issue is 'clearl', yet I don't seem to be able to communicate it effectively.... and I think you have the same 'clarity' in your head... which is fine, but hear me out...

        ... and I think I will start by asking you to consider three different specific points, in order:

        1. I agree, XML spec does not define a particular order for Attributes, in fact, it says: Note that the order of attribute specifications in a start-tag or empty-element tag is not significant. The XML InfoSet specification also says the order of Attributes is not defined to be part of the InfoSet.

        2. XPath specification does define that a document order exists for Attributes, and that the order is 'implementation specific'. - Section 5 defines the 'Document Order' including: "The relative order of namespace nodes is implementation-dependent. The relative order of attribute nodes is implementation-dependent." One very important question as it relates to this bug, (and I will expand on this later) is "what is the 'implementation'"? Further, the order of these values, the Namespaces and Attributes, are 'exposed' in the XPath specification as the attribute:: and namespace:: axes. The XPath spec defines all axes to be either in document-order, or reverse-document-order. There is no explicit exception for attribute:: or namespace::. Further, the document order is not allowed to change during the evaluation of a query. The implication is that: a) there is a document order; b) it is implementation-dependent for attribute:: and namespace:: and c) it is not allowed to change during the evaluation of an expression.

        3. The Jaxen 'specification' defines an abstraction layer between the XPath 'engine' and the document 'model'. The engine can work on any model that is defined using the Jaxen 'Navigator'. The documentation for 'Navigator' says: "There is a method to obtain a java.util.Iterator, for each axis specified by XPath.".

        The only way to make Jaxen work in all cases is for the Navigator implementation to be consistent with all three specifications... XML, XPath, and Jaxen.

        Now, back to the question of 'implementation-dependant'. Who 'owns' the implementation. Jaxen's documentation is not as 'rigourous' or 'formal' as a w3c specification, but, there is a clear implication that "Jaxen is an engine", There is a document model (with implementations for DOM, XOM, etc.), and the Navigator defines the 'interface' between the engine and the model.

        The issue at hand (JAXEN 215) is really about who owns the 'document order'. In my head it is clear that the document order is 'owned' by the document model, and is 'exposed' by the Navigator.

        In your head it seems that the document order is owned by the document model for everything except the attribute and the namespace axes, where Jaxen unilaterally redefines the order to be something else.

        There is no specification that makes Jaxen reorder those axes. On the contrary, Jaxen 'clearly' delegates the order of the axes to the Navigator.

        So, the symptoms of this contradiction between the Navigator concept and the Jaxen reordering is things like the following should be 'obvious', but they break (using an XPath 'pseudocode'):

        (using DOM):
        element.getAttributes().item(0) == XPath.evaluate("@*[1]", element)

        (using JDOM):
        element.getAttributes().get(0) ....
        element.getNamespacesInScope().get(0) == XPath.evaluate("namespace::*[1]", element);

        and so on....

        Other issues are things like new artificial differences between the :
        XPath.selectSingleNode("@") and XPath.selectNodes("@[1]")

        I want to point out a special 'marketing' angle for Jaxen that the engine can be applied to non-XML models too - as long as they can expose the implementation in a way that conforms to the Navigator (this is something that I have done with Jaxen a couple of times already.... -apart from JDOM - by the way).

        So, apart from pointing out the incongruence you have introduced by 'stealing' ownership of 'document order', I guess I can put together a demonstration of it.... which I have already done a number of times in this discussion, but perhaps I will do it with something other than JDOM....

        Oh, by the way, it was me who put together the test case for #221.....

        Rolf

        Show
        Rolf JDom added a comment - Hi Elliotte. It would be great to have a 'clean' discussion about this... the comments on this issue is cumbersome, and the mailing list for Jaxen appears to have been unused for half-a-decade... But, in my head, the issue is 'clearl', yet I don't seem to be able to communicate it effectively.... and I think you have the same 'clarity' in your head... which is fine, but hear me out... ... and I think I will start by asking you to consider three different specific points, in order: 1. I agree, XML spec does not define a particular order for Attributes, in fact, it says: Note that the order of attribute specifications in a start-tag or empty-element tag is not significant. The XML InfoSet specification also says the order of Attributes is not defined to be part of the InfoSet. 2. XPath specification does define that a document order exists for Attributes, and that the order is 'implementation specific'. - Section 5 defines the 'Document Order' including: "The relative order of namespace nodes is implementation-dependent. The relative order of attribute nodes is implementation-dependent." One very important question as it relates to this bug, (and I will expand on this later) is "what is the 'implementation'"? Further, the order of these values, the Namespaces and Attributes, are 'exposed' in the XPath specification as the attribute:: and namespace:: axes. The XPath spec defines all axes to be either in document-order, or reverse-document-order. There is no explicit exception for attribute:: or namespace::. Further, the document order is not allowed to change during the evaluation of a query. The implication is that: a) there is a document order; b) it is implementation-dependent for attribute:: and namespace:: and c) it is not allowed to change during the evaluation of an expression. 3. The Jaxen 'specification' defines an abstraction layer between the XPath 'engine' and the document 'model'. The engine can work on any model that is defined using the Jaxen 'Navigator'. The documentation for 'Navigator' says: "There is a method to obtain a java.util.Iterator, for each axis specified by XPath.". The only way to make Jaxen work in all cases is for the Navigator implementation to be consistent with all three specifications... XML, XPath, and Jaxen. Now, back to the question of 'implementation-dependant'. Who 'owns' the implementation. Jaxen's documentation is not as 'rigourous' or 'formal' as a w3c specification, but, there is a clear implication that "Jaxen is an engine", There is a document model (with implementations for DOM, XOM, etc.), and the Navigator defines the 'interface' between the engine and the model. The issue at hand (JAXEN 215) is really about who owns the 'document order'. In my head it is clear that the document order is 'owned' by the document model, and is 'exposed' by the Navigator. In your head it seems that the document order is owned by the document model for everything except the attribute and the namespace axes, where Jaxen unilaterally redefines the order to be something else. There is no specification that makes Jaxen reorder those axes. On the contrary, Jaxen 'clearly' delegates the order of the axes to the Navigator. So, the symptoms of this contradiction between the Navigator concept and the Jaxen reordering is things like the following should be 'obvious', but they break (using an XPath 'pseudocode'): (using DOM): element.getAttributes().item(0) == XPath.evaluate("@* [1] ", element) (using JDOM): element.getAttributes().get(0) .... element.getNamespacesInScope().get(0) == XPath.evaluate("namespace::* [1] ", element); and so on.... Other issues are things like new artificial differences between the : XPath.selectSingleNode("@ ") and XPath.selectNodes("@ [1] ") I want to point out a special 'marketing' angle for Jaxen that the engine can be applied to non-XML models too - as long as they can expose the implementation in a way that conforms to the Navigator (this is something that I have done with Jaxen a couple of times already.... -apart from JDOM - by the way). So, apart from pointing out the incongruence you have introduced by 'stealing' ownership of 'document order', I guess I can put together a demonstration of it.... which I have already done a number of times in this discussion, but perhaps I will do it with something other than JDOM.... Oh, by the way, it was me who put together the test case for #221..... Rolf
        Hide
        Rolf JDom added a comment -

        I have attached a different version of the NodeComparator to JAXEN-223 to resolve the content-before-attribute problem. That NodeComparator also changes the sort-order of Attributes/Namespaces to match the Navigator's order. This would also fully 'resolve' this particular issue.

        Show
        Rolf JDom added a comment - I have attached a different version of the NodeComparator to JAXEN-223 to resolve the content-before-attribute problem. That NodeComparator also changes the sort-order of Attributes/Namespaces to match the Navigator's order. This would also fully 'resolve' this particular issue.

          People

          • Assignee:
            Elliotte Rusty Harold
            Reporter:
            Rolf JDom
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: