jaxen
  1. jaxen
  2. JAXEN-215

NodeComparator does not return correct results for sibling Attributes/Namespaces

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.1.3
    • Fix Version/s: 1.1.4
    • Component/s: core
    • Labels:
      None
    • Environment:
      all
    • Testcase included:
      yes
    • Number of attachments :
      2

      Description

      The NodeComparator.compare() method has two faults when taken from the perspective where a predicate function returns multiple Attribute ( or Namespace) siblings.
      The first issue is that compare(Object o1,Object o2) should return 0 when o1 == o2. This should be default behaviour as during the sorting process it is technically legal for the comparator to be called that way.

      The bigger issue is that if o1 and o2 are both siblings, and both either Attributes, or Namespaces, then the code 'falls through', and uses the compare() result of the parent (Element) to test for the document order.

      Unfortunately, they have the same parent (and the same depth), and, as a result, they always compare() as value 1. This always puts the o1 sibling after the o2 sibling.

      Further, this breaks the Comparator 'contract', because:
      1 == compare(SiblingA, SiblingB)
      also:
      1 == compare(SiblingB, SiblingA)

      Attached are two files, the first is a test case showing the problem.

      The second is a NodeComparator .java class with two changes to the current version in subversion:

      1. it immediately returns 0 for o1 == o2
      2. it does a special sibling-compare for sibling Attributes/Namespaces

      The motivation for '2' is that the XPath specification does allow a preceding/following axis for namespaces/attributes (likely because DOM has no Attribute order assumption), but, almost all implementations have a 'natural' order for Attributes and Namespaces, and the axis-order must be consistent for any evaluation anyways.

      This change provides consistency and reliability in the NodeComparator class.

        Activity

        Hide
        Rolf JDom added a comment -

        > The implementation dependent order is jaxen's order.

        Respectfully, I believe you are wrong on this point. Jaxen is not the 'owner' of the 'Data Model'. Jaxen (by design) 'delegates' ownership of the XPath 'Data Model' to the 'Navigator'. The 'Navigator' is the 'implementation' of the Data Model, and thus the Navigator is the authority on (from a Jaxen perspective) of what the document model is, and thus by extension what the 'document order' is.

        To illustrate the 'bug' using 'Proof by Contradiction' logic (using the attributes axis).....

        Assume that Jaxen 'owns' the 'document order' of Attributes.

        From the Navigator documentation: getAttributeAxisIterator() returns "an Iterator capable of traversing the axis, not null"

        To 'traverse' the axis it first has to know not only what the axis contains but also what the order of the axis is. Jaxen determines the order. But Jaxen cannot know the order until it has all the Attributes and also only after it has run the XPath expression (because the Jaxen 'order' is currently dependent on the XPath expression).

        This is 'absurd', therefore the assumption must be wrong. Jaxen cannot 'own' the document order and expect the Navigator to return Attributes in document-order.

        Conclusion

        The navigator 'owns' document order (of everything, including Attributes), and it exposes the document order in the sequence returned by the axis iterators. Jaxen ignores this implied order for both the namespace and attribute axes. Hence Jaxen has a bug.

        Rolf

        Show
        Rolf JDom added a comment - > The implementation dependent order is jaxen's order. Respectfully, I believe you are wrong on this point. Jaxen is not the 'owner' of the 'Data Model'. Jaxen (by design) 'delegates' ownership of the XPath 'Data Model' to the 'Navigator'. The 'Navigator' is the 'implementation' of the Data Model, and thus the Navigator is the authority on (from a Jaxen perspective) of what the document model is, and thus by extension what the 'document order' is. To illustrate the 'bug' using 'Proof by Contradiction' logic (using the attributes axis)..... Assume that Jaxen 'owns' the 'document order' of Attributes. From the Navigator documentation: getAttributeAxisIterator() returns "an Iterator capable of traversing the axis, not null" To 'traverse' the axis it first has to know not only what the axis contains but also what the order of the axis is. Jaxen determines the order. But Jaxen cannot know the order until it has all the Attributes and also only after it has run the XPath expression (because the Jaxen 'order' is currently dependent on the XPath expression). This is 'absurd', therefore the assumption must be wrong. Jaxen cannot 'own' the document order and expect the Navigator to return Attributes in document-order. Conclusion The navigator 'owns' document order (of everything, including Attributes), and it exposes the document order in the sequence returned by the axis iterators. Jaxen ignores this implied order for both the namespace and attribute axes. Hence Jaxen has a bug. Rolf
        Hide
        Rolf JDom added a comment -

        I am trying to figure out the motivation for the way you implemented this fix...

        Just so I understand it correctly, there is no XML-based specification for determining the order of Attribute and Namespace declarations, similarly there is no XPath specification, yet, there is a Jaxen specification that says the Namespaces/Attribute axes are provided by the Navigator....

        Yet, from what I can tell, you arbitrarilily (against the Jaxen documentation) re-sort all the attributes by the QName, and the namespaces by prefix? Why are those orders better than the actual axis order?

        Before this fix, the Attribute/Namespace orders would only be 'wrong' (different to the attribute/namespace axes) when there was a 'union' XPath operator, but now the attribute/namespace results are often wrong because they are always re-sorted alphabetically!

        I am happy to help/work on making a solution work that makes sense... and if my 'fix' for #221 was not complete (because DOM navigator returns different values each time you access it's Namespace axis....) then I can work to fix that. But this issue is not fixed, it is more broken than before!

        Show
        Rolf JDom added a comment - I am trying to figure out the motivation for the way you implemented this fix... Just so I understand it correctly, there is no XML-based specification for determining the order of Attribute and Namespace declarations, similarly there is no XPath specification, yet, there is a Jaxen specification that says the Namespaces/Attribute axes are provided by the Navigator.... Yet, from what I can tell, you arbitrarilily (against the Jaxen documentation) re-sort all the attributes by the QName, and the namespaces by prefix? Why are those orders better than the actual axis order? Before this fix, the Attribute/Namespace orders would only be 'wrong' (different to the attribute/namespace axes) when there was a 'union' XPath operator, but now the attribute/namespace results are often wrong because they are always re-sorted alphabetically! I am happy to help/work on making a solution work that makes sense... and if my 'fix' for #221 was not complete (because DOM navigator returns different values each time you access it's Namespace axis....) then I can work to fix that. But this issue is not fixed, it is more broken than before!
        Hide
        Elliotte Rusty Harold added a comment -

        I had to pick something that would give us a predictable sort order to make Java 7 work, so I picked the simplest thing that could possibly work. (Well, not quite the simplest. Three other things that seemed simpler failed. This was my fourth attempt to fix Jaxen-221.) Beyond that, this approach satisfies the XPath 1.0 spec to my satisfaction. If you can contrive an XPath expression for which this generates demonstrably incorrect output, as Abhay Kumar Yadav did in Jaxen-221, then I'll try to fix it. But remember that the XPath specification does not define any order for attribute and namespace nodes (though I notice now it does require "The namespace nodes are defined to occur before the attribute nodes." I should check to see that we've got that right.) The deep issue here is that XML parsers are not required to, and in practice don't, generate attribute and namespace nodes in any particular order. Indeed you can parse the same document several times and get the attributes in different orders each time. Indeed in some APIs, you can parse a document once, read the attributes several times, and get them in different orders each time. Attributes and namespace are fundamentally unordered. Any ordering jaxen provides is a convenience for navigational purposes. It may change from one release to the next; indeed from one run to the next; and client code should not rely on it.

        Show
        Elliotte Rusty Harold added a comment - I had to pick something that would give us a predictable sort order to make Java 7 work, so I picked the simplest thing that could possibly work. (Well, not quite the simplest. Three other things that seemed simpler failed. This was my fourth attempt to fix Jaxen-221.) Beyond that, this approach satisfies the XPath 1.0 spec to my satisfaction. If you can contrive an XPath expression for which this generates demonstrably incorrect output, as Abhay Kumar Yadav did in Jaxen-221, then I'll try to fix it. But remember that the XPath specification does not define any order for attribute and namespace nodes (though I notice now it does require "The namespace nodes are defined to occur before the attribute nodes." I should check to see that we've got that right.) The deep issue here is that XML parsers are not required to, and in practice don't, generate attribute and namespace nodes in any particular order. Indeed you can parse the same document several times and get the attributes in different orders each time. Indeed in some APIs, you can parse a document once , read the attributes several times, and get them in different orders each time. Attributes and namespace are fundamentally unordered. Any ordering jaxen provides is a convenience for navigational purposes. It may change from one release to the next; indeed from one run to the next; and client code should not rely on it.
        Hide
        Rolf JDom added a comment -

        Hi Elliotte.

        It would be great to have a 'clean' discussion about this... the comments on this issue is cumbersome, and the mailing list for Jaxen appears to have been unused for half-a-decade...

        But, in my head, the issue is 'clearl', yet I don't seem to be able to communicate it effectively.... and I think you have the same 'clarity' in your head... which is fine, but hear me out...

        ... and I think I will start by asking you to consider three different specific points, in order:

        1. I agree, XML spec does not define a particular order for Attributes, in fact, it says: Note that the order of attribute specifications in a start-tag or empty-element tag is not significant. The XML InfoSet specification also says the order of Attributes is not defined to be part of the InfoSet.

        2. XPath specification does define that a document order exists for Attributes, and that the order is 'implementation specific'. - Section 5 defines the 'Document Order' including: "The relative order of namespace nodes is implementation-dependent. The relative order of attribute nodes is implementation-dependent." One very important question as it relates to this bug, (and I will expand on this later) is "what is the 'implementation'"? Further, the order of these values, the Namespaces and Attributes, are 'exposed' in the XPath specification as the attribute:: and namespace:: axes. The XPath spec defines all axes to be either in document-order, or reverse-document-order. There is no explicit exception for attribute:: or namespace::. Further, the document order is not allowed to change during the evaluation of a query. The implication is that: a) there is a document order; b) it is implementation-dependent for attribute:: and namespace:: and c) it is not allowed to change during the evaluation of an expression.

        3. The Jaxen 'specification' defines an abstraction layer between the XPath 'engine' and the document 'model'. The engine can work on any model that is defined using the Jaxen 'Navigator'. The documentation for 'Navigator' says: "There is a method to obtain a java.util.Iterator, for each axis specified by XPath.".

        The only way to make Jaxen work in all cases is for the Navigator implementation to be consistent with all three specifications... XML, XPath, and Jaxen.

        Now, back to the question of 'implementation-dependant'. Who 'owns' the implementation. Jaxen's documentation is not as 'rigourous' or 'formal' as a w3c specification, but, there is a clear implication that "Jaxen is an engine", There is a document model (with implementations for DOM, XOM, etc.), and the Navigator defines the 'interface' between the engine and the model.

        The issue at hand (JAXEN 215) is really about who owns the 'document order'. In my head it is clear that the document order is 'owned' by the document model, and is 'exposed' by the Navigator.

        In your head it seems that the document order is owned by the document model for everything except the attribute and the namespace axes, where Jaxen unilaterally redefines the order to be something else.

        There is no specification that makes Jaxen reorder those axes. On the contrary, Jaxen 'clearly' delegates the order of the axes to the Navigator.

        So, the symptoms of this contradiction between the Navigator concept and the Jaxen reordering is things like the following should be 'obvious', but they break (using an XPath 'pseudocode'):

        (using DOM):
        element.getAttributes().item(0) == XPath.evaluate("@*[1]", element)

        (using JDOM):
        element.getAttributes().get(0) ....
        element.getNamespacesInScope().get(0) == XPath.evaluate("namespace::*[1]", element);

        and so on....

        Other issues are things like new artificial differences between the :
        XPath.selectSingleNode("@") and XPath.selectNodes("@[1]")

        I want to point out a special 'marketing' angle for Jaxen that the engine can be applied to non-XML models too - as long as they can expose the implementation in a way that conforms to the Navigator (this is something that I have done with Jaxen a couple of times already.... -apart from JDOM - by the way).

        So, apart from pointing out the incongruence you have introduced by 'stealing' ownership of 'document order', I guess I can put together a demonstration of it.... which I have already done a number of times in this discussion, but perhaps I will do it with something other than JDOM....

        Oh, by the way, it was me who put together the test case for #221.....

        Rolf

        Show
        Rolf JDom added a comment - Hi Elliotte. It would be great to have a 'clean' discussion about this... the comments on this issue is cumbersome, and the mailing list for Jaxen appears to have been unused for half-a-decade... But, in my head, the issue is 'clearl', yet I don't seem to be able to communicate it effectively.... and I think you have the same 'clarity' in your head... which is fine, but hear me out... ... and I think I will start by asking you to consider three different specific points, in order: 1. I agree, XML spec does not define a particular order for Attributes, in fact, it says: Note that the order of attribute specifications in a start-tag or empty-element tag is not significant. The XML InfoSet specification also says the order of Attributes is not defined to be part of the InfoSet. 2. XPath specification does define that a document order exists for Attributes, and that the order is 'implementation specific'. - Section 5 defines the 'Document Order' including: "The relative order of namespace nodes is implementation-dependent. The relative order of attribute nodes is implementation-dependent." One very important question as it relates to this bug, (and I will expand on this later) is "what is the 'implementation'"? Further, the order of these values, the Namespaces and Attributes, are 'exposed' in the XPath specification as the attribute:: and namespace:: axes. The XPath spec defines all axes to be either in document-order, or reverse-document-order. There is no explicit exception for attribute:: or namespace::. Further, the document order is not allowed to change during the evaluation of a query. The implication is that: a) there is a document order; b) it is implementation-dependent for attribute:: and namespace:: and c) it is not allowed to change during the evaluation of an expression. 3. The Jaxen 'specification' defines an abstraction layer between the XPath 'engine' and the document 'model'. The engine can work on any model that is defined using the Jaxen 'Navigator'. The documentation for 'Navigator' says: "There is a method to obtain a java.util.Iterator, for each axis specified by XPath.". The only way to make Jaxen work in all cases is for the Navigator implementation to be consistent with all three specifications... XML, XPath, and Jaxen. Now, back to the question of 'implementation-dependant'. Who 'owns' the implementation. Jaxen's documentation is not as 'rigourous' or 'formal' as a w3c specification, but, there is a clear implication that "Jaxen is an engine", There is a document model (with implementations for DOM, XOM, etc.), and the Navigator defines the 'interface' between the engine and the model. The issue at hand (JAXEN 215) is really about who owns the 'document order'. In my head it is clear that the document order is 'owned' by the document model, and is 'exposed' by the Navigator. In your head it seems that the document order is owned by the document model for everything except the attribute and the namespace axes, where Jaxen unilaterally redefines the order to be something else. There is no specification that makes Jaxen reorder those axes. On the contrary, Jaxen 'clearly' delegates the order of the axes to the Navigator. So, the symptoms of this contradiction between the Navigator concept and the Jaxen reordering is things like the following should be 'obvious', but they break (using an XPath 'pseudocode'): (using DOM): element.getAttributes().item(0) == XPath.evaluate("@* [1] ", element) (using JDOM): element.getAttributes().get(0) .... element.getNamespacesInScope().get(0) == XPath.evaluate("namespace::* [1] ", element); and so on.... Other issues are things like new artificial differences between the : XPath.selectSingleNode("@ ") and XPath.selectNodes("@ [1] ") I want to point out a special 'marketing' angle for Jaxen that the engine can be applied to non-XML models too - as long as they can expose the implementation in a way that conforms to the Navigator (this is something that I have done with Jaxen a couple of times already.... -apart from JDOM - by the way). So, apart from pointing out the incongruence you have introduced by 'stealing' ownership of 'document order', I guess I can put together a demonstration of it.... which I have already done a number of times in this discussion, but perhaps I will do it with something other than JDOM.... Oh, by the way, it was me who put together the test case for #221..... Rolf
        Hide
        Rolf JDom added a comment -

        I have attached a different version of the NodeComparator to JAXEN-223 to resolve the content-before-attribute problem. That NodeComparator also changes the sort-order of Attributes/Namespaces to match the Navigator's order. This would also fully 'resolve' this particular issue.

        Show
        Rolf JDom added a comment - I have attached a different version of the NodeComparator to JAXEN-223 to resolve the content-before-attribute problem. That NodeComparator also changes the sort-order of Attributes/Namespaces to match the Navigator's order. This would also fully 'resolve' this particular issue.

          People

          • Assignee:
            Elliotte Rusty Harold
            Reporter:
            Rolf JDom
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: