jaxen

XPath support for Jericho HTML parser

Details

  • Type: Improvement Improvement
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Won't Fix
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Environment:
    Any
  • Testcase included:
    yes
  • Patch Submitted:
    Yes
  • Number of attachments :
    3

Description

I needed to build XPath support for Jericho locally, so I created custom implementations of Navigator and XPath to do this. I would like to contribute the code to the Jaxen project. I have attached the three java files (DocumentNavigator.java, JerichoXPath.java and a test JerichoNavigatorTest.java).

The first two would go into the main directory as a sibling of jdom and dom4j packages and the last one would go into the test directory.

Additional dependencies are jericho-html-2.5.jar, commons-logging.jar and log4j.jar (although the last two can be removed if you remove/replace the log.warn messages in the catch blocks in DocumentNavigator).

  1. DocumentNavigator.java
    21/Apr/09 2:27 PM
    15 kB
    Sujit Pal
  2. JerichoNavigatorTest.java
    21/Apr/09 2:27 PM
    2 kB
    Sujit Pal
  3. JerichoXPath.java
    21/Apr/09 2:27 PM
    2 kB
    Sujit Pal

Activity

Hide
Elliotte Rusty Harold added a comment -

Adding all the navigators directly into Jaxen was a mistake, and one we are hoping to cure in the near future. It causes significant problems for other projects that depend on jaxen because they then pull in dependencies on XOM, JDOM, dom4j, and so on that they usually don't need. This is especially true for maven based projects. Your proposal would, I'm afraid, make the situation worse by adding additional dependencies to everyone who builds jaxen.

I suspect we'll keep the DOM navigator in the core since it only really depends on the JDK these days, but we'll move the others into new projects.

Once this move is complete, perhaps we could consider hosting your navigator as a subproject of jaxen on Codehaus or some such, but it really shouldn't be bundled with the core.

Show
Elliotte Rusty Harold added a comment - Adding all the navigators directly into Jaxen was a mistake, and one we are hoping to cure in the near future. It causes significant problems for other projects that depend on jaxen because they then pull in dependencies on XOM, JDOM, dom4j, and so on that they usually don't need. This is especially true for maven based projects. Your proposal would, I'm afraid, make the situation worse by adding additional dependencies to everyone who builds jaxen. I suspect we'll keep the DOM navigator in the core since it only really depends on the JDK these days, but we'll move the others into new projects. Once this move is complete, perhaps we could consider hosting your navigator as a subproject of jaxen on Codehaus or some such, but it really shouldn't be bundled with the core.
Hide
Sujit Pal added a comment -

Sure, that makes sense...if you want to go that route, would it make sense to have some documentation on how to build an adapter to work with a different object model?

What helped me during building the Jericho adapter was the examples, and single stepping through the Jaxen code to see what I needed to override. The extension points are not very well defined, so its hard to figure out what to override and what not to. A programmatic way to handle this would be to make some of the overridable methods abstract rather than returning null or UnsupportedOperationException. Alternatively, a one page writeup explaining what methods are needed to be overriden would be a great help.

Please let me know if you would like me to open a separate bug for this.

Show
Sujit Pal added a comment - Sure, that makes sense...if you want to go that route, would it make sense to have some documentation on how to build an adapter to work with a different object model? What helped me during building the Jericho adapter was the examples, and single stepping through the Jaxen code to see what I needed to override. The extension points are not very well defined, so its hard to figure out what to override and what not to. A programmatic way to handle this would be to make some of the overridable methods abstract rather than returning null or UnsupportedOperationException. Alternatively, a one page writeup explaining what methods are needed to be overriden would be a great help. Please let me know if you would like me to open a separate bug for this.
Hide
Elliotte Rusty Harold added a comment -

We could certainly consider that. We'd have to think about whether that's likely to break any existing navigators. I.e. are there any navigators that depend on these methods being implemented in the superclass? And of course more documentation is always appreciated.

Show
Elliotte Rusty Harold added a comment - We could certainly consider that. We'd have to think about whether that's likely to break any existing navigators. I.e. are there any navigators that depend on these methods being implemented in the superclass? And of course more documentation is always appreciated.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: