jaxen

Pluggable entity resolution support for document()

Details

  • Type: New Feature New Feature
  • Status: Open Open
  • Priority: Major Major
  • Resolution: Unresolved
  • Affects Version/s: 1.0
  • Fix Version/s: 2.0
  • Component/s: None
  • Labels:
    None
  • Number of attachments :
    0

Description

The behavior of the document() function is currently navigator-dependent and hard-wired. This should be pluggable to allow people to, e.g., retrieve documents from the web, from JNDI, from a database, from a JAR, etc.

Personally, I came across this issue when I wanted to bundle up Jaxen together with a set of unit tests to validate integrity of the packages but couldn't get the document() function to look in the JAR instead of the filesystem.

Activity

Hide
Paul R. Brown added a comment -

Here is the first email I sent to the list on the subject:

Right now, the getDocument() function looks like:

public Object getDocument(String uri) throws FunctionCallException
{
try

{ DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); return builder.parse( uri ); }

catch (Exception e)

{ throw new FunctionCallException("Failed to parse doucment for URI: " + uri, e); }

}

I would like to suggest the following enhancements:

1) Cache the DocumentBuilderFactory and/or DocumentBuilder once created. (What is the status of a Navigator with respect to threads?)
2) Supply a method to register an EntityResolver instance with the Navigator so that custom resolution is possible.

I'll implement them and submit some code if there is agreement on what to do across the different navigators.

Thanks in advance.

Show
Paul R. Brown added a comment - Here is the first email I sent to the list on the subject: Right now, the getDocument() function looks like: public Object getDocument(String uri) throws FunctionCallException { try { DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); return builder.parse( uri ); } catch (Exception e) { throw new FunctionCallException("Failed to parse doucment for URI: " + uri, e); } } I would like to suggest the following enhancements: 1) Cache the DocumentBuilderFactory and/or DocumentBuilder once created. (What is the status of a Navigator with respect to threads?) 2) Supply a method to register an EntityResolver instance with the Navigator so that custom resolution is possible. I'll implement them and submit some code if there is agreement on what to do across the different navigators. Thanks in advance.
Hide
Paul R. Brown added a comment -

And here is the most recent mail I sent to the list:

A few months back, I talk with James Strachan about enhancing Jaxen so that invocations of the document() function would use custom URI resolvers.

I reached the point of scoping out the required changes, and I'd like to solicit feedback before making and committing the changes. Here's a list of what might be done:

The document function currently invokes the getDocument() method on the relevant Navigator; getDocument() accepts a URI and returns an Object. The current implementations are:

  • DOM: create new DocumentBuilderFactory, create new DocumentBuilder, call parse on String URI.
  • DOM4J: invoke the read() method on a SAXReader
  • Electric XML: pass a File to the Document() constructor
  • JDOM: pass String URI to the build() method on a SAXBuilder.

In the cases of DOM, JDOM, and DOM4J, the relevant methods all accept org.xml.sax.InputSource, so a switch to using an EntityResolver is straightforward in each case. For Electric XML, the InputStream or Reader will have to be unbundled from the InputSource, but is a dependency of Jaxen on SAX a desirable thing? EXML doesn't include SAX, but use of InputSource would require this. (I would say that SAX dependency is not a bad thing.)

The same question arises in the context of using an EntityResolver, i.e., EXML doesn't presuppose the presence of SAX classes. (In this case, Jaxen would also need to supply a default entity resolution behavior, i.e., use of java.net.URL.)

So, any input from the primary architects?

Show
Paul R. Brown added a comment - And here is the most recent mail I sent to the list: A few months back, I talk with James Strachan about enhancing Jaxen so that invocations of the document() function would use custom URI resolvers. I reached the point of scoping out the required changes, and I'd like to solicit feedback before making and committing the changes. Here's a list of what might be done: The document function currently invokes the getDocument() method on the relevant Navigator; getDocument() accepts a URI and returns an Object. The current implementations are:
  • DOM: create new DocumentBuilderFactory, create new DocumentBuilder, call parse on String URI.
  • DOM4J: invoke the read() method on a SAXReader
  • Electric XML: pass a File to the Document() constructor
  • JDOM: pass String URI to the build() method on a SAXBuilder.
In the cases of DOM, JDOM, and DOM4J, the relevant methods all accept org.xml.sax.InputSource, so a switch to using an EntityResolver is straightforward in each case. For Electric XML, the InputStream or Reader will have to be unbundled from the InputSource, but is a dependency of Jaxen on SAX a desirable thing? EXML doesn't include SAX, but use of InputSource would require this. (I would say that SAX dependency is not a bad thing.) The same question arises in the context of using an EntityResolver, i.e., EXML doesn't presuppose the presence of SAX classes. (In this case, Jaxen would also need to supply a default entity resolution behavior, i.e., use of java.net.URL.) So, any input from the primary architects?

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:

Time Tracking

Estimated:
3d
Original Estimate - 3 days
Remaining:
3d
Remaining Estimate - 3 days
Logged:
Not Specified
Time Spent - Not Specified