Maven
  1. Maven
  2. MNG-2255

Use a external XML Pull parser instead of plexus one

    Details

    • Type: Improvement Improvement
    • Status: Closed Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0.4
    • Fix Version/s: 3.0-alpha-1
    • Component/s: POM
    • Labels:
      None
    • Complexity:
      Intermediate
    • Number of attachments :
      0

      Description

      To avoid maintaining the plexus XMLPullParser we should move to a standard implementation like StaX

      As mentioned in DOXIA-60:

      There's a performance white paper
      http://java.sun.com/performance/reference/whitepapers/StAX-1_0.pdf

      The best implementation seems to be http://woodstox.codehaus.org/

      Another nice set of utilities at http://wiki.java.net/bin/view/Javawsxml/StaxUtilsProject

      The indentator could be used instead of the PrettyPrintXMLWriter

        Issue Links

          Activity

          Hide
          Carlos Sanchez added a comment -

          Seems by related issues that people is having a lot of trouble with encodings

          Show
          Carlos Sanchez added a comment - Seems by related issues that people is having a lot of trouble with encodings
          Hide
          Stefan Hübner added a comment -

          Hm, just a thought, but:

          To me it seems that the parser used to read in maven's model isn't the problem really. It's more likely the way the POM is handed to the parser.

          Have a look to class org.apache.maven.model.io.xpp3.MavenXpp3Reader which is generated by maven-model. This MavenXpp3Reader is invoked via read-method with one parameter. This parameter is a Reader and that, to me, is problematic. Problematic because there's no way once you have a Reader on some inputsource (i.e. a stream) to change that reader's handling of character encodings on it's underlying stream. So, as long as reading the model is invoked by handing a Reader to some parser the parser won't be able to adapt to the actual POM's encoding.

          As a conclusion I would recommend model parser should be handed a stream rather than a Reader-instance working on the underlying POM.

          I don't know if other parsers can cope with that situation, but XMLPullParser doesn't. It would though, if it is given a stream on the POM, but I don't know which wider implications this would cause.

          What do you think?

          Show
          Stefan Hübner added a comment - Hm, just a thought, but: To me it seems that the parser used to read in maven's model isn't the problem really. It's more likely the way the POM is handed to the parser. Have a look to class org.apache.maven.model.io.xpp3.MavenXpp3Reader which is generated by maven-model. This MavenXpp3Reader is invoked via read-method with one parameter. This parameter is a Reader and that, to me, is problematic. Problematic because there's no way once you have a Reader on some inputsource (i.e. a stream) to change that reader's handling of character encodings on it's underlying stream. So, as long as reading the model is invoked by handing a Reader to some parser the parser won't be able to adapt to the actual POM's encoding. As a conclusion I would recommend model parser should be handed a stream rather than a Reader-instance working on the underlying POM. I don't know if other parsers can cope with that situation, but XMLPullParser doesn't. It would though, if it is given a stream on the POM, but I don't know which wider implications this would cause. What do you think?
          Hide
          Carlos Sanchez added a comment -

          Yes, that's part of the solution too

          Show
          Carlos Sanchez added a comment - Yes, that's part of the solution too
          Hide
          John Casey added a comment -

          this will require the implementation of a woodstox (or other StaX parser) plugin for Modello, in order to allow us to generate an appropriate model parser.

          Show
          John Casey added a comment - this will require the implementation of a woodstox (or other StaX parser) plugin for Modello, in order to allow us to generate an appropriate model parser.
          Hide
          Jason van Zyl added a comment -

          From looking at the related issues I think the crux of the problem is using a Reader instead of an InputStream.

          Show
          Jason van Zyl added a comment - From looking at the related issues I think the crux of the problem is using a Reader instead of an InputStream.
          Hide
          Stefan Hübner added a comment -

          IMHO: yes, there's the nail's head, Jason.

          Show
          Stefan Hübner added a comment - IMHO: yes, there's the nail's head, Jason.
          Hide
          Brett Porter added a comment -

          this is now in place in the branch for MNG-3397

          Show
          Brett Porter added a comment - this is now in place in the branch for MNG-3397
          Hide
          Jason van Zyl added a comment -

          We are now using the StAX parser from the Woodstox project.

          Show
          Jason van Zyl added a comment - We are now using the StAX parser from the Woodstox project.

            People

            • Assignee:
              Unassigned
              Reporter:
              Carlos Sanchez
            • Votes:
              4 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: