Plexus Utils
  1. Plexus Utils
  2. PLXUTILS-11

MXParser can't handle the encoding declaration in XML declaration

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.4.4
    • Labels:
      None
    • Testcase included:
      yes
    • Number of attachments :
      4

      Description

      The xml pull parser in plexus-utils(MXParser.java) can't handle the encoding declaration in XML declaration.
      So, it's impossible to use an encoding different from system default encoding. This is critical in Japan, because
      there is two commonly used encodings in Japanese environment(SJIS and EUC-JP).

      I think MXParser should handle encoding declaration in xml as described in w3c specification/
      http://www.w3.org/TR/REC-xml/#sec-guessing

      I tried to fix this problem(see attachment).
      I changed the setInput(InputStream) method to detect encoding in xml declaration.
      For writing this code, I referred to source code of Apache Xerces.
      UCS-4 and UCS-2 isn't supported in this implementation, because
      these encoding isn't supported by Sun JDK.

      Xerces solves this problem by providing original reader for these encodings. I think Xerces's solution is
      too complex for plexus-utils.

      To solve this issue, it's not sufficient only to change plexus-utils, because
      DefaultMavenProjectBuilder reads POM by FileReader without specifying encoding.

        Issue Links

          Activity

          Hide
          Naoki Nose added a comment -

          patch file for plexus-utils.

          Show
          Naoki Nose added a comment - patch file for plexus-utils.
          Hide
          Naoki Nose added a comment -

          The test resource files for MXParserTest.

          Show
          Naoki Nose added a comment - The test resource files for MXParserTest.
          Hide
          Naoki Nose added a comment -

          I have attached additional two files, becaulse The format of my first attachment isn't diff format.

          The file plexus-utils.diff is the patch file for plexus-utils. It contains xml encoding detection code.
          The file plexus-utils-test-resources.tar.gz is the test resource files for MXParserTest.java.
          The format of these resource files is binary, so plexux-utils.diff could'nt contain these files.

          Show
          Naoki Nose added a comment - I have attached additional two files, becaulse The format of my first attachment isn't diff format. The file plexus-utils.diff is the patch file for plexus-utils. It contains xml encoding detection code. The file plexus-utils-test-resources.tar.gz is the test resource files for MXParserTest.java. The format of these resource files is binary, so plexux-utils.diff could'nt contain these files.
          Hide
          Carlos Sanchez added a comment -

          We'd better switch to an external xml parser, see MNG-2255

          Show
          Carlos Sanchez added a comment - We'd better switch to an external xml parser, see MNG-2255
          Hide
          Stefan Hübner added a comment -

          As I just stated in MNG-2255 I think, that "...the parser used to read in maven's model isn't the problem really. It's more likely the way the POM is handed to the parser."

          See MNG-2255 for full discussion of the issue.

          Show
          Stefan Hübner added a comment - As I just stated in MNG-2255 I think, that "...the parser used to read in maven's model isn't the problem really. It's more likely the way the POM is handed to the parser." See MNG-2255 for full discussion of the issue.
          Hide
          Herve Boutemy added a comment -

          Here is XmlReader class, taken out from Rome:
          see https://rome.dev.java.net/apidocs/0_9/com/sun/syndication/io/XmlReader.html

          With this class available, encoding detection is not necessary done by the XML parser, but can be handled by the XmlReader used: the parser does not need any complex code for encoding detection.

          Show
          Herve Boutemy added a comment - Here is XmlReader class, taken out from Rome: see https://rome.dev.java.net/apidocs/0_9/com/sun/syndication/io/XmlReader.html With this class available, encoding detection is not necessary done by the XML parser, but can be handled by the XmlReader used: the parser does not need any complex code for encoding detection.

            People

            • Assignee:
              Herve Boutemy
              Reporter:
              Naoki Nose
            • Votes:
              8 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: