jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • Plexus Utils
  • PLXUTILS-11

MXParser can't handle the encoding declaration in XML declaration

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 1.4.4
  • Labels:
    None
  • Testcase included:
    yes

Description

The xml pull parser in plexus-utils(MXParser.java) can't handle the encoding declaration in XML declaration.
So, it's impossible to use an encoding different from system default encoding. This is critical in Japan, because
there is two commonly used encodings in Japanese environment(SJIS and EUC-JP).

I think MXParser should handle encoding declaration in xml as described in w3c specification/
http://www.w3.org/TR/REC-xml/#sec-guessing

I tried to fix this problem(see attachment).
I changed the setInput(InputStream) method to detect encoding in xml declaration.
For writing this code, I referred to source code of Apache Xerces.
UCS-4 and UCS-2 isn't supported in this implementation, because
these encoding isn't supported by Sun JDK.

Xerces solves this problem by providing original reader for these encodings. I think Xerces's solution is
too complex for plexus-utils.

To solve this issue, it's not sufficient only to change plexus-utils, because
DefaultMavenProjectBuilder reads POM by FileReader without specifying encoding.

  • Options
    • Sort By Name
    • Sort By Date
    • Ascending
    • Descending
    • Download All

Attachments

  1. File
    plexus-utils.diff
    23/Apr/06 2:44 AM
    58 kB
    Naoki Nose
  2. GZip Archive
    plexus-utils-test-resource.tar.gz
    23/Apr/06 2:48 AM
    0.6 kB
    Naoki Nose
  3. File
    PLXUTILS-11.diff
    26/Jun/07 3:23 PM
    34 kB
    Herve Boutemy
  4. Hide
    Java Archive File
    src.jar
    12/Mar/06 10:28 AM
    48 kB
    Naoki Nose
    1. File
      META-INF/MANIFEST.MF 0.0 kB
    2. Java Source File
      src/main/java/.../xml/pull/EncodingMap.java 38 kB
    3. Java Source File
      src/main/java/org/.../xml/pull/MXParser.java 137 kB
    4. Java Source File
      src/main/.../pull/RewindableInputStream.java 3 kB
    5. Java Source File
      src/main/java/.../pull/XmlPullParser.java 44 kB
    6. Java Source File
      src/main/.../XmlPullParserException.java 2 kB
    7. Java Source File
      src/test/java/.../xml/pull/MXParserTest.java 4 kB
    8. XML File
      src/test/java/org/.../xml/pull/utf16be.xml 0.1 kB
    9. XML File
      src/test/java/.../pull/utf16be_nodecl.xml 0.1 kB
    10. Java Source File
      src/test/.../RewindableInputStreamTest.java 8 kB
    11. XML File
      src/test/java/org/.../xml/pull/euc-jp.xml 0.1 kB
    12. XML File
      src/test/.../pull/euc-jp-invaliddecl.xml 0.1 kB
    13. XML File
      src/test/java/org/.../util/xml/pull/sjis.xml 0.1 kB
    14. XML File
      src/test/java/org/.../xml/pull/utf-8.xml 0.1 kB
    15. XML File
      src/test/java/.../xml/pull/euc-jp-nodecl.xml 0.0 kB
    Download Zip
    Show
    Java Archive File
    src.jar
    12/Mar/06 10:28 AM
    48 kB
    Naoki Nose

Issue Links

is depended upon by

Bug - A problem which impairs or prevents the functions of the product. MANTTASKS-14 Ant Tasks do not work on the ZOS

  • Critical - Crashes, loss of data, severe memory leak.
  • Open - The issue is open and ready for the assignee to start work on it.

Bug - A problem which impairs or prevents the functions of the product. MNG-2025 POM is still not read using the right encoding

  • Critical - Crashes, loss of data, severe memory leak.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.

Bug - A problem which impairs or prevents the functions of the product. DOXIA-133 default XML encoding (UTF-8) or XML encoding set in XML files is ignored: inputEncoding is used instead

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.

Bug - A problem which impairs or prevents the functions of the product. MECLIPSE-56 Generated .project-file misses encoding declaration

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.

Bug - A problem which impairs or prevents the functions of the product. MNG-2254 the encoding parameter in xml declaration of POM is ignored

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.

Bug - A problem which impairs or prevents the functions of the product. MSITE-239 encoding declaration in site.xml is ignored

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.
is related to

Bug - A problem which impairs or prevents the functions of the product. MEVENIDE-449 Mevenide plugin stumbels upon non ASCII characters in the path to the .m2 directory

  • Major - Major loss of function.
  • Resolved - A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.

Bug - A problem which impairs or prevents the functions of the product. MSITE-19 Various encoding problems with InputStream and XML

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.
is superceded by

Improvement - An improvement or enhancement to an existing feature or task. MNG-2255 Use a external XML Pull parser instead of plexus one

  • Critical - Crashes, loss of data, severe memory leak.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.
Show 4 more links (1 is depended upon by, 2 is related to, 1 is superceded by)

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Naoki Nose added a comment - 23/Apr/06 2:44 AM

patch file for plexus-utils.

Show
Naoki Nose added a comment - 23/Apr/06 2:44 AM patch file for plexus-utils.
Hide
Permalink
Naoki Nose added a comment - 23/Apr/06 2:48 AM

The test resource files for MXParserTest.

Show
Naoki Nose added a comment - 23/Apr/06 2:48 AM The test resource files for MXParserTest.
Hide
Permalink
Naoki Nose added a comment - 23/Apr/06 3:23 AM

I have attached additional two files, becaulse The format of my first attachment isn't diff format.

The file plexus-utils.diff is the patch file for plexus-utils. It contains xml encoding detection code.
The file plexus-utils-test-resources.tar.gz is the test resource files for MXParserTest.java.
The format of these resource files is binary, so plexux-utils.diff could'nt contain these files.

Show
Naoki Nose added a comment - 23/Apr/06 3:23 AM I have attached additional two files, becaulse The format of my first attachment isn't diff format. The file plexus-utils.diff is the patch file for plexus-utils. It contains xml encoding detection code. The file plexus-utils-test-resources.tar.gz is the test resource files for MXParserTest.java. The format of these resource files is binary, so plexux-utils.diff could'nt contain these files.
Hide
Permalink
Carlos Sanchez added a comment - 03/May/06 11:32 AM

We'd better switch to an external xml parser, see MNG-2255

Show
Carlos Sanchez added a comment - 03/May/06 11:32 AM We'd better switch to an external xml parser, see MNG-2255
Hide
Permalink
Stefan Hübner added a comment - 09/May/06 5:32 PM

As I just stated in MNG-2255 I think, that "...the parser used to read in maven's model isn't the problem really. It's more likely the way the POM is handed to the parser."

See MNG-2255 for full discussion of the issue.

Show
Stefan Hübner added a comment - 09/May/06 5:32 PM As I just stated in MNG-2255 I think, that "...the parser used to read in maven's model isn't the problem really. It's more likely the way the POM is handed to the parser." See MNG-2255 for full discussion of the issue.
Hide
Permalink
Herve Boutemy added a comment - 26/Jun/07 3:23 PM

Here is XmlReader class, taken out from Rome:
see https://rome.dev.java.net/apidocs/0_9/com/sun/syndication/io/XmlReader.html

With this class available, encoding detection is not necessary done by the XML parser, but can be handled by the XmlReader used: the parser does not need any complex code for encoding detection.

Show
Herve Boutemy added a comment - 26/Jun/07 3:23 PM Here is XmlReader class, taken out from Rome: see https://rome.dev.java.net/apidocs/0_9/com/sun/syndication/io/XmlReader.html With this class available, encoding detection is not necessary done by the XML parser, but can be handled by the XmlReader used: the parser does not need any complex code for encoding detection.

People

  • Assignee:
    Herve Boutemy
    Reporter:
    Naoki Nose
Vote (8)
Watch (7)

Dates

  • Created:
    12/Mar/06 10:28 AM
    Updated:
    02/Feb/08 3:34 PM
    Resolved:
    08/Jul/07 1:55 PM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.