Maven Help Plugin
  1. Maven Help Plugin
  2. MPH-87

help:effective-pom uses platform encoding and garbles non-ascii characters, emits invalid XML

    Details

    • Type: Bug Bug
    • Status: Open Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 2.1.1
    • Fix Version/s: None
    • Component/s: effective-pom
    • Labels:
      None
    • Environment:
      Windows, MacOSX, Linux, Maven 3.0.4
    • Number of attachments :
      1

      Description

      As stated in http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info XML files without a BOM and without a XML encoding declaration should read the XML as UTF-8.

      help:effective-pom does use the platform encoding for writing the effective-pom without emitting an appropriate XML encoding declaration in the resulting XML file.

      I have created a small sample project (available at https://github.com/mfriedenhagen/invalidpom, attached as ZIP) which will reproduce the issue.

      While the parent pom (https://raw.github.com/mfriedenhagen/invalidpom/master/pom.xml) has a XML encoding declaration, https://raw.github.com/mfriedenhagen/invalidpom/master/child-invalid/pom.xml has none.

      Now running:

      mvn -s settings.xml -gs settings.xml clean validate
      

      will produce an invalid character for the developer name "Jörg" in child-invalid.

      Two workarounds are:

      • to include a XML encoding declaration as done in child-valid.
      • to use JAVA_TOOL_OPTIONS on Windows as stated in http://stackoverflow.com/a/623036/49132
      • to use MAVEN_OPTS=-Dfile.encoding=utf-8 mvn -s settings.xml -gs settings.xml clean validate.

      Nonetheless I consider this a Major bug, as it clearly violates the recommendations of W3C.

        Activity

        Mirko Friedenhagen made changes -
        Field Original Value New Value
        Attachment mfriedenhagen-invalidpom-MPH-87-0-g42a5c31.zip [ 59306 ]
        Robert Scholte made changes -
        Description As stated in http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info XML files without a BOM and without a XML encoding declaration should read the XML as UTF-8.

        {{help:effective-pom}} does use the platform encoding for writing the effective-pom without emitting an appropriate XML encoding declaration in the resulting XML file.

        I have created a small sample project (available at https://github.com/mfriedenhagen/invalidpom, attached as ZIP) which will reproduce the issue.

        While the parent pom (https://raw.github.com/mfriedenhagen/invalidpom/master/pom.xml) has a XML encoding declaration, https://raw.github.com/mfriedenhagen/invalidpom/master/child-invalid/pom.xml has none.

        Now running:
        {code}
        mvn -s settings.xml -gs settings.xml clean validate
        {code}

        will produce an invalid character for the developer name "Jörg" in {{child-invalid}}.

        Two workarounds are:
        * to include a XML encoding declaration as done in {{child-valid}}.
        * to use {{JAVA_TOOL_OPTIONS}} on Windows as stated in http://stackoverflow.com/a/623036/49132
        * to use {{MAVEN_OPTS=-Dfile.encoding=utf-8 mvn -s settings.xml -gs settings.xml clean validate}}.

        Nonetheless I consider this a Major bug, as it clearly violates the recommendations of W3C.
        As stated in http://www.w3.org/TR/REC-xml/#sec-guessing-no-ext-info XML files without a BOM and without a XML encoding declaration should read the XML as UTF-8.

        {{help:effective-pom}} does use the platform encoding for writing the effective-pom without emitting an appropriate XML encoding declaration in the resulting XML file.

        I have created a small sample project (available at https://github.com/mfriedenhagen/invalidpom, attached as ZIP) which will reproduce the issue.

        While the parent pom (https://raw.github.com/mfriedenhagen/invalidpom/master/pom.xml) has a XML encoding declaration, https://raw.github.com/mfriedenhagen/invalidpom/master/child-invalid/pom.xml has none.

        Now running:
        {code}
        mvn -s settings.xml -gs settings.xml clean validate
        {code}

        will produce an invalid character for the developer name "Jörg" in {{child-invalid}}.

        Two workarounds are:
        * to include a XML encoding declaration as done in {{child-valid}}.
        * to use {{JAVA_TOOL_OPTIONS}} on Windows as stated in http://stackoverflow.com/a/623036/49132
        * to use {{MAVEN_OPTS=-Dfile.encoding=utf-8 mvn -s settings.xml -gs settings.xml clean validate}}.

        Nonetheless I consider this a Major bug, as it clearly violates the recommendations of W3C.
        Component/s effective-pom [ 15784 ]

          People

          • Assignee:
            Unassigned
            Reporter:
            Mirko Friedenhagen
          • Votes:
            2 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated: