jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
Signup
Maven 2 & 3
  • Maven 2 & 3
  • MNG-2932

Encoding chaos

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 2.0.4, 2.0.5, 2.0.6
  • Fix Version/s: 2.0.8
  • Component/s: POM::Encoding
  • Labels:
    None
  • Environment:
    windows, linux
  • Complexity:
    Intermediate
  • Number of attachments :
    0

Description

I have tried maven on a project where javadocs, xdocs, pom-comments are in a native language with many NON-ASCII characters.
This seems to reveal that maven is not acting clean with different encodings.

For instance the xdocs are XML. And XML allows me to use different encodings if properly declared in the xml header. However it only works if I encode the XML as UTF-8. If I use ISO-8859-1 then the produced HTML contains UTF-8 characters from the nationalized site messages (resource bundles of maven plugins) and maven dumps the ISO-8859-1 encoded characters into that and ends up with mixed encodings in one HTML page.

Additionally the JAVA files also cause trouble when I use a different encoding than UTF-8. I configured the "encoding" for javadoc plugin to ISO-8859-1 and used Java files in that encoding. The resulting javadoc HTML was written in ISO-8859-1 but the browser displayed it as UTF-8 and I had to switch explicitly to ISO-8859-1 in firefox in order to have the special characters displayed properly.

Further I encounter trouble when I use special characters in pom.xml files that go onto the generated web-site. In the end I could NOT find a way to have a site without problems - even when I encode everything as UTF-8.

Maybe there are too few developers involved from non english-speaking countries that are used to think beyond US-ASCII

Unfortunatly I can not tell where the problems come from - it may be XPP, doxia, site-plugin or individual reports or all together.
You need to properly distinguish between input and output encoding and have to be extremly careful with Stuff like byte[]
and never parse XML from strings.

Can you reproduce the problem or do you need dummy projects as test-cases?

Issue Links

depends upon

Bug - A problem which impairs or prevents the functions of the product. DOXIA-133 default XML encoding (UTF-8) or XML encoding set in XML files is ignored: inputEncoding is used instead

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.
duplicates

Bug - A problem which impairs or prevents the functions of the product. MNG-2254 the encoding parameter in xml declaration of POM is ignored

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.
relates to

Improvement - An improvement or enhancement to an existing feature or task. MNG-2216 Add default encodings section to POM

  • Major - Major loss of function.
  • Open - The issue is open and ready for the assignee to start work on it.

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Jörg Hohwiller added a comment - 24/Apr/07 9:41 AM

Some further details on this issue:

1. The maven-javadoc-plugin is okay. It only calls javadoc and is not doing any character transformations. You need to use the parameter "charset" rather than "encoding" or "docencoding".

2. The bug in maven-site-plugin seems to be mainly related to:
http://codehaus01a.managed.contegix.com/browse/MNG-722

Anyways there are some other problems that seem to be not related to MNG-772

Show
Jörg Hohwiller added a comment - 24/Apr/07 9:41 AM Some further details on this issue: 1. The maven-javadoc-plugin is okay. It only calls javadoc and is not doing any character transformations. You need to use the parameter "charset" rather than "encoding" or "docencoding". 2. The bug in maven-site-plugin seems to be mainly related to: http://codehaus01a.managed.contegix.com/browse/MNG-722 Anyways there are some other problems that seem to be not related to MNG-772
Hide
Permalink
Timo Stülten added a comment - 02/May/07 4:24 PM

Working on OS X and applying
<plugin>
<artifactId>maven-site-plugin</artifactId>
<configuration>
<outputEncoding>utf-8</outputEncoding>
</configuration>
</plugin>
from the mentioned MNG-722 does not fix the broken encoding.

An 'ü' (u umlaut) from the pom.xml (using UTF-8) is written as '?º' with <outputEncoding> and as '?º' without to the team-list.html.

Avoiding non-ASCII is no problem for sources, but ugly for team member names.

Show
Timo Stülten added a comment - 02/May/07 4:24 PM Working on OS X and applying <plugin> <artifactId>maven-site-plugin</artifactId> <configuration> <outputEncoding>utf-8</outputEncoding> </configuration> </plugin> from the mentioned MNG-722 does not fix the broken encoding. An 'ü' (u umlaut) from the pom.xml (using UTF-8) is written as '?º' with <outputEncoding> and as '?º' without to the team-list.html. Avoiding non-ASCII is no problem for sources, but ugly for team member names.
Jason van Zyl made changes - 14/Jun/07 11:42 PM
Field Original Value New Value
Fix Version/s Reviewed [ 13555 ]
Jason van Zyl made changes - 16/Jun/07 6:05 PM
Component/s POM::Encoding [ 12860 ]
Component/s Sites & Reporting [ 12030 ]
Jason van Zyl made changes - 21/Jun/07 4:37 PM
Fix Version/s 2.0.8 [ 13139 ]
Fix Version/s Reviewed Pending Version Assignment [ 13555 ]
Carlos Sanchez made changes - 21/Jun/07 4:47 PM
Link This issue duplicates MNG-2254 [ MNG-2254 ]
Carlos Sanchez made changes - 21/Jun/07 4:48 PM
Link This issue relates to MNG-2216 [ MNG-2216 ]
Herve Boutemy made changes - 24/Jul/07 5:06 PM
Link This issue depends upon DOXIA-133 [ DOXIA-133 ]
Hide
Permalink
Herve Boutemy added a comment - 17/Oct/07 4:13 PM

POM reading is fixed in MNG-2254: everything should work fine now

Show
Herve Boutemy added a comment - 17/Oct/07 4:13 PM POM reading is fixed in MNG-2254 : everything should work fine now
Herve Boutemy made changes - 17/Oct/07 4:13 PM
Resolution Fixed [ 1 ]
Assignee Herve Boutemy [ hboutemy ]
Status Open [ 1 ] Closed [ 6 ]
Hide
Permalink
Jörg Hohwiller added a comment - 04/Dec/07 9:09 AM

Congratulations on reading the texts from the POMs. This feature seems to work properly now.

However this is not all about the "encoding chaos" I was talking about.
There is still the localization of the texts for the default reports. As it seems maven uses resource-bundles encoded as UTF-8.
My site (the HTML) is configured to be encoded as UTF-8. What I get is that the special characters from the resource-bundles are
twice UTF-8 encoded so the german translation of "dependencies" which is "Abhängigkeiten" appears as "Abhängigkeiten".

Show
Jörg Hohwiller added a comment - 04/Dec/07 9:09 AM Congratulations on reading the texts from the POMs. This feature seems to work properly now. However this is not all about the "encoding chaos" I was talking about. There is still the localization of the texts for the default reports. As it seems maven uses resource-bundles encoded as UTF-8. My site (the HTML) is configured to be encoded as UTF-8. What I get is that the special characters from the resource-bundles are twice UTF-8 encoded so the german translation of "dependencies" which is "Abhängigkeiten" appears as "Abhängigkeiten".
Hide
Permalink
Herve Boutemy added a comment - 04/Dec/07 1:25 PM

there was a real chaos, with many problems.
I tried the other problem you just reported = german texts in project-info-report
I was able to reproduce the problem, which is in MPIR, not in Maven itself (german umlaut in pom.xml was a Maven problem)
and I found that it is fixed in svn since August 15 2006, in 2.1-SNAPSHOT version: I checked, and it works perfectly
there is work in progress to release version 2.1 in the next weeks

if there are other problems, please open a dedicated Jira issue for each one, since the encoding chaos is now globally fixed: we need now focused reports on precise bugs that could still be here

Show
Herve Boutemy added a comment - 04/Dec/07 1:25 PM there was a real chaos, with many problems. I tried the other problem you just reported = german texts in project-info-report I was able to reproduce the problem, which is in MPIR, not in Maven itself (german umlaut in pom.xml was a Maven problem) and I found that it is fixed in svn since August 15 2006, in 2.1-SNAPSHOT version: I checked, and it works perfectly there is work in progress to release version 2.1 in the next weeks if there are other problems, please open a dedicated Jira issue for each one, since the encoding chaos is now globally fixed: we need now focused reports on precise bugs that could still be here

People

  • Assignee:
    Herve Boutemy
    Reporter:
    Jörg Hohwiller
Vote (3)
Watch (2)

Dates

  • Created:
    05/Apr/07 3:47 AM
    Updated:
    04/Dec/07 1:25 PM
    Resolved:
    17/Oct/07 4:13 PM
  • Atlassian JIRA (v5.2.7#850-sha1:b2af0c8)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.