Details
-
Type:
Bug
-
Status:
Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: 1.9.2
-
Fix Version/s: None
-
Labels:None
-
Environment:Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP workstation with IBM Java SDK V1.4.2
-
Testcase included:yes
-
Number of attachments :2
Description
When I attempt to build UTF-8 encoded HTML from UTF-8 XML source files, every special character is scrambled.
We are using the xDoc plugin to generate the HTML for our on-line user guide. We sent the English source files and the default properties file to 9 translation centers. The translators returned valid UTF-8 source, but xDoc will not generate valid UTF-8 HTML.
I have attached a very small subset of our product that demonstrates this problem. See the README.txt file within the ZIP archive for information about how to use the supplied scripts to build the output for German, English, French, and Traditional Chinese and view the result.
-
Hide
- UTF8EncodingProblem.zip
- 30/Apr/06 5:06 PM
- 188 kB
- Lance Bader
-
- UTF8EncodingProblem/project.properties 1 kB
- UTF8EncodingProblem/project.xml 3 kB
- UTF8EncodingProblem/README.txt 3 kB
- UTF8EncodingProblem/src/.../wasce.properties 46 kB
- UTF8EncodingProblem/.../wasce_de.properties 42 kB
- UTF8EncodingProblem/.../wasce_fr.properties 45 kB
- UTF8EncodingProblem/.../wasce_zh_TW.properties 54 kB
- UTF8EncodingProblem/src/.../banner_left.gif 3 kB
- UTF8EncodingProblem/src/.../banner_right.gif 12 kB
- UTF8EncodingProblem/src/.../nglinux.gif 0.3 kB
- UTF8EncodingProblem/src/.../images/ngwin.gif 0.3 kB
- UTF8EncodingProblem/.../powered_by_100x30.gif 1 kB
- UTF8EncodingProblem/src/.../de/index.xml 8 kB
- UTF8EncodingProblem/src/.../navigation.xml 19 kB
- UTF8EncodingProblem/src/.../project.css 0.1 kB
- UTF8EncodingProblem/src/.../banner_left.gif 3 kB
- UTF8EncodingProblem/src/.../banner_right.gif 12 kB
- UTF8EncodingProblem/src/.../nglinux.gif 0.3 kB
- UTF8EncodingProblem/src/.../images/ngwin.gif 0.3 kB
- UTF8EncodingProblem/.../powered_by_100x30.gif 1 kB
- UTF8EncodingProblem/src/.../en/index.xml 8 kB
- UTF8EncodingProblem/src/.../navigation.xml 19 kB
- UTF8EncodingProblem/src/.../project.css 0.1 kB
- UTF8EncodingProblem/src/.../banner_left.gif 3 kB
- UTF8EncodingProblem/src/.../banner_right.gif 12 kB
- UTF8EncodingProblem/src/.../nglinux.gif 0.3 kB
- UTF8EncodingProblem/src/.../images/ngwin.gif 0.3 kB
- UTF8EncodingProblem/.../powered_by_100x30.gif 1 kB
- UTF8EncodingProblem/src/.../fr/index.xml 7 kB
- UTF8EncodingProblem/src/.../navigation.xml 19 kB
-
Hide
- UTF8EncodingProblem.zip
- 28/Apr/06 9:11 PM
- 147 kB
- Lance Bader
-
- UTF8EncodingProblem/project.properties 1 kB
- UTF8EncodingProblem/project.xml 3 kB
- UTF8EncodingProblem/README.txt 3 kB
- UTF8EncodingProblem/src/.../wasce.properties 46 kB
- UTF8EncodingProblem/.../wasce_de.properties 41 kB
- UTF8EncodingProblem/.../wasce_fr.properties 44 kB
- UTF8EncodingProblem/.../wasce_zh_TW.properties 41 kB
- UTF8EncodingProblem/src/.../banner_left.gif 3 kB
- UTF8EncodingProblem/src/.../banner_right.gif 12 kB
- UTF8EncodingProblem/src/.../nglinux.gif 0.3 kB
- UTF8EncodingProblem/src/.../images/ngwin.gif 0.3 kB
- UTF8EncodingProblem/.../powered_by_100x30.gif 1 kB
- UTF8EncodingProblem/src/.../de/index.xml 8 kB
- UTF8EncodingProblem/src/.../navigation.xml 19 kB
- UTF8EncodingProblem/src/.../project.css 0.1 kB
- UTF8EncodingProblem/src/.../banner_left.gif 3 kB
- UTF8EncodingProblem/src/.../banner_right.gif 12 kB
- UTF8EncodingProblem/src/.../nglinux.gif 0.3 kB
- UTF8EncodingProblem/src/.../images/ngwin.gif 0.3 kB
- UTF8EncodingProblem/.../powered_by_100x30.gif 1 kB
- UTF8EncodingProblem/src/.../en/index.xml 8 kB
- UTF8EncodingProblem/src/.../navigation.xml 19 kB
- UTF8EncodingProblem/src/.../project.css 0.1 kB
- UTF8EncodingProblem/src/.../banner_left.gif 3 kB
- UTF8EncodingProblem/src/.../banner_right.gif 12 kB
- UTF8EncodingProblem/src/.../nglinux.gif 0.3 kB
- UTF8EncodingProblem/src/.../images/ngwin.gif 0.3 kB
- UTF8EncodingProblem/.../powered_by_100x30.gif 1 kB
- UTF8EncodingProblem/src/.../fr/index.xml 7 kB
- UTF8EncodingProblem/src/.../navigation.xml 19 kB
Activity
NOTE: Although it has no affect on this problem, I have discovered a defect in the test case I supplied. The properties files in src\i18nBundles have not been converted to the required ASCII encoding. I expected the translators to return ASCII encoded files, but they used some native format instead. As a result, the navigation items, the section, headers, and the subsection headers will appear wrong, even if the rest of the page is generated correctly.
I will attach an updated test case when I have converted the properties files correctly. I first have to find out what encoding the translators used (it is obviously not UTF-8) and fix them with native2ascii.
Lance
I found an old Red Hat Linux system where I could run the supplied test case. Precisely, it is Red Hat Enterprise Linux V4 update 3 for i386. I installed Maven 1.1-beta-2 with maven-xdoc-plugin-1.9.2 and the attached test case. I created a script that matches the actions in build_de.bat, build_en.bat, build_fr.bat, and build_zh_TW.bat.
Except for the unrelated problem caused by poison properties files in src\i18nBundles (see the problem report in a previous comment), the HTML was generated CORRECTLY.
NOTE: I did NOT have to modify the LANG or LC_CTYPE environment variables, as suggested in the xDoc plugin FAQ or in http://jira.codehaus.org/browse/MPXDOC-184 . By default, LANG was already set to LANG="en_US.UTF-8". I dumped the Java system properties and observed that file.encoding=UTF-8 by default.
So, that begs the question, "Why doesn't this work on a Windows XP system when you use -Dfile.encoding=UTF-8 to override the default file encoding?" Its a mystery.
This is an updated test case where the poison properties files have been converted to ASCII using the native2ascii utility. When you run the supplied batch files on a Windows system, the navigation pane, the section names, and the subsection names will appear correctly, however, special characters in the other content will be scrambled.
Lance
Since I opened the issue I copied the same archive to a workstation using the Sun JDK V1.5 Update 6 and recreated the same problem. I think this means that it is unlikely that the IBM JDK is causing the problem.
I have also modified the velocity-1.4.jar file in the .maven\repository\velocity\jars directory. I replaced org\apache\velocity\runtime\defaults\velocity.properties after making the following changes.
#----------------------------------------------------------------------------
#----------------------------------------------------------------------------
input.encoding=UTF-8
output.encoding=UTF-8
I recreated the problem after making this change. I like to think that the problem is in velocity, but this change did not affect the outcome.
Next I will move to Red Hat Linux and try the suggested work around there.
Lance