UTF8EncodingProblem.zip ldbader@gmail.com This zip file contains a small subset of our project that can be used to demonstrate the reported UTF8 encoding problem. Simply extract the tree and make UTF8EncodingProblem the current directory. I am using Maven 1.1-beta-2 and maven-xdoc-plugin-1.9.2 on a Windows XP workstation with IBM Java SDK V1.4.2. When I attempt to build UTF-8 encoded HTML from UTF-8 XML source files, every special character is scrambled. To validate the input files, use your browser to examine UTF8EncodingProblem\src\xdocs\de\index.html UTF8EncodingProblem\src\xdocs\en\index.html UTF8EncodingProblem\src\xdocs\fr\index.html UTF8EncodingProblem\src\xdocs\zh_TW\index.html The browser should automatically use the UTF-8 encoding and the special characters should display correctly. I have included some Windows batch scripts to demonstrate the initial problem and a few alternatives I have tried to work around the problem. These scripts demonstrate the build process that should work. They will generate HTML from German, English, French, and Traditional Chinese respectively. build_de.bat build_en.bat build_fr.bat build_zh_TW.bat Run all 4 scripts and then examine UTF8EncodingProblem\target\docs\V1.0.1\de\index.html UTF8EncodingProblem\target\docs\V1.0.1\en\index.html UTF8EncodingProblem\target\docs\V1.0.1\fr\index.html UTF8EncodingProblem\target\docs\V1.0.1\zh_TW\index.html Compare the output with the source files listed above. Note that the German umlauts, the French grave, and all the Chinese characters are scrambled. The http://jira.codehaus.org/browse/MPXDOC-184 page recommends a technique that sets the system language and the system encoding using environment variables. There is no reason to believe that this will work on a Windows system, but just for fun, I create the following script to use the recommended values. It only builds the German page and it fails exactly the same way as build_de.bat. zTrial.bat Now if you postulate that the recommended technique works on a Linux system because some component is using the default Java file.encoding system property and the default locale, then I claim that I can acheive the same result by overriding the appropriate system properties using -D switches. The following scripts demonstrate this technique. They all fail exactly the same way. build_de_lcl.bat build_en_lcl.bat build_fr_lcl.bat build_zh_TW_lcl.bat - I have not tried to recreate this problem on a Windows workstation with the SUN SDK. - I have not tried the http://jira.codehaus.org/browse/MPXDOC-184 recommendations on a Linux platform. - I have not tried to replace the velocity.properties file in velocity-1.4.jar with corrected values for #---------------------------------------------------------------------------- # T E M P L A T E E N C O D I N G #---------------------------------------------------------------------------- input.encoding=ISO-8859-1 output.encoding=ISO-8859-1