I have tried maven on a project where javadocs, xdocs, pom-comments are in a native language with many NON-ASCII characters.
This seems to reveal that maven is not acting clean with different encodings.
For instance the xdocs are XML. And XML allows me to use different encodings if properly declared in the xml header. However it only works if I encode the XML as UTF-8. If I use ISO-8859-1 then the produced HTML contains UTF-8 characters from the nationalized site messages (resource bundles of maven plugins) and maven dumps the ISO-8859-1 encoded characters into that and ends up with mixed encodings in one HTML page.
Additionally the JAVA files also cause trouble when I use a different encoding than UTF-8. I configured the "encoding" for javadoc plugin to ISO-8859-1 and used Java files in that encoding. The resulting javadoc HTML was written in ISO-8859-1 but the browser displayed it as UTF-8 and I had to switch explicitly to ISO-8859-1 in firefox in order to have the special characters displayed properly.
Further I encounter trouble when I use special characters in pom.xml files that go onto the generated web-site. In the end I could NOT find a way to have a site without problems - even when I encode everything as UTF-8.
Maybe there are too few developers involved from non english-speaking countries that are used to think beyond US-ASCII
Unfortunatly I can not tell where the problems come from - it may be XPP, doxia, site-plugin or individual reports or all together.
You need to properly distinguish between input and output encoding and have to be extremly careful with Stuff like byte
and never parse XML from strings.
Can you reproduce the problem or do you need dummy projects as test-cases?