jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • Maven 2.x and 3.x Site Plugin
  • MSITE-19

Various encoding problems with InputStream and XML

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 2.0-beta-6
  • Component/s: encoding
  • Labels:
    None

Description

There is various encoding problems with InputStream and XML in different components.

  • Property resource file is encoded with UTF-8 , but Java reads bundle with UTF-8.
  • In different components Reader is constructed with default system encoding.
  • MXParser ignores encoding attribute in xml declaration.
  • Options
    • Sort By Name
    • Sort By Date
    • Ascending
    • Descending
    • Download All

Attachments

  1. Text File
    plexus-i18n.diff
    03/Nov/05 11:37 AM
    2 kB
    Vincent Siveton
  2. Text File
    plexus-site-renderer.diff
    03/Nov/05 11:37 AM
    0.9 kB
    Vincent Siveton
  3. Text File
    plexus-utils_2.diff
    03/Nov/05 11:37 AM
    0.6 kB
    Vincent Siveton
  4. Text File
    plexus-utils.diff
    03/Nov/05 11:37 AM
    1 kB
    Vincent Siveton
  5. File
    project-info-report_ja.properties
    22/Jan/06 8:42 AM
    18 kB
    Naoki Nose
  6. File
    project-info-report_zh_CN.properties
    11/Feb/06 3:54 AM
    26 kB
    Yue Ni
  7. File
    project-info-report_zh_CN.properties
    11/Feb/06 3:19 AM
    26 kB
    Yue Ni
  8. File
    site-plugin_ja.properties
    22/Jan/06 8:40 AM
    2 kB
    Naoki Nose
  9. File
    site-plugin_zh_CN.properties
    11/Feb/06 3:17 AM
    3 kB
    Yue Ni

Issue Links

depends upon

Improvement - An improvement or enhancement to an existing feature or task. DOXIA-60 Use a external XML Pull parser instead of plexus one

  • Critical - Crashes, loss of data, severe memory leak.
  • Open - The issue is open and ready for the assignee to start work on it.
is related to

Bug - A problem which impairs or prevents the functions of the product. MSITE-123 Output encoding is UTF-8 despite outputEncoding is set to ISO-8859-1

  • Minor - Minor loss of function, or other problem where easy workaround is present.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.

Task - A task that needs to be done. DOXIA-119 How to deal with encoding and documentation

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.
relates to

Bug - A problem which impairs or prevents the functions of the product. MSITE-239 encoding declaration in site.xml is ignored

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.

Bug - A problem which impairs or prevents the functions of the product. PLXUTILS-11 MXParser can't handle the encoding declaration in XML declaration

  • Major - Major loss of function.
  • Closed - The issue is considered finished, the resolution is correct. Issues which are not closed can be reopened.

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Vincent Siveton added a comment - 03/Nov/05 11:37 AM

This issue appears currently for a Japanese translation and maybe for other East Asian languages (CJK charsets).

  • Using a VM parameter could be a good starting point -Dfile.encoding=UTF-8 (to add to MAVEN_OPTS).
  • Java reads bundles stream with the ISO-8859-1 charset.
    PropertyResourceBundle class uses Properties internally: the ISO 8859-1 character encoding is used to load properties.
    Have a look to the API:
    http://java.sun.com/j2se/1.4.2/docs/api/java/util/PropertyResourceBundle.html
    http://java.sun.com/j2se/1.4.2/docs/api/java/util/Properties.html
    So, I propose to correct plexus-i18n and use it instead of ResourceBundle.getBundle() calls (I think specifically in maven-project-info-reports-plugin subproject). See plexus-i18n.diff.
    Another solution could be to use native2ascii in each bundles but IMHO it is not really human readable.
  • Xpp3DomBuilder in plexus-util seems to not handle correctly encoding parameter in XML header. So, plexus-site-renderer component doesn't generate a site descriptor with special characters.
    Have a look to plexus-utils.diff and plexus-site-renderer.diff
    Another issue could be in the toString() method from Xpp3Dom class: we need to add a default encoding. See plexus-utils_2.diff.
  • Finally, IMHO, I don't think that the StringInputStream class in plexus-utils component has a good implementation because no encoding is defined. Maybe we could migrate to the StringInputStream class from Ant project.
    http://svn.apache.org/repos/asf/ant/core/trunk/src/main/org/apache/tools/ant/filters/StringInputStream.java

It is hard to debug charset problems and depends on several factors.
Other ideas are welcome.

Show
Vincent Siveton added a comment - 03/Nov/05 11:37 AM This issue appears currently for a Japanese translation and maybe for other East Asian languages (CJK charsets).
  • Using a VM parameter could be a good starting point -Dfile.encoding=UTF-8 (to add to MAVEN_OPTS).
  • Java reads bundles stream with the ISO-8859-1 charset. PropertyResourceBundle class uses Properties internally: the ISO 8859-1 character encoding is used to load properties. Have a look to the API: http://java.sun.com/j2se/1.4.2/docs/api/java/util/PropertyResourceBundle.html http://java.sun.com/j2se/1.4.2/docs/api/java/util/Properties.html So, I propose to correct plexus-i18n and use it instead of ResourceBundle.getBundle() calls (I think specifically in maven-project-info-reports-plugin subproject). See plexus-i18n.diff. Another solution could be to use native2ascii in each bundles but IMHO it is not really human readable.
  • Xpp3DomBuilder in plexus-util seems to not handle correctly encoding parameter in XML header. So, plexus-site-renderer component doesn't generate a site descriptor with special characters. Have a look to plexus-utils.diff and plexus-site-renderer.diff Another issue could be in the toString() method from Xpp3Dom class: we need to add a default encoding. See plexus-utils_2.diff.
  • Finally, IMHO, I don't think that the StringInputStream class in plexus-utils component has a good implementation because no encoding is defined. Maybe we could migrate to the StringInputStream class from Ant project. http://svn.apache.org/repos/asf/ant/core/trunk/src/main/org/apache/tools/ant/filters/StringInputStream.java
It is hard to debug charset problems and depends on several factors. Other ideas are welcome.
Hide
Permalink
Vincent Siveton added a comment - 03/Nov/05 11:37 AM

plexus-i18n.diff

Show
Vincent Siveton added a comment - 03/Nov/05 11:37 AM plexus-i18n.diff
Hide
Permalink
Vincent Siveton added a comment - 03/Nov/05 11:37 AM

plexus-site-renderer.diff

Show
Vincent Siveton added a comment - 03/Nov/05 11:37 AM plexus-site-renderer.diff
Hide
Permalink
Vincent Siveton added a comment - 03/Nov/05 11:37 AM

plexus-utils.diff

Show
Vincent Siveton added a comment - 03/Nov/05 11:37 AM plexus-utils.diff
Hide
Permalink
Vincent Siveton added a comment - 03/Nov/05 11:37 AM

plexus-utils_2.diff

Show
Vincent Siveton added a comment - 03/Nov/05 11:37 AM plexus-utils_2.diff
Hide
Permalink
Lukas Theussl added a comment - 03/Nov/05 12:56 PM

The german translation also has some problems. The properties files are UTF-8 encoded, but the html output is unreadable (even with -Dfile.encoding=UTF-8, LC_ALL=en_US.UTF-8, checked with test9 of the site plugin). Strangely, the french properties files are not UTF-8 encoded (contrary to our own standarts), but the html result is correct in UTF-8. This definitely has to be sorted out before more translations are coming in...

Show
Lukas Theussl added a comment - 03/Nov/05 12:56 PM The german translation also has some problems. The properties files are UTF-8 encoded, but the html output is unreadable (even with -Dfile.encoding=UTF-8, LC_ALL=en_US.UTF-8, checked with test9 of the site plugin). Strangely, the french properties files are not UTF-8 encoded (contrary to our own standarts), but the html result is correct in UTF-8. This definitely has to be sorted out before more translations are coming in...
Hide
Permalink
Michael Schnake added a comment - 09/Nov/05 9:27 AM

While trying to get meaningful results building a (default, that is no own apt etc. files) site in german the current (= maven2 build from SVN) situation seems to be that it is impossible right now. My default file.encoding is UTF-8. I have maven-site-plugin configured with <outputEncoding>UTF-8</outputEncoding>.

The situation "out of the box" with regard to the german umlauts for the generated side is:
=> Result: site content has garbage, site navigator is correct, organization name (from pom.xlm) in copyright statement is correct.

1. Despite the statement at http://maven.apache.org/plugins/maven-site-plugin/i18n.html all Java .properties files must be encoded "ISO-8859-1 with unicode escapes as needed" (as defined by the Java API and already stated above). So I converted site-plugin_de.properties from UTF-8 to ISO-8859-1.
=> Result: Site content is correct, site navigator has garbage, organization name is correct.

2. Well, the component building the site navigator seems to (incorrectly, or at least "non Property API-Doc conforming") read site-plugin_de.properties using my platform default encoding (= UTF-8). So I called "mvn site" with MAVEN_OPTS="-Dfile.encoding=ISO-8859-1".
=> Result: Site content is correct, site navigator is correct, organization name has garbage.

So, now the organziation name has garbage, although it comes from my pom.xlm which explicitly states <?xml version="1.0" encoding="UTF-8"?>. But the parser reading the organization name from there seems to ignore that and uses the platform encoding (= ISO-8859-1 in the step above) instead.

The net result is that you currently have to sacrifice one of site [content | navigator | copyright]. But, hey, two out of three is not that bad Note that the previous comments for this bug already seem to explain (and probably fix) that behaviour. But perhaps this comment helps those struggling with site i18n until this is fixed.

Show
Michael Schnake added a comment - 09/Nov/05 9:27 AM While trying to get meaningful results building a (default, that is no own apt etc. files) site in german the current (= maven2 build from SVN) situation seems to be that it is impossible right now. My default file.encoding is UTF-8. I have maven-site-plugin configured with <outputEncoding>UTF-8</outputEncoding>. The situation "out of the box" with regard to the german umlauts for the generated side is: => Result: site content has garbage, site navigator is correct, organization name (from pom.xlm) in copyright statement is correct. 1. Despite the statement at http://maven.apache.org/plugins/maven-site-plugin/i18n.html all Java .properties files must be encoded "ISO-8859-1 with unicode escapes as needed" (as defined by the Java API and already stated above). So I converted site-plugin_de.properties from UTF-8 to ISO-8859-1. => Result: Site content is correct, site navigator has garbage, organization name is correct. 2. Well, the component building the site navigator seems to (incorrectly, or at least "non Property API-Doc conforming") read site-plugin_de.properties using my platform default encoding (= UTF-8). So I called "mvn site" with MAVEN_OPTS="-Dfile.encoding=ISO-8859-1". => Result: Site content is correct, site navigator is correct, organization name has garbage. So, now the organziation name has garbage, although it comes from my pom.xlm which explicitly states <?xml version="1.0" encoding="UTF-8"?>. But the parser reading the organization name from there seems to ignore that and uses the platform encoding (= ISO-8859-1 in the step above) instead. The net result is that you currently have to sacrifice one of site [content | navigator | copyright]. But, hey, two out of three is not that bad Note that the previous comments for this bug already seem to explain (and probably fix) that behaviour. But perhaps this comment helps those struggling with site i18n until this is fixed.
Hide
Permalink
Naoki Nose added a comment - 13/Dec/05 8:52 AM

I've looked into the source codes for the cause of encoding problems.

Problem 1.
the encoding detection of the input files heavily rely on
default system encoding.
Problem 2.
In the site generation process, The Stirng to byte array conversions occur many times.
This leads to problems difficult to solve.

With problem 1, I have some idea about the solutions.

there are some types of input files, for example

  • property resource file
  • XML file
  • apt file

and there should be an method
of specifying encoding according to the file type .

With property resource file, I like to use native2ascii.
Certainly, that's not human readable, but rarely causes the encoding problems.
And the problem of readability can be avoided by automating
native2ascii processing. the build lifecycle phase
"process-resource" will be
good place to hold such a process.

With XML file , I think the encoding detection should
follow XML specification of w3c.
So, MXParser should be changed to support the auto
encoding detection.
http://www.w3.org/TR/REC-xml/#sec-guessing

With apt file , I think the encoding detection should follow
POM configuration. The configuration will be like following:

<configuration>
<inputEncoding>Shift_JIS</inputEncoding>
<outputEncoding>UTF-8</outputEncoding>
<locales>en,ja</locales>
</configuration>

With problem 2, I have no idea about the good solutions, yet.
the string to byte array conversion occur many times
in the process of getting the site descriptor. In that process,
the characters seems to be converted wrongly.

Show
Naoki Nose added a comment - 13/Dec/05 8:52 AM I've looked into the source codes for the cause of encoding problems. Problem 1. the encoding detection of the input files heavily rely on default system encoding. Problem 2. In the site generation process, The Stirng to byte array conversions occur many times. This leads to problems difficult to solve. With problem 1, I have some idea about the solutions. there are some types of input files, for example
  • property resource file
  • XML file
  • apt file
and there should be an method of specifying encoding according to the file type . With property resource file, I like to use native2ascii. Certainly, that's not human readable, but rarely causes the encoding problems. And the problem of readability can be avoided by automating native2ascii processing. the build lifecycle phase "process-resource" will be good place to hold such a process. With XML file , I think the encoding detection should follow XML specification of w3c. So, MXParser should be changed to support the auto encoding detection. http://www.w3.org/TR/REC-xml/#sec-guessing With apt file , I think the encoding detection should follow POM configuration. The configuration will be like following: <configuration> <inputEncoding>Shift_JIS</inputEncoding> <outputEncoding>UTF-8</outputEncoding> <locales>en,ja</locales> </configuration> With problem 2, I have no idea about the good solutions, yet. the string to byte array conversion occur many times in the process of getting the site descriptor. In that process, the characters seems to be converted wrongly.
Hide
Permalink
Brett Porter added a comment - 06/Jan/06 12:27 AM

the plexus-site-renderer patch is no longer required as it has moved to parsing using the modello generated model which accounts for encoding

Show
Brett Porter added a comment - 06/Jan/06 12:27 AM the plexus-site-renderer patch is no longer required as it has moved to parsing using the modello generated model which accounts for encoding
Hide
Permalink
Brett Porter added a comment - 06/Jan/06 1:02 AM

I have not applied the i18n patch. I like the idea of doing native2ascii in process-resources better.

Do you know if there will be any negative side effects of the change to XmlWriter? What was that attempting to address?

Is there anything else necessary to get this issue resolved other than the above patches and the native2ascii'ing?

Show
Brett Porter added a comment - 06/Jan/06 1:02 AM I have not applied the i18n patch. I like the idea of doing native2ascii in process-resources better. Do you know if there will be any negative side effects of the change to XmlWriter? What was that attempting to address? Is there anything else necessary to get this issue resolved other than the above patches and the native2ascii'ing?
Hide
Permalink
Vincent Siveton added a comment - 06/Jan/06 4:10 PM

Brett,

I tried to generate a dummy site in Japanese and in other available languages.
So, I used plexus-utils trunk version and I converted all bundles with native2ascii.
It works a treat for me with outputEncoding=UTF-8
Naoki, could you confirm too?
Moreover some translation in japanese are missing (eg in the dependencies page).

From my point of view, I don't see any negative side effects.

I think we could close this issue after native2ascii'ing all bundles (automating native2ascii with process-resources phase or not)

Show
Vincent Siveton added a comment - 06/Jan/06 4:10 PM Brett, I tried to generate a dummy site in Japanese and in other available languages. So, I used plexus-utils trunk version and I converted all bundles with native2ascii. It works a treat for me with outputEncoding=UTF-8 Naoki, could you confirm too? Moreover some translation in japanese are missing (eg in the dependencies page). From my point of view, I don't see any negative side effects. I think we could close this issue after native2ascii'ing all bundles (automating native2ascii with process-resources phase or not)
Hide
Permalink
Naoki Nose added a comment - 09/Jan/06 11:36 AM

I also tried to generate a dummy site including Japanese.
My enviroment is Debian/GNU Linux and the default encoding is EUC-JP.
I used trunk version of maven-site-plugin, doxia, modello and plexus and Japanese rendered collectly.
Thanks! Many Japanese developers will appreciate this improvement.

> Moreover some translation in japanese are missing (eg in the dependencies page).
Some property items have added to the original property file since I send the japanese translation first.
I will update japanese translation later.

>I think we could close this issue after native2ascii'ing all bundles (automating native2ascii with process-resources phase or not)

There are some disirable improvements about this problems.
1. XML parser in plexus-utils should handle encoding parameter in XML declaration collectly.
2. Dixia constucts reader with a default encoding. The encoding of the site documents should be declared explicitly.

May I create new issues about these ?

Show
Naoki Nose added a comment - 09/Jan/06 11:36 AM I also tried to generate a dummy site including Japanese. My enviroment is Debian/GNU Linux and the default encoding is EUC-JP. I used trunk version of maven-site-plugin, doxia, modello and plexus and Japanese rendered collectly. Thanks! Many Japanese developers will appreciate this improvement. > Moreover some translation in japanese are missing (eg in the dependencies page). Some property items have added to the original property file since I send the japanese translation first. I will update japanese translation later. >I think we could close this issue after native2ascii'ing all bundles (automating native2ascii with process-resources phase or not) There are some disirable improvements about this problems. 1. XML parser in plexus-utils should handle encoding parameter in XML declaration collectly. 2. Dixia constucts reader with a default encoding. The encoding of the site documents should be declared explicitly. May I create new issues about these ?
Hide
Permalink
Brett Porter added a comment - 10/Jan/06 11:15 PM

Naoki,
yes, please create new issues for your 2 points, and the updated japanese translation. Thanks!

Show
Brett Porter added a comment - 10/Jan/06 11:15 PM Naoki, yes, please create new issues for your 2 points, and the updated japanese translation. Thanks!
Hide
Permalink
Naoki Nose added a comment - 22/Jan/06 8:40 AM

maven-site-plugin japanese translation update.

Show
Naoki Nose added a comment - 22/Jan/06 8:40 AM maven-site-plugin japanese translation update.
Hide
Permalink
Naoki Nose added a comment - 22/Jan/06 8:42 AM

mave-project-info-reports-plugin japanese translation update.

Show
Naoki Nose added a comment - 22/Jan/06 8:42 AM mave-project-info-reports-plugin japanese translation update.
Hide
Permalink
Vincent Siveton added a comment - 01/Feb/06 7:52 PM

Applied in SVN. Thanks for the translation!

Show
Vincent Siveton added a comment - 01/Feb/06 7:52 PM Applied in SVN. Thanks for the translation!
Hide
Permalink
Vincent Siveton added a comment - 01/Feb/06 7:53 PM

Brett,

Any news about potential side effects? Could we close this issue?

Show
Vincent Siveton added a comment - 01/Feb/06 7:53 PM Brett, Any news about potential side effects? Could we close this issue?
Hide
Permalink
Brett Porter added a comment - 07/Feb/06 7:01 AM

We still need to setup the native2ascii'ing.

Show
Brett Porter added a comment - 07/Feb/06 7:01 AM We still need to setup the native2ascii'ing.
Hide
Permalink
Yue Ni added a comment - 11/Feb/06 3:14 AM

I translate the Chinese simplified version of the site and project-info-report resource bundles, and attach them here, could anyone help to commit them to the svn repository?

Show
Yue Ni added a comment - 11/Feb/06 3:14 AM I translate the Chinese simplified version of the site and project-info-report resource bundles, and attach them here, could anyone help to commit them to the svn repository?
Hide
Permalink
Yue Ni added a comment - 11/Feb/06 3:54 AM

Corrected a term in the translation.

Show
Yue Ni added a comment - 11/Feb/06 3:54 AM Corrected a term in the translation.
Hide
Permalink
Brett Porter added a comment - 15/Feb/06 3:14 AM

applied Chinese simplified translation - thanks. Please attach new translations to a new issue!

Show
Brett Porter added a comment - 15/Feb/06 3:14 AM applied Chinese simplified translation - thanks. Please attach new translations to a new issue!
Hide
Permalink
Jesse McConnell added a comment - 04/Apr/06 9:33 AM

should be use make another mojo that can be bound to process-resources and implements the native2ascii behavior we are looking for here?

I see that some people have already gotten this functionality working by using the ant native2ascii task...

if this is the case then we can probably make a mojo for this pretty quickly, and just make an issue over there for creating it, link it to this issue and then close this issue out if that is all that is remaining.

Show
Jesse McConnell added a comment - 04/Apr/06 9:33 AM should be use make another mojo that can be bound to process-resources and implements the native2ascii behavior we are looking for here? I see that some people have already gotten this functionality working by using the ant native2ascii task... if this is the case then we can probably make a mojo for this pretty quickly, and just make an issue over there for creating it, link it to this issue and then close this issue out if that is all that is remaining.
Hide
Permalink
Brett Porter added a comment - 06/Apr/06 7:11 AM

I think so

Show
Brett Porter added a comment - 06/Apr/06 7:11 AM I think so
Hide
Permalink
Carlos Sanchez added a comment - 03/May/06 11:29 AM

What is the status of all this patches?

It'd be better to use the external XmlPullParser from the standard jsr173 api than patching the one from plexus

Show
Carlos Sanchez added a comment - 03/May/06 11:29 AM What is the status of all this patches? It'd be better to use the external XmlPullParser from the standard jsr173 api than patching the one from plexus
Hide
Permalink
Marian Flor added a comment - 21/Aug/06 4:55 AM

This is a follow-up to Michael Schnake's explanation:

Here's how I got the most german umlauts (may be in general applied to other languages):
Running Maven 2.0.4 (bin), with maven-project-info-reports-plugin-2.0.1.jar and maven-site-plugin-2.0-beta-5.jar.

  • Default Eclipse-IDE/OS Encoding is UTF-8.
  • Edited the _de.properties according to Michaels instructions (this is the ugly part since it will break the umlauts whenever a new version of the plugin is propagated) and repacked (ISO-8859-1 encoded Prop-Files) the jars in the local repository.
  • Plugin-Configuration (pom.xml):
    ...
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-site-plugin</artifactId>
    <configuration>
    <locales>de</locales>
    <outputEncoding>UTF-8</outputEncoding>
    </configuration>
    ...
  • set MAVEN_OPTS=-Dfile.encoding=UTF-8
  • pom.xml, site.xml are UTF-8 encoded.
  • *.apt-Files are ISO-8859-1 encoded.
  • faq.fml (UTF-8 or ISO-8859-1 does not matter)

Results:
Navigation (site.xml): ok, Copyright and team-list (pom.xml): ok, Content (.apt): ok, FAQ (.fml): broken umlauts. Thus I get 3 from 4 .

The FAQ Umlauts are broken with either encoding. My workaround is to use ASCII in *.fml-Documents. :-/
The "FAQ-Umlauts" does not bother me that much, but if someone has an explanation/fix for this it will be greatly appreciated.

regards,
Marian

Show
Marian Flor added a comment - 21/Aug/06 4:55 AM This is a follow-up to Michael Schnake's explanation: Here's how I got the most german umlauts (may be in general applied to other languages): Running Maven 2.0.4 (bin), with maven-project-info-reports-plugin-2.0.1.jar and maven-site-plugin-2.0-beta-5.jar.
  • Default Eclipse-IDE/OS Encoding is UTF-8.
  • Edited the _de.properties according to Michaels instructions (this is the ugly part since it will break the umlauts whenever a new version of the plugin is propagated) and repacked (ISO-8859-1 encoded Prop-Files) the jars in the local repository.
  • Plugin-Configuration (pom.xml): ... <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-site-plugin</artifactId> <configuration> <locales>de</locales> <outputEncoding>UTF-8</outputEncoding> </configuration> ...
  • set MAVEN_OPTS=-Dfile.encoding=UTF-8
  • pom.xml, site.xml are UTF-8 encoded.
  • *.apt-Files are ISO-8859-1 encoded.
  • faq.fml (UTF-8 or ISO-8859-1 does not matter)
Results: Navigation (site.xml): ok, Copyright and team-list (pom.xml): ok, Content (.apt): ok, FAQ (.fml): broken umlauts. Thus I get 3 from 4 . The FAQ Umlauts are broken with either encoding. My workaround is to use ASCII in *.fml-Documents. :-/ The "FAQ-Umlauts" does not bother me that much, but if someone has an explanation/fix for this it will be greatly appreciated. regards, Marian
Hide
Permalink
Darius added a comment - 19/Sep/06 4:57 AM

When using site:run DoxiaFilter does not set any output encoding.
I think it should call servletResponse.setCharacterEncoding() or servletResponse.setContentType()
with output encoding that is specified in pom.xml before caling servletResponse.getWriter().

Maven 2.0.4 (bin) with maven-site-plugin-2.0-beta-5.jar.
My apt files are utf-8 and I get correct html files, but in preview mode (site:run) I see "?"s.

Darius

Show
Darius added a comment - 19/Sep/06 4:57 AM When using site:run DoxiaFilter does not set any output encoding. I think it should call servletResponse.setCharacterEncoding() or servletResponse.setContentType() with output encoding that is specified in pom.xml before caling servletResponse.getWriter(). Maven 2.0.4 (bin) with maven-site-plugin-2.0-beta-5.jar. My apt files are utf-8 and I get correct html files, but in preview mode (site:run) I see "?"s. Darius
Hide
Permalink
Herve Boutemy added a comment - 01/Dec/07 9:12 AM

XML encoding detection support is fixed in 2.0-beta-6 for site.xml, xdoc files, *.fml, and so on
all problems from this issue should be fixed now
if there is still some area needing rework, please open another issue focused on it

Show
Herve Boutemy added a comment - 01/Dec/07 9:12 AM XML encoding detection support is fixed in 2.0-beta-6 for site.xml, xdoc files, *.fml, and so on all problems from this issue should be fixed now if there is still some area needing rework, please open another issue focused on it
Hide
Permalink
Herve Boutemy added a comment - 01/Dec/07 9:15 AM

see http://docs.codehaus.org/display/MAVENUSER/XML+encoding for information on XML encoding detection support in Maven and plugins

Show
Herve Boutemy added a comment - 01/Dec/07 9:15 AM see http://docs.codehaus.org/display/MAVENUSER/XML+encoding for information on XML encoding detection support in Maven and plugins

People

  • Assignee:
    Herve Boutemy
    Reporter:
    Naoki Nose
Vote (27)
Watch (16)

Dates

  • Created:
    03/Nov/05 7:58 AM
    Updated:
    01/Dec/07 9:15 AM
    Resolved:
    01/Dec/07 9:12 AM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.