Details
-
Type:
Bug
-
Status:
Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Number of attachments :
Description
It is common to introduce xml entities for non-english characters in order to be able to use these in xml files across parses and character sets. This causes the plexus xml pull-parser to fail, a bug which makes many xml files unparsable.
Consider the following fragment from our maven project.xml file
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE project [
<!ENTITY OSlash "ø">
<!ENTITY CapitalOSlash "Ø">
]>
<project>
<pomVersion>3</pomVersion>
.....
<developers>
<!-- 'timezone' used as phone number field -->
<developer>
<name>Marit Finne J&OSlash;rgensen</name>
<id>mfj</id>
<email>marit ... </email>
</project>
This works with maven 1.0. In order to use the project xml file with maven 1.1, the &OSlash; symbols must be replaced with their real letter 'ø', otherwise the plexus pullparser throws exceptionon the entity.
While not supporting 'ø' actually could be considered a bug in all the other xml parsers, the Entity workaround is legal XML and should be parsable with all parsers.
Why bother with entity declarations if you can insert the character literally? The XML declaration allows you to select your favorite encoding, e.g. use UTF-8 and happily write quite any character around the world. All JVMs must support UTF-8 and all proper text editors support it.
If you really want to write ASCII-only, you could still use numeric entities like "&#xuuuu;" which XML parsers understand out-of-the-box.