History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: XSTR-473
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Assignee: Joerg Schaible
Reporter: Roman Wolf
Votes: 0
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
XStream

CLONE -String "\0" serialized as inavlid XML

Created: 05/Feb/08 07:35 AM   Updated: 25/Feb/08 05:02 PM
Component/s: None
Affects Version/s: None
Fix Version/s: 1.3


 Description  « Hide
Hello,

according to XML 1.0 spec, valid XML charactes are

#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

and any other characters are not allowed to appear in XML document, even as character references.

I tried to serialize String "\0", and got illegal character NUL in XML file when used default serializer and got invalid character reference &#x0 when used StaxWriter.

 All   Comments   Change History      Sort Order: Ascending order - Click to sort in descending order
Roman Wolf - 05/Feb/08 07:39 AM
I have the same issue with a serialized class.
I've tried all Driver classes provided by the XStream Package, but get in any case the error: Character reference "&#x0" is an invalid XML character
when I try to parse the serialized XML as a DOM4J document.

There are several characters serialized which are non-printable and also not allowed in XML.
Why does XStream creates them instead of excaping them or providing an entitiy declaration?

Any hint how to care with such objects? From my point of view it should be part of XStream to create a wellformed XML structure. But maybe its a general of binary data and XML and how to bring them together.


Thanks,
Roman.

Joerg Schaible - 12/Feb/08 04:26 PM
I've always misinterpreted the spec, since I thought it is possible to write those as entities. However, they may not appear at all in XML, which makes it really difficult. Escaping does not work, since the XML parser would have to "unescape" those values also. Funny enough, even JDK XML serialization writes 0 as an entity value and fails when reading the generated XML. Maybe we have to introduce some quirks mode for XStream's PrettyPrintWriter. As long as it is not operating in XML 1.0 or 1.1. mode, it should write those entities - and if it is simply for backward compatibility. XStream's default parser (Xpp3) will happily ignore the spec and turn those entities back into real values ...

Joerg Schaible - 20/Feb/08 06:12 PM
I've committed a version of the PrettyPrintWriter now, that works in different modes:
- QUIRKS: Current behaviour, will write any non-printable character as character entity (default)
- XML_1_0: Throws StreamException if a charater should be written not allowed according the 1.0 XML spec
- XML_1_1: Throws StreamException if a charater should be written not allowed according the 1.1 XML spec
QUIRKS is default for compatibility and because the Xpp3 parser does ignore the spec here and happily read any kind of character entity. You may give the head revision a try.
-Jörg

Joerg Schaible - 25/Feb/08 05:02 PM
Closing issues before next release.