Details
-
Type:
Bug
-
Status:
Open
-
Priority:
Critical
-
Resolution: Unresolved
-
Affects Version/s: 4.0.8
-
Fix Version/s: None
-
Labels:None
-
Testcase included:yes
-
Number of attachments :
Description
The javadoc for javax.xml.stream.Location.getCharacterOffset() states "If the input source is a file or a byte stream then this is the byte offset into that stream". However, when given an input stream as a source, the character offset when encountering a unicode control character is wrong.
This is important to us as the XML may have embedded binaries, and we need to detect the start and end byte offsets of those binaries so we can directly seek to those locations.
I have attached a unit test that highlights the error. Although this is a ByteArrayInputStream, the same bug occurs for FileInputStream too. (I note that getCharacterOffset() behaves differently for character media, but this is purely byte streams that we are using.)