Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Labels:None
-
Environment:windows XP, Java JDK5.0, StAX 1.1.1
-
Number of attachments :
Description
I have an XML document sent to me from .NET with the first 3 bytes as EF BB BF. Apparently, this is okay XML, but StAX bombs on it. Those 3 bytes mean UTF-8. See
http://www.unicode.org/faq/utf_bom.html#22
Although it would be nice to support all of those special encodings, I'm only interested in UTF-8 at the moment, so the ability to ignore the first 3 bytes if they are EF BB BF would be great.
Here's the stack trace:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,7]
Message: processing instruction can not have PITarget with reserveld xml name
at com.bea.xml.stream.MXParser.parsePI(MXParser.java:2742)
at com.bea.xml.stream.MXParser.parseProlog(MXParser.java:1783)
at com.bea.xml.stream.MXParser.nextImpl(MXParser.java:1725)
at com.bea.xml.stream.MXParser.next(MXParser.java:1180)
at com.bea.xml.stream.XMLEventReaderBase.<init>(XMLEventReaderBase.java:
72)
at com.bea.xml.stream.XMLEventReaderBase.<init>(XMLEventReaderBase.java:
48)
at com.bea.xml.stream.MXParserFactory.createXMLEventReader(MXParserFacto
ry.java:104)
at com.bea.xml.stream.MXParserFactory.createXMLEventReader(MXParserFacto
ry.java:125)
Here's a hex output of the first few bytes:
00000000: efbb bf3c 3f78 6d6c 2076 6572 7369 6f6e ...<?xml version
00000010: 3d22 312e 3022 2065 6e63 6f64 696e 673d ="1.0" encoding=
00000020: 2275 7466 2d38 223f 3e3c 212d 2d61 7574 "utf-8"?><!--aut
...
Thanks.
Activity
| Field | Original Value | New Value |
|---|---|---|
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Assignee | Tatu Saloranta [ cowtowncoder ] | |
| Resolution | Fixed [ 1 ] |