jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • Woodstox
  • WSTX-211

Incorrect parse error with really long CDATA section

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Resolved Resolved
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 4.0.5
  • Fix Version/s: 4.0.6
  • Labels:
    None

Description

This error occurs when reading the attached XML file:

Exception at reader location: [row,col,system-id]: [4,1,"message.xml"]
org.codehaus.stax2.typed.TypedXMLStreamException: ParseError at [row,col]:[4,1]
Message: Element content can not contain child START_ELEMENT when using Typed Access methods
at com.ctc.wstx.sr.BasicStreamReader._constructTypeException(BasicStreamReader.java:5475)
at com.ctc.wstx.sr.BasicStreamReader._constructUnexpectedInTyped(BasicStreamReader.java:5468)
at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:705)
at Reader.main(Reader.java:45)

It appears that there is a limit of 500 characters in a CDATA section (including LF's), after which it appears Woodstox behaves as if the CDATA section had been closed. An embedded tag then produces the above error. Changing the contents of message.xml (as noted in the file itself) I can also produce this error:

Exception at reader location: [row,col,system-id]: [2,7,"message.xml"]
com.ctc.wstx.exc.WstxParsingException: String ']]>' not allowed in textual content, except as the end marker of CDATA section
at [row,col,system-id]: [7,48,"message.xml"]
at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:580)
at com.ctc.wstx.sr.StreamScanner.throwWfcException(StreamScanner.java:429)
at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4682)
at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4140)
at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:672)
at Reader.main(Reader.java:45)

  • Options
    • Sort By Name
    • Sort By Date
    • Ascending
    • Descending
    • Download All

Attachments

  1. XML File
    message.xml
    23/Jun/09 2:40 PM
    0.9 kB
    Kevin Braun
  2. Java Source File
    Reader.java
    23/Jun/09 2:40 PM
    2 kB
    Kevin Braun

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Kevin Braun added a comment - 23/Jun/09 2:46 PM

FYI: similar behavior occurs under version 3.9.2

Show
Kevin Braun added a comment - 23/Jun/09 2:46 PM FYI: similar behavior occurs under version 3.9.2
Hide
Permalink
Tatu Saloranta added a comment - 23/Jun/09 3:11 PM

Ok, I really hope to nail down this thing. There have been previous reports, but no reproducible test case. A few unit tests have been added to check out buffer boundary conditions, but none failed. :-/
So let's hope we finally have the photo of Nessie here!

500 char limit sounds like the initial text buffer; but there must be some specific interaction with something like linefeed handling or such to trigger it, because there are a few test cases that check for significantly longer chunks of CDATA (with or without coalescing).

Show
Tatu Saloranta added a comment - 23/Jun/09 3:11 PM Ok, I really hope to nail down this thing. There have been previous reports, but no reproducible test case. A few unit tests have been added to check out buffer boundary conditions, but none failed. :-/ So let's hope we finally have the photo of Nessie here! 500 char limit sounds like the initial text buffer; but there must be some specific interaction with something like linefeed handling or such to trigger it, because there are a few test cases that check for significantly longer chunks of CDATA (with or without coalescing).
Hide
Permalink
Franck Valentin added a comment - 02/Jul/09 8:32 AM

Hi,

I've bumped into this problem. I don't know if it's of any help but this only occured on my data when the StAX property 'isCoalescing' is set to 'false'.

Show
Franck Valentin added a comment - 02/Jul/09 8:32 AM Hi, I've bumped into this problem. I don't know if it's of any help but this only occured on my data when the StAX property 'isCoalescing' is set to 'false'.
Hide
Permalink
Tatu Saloranta added a comment - 02/Jul/09 1:34 PM

Yes that actually does help. I haven't had chance to look more into this, but then I do I'll focus on handling in non-coalescing mode (there are a few differences at low level code).

Show
Tatu Saloranta added a comment - 02/Jul/09 1:34 PM Yes that actually does help. I haven't had chance to look more into this, but then I do I'll focus on handling in non-coalescing mode (there are a few differences at low level code).
Hide
Permalink
Kevin Braun added a comment - 02/Jul/09 2:21 PM

Indeed, if you turn coalescing on in the test program I gave you, it no longer gets an exception. I just tried it, FYI.

Show
Kevin Braun added a comment - 02/Jul/09 2:21 PM Indeed, if you turn coalescing on in the test program I gave you, it no longer gets an exception. I just tried it, FYI.
Hide
Permalink
Tatu Saloranta added a comment - 27/Jul/09 2:11 PM

Quick comment: I haven't had time to look into this, but will have once I get Jackson 1.2 out. In worst case, that'll be towards late August (finally having my summer vacation). But this is the first thing I'll look into with woodstox, since it obviously is a major problem (and as a user I may bump into it myself). Just thought I'll mention timings, just so it's clear this is not a forgotten issue.

Show
Tatu Saloranta added a comment - 27/Jul/09 2:11 PM Quick comment: I haven't had time to look into this, but will have once I get Jackson 1.2 out. In worst case, that'll be towards late August (finally having my summer vacation). But this is the first thing I'll look into with woodstox, since it obviously is a major problem (and as a user I may bump into it myself). Just thought I'll mention timings, just so it's clear this is not a forgotten issue.
Hide
Permalink
Tatu Saloranta added a comment - 04/Sep/09 7:20 PM

Ah-ha! Problem here only occurs with two-character (== Windows) linefeeds, within CDATA section; and given description, almost certainly only when that linefeed crosses input buffer boundary.

Good news is two-part:

(a) I managed to add a unit test that reproduces the problem (stax2.stream.TestXMLStreamReader2), and
(b) Test actually passes against trunk (== 5.0)

but I'll still need to fix that for 4.0 and 4.1.

Show
Tatu Saloranta added a comment - 04/Sep/09 7:20 PM Ah-ha! Problem here only occurs with two-character (== Windows) linefeeds, within CDATA section; and given description, almost certainly only when that linefeed crosses input buffer boundary. Good news is two-part: (a) I managed to add a unit test that reproduces the problem (stax2.stream.TestXMLStreamReader2), and (b) Test actually passes against trunk (== 5.0) but I'll still need to fix that for 4.0 and 4.1.
Hide
Permalink
Tatu Saloranta added a comment - 05/Sep/09 1:53 AM

Holy mackarel... for once, this bug might have been findable using FindBugs or even Eclipse's warnings (unused parameter).
Basically, length checks were using class member variable, instead of passed in value (which needs to override member...
Result was that CDATA section was only partially handled, if (and only if) enough of it had been parsed. But parser thought it had gotten it all (i.e. segment marked as completed).

Show
Tatu Saloranta added a comment - 05/Sep/09 1:53 AM Holy mackarel... for once, this bug might have been findable using FindBugs or even Eclipse's warnings (unused parameter). Basically, length checks were using class member variable, instead of passed in value (which needs to override member... Result was that CDATA section was only partially handled, if (and only if) enough of it had been parsed. But parser thought it had gotten it all (i.e. segment marked as completed).
Hide
Permalink
Tatu Saloranta added a comment - 05/Sep/09 2:01 AM

Fixed; trunk (5.0) was not affected, 3.2.9, 4.0.5, 4.1 were.

Show
Tatu Saloranta added a comment - 05/Sep/09 2:01 AM Fixed; trunk (5.0) was not affected, 3.2.9, 4.0.5, 4.1 were.

People

  • Assignee:
    Tatu Saloranta
    Reporter:
    Kevin Braun
Vote (0)
Watch (1)

Dates

  • Created:
    23/Jun/09 2:40 PM
    Updated:
    05/Sep/09 2:01 AM
    Resolved:
    05/Sep/09 2:01 AM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.