Woodstox
  1. Woodstox
  2. WSTX-211

Incorrect parse error with really long CDATA section

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0.5
    • Fix Version/s: 4.0.6
    • Labels:
      None
    • Number of attachments :
      2

      Description

      This error occurs when reading the attached XML file:

      Exception at reader location: [row,col,system-id]: [4,1,"message.xml"]
      org.codehaus.stax2.typed.TypedXMLStreamException: ParseError at [row,col]:[4,1]
      Message: Element content can not contain child START_ELEMENT when using Typed Access methods
      at com.ctc.wstx.sr.BasicStreamReader._constructTypeException(BasicStreamReader.java:5475)
      at com.ctc.wstx.sr.BasicStreamReader._constructUnexpectedInTyped(BasicStreamReader.java:5468)
      at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:705)
      at Reader.main(Reader.java:45)

      It appears that there is a limit of 500 characters in a CDATA section (including LF's), after which it appears Woodstox behaves as if the CDATA section had been closed. An embedded tag then produces the above error. Changing the contents of message.xml (as noted in the file itself) I can also produce this error:

      Exception at reader location: [row,col,system-id]: [2,7,"message.xml"]
      com.ctc.wstx.exc.WstxParsingException: String ']]>' not allowed in textual content, except as the end marker of CDATA section
      at [row,col,system-id]: [7,48,"message.xml"]
      at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:580)
      at com.ctc.wstx.sr.StreamScanner.throwWfcException(StreamScanner.java:429)
      at com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4682)
      at com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4140)
      at com.ctc.wstx.sr.BasicStreamReader.getElementText(BasicStreamReader.java:672)
      at Reader.main(Reader.java:45)

      1. message.xml
        0.9 kB
        Kevin Braun
      2. Reader.java
        2 kB
        Kevin Braun

        Activity

        Hide
        Kevin Braun added a comment -

        FYI: similar behavior occurs under version 3.9.2

        Show
        Kevin Braun added a comment - FYI: similar behavior occurs under version 3.9.2
        Hide
        Tatu Saloranta added a comment -

        Ok, I really hope to nail down this thing. There have been previous reports, but no reproducible test case. A few unit tests have been added to check out buffer boundary conditions, but none failed. :-/
        So let's hope we finally have the photo of Nessie here!

        500 char limit sounds like the initial text buffer; but there must be some specific interaction with something like linefeed handling or such to trigger it, because there are a few test cases that check for significantly longer chunks of CDATA (with or without coalescing).

        Show
        Tatu Saloranta added a comment - Ok, I really hope to nail down this thing. There have been previous reports, but no reproducible test case. A few unit tests have been added to check out buffer boundary conditions, but none failed. :-/ So let's hope we finally have the photo of Nessie here! 500 char limit sounds like the initial text buffer; but there must be some specific interaction with something like linefeed handling or such to trigger it, because there are a few test cases that check for significantly longer chunks of CDATA (with or without coalescing).
        Hide
        Franck Valentin added a comment -

        Hi,

        I've bumped into this problem. I don't know if it's of any help but this only occured on my data when the StAX property 'isCoalescing' is set to 'false'.

        Show
        Franck Valentin added a comment - Hi, I've bumped into this problem. I don't know if it's of any help but this only occured on my data when the StAX property 'isCoalescing' is set to 'false'.
        Hide
        Tatu Saloranta added a comment -

        Yes that actually does help. I haven't had chance to look more into this, but then I do I'll focus on handling in non-coalescing mode (there are a few differences at low level code).

        Show
        Tatu Saloranta added a comment - Yes that actually does help. I haven't had chance to look more into this, but then I do I'll focus on handling in non-coalescing mode (there are a few differences at low level code).
        Hide
        Kevin Braun added a comment -

        Indeed, if you turn coalescing on in the test program I gave you, it no longer gets an exception. I just tried it, FYI.

        Show
        Kevin Braun added a comment - Indeed, if you turn coalescing on in the test program I gave you, it no longer gets an exception. I just tried it, FYI.
        Hide
        Tatu Saloranta added a comment -

        Quick comment: I haven't had time to look into this, but will have once I get Jackson 1.2 out. In worst case, that'll be towards late August (finally having my summer vacation). But this is the first thing I'll look into with woodstox, since it obviously is a major problem (and as a user I may bump into it myself). Just thought I'll mention timings, just so it's clear this is not a forgotten issue.

        Show
        Tatu Saloranta added a comment - Quick comment: I haven't had time to look into this, but will have once I get Jackson 1.2 out. In worst case, that'll be towards late August (finally having my summer vacation). But this is the first thing I'll look into with woodstox, since it obviously is a major problem (and as a user I may bump into it myself). Just thought I'll mention timings, just so it's clear this is not a forgotten issue.
        Hide
        Tatu Saloranta added a comment -

        Ah-ha! Problem here only occurs with two-character (== Windows) linefeeds, within CDATA section; and given description, almost certainly only when that linefeed crosses input buffer boundary.

        Good news is two-part:

        (a) I managed to add a unit test that reproduces the problem (stax2.stream.TestXMLStreamReader2), and
        (b) Test actually passes against trunk (== 5.0)

        but I'll still need to fix that for 4.0 and 4.1.

        Show
        Tatu Saloranta added a comment - Ah-ha! Problem here only occurs with two-character (== Windows) linefeeds, within CDATA section; and given description, almost certainly only when that linefeed crosses input buffer boundary. Good news is two-part: (a) I managed to add a unit test that reproduces the problem (stax2.stream.TestXMLStreamReader2), and (b) Test actually passes against trunk (== 5.0) but I'll still need to fix that for 4.0 and 4.1.
        Hide
        Tatu Saloranta added a comment -

        Holy mackarel... for once, this bug might have been findable using FindBugs or even Eclipse's warnings (unused parameter).
        Basically, length checks were using class member variable, instead of passed in value (which needs to override member...
        Result was that CDATA section was only partially handled, if (and only if) enough of it had been parsed. But parser thought it had gotten it all (i.e. segment marked as completed).

        Show
        Tatu Saloranta added a comment - Holy mackarel... for once, this bug might have been findable using FindBugs or even Eclipse's warnings (unused parameter). Basically, length checks were using class member variable, instead of passed in value (which needs to override member... Result was that CDATA section was only partially handled, if (and only if) enough of it had been parsed. But parser thought it had gotten it all (i.e. segment marked as completed).
        Hide
        Tatu Saloranta added a comment -

        Fixed; trunk (5.0) was not affected, 3.2.9, 4.0.5, 4.1 were.

        Show
        Tatu Saloranta added a comment - Fixed; trunk (5.0) was not affected, 3.2.9, 4.0.5, 4.1 were.

          People

          • Assignee:
            Tatu Saloranta
            Reporter:
            Kevin Braun
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: