Details
-
Type:
Bug
-
Status:
Closed
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: None
-
Fix Version/s: None
-
Labels:None
-
Environment:windows, tomcat 6
-
Number of attachments :
Description
Hello,
First of all, thank you all, for the great job and for the quick resolving of the issue regarding the maven problem.
I am currently encountering a problem when validating a file countaining empty markup.
I have set up the validation similar to there (but for XSD not DTD) : http://docs.codehaus.org/download/attachments/59446/ValidateWithDtd.java?version=1. See my code example in the end.
My logger attached (factory.setXMLReporter) told me "XSD Valition : Unknown reason on line:x col: y" on every line countaining empty markup, including empty markup containing empty CDATA.
ex: <markup />, or <markup></markup>, or even <markup><![CDATA[]]></markup>
I'm working on the woodstox-4.0.1
I retrace the probleme in the source, to this point :
BasicStreamReader.java:2919 :
int vld = mElementStack.resolveAndValidateElement();
if (empty)
Then, call validateEndElement()
GenericMsvValidator.java:362 :
Acceptor acc = (Acceptor)mAcceptors.remove(mAcceptors.size()-1);
if (acc != null) { // may be null during error recovery? or not?
if (!acc.isAcceptState(mErrorRef) || mErrorRef.str != null)
}
This is certainly not intended to behave like that ?
The rest of the validation is working perfectly fine (with error when it should be, and none it shouldn't).
Could I comment safely the "vld = mElementStack.validateEndElement();" line in order to avoid thoses errors ?
My code looks exemple :
XMLValidationSchemaFactory sf = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA);
InputStream xsdStream = XMLCatalogSerializer.class.getResourceAsStream("file.xsd");
XMLValidationSchema xsd = null;
try {
xsd = sf.createSchema(xsdStream, "UTF-8");
} catch (XMLStreamException xe)
XMLStreamReader2 reader = factory.createXMLStreamReader(xmlFile);
reader.validateAgainst(xsd);
while (reader.hasNext()) {
reader.next();
}
Thank you,
Best,
Yves Ménard
I need to read through your investigation more closely to understand what is going on (it has been a year since I last looked at that part of code), but you are probably right in that code is not working the way it should. I thought I had tested empty elements properly (with a unit test), but perhaps this is not the case. A unit test to demonstrate the prob,em would be useful.
Btw, one minor comment on empty elements – I think that empty CDATA is NOT allowed within empty content specification, as per xml specs; at least it is not for DTDs (nor are processing instructions or comments). This assuming that xml schema has not changed this definition. But other two forms should of course be valid.
As to call to validateEndElement: that is not completely safe, since it updates state of legal textual content (any text fine, specific textual content, none), and omitting the call would make these values go out of sync. This may work for some cases, but in general could cause problems in other places. It is possible that in your case it would – so for now maybe that is the work-around.
Finally, thank you for trouble-shooting this: schema validation part has not been extensively used so far, I think, and thus there are more potential problems to uncover. But MSV (the validator being used) should not have too many bugs, so we should be able to straighten out these problems.