see DOXIA-189 and the discussion that ensued on the dev list: http://tinyurl.com/2f48m8.
I agree with you that AbstractXmlSink should not do any pretty printing by itself because XML is just too low-level and pretty printing requires knowledge about the exact semantics of the output format. For this reason, AbstractXmlSink.writeStartTag() should not emit an eol for "simple" tags because an inline element can in general not know whether its surrounding block element supports ignorable whitespace such that the current heuristic may cause errors sooner or later.
Being a user, I consider human-friendly sink output a nice to have feature and hence have no arguments against abstract sinks that provide means for their subclasses to add ignorable whitespace to the output. However, as a matter of design, I suggest to decouple pretty printing and actual content output from one other in the sink methods. I.e. instead of offering both writeEndTag() and writeEndTagWithoutEOL() discard the former one and rename the later one to writeEndTag() such that this method does really what it name suggests to readers, write the end tag but no more. Writing the EOL (and maybe some indent?) would then be provided by a dedicated method, say writeEOL(). Subclasses that want to pretty print would invoke writeEndTag() and writeEOL() explicitly, again leading to better understandable/maintainable code IMHO.
As for the parsing site: In concern of robustness, I believe it is a desirable feature of AbstractXmlParser and all its subclasses if their output does not differ regardless whether the underlying XML parser is equipped with a DTD/XSD when parsing or not (i.e. whether ignorable whitespace may be reported). This could be realized by
- have all doxia XML parsers assume their underlying XML parser cannot detect ignorable whitespace but reports it as PCDATA
- add a boolean flag to AbstractXmlParser that subclasses like XdocParser can use to control whether the currently parsed element has mixed content or not
- move handleText() from AbstractXhtmlParser up to AbstractXmlParser and have it appropriately trim the text if the ignorableWhitespace flag is set
This patch breaks the identity tests in xdoc and xhtml modules as there are many more text events emitted by the parser now. Need to clarify if this should be fixed in the tests or in the corresponding sinks. Generally, we should add a XhtmlBaseParser/SinkTest.