Details
Description
Some undefined time ago, it was possible to parse the following source
with qdox and retrieve a sensible value for the "foo" attribute of the
"bar.baz" tag -> "this is multilined"
/**
- @bar.baz foo="this is
- multilined"
*/
with the latest snapshot, this unfortunately doesn't work anymore.
I haven't found an open related jira issue, but before creating one, I
wanted to make sure this wasn't on purpose ?
I think allowing this makes sense, for instance for xdoclet because
some attributes might have longish values, like the "description"
elements for servlets and such.
-
- MultineLineAttributeValuesWithQDoxTestCase.java
- 07/Feb/05 5:08 PM
- 1 kB
- Grégory Joseph (old account)
-
- qdox82-test.patch
- 07/Feb/05 5:31 PM
- 2 kB
- Grégory Joseph (old account)
Activity
I don't think this is a regression. Up until fairly recently, QDox collapsed all whitespace inside comments, effectively discarding newlines. It has always been thus, until about a month ago, when Joe implemented feature-request (QDOX-78) ... now, newlines are retained inside comments, and tag-values. I just checked, and this still appears to be working in CVS HEAD.
Gregory, perhaps you switched from a qdox-1.6-SHAPSHOT.jar to a previous release? Or to a previous snapshot?
I couldn't reproduce the problem in CVS HEAD. I've added a unit-test (testJiraQdox82), to demonstrate newlines in tag values.
Just spent most of the day tracking this down and getting things organized. Here's the regression:
- Check out xdoclet-plugins project
- Revert xdoclet-plugins/testapp-web/src/main/java/org/xdoclet/testapp/web/TimerFilter.java to version 1.1
- Build xdoclet-plugin-web project
You'll get an error in the org.xdoclet.plugin.web.WebPluginTestCase test there. That's what we are looking to fix.
Thanks for looking at this.
-b
- Check out xdoclet-plugins project
- Revert xdoclet-plugins/testapp-web/src/main/java/org/xdoclet/testapp/web/TimerFilter.java to version 1.1
- Build xdoclet-plugin-web project
Mike, the output i'm expecting from this is indeed with white lines collapsed, which is fine. But the testcase I had on xdoclet-plugins that Brian mentionned now gets only the first line. (Well, until I aligned my test source to use non-multilined). I'll try from cvs head again.
Brian, to make our lives easier, could you attach a standalone testcase to demonstrate this please.
Thought to all... given the example above...
/**
- @bar.baz foo="this is
- multilined"
*/
... what would you expect the value that qdox parses?
"this is multilined"
or
"this is\n multilined"
or
"this is\n multilined"
The first one is more convenient, but does it lose potentially useful data (as was the case with QDOX-78)?
- @bar.baz foo="this is
- multilined" */
In answer to Joe's question, I'd expect:
"this is\nmultilined"
with the leading space trimmed.
More generally, I think QDox should retain ALL whitespace within comments, as in some cases it may be significant ... e.g. <pre> sections. Trouble is, this would be an incompatible change.
"this is multilined" would be good enough as far as i'm concerning, but got "this is" when i opened this issue. As I said earlier, I'll try again ![]()
Here's a sample testcase that doesn't pass with qdox-1.6-snapshot built out of cvs on my mac.
I'll let you guys make the path to the test source portable ![]()
or maybe as a patch if you prefer.
(same remark wrt the path to the source)
Ah, okay, looks like the bug is in named-parameter parsing (in TagParser).
This appears to be due to the use of java.io.StreamTokenizer for parsing of named parameters. StreamTokenizer does not appear to support newlines within quoted strings unless they are backslash-escaped. Which is to say, multi-line named parameter values are currently supported if you escape the newline:
/**
- @bar.baz foo="this is\
- multilined"
*/
I made the change to StreamTokenizer was made about 9 months ago, to improve named-parameter parsing (QDOX-45, QDOX-50), and it was released in QDox-1.5.
I'm not sure if the correct resolution for this issue. Having to escape the newline could be an irritation. On the other hand, there's a workaround, and the previous behaviour (for a multi-lined named parameter) was undefined.
Compatibility with XJavadoc is also an issue, especially since named-parameter support was added by and primarily for the XDoclet crowd. At this point, I'd prefer to hand this over to someone with an interest in XDoclet. Aslak, perhaps the simplest solution would be to contribute the relevant code from XJavadoc?
Since the behaviour has not changed since QDox-1.5, perhaps we shouldn't allow this to hold up the QDox-1.6 release.
- @bar.baz foo="this is\
- multilined" */
Trying to resolve old issues.... Gregory, was this problem ever solved for you?
thanks
-Joe
I was going to run the tests (see attachments) against the latest sources to check this but err,
http://qdox.codehaus.org/cvs-usage.html
(I'm guessing i'll find qdox on codehaus's svn though
)
I was going to run the tests (see attachments) against the latest sources to check this but err, just for the record, the site is outdate : http://qdox.codehaus.org/cvs-usage.html
So I got the trunk from svn and my test sill doesn't pass.
<pre/> tags are mentioned above. Does this bug also include <pre/> tag support which should include all whitespace (not just EOL characters) or should I create a new bug?
It looks like Bug #/QDOX-78 covers EOL characters, but not normal whitespaces used for indention?
A short study on that point shown that the StreamTokenizer used in TagParser does not work as expected.
It handles newline as token separator, according to Java <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4501629">Bug 4501629</a> it won't be fixed.
Suggestion, the TagParser should be reimplemented.
Dimitri, are you interested in try some regex or extended/alternate StreamTokenizer hacking to overcome this ?
it might be OK to substitute newlines for some other char, might it not ?
- Paul
- Paul
Hi,
I had the same issue parsing something else with StreamTokenizer and I found your issue when I was searching for solutions. I could not find one so I cooked up my own and thought you might be interested in applying it to your problem as well. Basically I needed to parse strings contained in double quotes, StreamTokenizer does this for you but fails if there is a newline in the string. So instead of letting StreamTokenizer do the string parsing, I tell it that double quote is not special and when I get to a one, I reconfigure the tokenizer into my own "string mode" where the only special chars are double quote and backslash. When I get to the end of the string I switch back to my normal tokenizer config (for a format call fvar). Here are the methods:
private void setUpTokenizerForFvar(StreamTokenizer tokenizer) { // Setup the tokenizer just like a new one as per the StreamTokenizer constructor comment tokenizer.resetSyntax(); tokenizer.wordChars((int)'a', (int)'z'); tokenizer.wordChars((int)'A', (int)'Z'); tokenizer.wordChars(128 + 32, 255); tokenizer.whitespaceChars(0, (int)' '); tokenizer.commentChar((int)'/'); tokenizer.parseNumbers(); // Attribute names in fvar can include underscores, and spaces! tokenizer.wordChars(UNDER_SCORE, UNDER_SCORE); tokenizer.wordChars(SPACE, SPACE); tokenizer.ordinaryChar(DOUBLE_QUOTE); }
private void setUpTokenizerForQuotedValue(StreamTokenizer tokenizer) { // Reset the tokenizer to treat everything as a word except the double quote char and the escape char tokenizer.resetSyntax(); tokenizer.wordChars(0, 127); tokenizer.ordinaryChar(ESCAPE); tokenizer.ordinaryChar(DOUBLE_QUOTE); }
// Because StreamTokenizer does not parse quoted strings that contain newlines properly
// we have to do it ourselves. Reads everything up until a matching closing quote
// ignoring any that are preceded by an escape char '\'
private String parseQuotedString(int openQuote, StreamTokenizer tokenizer) {
StringBuilder value = new StringBuilder();
setUpTokenizerForQuotedValue(tokenizer);
def nextToken = tokenizer.nextToken();
boolean escapedQuote = false;
while (escapedQuote || nextToken != openQuote) {
escapedQuote = false;
if (nextToken == StreamTokenizer.TT_WORD) {
value.append(tokenizer.sval);
} else if (nextToken == ESCAPE) {
escapedQuote = true;
value.append((char)nextToken);
} else if (nextToken == openQuote) {
value.append((char)nextToken);
}
nextToken = tokenizer.nextToken();
}
setUpTokenizerForFvar(tokenizer);
return value.toString();
}
used in some code like this:
[...]
nextToken = tokenizer.nextToken();
if (nextToken == DOUBLE_QUOTE) {
String value = parseQuotedString(nextToken, tokenizer);
[...]
Hope that is some value to you.
Assigned to 1.6 release as this is a critical regression that has recently been introduced. Whoever fixes this, please add a unit-test to ensure the problem doesn't return unnoticed.