jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • QDox
  • QDOX-82

multiline'd tag attribute values not working anymore

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Resolved Resolved
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: 1.10
  • Component/s: Parser
  • Labels:
    None

Description

Some undefined time ago, it was possible to parse the following source
with qdox and retrieve a sensible value for the "foo" attribute of the
"bar.baz" tag -> "this is multilined"

/**

  • @bar.baz foo="this is
  •       multilined"
    */

with the latest snapshot, this unfortunately doesn't work anymore.

I haven't found an open related jira issue, but before creating one, I
wanted to make sure this wasn't on purpose ?

I think allowing this makes sense, for instance for xdoclet because
some attributes might have longish values, like the "description"
elements for servlets and such.

  • Options
    • Sort By Name
    • Sort By Date
    • Ascending
    • Descending
    • Download All

Attachments

  1. Java Source File
    MultineLineAttributeValuesWithQDoxTestCase.java
    07/Feb/05 5:08 PM
    1 kB
    Grégory Joseph (old account)
  2. Text File
    qdox82-test.patch
    07/Feb/05 5:31 PM
    2 kB
    Grégory Joseph (old account)

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Joe Walnes added a comment - 04/Feb/05 8:55 AM

Assigned to 1.6 release as this is a critical regression that has recently been introduced. Whoever fixes this, please add a unit-test to ensure the problem doesn't return unnoticed.

Show
Joe Walnes added a comment - 04/Feb/05 8:55 AM Assigned to 1.6 release as this is a critical regression that has recently been introduced. Whoever fixes this, please add a unit-test to ensure the problem doesn't return unnoticed.
Hide
Permalink
Mike Williams added a comment - 06/Feb/05 5:19 AM

I don't think this is a regression. Up until fairly recently, QDox collapsed all whitespace inside comments, effectively discarding newlines. It has always been thus, until about a month ago, when Joe implemented feature-request (QDOX-78) ... now, newlines are retained inside comments, and tag-values. I just checked, and this still appears to be working in CVS HEAD.

Gregory, perhaps you switched from a qdox-1.6-SHAPSHOT.jar to a previous release? Or to a previous snapshot?

Show
Mike Williams added a comment - 06/Feb/05 5:19 AM I don't think this is a regression. Up until fairly recently, QDox collapsed all whitespace inside comments, effectively discarding newlines. It has always been thus, until about a month ago, when Joe implemented feature-request (QDOX-78) ... now, newlines are retained inside comments, and tag-values. I just checked, and this still appears to be working in CVS HEAD. Gregory, perhaps you switched from a qdox-1.6-SHAPSHOT.jar to a previous release? Or to a previous snapshot?
Hide
Permalink
Mike Williams added a comment - 06/Feb/05 5:47 AM

I couldn't reproduce the problem in CVS HEAD. I've added a unit-test (testJiraQdox82), to demonstrate newlines in tag values.

Show
Mike Williams added a comment - 06/Feb/05 5:47 AM I couldn't reproduce the problem in CVS HEAD. I've added a unit-test (testJiraQdox82), to demonstrate newlines in tag values.
Hide
Permalink
Brian Topping added a comment - 06/Feb/05 6:53 AM

This was happening in CVS HEAD. More data to come.

Show
Brian Topping added a comment - 06/Feb/05 6:53 AM This was happening in CVS HEAD. More data to come.
Hide
Permalink
Brian Topping added a comment - 06/Feb/05 4:40 PM

Just spent most of the day tracking this down and getting things organized. Here's the regression:

  • Check out xdoclet-plugins project
  • Revert xdoclet-plugins/testapp-web/src/main/java/org/xdoclet/testapp/web/TimerFilter.java to version 1.1
  • Build xdoclet-plugin-web project

You'll get an error in the org.xdoclet.plugin.web.WebPluginTestCase test there. That's what we are looking to fix.

Thanks for looking at this.

-b

Show
Brian Topping added a comment - 06/Feb/05 4:40 PM Just spent most of the day tracking this down and getting things organized. Here's the regression:
  • Check out xdoclet-plugins project
  • Revert xdoclet-plugins/testapp-web/src/main/java/org/xdoclet/testapp/web/TimerFilter.java to version 1.1
  • Build xdoclet-plugin-web project
You'll get an error in the org.xdoclet.plugin.web.WebPluginTestCase test there. That's what we are looking to fix. Thanks for looking at this. -b
Hide
Permalink
Grégory Joseph (old account) added a comment - 06/Feb/05 7:29 PM

Mike, the output i'm expecting from this is indeed with white lines collapsed, which is fine. But the testcase I had on xdoclet-plugins that Brian mentionned now gets only the first line. (Well, until I aligned my test source to use non-multilined). I'll try from cvs head again.

Show
Grégory Joseph (old account) added a comment - 06/Feb/05 7:29 PM Mike, the output i'm expecting from this is indeed with white lines collapsed, which is fine. But the testcase I had on xdoclet-plugins that Brian mentionned now gets only the first line. (Well, until I aligned my test source to use non-multilined). I'll try from cvs head again.
Hide
Permalink
Joe Walnes added a comment - 07/Feb/05 2:19 AM

Brian, to make our lives easier, could you attach a standalone testcase to demonstrate this please.

Thought to all... given the example above...

/**

  • @bar.baz foo="this is
  • multilined"
    */

... what would you expect the value that qdox parses?

"this is multilined"
or
"this is\n multilined"
or
"this is\n multilined"

The first one is more convenient, but does it lose potentially useful data (as was the case with QDOX-78)?

Show
Joe Walnes added a comment - 07/Feb/05 2:19 AM Brian, to make our lives easier, could you attach a standalone testcase to demonstrate this please. Thought to all... given the example above... /**
  • @bar.baz foo="this is
  • multilined" */
... what would you expect the value that qdox parses? "this is multilined" or "this is\n multilined" or "this is\n multilined" The first one is more convenient, but does it lose potentially useful data (as was the case with QDOX-78)?
Hide
Permalink
Mike Williams added a comment - 07/Feb/05 5:18 AM

In answer to Joe's question, I'd expect:

"this is\nmultilined"

with the leading space trimmed.

More generally, I think QDox should retain ALL whitespace within comments, as in some cases it may be significant ... e.g. <pre> sections. Trouble is, this would be an incompatible change.

Show
Mike Williams added a comment - 07/Feb/05 5:18 AM In answer to Joe's question, I'd expect: "this is\nmultilined" with the leading space trimmed. More generally, I think QDox should retain ALL whitespace within comments, as in some cases it may be significant ... e.g. <pre> sections. Trouble is, this would be an incompatible change.
Hide
Permalink
Grégory Joseph (old account) added a comment - 07/Feb/05 10:49 AM

"this is multilined" would be good enough as far as i'm concerning, but got "this is" when i opened this issue. As I said earlier, I'll try again

Show
Grégory Joseph (old account) added a comment - 07/Feb/05 10:49 AM "this is multilined" would be good enough as far as i'm concerning, but got "this is" when i opened this issue. As I said earlier, I'll try again
Hide
Permalink
Grégory Joseph (old account) added a comment - 07/Feb/05 5:08 PM

Here's a sample testcase that doesn't pass with qdox-1.6-snapshot built out of cvs on my mac.
I'll let you guys make the path to the test source portable

Show
Grégory Joseph (old account) added a comment - 07/Feb/05 5:08 PM Here's a sample testcase that doesn't pass with qdox-1.6-snapshot built out of cvs on my mac. I'll let you guys make the path to the test source portable
Hide
Permalink
Grégory Joseph (old account) added a comment - 07/Feb/05 5:31 PM

or maybe as a patch if you prefer.
(same remark wrt the path to the source)

Show
Grégory Joseph (old account) added a comment - 07/Feb/05 5:31 PM or maybe as a patch if you prefer. (same remark wrt the path to the source)
Hide
Permalink
Mike Williams added a comment - 12/Feb/05 4:52 AM

Ah, okay, looks like the bug is in named-parameter parsing (in TagParser).

Show
Mike Williams added a comment - 12/Feb/05 4:52 AM Ah, okay, looks like the bug is in named-parameter parsing (in TagParser).
Hide
Permalink
Mike Williams added a comment - 12/Feb/05 6:17 AM

This appears to be due to the use of java.io.StreamTokenizer for parsing of named parameters. StreamTokenizer does not appear to support newlines within quoted strings unless they are backslash-escaped. Which is to say, multi-line named parameter values are currently supported if you escape the newline:

/**

  • @bar.baz foo="this is\
  • multilined"
    */

I made the change to StreamTokenizer was made about 9 months ago, to improve named-parameter parsing (QDOX-45, QDOX-50), and it was released in QDox-1.5.

I'm not sure if the correct resolution for this issue. Having to escape the newline could be an irritation. On the other hand, there's a workaround, and the previous behaviour (for a multi-lined named parameter) was undefined.

Compatibility with XJavadoc is also an issue, especially since named-parameter support was added by and primarily for the XDoclet crowd. At this point, I'd prefer to hand this over to someone with an interest in XDoclet. Aslak, perhaps the simplest solution would be to contribute the relevant code from XJavadoc?

Since the behaviour has not changed since QDox-1.5, perhaps we shouldn't allow this to hold up the QDox-1.6 release.

Show
Mike Williams added a comment - 12/Feb/05 6:17 AM This appears to be due to the use of java.io.StreamTokenizer for parsing of named parameters. StreamTokenizer does not appear to support newlines within quoted strings unless they are backslash-escaped. Which is to say, multi-line named parameter values are currently supported if you escape the newline: /**
  • @bar.baz foo="this is\
  • multilined" */
I made the change to StreamTokenizer was made about 9 months ago, to improve named-parameter parsing (QDOX-45, QDOX-50), and it was released in QDox-1.5. I'm not sure if the correct resolution for this issue. Having to escape the newline could be an irritation. On the other hand, there's a workaround, and the previous behaviour (for a multi-lined named parameter) was undefined. Compatibility with XJavadoc is also an issue, especially since named-parameter support was added by and primarily for the XDoclet crowd. At this point, I'd prefer to hand this over to someone with an interest in XDoclet. Aslak, perhaps the simplest solution would be to contribute the relevant code from XJavadoc? Since the behaviour has not changed since QDox-1.5, perhaps we shouldn't allow this to hold up the QDox-1.6 release.
Hide
Permalink
Joe Walnes added a comment - 23/Sep/06 3:39 AM

Trying to resolve old issues.... Gregory, was this problem ever solved for you?

thanks
-Joe

Show
Joe Walnes added a comment - 23/Sep/06 3:39 AM Trying to resolve old issues.... Gregory, was this problem ever solved for you? thanks -Joe
Hide
Permalink
Grégory Joseph added a comment - 24/Sep/06 11:46 AM

I was going to run the tests (see attachments) against the latest sources to check this but err,
http://qdox.codehaus.org/cvs-usage.html

(I'm guessing i'll find qdox on codehaus's svn though )

Show
Grégory Joseph added a comment - 24/Sep/06 11:46 AM I was going to run the tests (see attachments) against the latest sources to check this but err, http://qdox.codehaus.org/cvs-usage.html (I'm guessing i'll find qdox on codehaus's svn though )
Hide
Permalink
Grégory Joseph added a comment - 24/Sep/06 11:56 AM

I was going to run the tests (see attachments) against the latest sources to check this but err, just for the record, the site is outdate : http://qdox.codehaus.org/cvs-usage.html

So I got the trunk from svn and my test sill doesn't pass.

Show
Grégory Joseph added a comment - 24/Sep/06 11:56 AM I was going to run the tests (see attachments) against the latest sources to check this but err, just for the record, the site is outdate : http://qdox.codehaus.org/cvs-usage.html So I got the trunk from svn and my test sill doesn't pass.
Hide
Permalink
Christian Hargraves added a comment - 21/Jan/07 11:40 AM

<pre/> tags are mentioned above. Does this bug also include <pre/> tag support which should include all whitespace (not just EOL characters) or should I create a new bug?

It looks like Bug #/QDOX-78 covers EOL characters, but not normal whitespaces used for indention?

Show
Christian Hargraves added a comment - 21/Jan/07 11:40 AM <pre/> tags are mentioned above. Does this bug also include <pre/> tag support which should include all whitespace (not just EOL characters) or should I create a new bug? It looks like Bug #/QDOX-78 covers EOL characters, but not normal whitespaces used for indention?
Hide
Permalink
Mauro Talevi added a comment - 27/May/07 10:48 AM

Changed fix version to 1.7

Show
Mauro Talevi added a comment - 27/May/07 10:48 AM Changed fix version to 1.7
Hide
Permalink
Dimitri BAELI added a comment - 11/Jan/08 8:31 AM

A short study on that point shown that the StreamTokenizer used in TagParser does not work as expected.
It handles newline as token separator, according to Java <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4501629">Bug 4501629</a> it won't be fixed.
Suggestion, the TagParser should be reimplemented.

Show
Dimitri BAELI added a comment - 11/Jan/08 8:31 AM A short study on that point shown that the StreamTokenizer used in TagParser does not work as expected. It handles newline as token separator, according to Java <a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4501629">Bug 4501629</a> it won't be fixed. Suggestion, the TagParser should be reimplemented.
Hide
Permalink
Paul Hammant added a comment - 21/Feb/08 1:46 AM

Dimitri, are you interested in try some regex or extended/alternate StreamTokenizer hacking to overcome this ?

it might be OK to substitute newlines for some other char, might it not ?

  • Paul
Show
Paul Hammant added a comment - 21/Feb/08 1:46 AM Dimitri, are you interested in try some regex or extended/alternate StreamTokenizer hacking to overcome this ? it might be OK to substitute newlines for some other char, might it not ?
  • Paul
Hide
Permalink
Mark Jenner added a comment - 01/Jun/09 8:58 AM

Hi,

I had the same issue parsing something else with StreamTokenizer and I found your issue when I was searching for solutions. I could not find one so I cooked up my own and thought you might be interested in applying it to your problem as well. Basically I needed to parse strings contained in double quotes, StreamTokenizer does this for you but fails if there is a newline in the string. So instead of letting StreamTokenizer do the string parsing, I tell it that double quote is not special and when I get to a one, I reconfigure the tokenizer into my own "string mode" where the only special chars are double quote and backslash. When I get to the end of the string I switch back to my normal tokenizer config (for a format call fvar). Here are the methods:

private void setUpTokenizerForFvar(StreamTokenizer tokenizer) { // Setup the tokenizer just like a new one as per the StreamTokenizer constructor comment tokenizer.resetSyntax(); tokenizer.wordChars((int)'a', (int)'z'); tokenizer.wordChars((int)'A', (int)'Z'); tokenizer.wordChars(128 + 32, 255); tokenizer.whitespaceChars(0, (int)' '); tokenizer.commentChar((int)'/'); tokenizer.parseNumbers(); // Attribute names in fvar can include underscores, and spaces! tokenizer.wordChars(UNDER_SCORE, UNDER_SCORE); tokenizer.wordChars(SPACE, SPACE); tokenizer.ordinaryChar(DOUBLE_QUOTE); }

private void setUpTokenizerForQuotedValue(StreamTokenizer tokenizer) { // Reset the tokenizer to treat everything as a word except the double quote char and the escape char tokenizer.resetSyntax(); tokenizer.wordChars(0, 127); tokenizer.ordinaryChar(ESCAPE); tokenizer.ordinaryChar(DOUBLE_QUOTE); }

// Because StreamTokenizer does not parse quoted strings that contain newlines properly
// we have to do it ourselves. Reads everything up until a matching closing quote
// ignoring any that are preceded by an escape char '\'
private String parseQuotedString(int openQuote, StreamTokenizer tokenizer) {
StringBuilder value = new StringBuilder();
setUpTokenizerForQuotedValue(tokenizer);
def nextToken = tokenizer.nextToken();
boolean escapedQuote = false;
while (escapedQuote || nextToken != openQuote) {
escapedQuote = false;
if (nextToken == StreamTokenizer.TT_WORD) { value.append(tokenizer.sval); } else if (nextToken == ESCAPE) { escapedQuote = true; value.append((char)nextToken); } else if (nextToken == openQuote) { value.append((char)nextToken); }
nextToken = tokenizer.nextToken();
}
setUpTokenizerForFvar(tokenizer);
return value.toString();
}

used in some code like this:
[...]
nextToken = tokenizer.nextToken();
if (nextToken == DOUBLE_QUOTE) {
String value = parseQuotedString(nextToken, tokenizer);
[...]

Hope that is some value to you.

Show
Mark Jenner added a comment - 01/Jun/09 8:58 AM Hi, I had the same issue parsing something else with StreamTokenizer and I found your issue when I was searching for solutions. I could not find one so I cooked up my own and thought you might be interested in applying it to your problem as well. Basically I needed to parse strings contained in double quotes, StreamTokenizer does this for you but fails if there is a newline in the string. So instead of letting StreamTokenizer do the string parsing, I tell it that double quote is not special and when I get to a one, I reconfigure the tokenizer into my own "string mode" where the only special chars are double quote and backslash. When I get to the end of the string I switch back to my normal tokenizer config (for a format call fvar). Here are the methods: private void setUpTokenizerForFvar(StreamTokenizer tokenizer) { // Setup the tokenizer just like a new one as per the StreamTokenizer constructor comment tokenizer.resetSyntax(); tokenizer.wordChars((int)'a', (int)'z'); tokenizer.wordChars((int)'A', (int)'Z'); tokenizer.wordChars(128 + 32, 255); tokenizer.whitespaceChars(0, (int)' '); tokenizer.commentChar((int)'/'); tokenizer.parseNumbers(); // Attribute names in fvar can include underscores, and spaces! tokenizer.wordChars(UNDER_SCORE, UNDER_SCORE); tokenizer.wordChars(SPACE, SPACE); tokenizer.ordinaryChar(DOUBLE_QUOTE); } private void setUpTokenizerForQuotedValue(StreamTokenizer tokenizer) { // Reset the tokenizer to treat everything as a word except the double quote char and the escape char tokenizer.resetSyntax(); tokenizer.wordChars(0, 127); tokenizer.ordinaryChar(ESCAPE); tokenizer.ordinaryChar(DOUBLE_QUOTE); } // Because StreamTokenizer does not parse quoted strings that contain newlines properly // we have to do it ourselves. Reads everything up until a matching closing quote // ignoring any that are preceded by an escape char '\' private String parseQuotedString(int openQuote, StreamTokenizer tokenizer) { StringBuilder value = new StringBuilder(); setUpTokenizerForQuotedValue(tokenizer); def nextToken = tokenizer.nextToken(); boolean escapedQuote = false; while (escapedQuote || nextToken != openQuote) { escapedQuote = false; if (nextToken == StreamTokenizer.TT_WORD) { value.append(tokenizer.sval); } else if (nextToken == ESCAPE) { escapedQuote = true; value.append((char)nextToken); } else if (nextToken == openQuote) { value.append((char)nextToken); } nextToken = tokenizer.nextToken(); } setUpTokenizerForFvar(tokenizer); return value.toString(); } used in some code like this: [...] nextToken = tokenizer.nextToken(); if (nextToken == DOUBLE_QUOTE) { String value = parseQuotedString(nextToken, tokenizer); [...] Hope that is some value to you.
Hide
Permalink
Robert Scholte added a comment - 03/Sep/09 3:46 PM

This one should be resolved by implementing QDOX-168

Show
Robert Scholte added a comment - 03/Sep/09 3:46 PM This one should be resolved by implementing QDOX-168

People

  • Assignee:
    Robert Scholte
    Reporter:
    Grégory Joseph (old account)
Vote (1)
Watch (1)

Dates

  • Created:
    03/Feb/05 12:53 PM
    Updated:
    03/Sep/09 3:46 PM
    Resolved:
    03/Sep/09 3:46 PM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.