QDox

java.lang.Character$UnicodeBlock fields are not correctly parsed

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.4
  • Fix Version/s: 1.4
  • Component/s: Parser
  • Labels:
    None
  • Environment:
    Latest CVS version
    J2SDK 1.4.2_03
    WinXP Pro
  • Number of attachments :
    0

Description

JavaClassBuilder fails to successfully parse the UnicodeBlock java.lang.Character inner class.

In this class, fields are not defined as usual but like this :

public static final UnicodeBlock
BASIC_LATIN
= new UnicodeBlock("BASIC_LATIN"),
LATIN_1_SUPPLEMENT
= new UnicodeBlock("LATIN_1_SUPPLEMENT"),
/* cut */
HALFWIDTH_AND_FULLWIDTH_FORMS
= new UnicodeBlock("HALFWIDTH_AND_FULLWIDTH_FORMS"),
SPECIALS
= new UnicodeBlock("SPECIALS");

From this code block, QDOX only finds the BASIC_LATIN field, and only 4 fields from the entire class (BASIC_LATIN, SYRIAC, blockStarts and blocks).

Activity

Hide
Mike Williams added a comment -

Tricky.

This happens because the lexer discards everything between the first "=" and the ";".

Discarding "un-interesting" tokens during lexical analysis is what makes QDox so speedy. In other words, it will be difficult to fix this bug without a big performance hit.

Show
Mike Williams added a comment - Tricky. This happens because the lexer discards everything between the first "=" and the ";". Discarding "un-interesting" tokens during lexical analysis is what makes QDox so speedy. In other words, it will be difficult to fix this bug without a big performance hit.
Hide
Aslak Hellesoy added a comment -

This is a matter of whether we want to support ugly C-style declarations like:

int i,j,k;

It's in the Java spec, so I don't think we should "won't fix" this.

Show
Aslak Hellesoy added a comment - This is a matter of whether we want to support ugly C-style declarations like: int i,j,k; It's in the Java spec, so I don't think we should "won't fix" this.
Hide
Mike Williams added a comment -

After a bit more thought, I think there actually is a way to handle this without a performance hit.

If we keep track of paren/brace nesting within the ASSIGNMENT, we can get the lexer to recognise and return the COMMA, while still discarding the uninteresting tokens.

Show
Mike Williams added a comment - After a bit more thought, I think there actually is a way to handle this without a performance hit. If we keep track of paren/brace nesting within the ASSIGNMENT, we can get the lexer to recognise and return the COMMA, while still discarding the uninteresting tokens.
Hide
Aslak Hellesoy added a comment -

Agree. That's the way to do it

Show
Aslak Hellesoy added a comment - Agree. That's the way to do it
Hide
Eric Dechaux added a comment -

If implemanting this cause too much slowdowns, it may be possible to have to different lexers. One "fast" and a "slow", that could be choosen at runtime or at compiletime...

Show
Eric Dechaux added a comment - If implemanting this cause too much slowdowns, it may be possible to have to different lexers. One "fast" and a "slow", that could be choosen at runtime or at compiletime...
Hide
Mike Williams added a comment -

I was wrong: we managed to fix this without a performance hit.

Show
Mike Williams added a comment - I was wrong: we managed to fix this without a performance hit.
Hide
Aslak Hellesoy added a comment -

Awesome work Mike!

Show
Aslak Hellesoy added a comment - Awesome work Mike!
Hide
Eric Dechaux added a comment -

I agree, great work

it works great

Show
Eric Dechaux added a comment - I agree, great work it works great

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: