Maven Doxia

Handle non-ASCII characters in anchors and id's

Details

  • Number of attachments :
    0

Description

From DOXIA-236:

The javadoc for the method HtmlTools.encodeId() mentions the pattern [A-Za-z][A-Za-z0-9:_.-]* for its output. To me, this looks like the term "letter" in meant to refer to ASCII characters in this context. However, the employed method Character.isLetter() will classify characters according to the Unicode data file. For instance, the characters "ä" and "ß" are letters in the Unicode sense. encodeId() will pass these through to its output, violating the ASCII-only pattern stated in its javadoc.

Activity

Hide
Lukas Theussl added a comment -

Some links:

http://www.w3.org/TR/html4/struct/links.html#h-12.2.1
http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars

I think encodeId() should replace non-ASCII characters according to the recommendation of the latter link above.

Show
Lukas Theussl added a comment - Some links: http://www.w3.org/TR/html4/struct/links.html#h-12.2.1 http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars I think encodeId() should replace non-ASCII characters according to the recommendation of the latter link above.
Hide
Lukas Theussl added a comment -

Fixed in r764589.

Show
Lukas Theussl added a comment - Fixed in r764589.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: