Issue Details (XML | Word | Printable)

Key: DOXIA-239
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Major Major
Assignee: Lukas Theussl
Reporter: Lukas Theussl
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Maven Doxia

Handle non-ASCII characters in anchors and id's

Created: 01/May/08 06:57 AM   Updated: 13/Apr/09 02:55 PM   Resolved: 13/Apr/09 02:55 PM
Return to search
Component/s: Core, Documentation, Modules, Sink API
Affects Version/s: None
Fix Version/s: 1.1.1

Time Tracking:
Not Specified


 Description  « Hide

From DOXIA-236:

The javadoc for the method HtmlTools.encodeId() mentions the pattern [A-Za-z][A-Za-z0-9:_.-]* for its output. To me, this looks like the term "letter" in meant to refer to ASCII characters in this context. However, the employed method Character.isLetter() will classify characters according to the Unicode data file. For instance, the characters "ä" and "ß" are letters in the Unicode sense. encodeId() will pass these through to its output, violating the ASCII-only pattern stated in its javadoc.



Lukas Theussl added a comment - 01/May/08 07:00 AM

Some links:

http://www.w3.org/TR/html4/struct/links.html#h-12.2.1
http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars

I think encodeId() should replace non-ASCII characters according to the recommendation of the latter link above.


Lukas Theussl added a comment - 13/Apr/09 02:55 PM

Fixed in r764589.