Maven Doxia
  1. Maven Doxia
  2. DOXIA-131

HtmlTools.encodeId makes id lower case

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0-alpha-8
    • Fix Version/s: 1.0-alpha-9
    • Component/s: Core
    • Labels:
      None
    • Number of attachments :
      0

      Description

      HtmlTools.encodeId uses Character.toLowerCase to convert its argument to lower case. I don't see the reason for that since upper case characters are legal in id's (see http://www.w3.org/TR/html4/types.html#type-name ). In particular, it's a problem with anchors/links in the xhtml sink (see DOXIA-47 ), especially if we want to create anchors from section names, to maintain backward compatibility with m1. Is there a special reason for the toLowerCase or can we remove it?

        Issue Links

          Activity

          Hide
          Vincent Siveton added a comment -

          Seems to be a typo: the javadoc is clear! Feel free to correct it with a test case!

          Also, the actual test case [1] should be moved to o.a.m.d.util in the src/test dir

          [1]
          https://svn.apache.org/repos/asf/maven/doxia/doxia/trunk/doxia-core/src/test/java/org/apache/maven/doxia/module/HtmlToolsTest.java

          Show
          Vincent Siveton added a comment - Seems to be a typo: the javadoc is clear! Feel free to correct it with a test case! Also, the actual test case [1] should be moved to o.a.m.d.util in the src/test dir [1] https://svn.apache.org/repos/asf/maven/doxia/doxia/trunk/doxia-core/src/test/java/org/apache/maven/doxia/module/HtmlToolsTest.java
          Hide
          Lukas Theussl added a comment -

          Hmm, I have a doubt: the apt user guide cites the following example for anchor/link usage:

           
          {Anchor}. Link to {{anchor}}. Link to {{{anchor}showing alternate text}}.
           

          This gets converted by aptconvert into the following html:

          <a id="anchor" name="anchor">Anchor</a>. Link to <a href="#anchor">Anchor</a>. Link to <a href="#anchor">showing alternate text</a>.
          

          Note the anchor name and id have become lower case. Now I don't see that documented anywhere in the aptconvert guide (it only states "The name of an anchor/link is its text with all non alphanumeric characters stripped."), and it doesn't seem consistent since id's are case sensitive.

          So I'd say we stick to not making id's lower case, but we have to adjust the documentation...

          Show
          Lukas Theussl added a comment - Hmm, I have a doubt: the apt user guide cites the following example for anchor/link usage: {Anchor}. Link to {{anchor}}. Link to {{{anchor}showing alternate text}}. This gets converted by aptconvert into the following html: <a id= "anchor" name= "anchor" > Anchor </a> . Link to <a href= "#anchor" > Anchor </a> . Link to <a href= "#anchor" > showing alternate text </a> . Note the anchor name and id have become lower case. Now I don't see that documented anywhere in the aptconvert guide (it only states "The name of an anchor/link is its text with all non alphanumeric characters stripped."), and it doesn't seem consistent since id's are case sensitive. So I'd say we stick to not making id's lower case, but we have to adjust the documentation...
          Hide
          Dennis Lundberg added a comment -

          I have checked the different standards and have not found any evidence that ids need to be lower case. So I added a test case and changed encodeId() to not change case anymore. I also added tons of JavaDoc.

          Is that enough or did you have any other documentation in mind, Lukas?

          Show
          Dennis Lundberg added a comment - I have checked the different standards and have not found any evidence that ids need to be lower case. So I added a test case and changed encodeId() to not change case anymore. I also added tons of JavaDoc. Is that enough or did you have any other documentation in mind, Lukas?
          Hide
          Dennis Lundberg added a comment -

          I had a look at the original docs for aptconvert and found this interesting quote:

          The name of an anchor/link is its text with all non alphanumeric characters stripped.

          That suggests that we should be removing these characters ":_.-" from the ids as well.

          I'll give aptconvert a spin and see if I can shed some light on these issues.

          Show
          Dennis Lundberg added a comment - I had a look at the original docs for aptconvert and found this interesting quote: The name of an anchor/link is its text with all non alphanumeric characters stripped. That suggests that we should be removing these characters ":_.-" from the ids as well. I'll give aptconvert a spin and see if I can shed some light on these issues.
          Hide
          Dennis Lundberg added a comment -

          Here are the results of a little test I did with aptconvert

          APT

            {Anchor}. Link to {{Anchor}}. Link to {{http://www.pixware.fr}}. 
            Link to {{{Anchor}showing alternate text}}.
            Link to {{{http://www.pixware.fr}Pixware home page}}.
            {-.:_myAnchor-_.:}
          

          HTML

          <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
          <html lang="en">
          <head>
          <meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
          <title></title>
          </head>
          <body>
          <p><a id="anchor" name="anchor">Anchor</a>. Link to
          <a href="#anchor">Anchor</a>. Link to
          <a href="http://www.pixware.fr">http://www.pixware.fr</a>. Link to
          <a href="#anchor">showing alternate text</a>. Link to
          <a href="http://www.pixware.fr">Pixware home page</a>.
          <a id="myanchor" name="myanchor">-.:_myAnchor-_.:</a></p></body>
          </html>
          

          As you can see characters are converted to lower case and "-_.:" are being stripped away.

          I think that preserving case, like we do now, is the right thing to do. If we don't it might lead to duplicate ids which is illegal.

          We should probably remove the "-_.:" characters though, as they don't have any value in an id attribute.

          Show
          Dennis Lundberg added a comment - Here are the results of a little test I did with aptconvert APT {Anchor}. Link to {{Anchor}}. Link to {{http: //www.pixware.fr}}. Link to {{{Anchor}showing alternate text}}. Link to {{{http: //www.pixware.fr}Pixware home page}}. {-.:_myAnchor-_.:} HTML <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > <html lang= "en" > <head> <meta http-equiv= "Content-Type" content= "text/html; charset=Windows-1252" > <title> </title> </head> <body> <p> <a id= "anchor" name= "anchor" > Anchor </a> . Link to <a href= "#anchor" > Anchor </a> . Link to <a href= "http://www.pixware.fr" > http://www.pixware.fr </a> . Link to <a href= "#anchor" > showing alternate text </a> . Link to <a href= "http://www.pixware.fr" > Pixware home page </a> . <a id= "myanchor" name= "myanchor" > -.:_myAnchor-_.: </a> </p> </body> </html> As you can see characters are converted to lower case and "-_.:" are being stripped away. I think that preserving case, like we do now, is the right thing to do. If we don't it might lead to duplicate ids which is illegal. We should probably remove the "-_.:" characters though, as they don't have any value in an id attribute.
          Hide
          Dennis Lundberg added a comment -

          To be able to truly solve DOXIA-47 we should disallow the use of '.' in ids. That way we can use HtmlTools.isId(String) to determine if a text is an internal or external link. A non-internal link usually has either a '.' or a '/' in them.

          Show
          Dennis Lundberg added a comment - To be able to truly solve DOXIA-47 we should disallow the use of '.' in ids. That way we can use HtmlTools.isId(String) to determine if a text is an internal or external link. A non-internal link usually has either a '.' or a '/' in them.
          Show
          Lukas Theussl added a comment - See http://mail-archives.apache.org/mod_mbox/maven-doxia-dev/200708.mbox/%3c46C17AAA.6070604@apache.org%3e

            People

            • Assignee:
              Dennis Lundberg
              Reporter:
              Lukas Theussl
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: