Maven Doxia
  1. Maven Doxia
  2. DOXIA-431

Doxia creates illegal URLs from local paths

    Details

    • Type: Bug Bug
    • Status: Open Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.2
    • Fix Version/s: None
    • Component/s: Core
    • Labels:
      None
    • Number of attachments :
      0

      Description

      If a local resource contains characters which are illegal in a URL, Doxia creates illegal code or crashes.

        Activity

        Hide
        Aaron Digulla added a comment -

        In an external project, there are image files which contain spaces. Instead of replacing the spaces with %20 or calling java.net.URLEncoder.encode(), Doxia tries to call new java.net.URL("images/The ExTeX Project.png" which fails.

        I tried a fix but couldn't get it to work in a couple of hours. The problem is that you use a lot of Strings when you should be using URLs (or at least a URL-like type). Without such a type, it's impossible to know when a URL must be encoded/decoded.

        Example stacktrace:

        Caused by: java.lang.IllegalArgumentException
                at java.net.URI.create(URI.java:842)
                at org.apache.maven.doxia.site.decoration.inheritance.URIPathDescriptor.<init>(URIPathDescriptor.java:69)
                at org.apache.maven.doxia.site.decoration.inheritance.DefaultDecorationModelInheritanceAssembler.rebaseLink(DefaultDecorationModelInheritanceAssembler.java:361)
                at org.apache.maven.doxia.site.decoration.inheritance.DefaultDecorationModelInheritanceAssembler.rebaseBannerPaths(DefaultDecorationModelInheritanceAssembler.java:162)
                at org.apache.maven.doxia.site.decoration.inheritance.DefaultDecorationModelInheritanceAssembler.assembleModelInheritance(DefaultDecorationModelInheritanceAssembler.java:61)
                at org.apache.maven.doxia.tools.DefaultSiteTool.getDecorationModel(DefaultSiteTool.java:1221)
                at org.apache.maven.doxia.tools.DefaultSiteTool.getDecorationModel(DefaultSiteTool.java:458)
                at org.apache.maven.plugins.site.AbstractSiteRenderingMojo.createSiteRenderingContext(AbstractSiteRenderingMojo.java:285)
                at org.apache.maven.plugins.site.SiteMojo.renderLocale(SiteMojo.java:140)
                at org.apache.maven.plugins.site.SiteMojo.execute(SiteMojo.java:124)
                at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:107)
                ... 20 more
        Caused by: java.net.URISyntaxException: Illegal character in path at index 10: images/The ExTeX Project.png
                at java.net.URI$Parser.fail(URI.java:2809)
                at java.net.URI$Parser.checkChars(URI.java:2982)
                at java.net.URI$Parser.parseHierarchical(URI.java:3066)
                at java.net.URI$Parser.parse(URI.java:3024)
                at java.net.URI.<init>(URI.java:578)
                at java.net.URI.create(URI.java:840)
                ... 30 more
        
        Show
        Aaron Digulla added a comment - In an external project, there are image files which contain spaces. Instead of replacing the spaces with %20 or calling java.net.URLEncoder.encode() , Doxia tries to call new java.net.URL("images/The ExTeX Project.png" which fails. I tried a fix but couldn't get it to work in a couple of hours. The problem is that you use a lot of Strings when you should be using URLs (or at least a URL-like type). Without such a type, it's impossible to know when a URL must be encoded/decoded. Example stacktrace: Caused by: java.lang.IllegalArgumentException at java.net.URI.create(URI.java:842) at org.apache.maven.doxia.site.decoration.inheritance.URIPathDescriptor.<init>(URIPathDescriptor.java:69) at org.apache.maven.doxia.site.decoration.inheritance.DefaultDecorationModelInheritanceAssembler.rebaseLink(DefaultDecorationModelInheritanceAssembler.java:361) at org.apache.maven.doxia.site.decoration.inheritance.DefaultDecorationModelInheritanceAssembler.rebaseBannerPaths(DefaultDecorationModelInheritanceAssembler.java:162) at org.apache.maven.doxia.site.decoration.inheritance.DefaultDecorationModelInheritanceAssembler.assembleModelInheritance(DefaultDecorationModelInheritanceAssembler.java:61) at org.apache.maven.doxia.tools.DefaultSiteTool.getDecorationModel(DefaultSiteTool.java:1221) at org.apache.maven.doxia.tools.DefaultSiteTool.getDecorationModel(DefaultSiteTool.java:458) at org.apache.maven.plugins.site.AbstractSiteRenderingMojo.createSiteRenderingContext(AbstractSiteRenderingMojo.java:285) at org.apache.maven.plugins.site.SiteMojo.renderLocale(SiteMojo.java:140) at org.apache.maven.plugins.site.SiteMojo.execute(SiteMojo.java:124) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:107) ... 20 more Caused by: java.net.URISyntaxException: Illegal character in path at index 10: images/The ExTeX Project.png at java.net.URI$Parser.fail(URI.java:2809) at java.net.URI$Parser.checkChars(URI.java:2982) at java.net.URI$Parser.parseHierarchical(URI.java:3066) at java.net.URI$Parser.parse(URI.java:3024) at java.net.URI.<init>(URI.java:578) at java.net.URI.create(URI.java:840) ... 30 more
        Hide
        Lukas Theussl added a comment -

        Can you be more specific: where does this image come from, from an apt/xdoc source file or site.xml? Doxia 1.2 is not used yet in any site release so I assume you are using a snapshot? Is this a regression then? Is it documented somewhere that image/file references may contain spaces (just for my education, I don't think eg that the apt reference is sufficiently precise in many respects)?

        Show
        Lukas Theussl added a comment - Can you be more specific: where does this image come from, from an apt/xdoc source file or site.xml? Doxia 1.2 is not used yet in any site release so I assume you are using a snapshot? Is this a regression then? Is it documented somewhere that image/file references may contain spaces (just for my education, I don't think eg that the apt reference is sufficiently precise in many respects)?
        Hide
        Aaron Digulla added a comment -

        > Doxia 1.2 is not used...

        I had problems with the site plugin 3.0-beta-3, so I tried beta-4-SNAPSHOT.

        > Is this a regression?

        Probably. My guess is that recent code changes unveiled a whole set of errors.

        > where does this image come from...

        The image comes from the folder src/site/resources/images/

        I'm not the maintainer of the project, so I have no idea how Doxia includes the image. All I have is the error message and the filename. I can't see any reference to the image in site.xml, so it must be included from somewhere else, probably the skin.

        > Is it documented somewhere that image/file references...?

        A URL can contain only some characters. See http://www.blooberry.com/indexdot/html/topics/urlencoding.htm for a pretty good explanation.

        File names on Unix can contain anything except "/" (slash) and 0-bytes.

        So if you accept Unix file names anywhere in Doxia, you must escape them as soon as they are converted to URLs and you must unescape them when they are converted back to file names.

        My suggestion is a new type which can be both and which has accessor methods to get a OS-specific path or a RFC-compliant URL and to get rid of the type String as soon as you can to make sure you don't have any gaps in the chain.

        Show
        Aaron Digulla added a comment - > Doxia 1.2 is not used... I had problems with the site plugin 3.0-beta-3, so I tried beta-4-SNAPSHOT. > Is this a regression? Probably. My guess is that recent code changes unveiled a whole set of errors. > where does this image come from... The image comes from the folder src/site/resources/images/ I'm not the maintainer of the project, so I have no idea how Doxia includes the image. All I have is the error message and the filename. I can't see any reference to the image in site.xml, so it must be included from somewhere else, probably the skin. > Is it documented somewhere that image/file references...? A URL can contain only some characters. See http://www.blooberry.com/indexdot/html/topics/urlencoding.htm for a pretty good explanation. File names on Unix can contain anything except "/" (slash) and 0-bytes. So if you accept Unix file names anywhere in Doxia, you must escape them as soon as they are converted to URLs and you must unescape them when they are converted back to file names. My suggestion is a new type which can be both and which has accessor methods to get a OS-specific path or a RFC-compliant URL and to get rid of the type String as soon as you can to make sure you don't have any gaps in the chain.
        Hide
        Lukas Theussl added a comment -

        Yeah, I'm somewhat familiar with url encoding

        I meant whether it is documented somewhere within maven/doxia wheher file references have to be urls, eg whether you are allowed to use in an apt source file

        [The ExTeX Project.png] Figure caption
        

        or in a site.xml

        <logo name=.. href=.. img="The ExTeX Project.png" />
        

        I guess the answer is yes, but I'm just wondering if there is anything in the docs.

        Show
        Lukas Theussl added a comment - Yeah, I'm somewhat familiar with url encoding I meant whether it is documented somewhere within maven/doxia wheher file references have to be urls, eg whether you are allowed to use in an apt source file [The ExTeX Project.png] Figure caption or in a site.xml <logo name=.. href=.. img="The ExTeX Project.png" /> I guess the answer is yes, but I'm just wondering if there is anything in the docs.
        Hide
        Aaron Digulla added a comment -

        I have no idea. But from my experience, I'd say that those URLs should already be encoded. I mean "Image[1].png" is a valid Unix filename. If you want to use that as a caption, you need escaping.

        So maybe the solution is to reject strings which contain invalid characters close to the input side.

        But I saw that you have sanitize methods in some URL helper class in Doxia. That led me to think that you want to do it there and I don't believe this will work. Data must be sanitized and validated in the outside interface, not deep in the code.

        Show
        Aaron Digulla added a comment - I have no idea. But from my experience, I'd say that those URLs should already be encoded. I mean "Image [1] .png" is a valid Unix filename. If you want to use that as a caption, you need escaping. So maybe the solution is to reject strings which contain invalid characters close to the input side. But I saw that you have sanitize methods in some URL helper class in Doxia. That led me to think that you want to do it there and I don't believe this will work. Data must be sanitized and validated in the outside interface, not deep in the code.
        Hide
        Lukas Theussl added a comment -

        I generally agree with your comments. The sanitize methods in URIPathDescriptor (I guess that's what you are referring to) were necessary for some backward issues I encountered when re-writing the deprecated PathDescriptor class. Note also the comment in the relativizeLink method of DefaultDecorationModelInheritanceAssembler.

        One thing I can point out is the javadoc in the Sink API for figureGraphics, which states that the image src parameter has to be a valid URL before being emitted into the Sink. So that's consistent with your remark that data should be validated at the input side, ie by the Parser.

        Otherwise, I think a concrete test example would help me to work on this, as I still don't know where your figure is referenced from.

        Show
        Lukas Theussl added a comment - I generally agree with your comments. The sanitize methods in URIPathDescriptor (I guess that's what you are referring to) were necessary for some backward issues I encountered when re-writing the deprecated PathDescriptor class. Note also the comment in the relativizeLink method of DefaultDecorationModelInheritanceAssembler. One thing I can point out is the javadoc in the Sink API for figureGraphics , which states that the image src parameter has to be a valid URL before being emitted into the Sink. So that's consistent with your remark that data should be validated at the input side, ie by the Parser. Otherwise, I think a concrete test example would help me to work on this, as I still don't know where your figure is referenced from.

          People

          • Assignee:
            Unassigned
            Reporter:
            Aaron Digulla
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated: