Details

    • Type: Task Task
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0-alpha-2
    • Fix Version/s: 1.1
    • Component/s: Sink API
    • Labels:
      None
    • Number of attachments :
      0

      Description

      If the idea with extensibility and interchangeable input/output formats should be more than a nice dream, the Sink API needs a thorough specification (e.g. by means of more javadoc at Sink) because that's were everything meets. It should define

      1. what rules parsers must obey when generating events and
      2. what events a sink needs to be prepared to handle

      Currently, all of this is left to assumptions. Some example issues that need to be clarified:

      • What characters may constitute an anchor reported by anchor()? Arbitrary, ASCII-only, ...?
      • What format applies to the name parameter of link()? How are internal and external links to be distinguished (DOXIA-208)?
      • What character chunks are reported by text()? Longest consecutive sequence, line-by-line, arbitrary, ... (DOXIA-222)?
      • What exactly is a figure's source as reported by figureGraphics()? Relative/absolute path, relative to which directory? What about file extensions (DOXIA-99)?
      • What order of events is "reasonable" (DOXIA-132)? May parsers report table body and caption in a specific or arbitrary order? Must the document head always be reported before body or may it be postponed?
      • Is closing a sink twice acceptable or an error?

        Issue Links

          Activity

          Hide
          Lukas Theussl added a comment -

          Thanks for the graphic illustration

          However, I most definitely disagree with your conclusion. Curiously, I had to defend my point several times already, so let me just direct you to some issues: DOXIA-152, DOXIA-138 (lower part of the discussion). In short: a parser doesn't know yet where it's output will go, some feature that might be acceptable for one Sink may lead to errors in others. Only a Sink knows what output is legal for its format, a Parser should therefore never insert anything that was not explicitly there in the original input format. Otherwise you would not be able to produce eg a pdf and a html from the same set of source documents.

          restricting the parsers is equivalent to restricting the input format

          I consider it a fundamental design flaw if an input format defines implicit anchors for section titles. We have modified the original apt format (as documentet in the doxia-apt.apt document on the doxia site) for these reasons.

          Show
          Lukas Theussl added a comment - Thanks for the graphic illustration However, I most definitely disagree with your conclusion. Curiously, I had to defend my point several times already, so let me just direct you to some issues: DOXIA-152 , DOXIA-138 (lower part of the discussion). In short: a parser doesn't know yet where it's output will go, some feature that might be acceptable for one Sink may lead to errors in others. Only a Sink knows what output is legal for its format, a Parser should therefore never insert anything that was not explicitly there in the original input format. Otherwise you would not be able to produce eg a pdf and a html from the same set of source documents. restricting the parsers is equivalent to restricting the input format I consider it a fundamental design flaw if an input format defines implicit anchors for section titles. We have modified the original apt format (as documentet in the doxia-apt.apt document on the doxia site) for these reasons.
          Hide
          Benjamin Bentmann added a comment -

          Thanks for the graphic illustration

          Sometimes I just can't resist my brain dumps, sorry

          a parser doesn't know yet where it's output will go,

          Yep, exactly my motivation for this issue: Since a parser can't and shouldn't know the various sinks, he must at least know the contract of their common interface that every sink obeys. If you can't setup such a common denominator among the sinks, it's all lost with interchangable output formats.

          some feature that might be acceptable for one Sink may lead to errors in others

          Of course the output formats created by sinks will have different requirements/restrictions, but every sink should
          a) either fully support an event that is defined as part of the Sink API
          b) or at least gracefully ignore an event it can't handle
          such that users get a (best-effort) output regardless of the selected sink. It is the responsibility of the sink implementor to shield parsers from the details of its realized output format. IMHO, a sink should never ever fail with an exception if the input event is valid according to the Sink API.

          Only a Sink knows what output is legal for its format, a Parser should therefore never insert anything that was not explicitly there in the original input format.

          Anchor events are part of the Sink API, so a parser has to my understanding always the right to push this event into a sink, regardless whether the event is driven by explicit user input or by implicit convention. It is the sink's responsibility to handle this defined event, whether it support anchors or not.

          Regarding the issue of unique anchor names: This is merely another aspect that needs to be added to the javadoc of the Sink API. If you define that anchor names must be unique within a document then

          1. a conforming parser is responsible for providing this uniqueness
          2. a sink has all right to fail if a non-conforming parser outputs two anchor events with the same name

          I consider it a fundamental design flaw if an input format defines implicit anchors for section titles.

          I am fine with your arguments against implicit anchors. However, then I still don't understand why sinks are allowed to output implicit anchors for sections. If we consider such anchors as problematic, nobody should be allowed to create them. An implicit anchor is an implicit anchor, regardless whether the parser of the sink created it, isn't it?

          For example, if we consider the SiteRenderingSite to be one of those specialized sinks that may output implicit anchors to the XHTML pages, people could start using these auto-links to cross-reference to those sections from external documents (of the same site). Now this a dangerous because as soon as the users wants to output his nicely linked HTML website into a PDF book, he will find all the auto-links not working anymore because the PdfSink doesn't create implicit anchors like the SiteRenderingSink.

          We have modified the original apt format

          From SVN logs I see this was created after the last deployment of the doxia site (2007-11-06). If it doesn't cause any harm to the overall site, it would be cool to have this doc online. For example, the APT Reference still reads "Section titles are implicitly defined anchors." which does not apply to the version of Doxia used by the Site Plugin, IIRC.

          Show
          Benjamin Bentmann added a comment - Thanks for the graphic illustration Sometimes I just can't resist my brain dumps, sorry a parser doesn't know yet where it's output will go, Yep, exactly my motivation for this issue: Since a parser can't and shouldn't know the various sinks, he must at least know the contract of their common interface that every sink obeys. If you can't setup such a common denominator among the sinks, it's all lost with interchangable output formats. some feature that might be acceptable for one Sink may lead to errors in others Of course the output formats created by sinks will have different requirements/restrictions, but every sink should a) either fully support an event that is defined as part of the Sink API b) or at least gracefully ignore an event it can't handle such that users get a (best-effort) output regardless of the selected sink. It is the responsibility of the sink implementor to shield parsers from the details of its realized output format. IMHO, a sink should never ever fail with an exception if the input event is valid according to the Sink API. Only a Sink knows what output is legal for its format, a Parser should therefore never insert anything that was not explicitly there in the original input format. Anchor events are part of the Sink API, so a parser has to my understanding always the right to push this event into a sink, regardless whether the event is driven by explicit user input or by implicit convention. It is the sink's responsibility to handle this defined event, whether it support anchors or not. Regarding the issue of unique anchor names: This is merely another aspect that needs to be added to the javadoc of the Sink API. If you define that anchor names must be unique within a document then a conforming parser is responsible for providing this uniqueness a sink has all right to fail if a non-conforming parser outputs two anchor events with the same name I consider it a fundamental design flaw if an input format defines implicit anchors for section titles. I am fine with your arguments against implicit anchors. However, then I still don't understand why sinks are allowed to output implicit anchors for sections. If we consider such anchors as problematic, nobody should be allowed to create them. An implicit anchor is an implicit anchor, regardless whether the parser of the sink created it, isn't it? For example, if we consider the SiteRenderingSite to be one of those specialized sinks that may output implicit anchors to the XHTML pages, people could start using these auto-links to cross-reference to those sections from external documents (of the same site). Now this a dangerous because as soon as the users wants to output his nicely linked HTML website into a PDF book, he will find all the auto-links not working anymore because the PdfSink doesn't create implicit anchors like the SiteRenderingSink. We have modified the original apt format From SVN logs I see this was created after the last deployment of the doxia site (2007-11-06). If it doesn't cause any harm to the overall site, it would be cool to have this doc online. For example, the APT Reference still reads "Section titles are implicitly defined anchors." which does not apply to the version of Doxia used by the Site Plugin, IIRC.
          Hide
          Lukas Theussl added a comment -

          Anchor events are part of the Sink API, so a parser has to my understanding always the right to push this event into a sink

          Not if there is no anchor in the parsed source document. Just because anchors are valid sink events doesn't mean a parser can emit one wherever it deems convenient.

          regardless whether the event is driven by explicit user input or by implicit convention.

          I disagree on the latter. A doxia parser is a translator, not an interpreter, if you want anchors for your section titles, provide them explicitly.

          I still don't understand why sinks are allowed to output implicit anchors

          Because there is no hard reason why they shouldn't. While there is such a reason to forbid it for parsers (because they don't know the output format), I don't see why it should in principle be forbidden for sinks. My personal opinion is that implicit anchors should never be generated neither by parser nor sink, and I think I made that clear in the javadocs, but after all, automatically generated anchors are still a useful and widely used feature for one single output format (html).

          If we consider such anchors as problematic, nobody should be allowed to create them

          The problem is not the existence of the implicit anchor, but its translation into different output formats. If you are only interested in a html web site for your project, I see no reason why you shouldn't be allowed to write a sink that automatically generates those anchors for you. Of course you will be in trouble the day you want to create a pdf from your docs. You will either have to adjust your input documents, or use an adapted pdf sink as well. So you could have adapted your input docs in the first place...

          it would be cool to have this doc online

          The docs are for doxia-beta-1 which is not released yet, so we can't publish them.

          Show
          Lukas Theussl added a comment - Anchor events are part of the Sink API, so a parser has to my understanding always the right to push this event into a sink Not if there is no anchor in the parsed source document. Just because anchors are valid sink events doesn't mean a parser can emit one wherever it deems convenient. regardless whether the event is driven by explicit user input or by implicit convention. I disagree on the latter. A doxia parser is a translator, not an interpreter, if you want anchors for your section titles, provide them explicitly. I still don't understand why sinks are allowed to output implicit anchors Because there is no hard reason why they shouldn't. While there is such a reason to forbid it for parsers (because they don't know the output format), I don't see why it should in principle be forbidden for sinks. My personal opinion is that implicit anchors should never be generated neither by parser nor sink, and I think I made that clear in the javadocs, but after all, automatically generated anchors are still a useful and widely used feature for one single output format (html). If we consider such anchors as problematic, nobody should be allowed to create them The problem is not the existence of the implicit anchor, but its translation into different output formats. If you are only interested in a html web site for your project, I see no reason why you shouldn't be allowed to write a sink that automatically generates those anchors for you. Of course you will be in trouble the day you want to create a pdf from your docs. You will either have to adjust your input documents, or use an adapted pdf sink as well. So you could have adapted your input docs in the first place... it would be cool to have this doc online The docs are for doxia-beta-1 which is not released yet, so we can't publish them.
          Hide
          Benjamin Bentmann added a comment -

          Just because anchors are valid sink events doesn't mean a parser can emit one wherever it deems convenient.

          Yes of course, a parser should not emit events at random. What I did not clearly express is my understanding that a parser adopts a certain input format for usage with Doxia, just like a sink realizes some output format. Now if the format (which is in general external and unrelated to Doxia) specifies that a single syntactical construct like a section title is to be interpreted as a title with an implicit anchor, a parser which wants to feed this format into Doxia now simply can't follow the format specification because sending the anchor event is prohibited, i.e. informataion from the input document is lost. That's the only thing that puzzled me a little, wondering if it's really necessary/desired. I'm fine if Doxia says "you ugly input format, don't use implicit anchors", it's just some kind of pushing best practices, I can fairly well understand that

          I don't see why it should in principle be forbidden for sinks.

          Alright, as long as the implicit anchors generated by such a sink do not interfere with the explicit anchors defined by the user (e.g. name clash).

          If you are only interested in a html web site for your project, I see no reason why you shouldn't be allowed to write a sink that automatically generates those anchors for you.

          If you are only interested in a html web siteAPT sources for your project, I see no reason why you shouldn't be allowed to write a sinkparser that automatically generates those anchors for you.

          Just for the fun of the words, it wasn't meant seriously

          so we can't publish them.

          I see, at least I know where to look for them.

          To come to an end, I might not fully understand all your arguments but that's mostly because I'm not familiar enough with Doxia's architecture. If I look back to where this issue started, I can only repeat you did a good job and feel this issue is ready for being closed, thanks Lukas!

          Show
          Benjamin Bentmann added a comment - Just because anchors are valid sink events doesn't mean a parser can emit one wherever it deems convenient. Yes of course, a parser should not emit events at random. What I did not clearly express is my understanding that a parser adopts a certain input format for usage with Doxia, just like a sink realizes some output format. Now if the format (which is in general external and unrelated to Doxia) specifies that a single syntactical construct like a section title is to be interpreted as a title with an implicit anchor, a parser which wants to feed this format into Doxia now simply can't follow the format specification because sending the anchor event is prohibited, i.e. informataion from the input document is lost. That's the only thing that puzzled me a little, wondering if it's really necessary/desired. I'm fine if Doxia says "you ugly input format, don't use implicit anchors", it's just some kind of pushing best practices, I can fairly well understand that I don't see why it should in principle be forbidden for sinks. Alright, as long as the implicit anchors generated by such a sink do not interfere with the explicit anchors defined by the user (e.g. name clash). If you are only interested in a html web site for your project, I see no reason why you shouldn't be allowed to write a sink that automatically generates those anchors for you. If you are only interested in a html web site APT sources for your project, I see no reason why you shouldn't be allowed to write a sink parser that automatically generates those anchors for you. Just for the fun of the words, it wasn't meant seriously so we can't publish them. I see, at least I know where to look for them. To come to an end, I might not fully understand all your arguments but that's mostly because I'm not familiar enough with Doxia's architecture. If I look back to where this issue started, I can only repeat you did a good job and feel this issue is ready for being closed, thanks Lukas!
          Hide
          Lukas Theussl added a comment -

          Thank you for the thread! I hope this can serve as a reference for future doubts...

          I have opened DOXIA-238 and DOXIA-239 for some of your intermediary comments, please feel free to file anything else I might have overlooked.

          Show
          Lukas Theussl added a comment - Thank you for the thread! I hope this can serve as a reference for future doubts... I have opened DOXIA-238 and DOXIA-239 for some of your intermediary comments, please feel free to file anything else I might have overlooked.

            People

            • Assignee:
              Lukas Theussl
              Reporter:
              Benjamin Bentmann
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: