Details

    • Type: Improvement Improvement
    • Status: Open Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.0-alpha-10
    • Fix Version/s: None
    • Component/s: Sink API
    • Labels:
      None
    • Number of attachments :
      1

      Issue Links

        Activity

        Hide
        Vincent Siveton added a comment - - edited

        I mean in the Sink API

        Show
        Vincent Siveton added a comment - - edited I mean in the Sink API
        Hide
        Lukas Theussl added a comment -

        The encoding of source/target files is set in the Reader/Writer of a Parser/Sink, ie outside of doxia.So I don't understand why this should be necessary or useful, or am I missing something?

        Show
        Lukas Theussl added a comment - The encoding of source/target files is set in the Reader/Writer of a Parser/Sink, ie outside of doxia.So I don't understand why this should be necessary or useful, or am I missing something?
        Hide
        Vincent Siveton added a comment -

        Actually yes it is but do we want to add character encoding directly in Doxia files like HTML does [1]?

        [1] http://www.w3.org/TR/html4/types.html

        Show
        Vincent Siveton added a comment - Actually yes it is but do we want to add character encoding directly in Doxia files like HTML does [1] ? [1] http://www.w3.org/TR/html4/types.html
        Hide
        Lukas Theussl added a comment -

        For xml files this is already done: DOXIA-133. For text files one would have to specify how to indicate an encoding on a per-file basis, eg via some meta-information. But in any case, the file encoding would have to be detected by some peek-ahead, ie outside doxia, to construct the Reader, like it's done by the ReaderFactory.newXmlReader in the xml case. So I don't think it's necessary to modify the Sink API.

        Show
        Lukas Theussl added a comment - For xml files this is already done: DOXIA-133 . For text files one would have to specify how to indicate an encoding on a per-file basis, eg via some meta-information. But in any case, the file encoding would have to be detected by some peek-ahead, ie outside doxia, to construct the Reader, like it's done by the ReaderFactory.newXmlReader in the xml case. So I don't think it's necessary to modify the Sink API.
        Hide
        Vincent Siveton added a comment -

        Lukas, the only need that I see is for xml files and specially for XhtmlSink. We need to find a way to add:

        <?xml version="1.0" encoding="UTF-8"?>
        

        and

        <meta http-equiv=content-type content="text/html; charset=UTF-8">
        
        Show
        Vincent Siveton added a comment - Lukas, the only need that I see is for xml files and specially for XhtmlSink. We need to find a way to add: <?xml version="1.0" encoding="UTF-8"?> and <meta http-equiv=content-type content="text/html; charset=UTF-8">
        Hide
        Vincent Siveton added a comment -

        Trivial patch need to be reviewed

        Show
        Vincent Siveton added a comment - Trivial patch need to be reviewed
        Hide
        Herve Boutemy added a comment -

        Adding the meta tag for pure XML, with an XML prolog

        <?xml version="1.0" ?>

        isn't really useful: the prolog defines encoding, either implictely (UTF-8 by default) or explicitely

        Adding the meta-tag is needed when there is no XML prolog: in this case, without the meta-tag, you can't know which encoding was used when writing the content. (XHTML without XML prolog but with meta-tag isn't really XML since it does not conform XML encoding convention, but that's another story).

        Then you should either add the XML prolog or the meta-tag, but both isn't really useful.

        From an API point of view, this requires the Sink API to know what encoding was used to create the Writer.

        About the proposed patch: Sink API is about output.
        Having an inputEncoding attribute doesn't make sense: Parser API could eventually have this attribute set along the Reader using the input encoding.

        To pass encoding to the Sink when rendering output, RenderingContext should be used, either by adding an outputEncoding attribute or using the attributes Map

        Show
        Herve Boutemy added a comment - Adding the meta tag for pure XML, with an XML prolog <?xml version= "1.0" ?> isn't really useful: the prolog defines encoding, either implictely (UTF-8 by default) or explicitely Adding the meta-tag is needed when there is no XML prolog: in this case, without the meta-tag, you can't know which encoding was used when writing the content. (XHTML without XML prolog but with meta-tag isn't really XML since it does not conform XML encoding convention, but that's another story). Then you should either add the XML prolog or the meta-tag, but both isn't really useful. From an API point of view, this requires the Sink API to know what encoding was used to create the Writer. About the proposed patch: Sink API is about output. Having an inputEncoding attribute doesn't make sense: Parser API could eventually have this attribute set along the Reader using the input encoding. To pass encoding to the Sink when rendering output, RenderingContext should be used, either by adding an outputEncoding attribute or using the attributes Map
        Hide
        Vincent Siveton added a comment -

        After some discussions with Hervé, here are a summary:

        • apt sink produces ASCII document
        • xml sinks produce UTF-8 documents

        What to do with others text sinks like twiki or confluence? I proposed to escape all characters similar to APT.

        Show
        Vincent Siveton added a comment - After some discussions with Hervé, here are a summary: apt sink produces ASCII document xml sinks produce UTF-8 documents What to do with others text sinks like twiki or confluence? I proposed to escape all characters similar to APT.

          People

          • Assignee:
            Unassigned
            Reporter:
            Vincent Siveton
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated: