Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.0-beta3c
    • Fix Version/s: 1.0-RC0
    • Component/s: core
    • Labels:
      None
    • Environment:
      Linux + Windows, JDK 1.4.2_02
    • Number of attachments :
      1

      Description

      Encoding + decoding french accents fails in UTF-8 encoding.
      There is also another bug with ISO-8859-1 which generates lots of escape characters sometimes eg:<CT_Adresse>rte Saint Antoine de Ginesti√ƒ∆’√†@™√ƒ@ @DĘ√ƒ∆’@Ň°√ƒ@š√ȁK;¬®re</CT_Adresse>
      for rte st Antoine de GinestiŤre

        Activity

        Hide
        Henri Dupre added a comment -

        this is a junit test case that reproduces the UTF-8 encoding bug.
        the class com.actualis.sage.Client can be replaced by any class with a String field.

        Show
        Henri Dupre added a comment - this is a junit test case that reproduces the UTF-8 encoding bug. the class com.actualis.sage.Client can be replaced by any class with a String field.
        Hide
        Dennis Sosnoski added a comment -

        In my testing I haven't been able to recreate any problems when using ISO-8859-1 encoding. There does appear to be an issue with UTF-8 input when the encoding is not specified by the user, but instead determined by the parser. At present this looks like a parser issue, but I'm investigating further.

        Show
        Dennis Sosnoski added a comment - In my testing I haven't been able to recreate any problems when using ISO-8859-1 encoding. There does appear to be an issue with UTF-8 input when the encoding is not specified by the user, but instead determined by the parser. At present this looks like a parser issue, but I'm investigating further.
        Hide
        Dennis Sosnoski added a comment -

        I added an input stream wrapper that handles detecting the character encoding and processing it appropriately. This should work for everything except some of the wierd 16-bit character encodings (for UTF-16 it should be okay). I also did find an error in the UTF-8 encoding code, which only effected characters in the 0x800-0x3FFF range. Both sets of problems should now be fixed in the CVS code.

        Show
        Dennis Sosnoski added a comment - I added an input stream wrapper that handles detecting the character encoding and processing it appropriately. This should work for everything except some of the wierd 16-bit character encodings (for UTF-16 it should be okay). I also did find an error in the UTF-8 encoding code, which only effected characters in the 0x800-0x3FFF range. Both sets of problems should now be fixed in the CVS code.

          People

          • Assignee:
            Dennis Sosnoski
            Reporter:
            Henri Dupre
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: