jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
Signup
JiBX
  • JiBX
  • JIBX-31

Encoding issues

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.0-beta3c
  • Fix Version/s: 1.0-RC0
  • Component/s: core
  • Labels:
    None
  • Environment:
    Linux + Windows, JDK 1.4.2_02
  • Number of attachments :
    1

Description

Encoding + decoding french accents fails in UTF-8 encoding.
There is also another bug with ISO-8859-1 which generates lots of escape characters sometimes eg:<CT_Adresse>rte Saint Antoine de GinestiÃÆ@™Ãƒ@ @D¢ÃƒÆ’@šÃ@šÃȁK;¨re</CT_Adresse>
for rte st Antoine de Ginestière

  • Options
    • Sort By Name
    • Sort By Date
    • Ascending
    • Descending
    • Download All

Attachments

  1. Java Source File
    XMLEncodingTest.java
    07/Jul/04 11:01 AM
    2 kB
    Henri Dupre

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Henri Dupre added a comment - 07/Jul/04 11:01 AM

this is a junit test case that reproduces the UTF-8 encoding bug.
the class com.actualis.sage.Client can be replaced by any class with a String field.

Show
Henri Dupre added a comment - 07/Jul/04 11:01 AM this is a junit test case that reproduces the UTF-8 encoding bug. the class com.actualis.sage.Client can be replaced by any class with a String field.
Hide
Permalink
Dennis Sosnoski added a comment - 07/Jul/04 3:06 PM

In my testing I haven't been able to recreate any problems when using ISO-8859-1 encoding. There does appear to be an issue with UTF-8 input when the encoding is not specified by the user, but instead determined by the parser. At present this looks like a parser issue, but I'm investigating further.

Show
Dennis Sosnoski added a comment - 07/Jul/04 3:06 PM In my testing I haven't been able to recreate any problems when using ISO-8859-1 encoding. There does appear to be an issue with UTF-8 input when the encoding is not specified by the user, but instead determined by the parser. At present this looks like a parser issue, but I'm investigating further.
Hide
Permalink
Dennis Sosnoski added a comment - 12/Jul/04 5:33 PM

I added an input stream wrapper that handles detecting the character encoding and processing it appropriately. This should work for everything except some of the wierd 16-bit character encodings (for UTF-16 it should be okay). I also did find an error in the UTF-8 encoding code, which only effected characters in the 0x800-0x3FFF range. Both sets of problems should now be fixed in the CVS code.

Show
Dennis Sosnoski added a comment - 12/Jul/04 5:33 PM I added an input stream wrapper that handles detecting the character encoding and processing it appropriately. This should work for everything except some of the wierd 16-bit character encodings (for UTF-16 it should be okay). I also did find an error in the UTF-8 encoding code, which only effected characters in the 0x800-0x3FFF range. Both sets of problems should now be fixed in the CVS code.

People

  • Assignee:
    Dennis Sosnoski
    Reporter:
    Henri Dupre
Vote (0)
Watch (1)

Dates

  • Created:
    07/Jul/04 10:59 AM
    Updated:
    13/Apr/05 6:33 PM
    Resolved:
    25/Mar/05 2:16 PM
  • Atlassian JIRA (v5.2.7#850-sha1:b2af0c8)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.