jira.codehaus.org

  • Log In Access more options
    • Online Help
    • Keyboard Shortcuts
    • About JIRA
    • JIRA Credits
    • What?s New
  • Dashboards Access more options (Alt+d)
  • Projects Access more options (Alt+p)
  • Issues Access more options (Alt+i)
  • Groovy Modules
  • GMOD-203

Groovy XML-RPC uses ISO-8859-1 for the XML-RPC response even if it is declared as UTF-8

  • Log In
  • Views
    • XML
    • Word
    • Printable

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: XML-RPC
  • Labels:
    None

Description

The XMLRPCMessageProcessor class currently hard-codes ISO-8859-1 as the character set for its InputStreamReader and for the contents of <base64> elements. It should be possible to either specify the expected response encoding when constructing an XMLRPCServerProxy, or for the class to have some logic to determine the appropriate character set based on the HTTP headers, XML declaration, and possible the response content itself. The latter is obviously preferable, but being able to specify the expected response encoding would be a step in the right direction.

Activity

Ascending order - Click to sort in descending order
  • All
  • Comments
  • Work Log
  • History
  • Activity
Hide
Permalink
Mike Dillon added a comment - 10/Jan/08 11:14 PM

Any word on this? I've been able to work around it on a service that I know is UTF-8 by using the following code on all string values, but this is obviously not ideal:

string = new String(string.getBytes("ISO-8859-1"), "UTF-8")
Show
Mike Dillon added a comment - 10/Jan/08 11:14 PM Any word on this? I've been able to work around it on a service that I know is UTF-8 by using the following code on all string values, but this is obviously not ideal:
string = new String(string.getBytes("ISO-8859-1"), "UTF-8")
Hide
Permalink
John Wilson added a comment - 11/Jan/08 3:57 AM

There is no problem with the Base64 code. Base64 is an encoding of a byte stream. The Base64 decoder uses a StringBuffer to accumulate bytes held in characters. You end up with a String with all characters having a value between 0 and 255. Calling this.buffer.toString().getBytes("ISO-8859-1") just turns this into a byte array with each byte being the lower eight bits of each character (because the Unicode code points 0 through 255 are identical to the ISO-8859-1 character set).

The hard coding of the InputStreamReader to ISO-8859-1 is a problem. Strictly there is some detection magic that the system should perform to guess the encoding of the input stream, However, experience with many XML-RPC implementations leads me to believe that the "right" way of getting the encoding often leads to getting it wrong. There are a large number of PHP and C implementations which send ISO-8859-1 without an XML declaration. The standard XML encoding guessing algorithm misidentifies this as UTF-8.

However you should reasonably expect our implementation to work properly when the other end of the protocol is not broken! How about having a parameter to the XMLRPCServer and XMLRPCServerProxy which makes the code use auto detect rather than hard coded ISO-8859-1 encoding?

Show
John Wilson added a comment - 11/Jan/08 3:57 AM There is no problem with the Base64 code. Base64 is an encoding of a byte stream. The Base64 decoder uses a StringBuffer to accumulate bytes held in characters. You end up with a String with all characters having a value between 0 and 255. Calling this.buffer.toString().getBytes("ISO-8859-1") just turns this into a byte array with each byte being the lower eight bits of each character (because the Unicode code points 0 through 255 are identical to the ISO-8859-1 character set). The hard coding of the InputStreamReader to ISO-8859-1 is a problem. Strictly there is some detection magic that the system should perform to guess the encoding of the input stream, However, experience with many XML-RPC implementations leads me to believe that the "right" way of getting the encoding often leads to getting it wrong. There are a large number of PHP and C implementations which send ISO-8859-1 without an XML declaration. The standard XML encoding guessing algorithm misidentifies this as UTF-8. However you should reasonably expect our implementation to work properly when the other end of the protocol is not broken! How about having a parameter to the XMLRPCServer and XMLRPCServerProxy which makes the code use auto detect rather than hard coded ISO-8859-1 encoding?
Hide
Permalink
Mike Dillon added a comment - 11/Jan/08 9:49 AM

Thanks for the response. I'll admit that I didn't look to closely at the Base64 part, so thanks for the explanation. I see now that the ISO-8859-1 is being used as a pass-through for the underlying bytes which is obviously fine.

As for having a "detectEncoding" flag, that sounds good. It may also make sense to have a parallel flag for forcing the encoding to a known value (i.e. Windows-1252 for a broken service without an XML declaration as opposed to ISO-8859-1). For proper services, you'd use detectedEncoding, for broken ISO-8859-1 services you'd do nothing, and for anything else broken you'd have to force the encoding.

Show
Mike Dillon added a comment - 11/Jan/08 9:49 AM Thanks for the response. I'll admit that I didn't look to closely at the Base64 part, so thanks for the explanation. I see now that the ISO-8859-1 is being used as a pass-through for the underlying bytes which is obviously fine. As for having a "detectEncoding" flag, that sounds good. It may also make sense to have a parallel flag for forcing the encoding to a known value (i.e. Windows-1252 for a broken service without an XML declaration as opposed to ISO-8859-1). For proper services, you'd use detectedEncoding, for broken ISO-8859-1 services you'd do nothing, and for anything else broken you'd have to force the encoding.
Hide
Permalink
John Wilson added a comment - 12/Jan/08 5:46 AM

I have committed a patch which provides new constructors for XMLRPCServer and XMLRPCServerProxy. They take an extra boolean parameter. Supplying true causes the code to auto detect the document encoding as per the XML spec (I hope!).

Can you give it a try and close this issue if it works for you?

Show
John Wilson added a comment - 12/Jan/08 5:46 AM I have committed a patch which provides new constructors for XMLRPCServer and XMLRPCServerProxy. They take an extra boolean parameter. Supplying true causes the code to auto detect the document encoding as per the XML spec (I hope!). Can you give it a try and close this issue if it works for you?
Hide
Permalink
Russel Winder added a comment - 12/Jan/08 6:13 AM

I have bumped the version number so these changes apply to 0.5.0-SNAPSHOT not 0.4.

Show
Russel Winder added a comment - 12/Jan/08 6:13 AM I have bumped the version number so these changes apply to 0.5.0-SNAPSHOT not 0.4.
Hide
Permalink
Mike Dillon added a comment - 14/Jan/08 11:11 PM

I gave the new flag a try on an UTF-8 XMLRPC service at work and it worked as expected.

Thanks for the fix.

Show
Mike Dillon added a comment - 14/Jan/08 11:11 PM I gave the new flag a try on an UTF-8 XMLRPC service at work and it worked as expected. Thanks for the fix.

People

  • Assignee:
    John Wilson
    Reporter:
    Mike Dillon
Vote (0)
Watch (1)

Dates

  • Created:
    25/Dec/07 1:35 PM
    Updated:
    23/Mar/11 8:33 AM
    Resolved:
    12/Jan/08 5:46 AM
  • Atlassian JIRA (v5.0.4#731-sha1:3aa7374)
  • Report a problem
  • Powered by a free Atlassian JIRA open source license for Codehaus. Try JIRA - bug tracking software for your team.