History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: GROOVY-2446
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: John Wilson
Reporter: Mike Dillon
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
groovy

Groovy XML-RPC uses ISO-8859-1 for the XML-RPC response even if it is declared as UTF-8

Created: 25/Dec/07 01:35 PM   Updated: 15/Jan/08 01:32 AM
Component/s: XML-RPC
Affects Version/s: None
Fix Version/s: 1.5.2

Time Tracking:
Not Specified


 Description  « Hide
The XMLRPCMessageProcessor class currently hard-codes ISO-8859-1 as the character set for its InputStreamReader and for the contents of <base64> elements. It should be possible to either specify the expected response encoding when constructing an XMLRPCServerProxy, or for the class to have some logic to determine the appropriate character set based on the HTTP headers, XML declaration, and possible the response content itself. The latter is obviously preferable, but being able to specify the expected response encoding would be a step in the right direction.

 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Mike Dillon - 10/Jan/08 11:14 PM
Any word on this? I've been able to work around it on a service that I know is UTF-8 by using the following code on all string values, but this is obviously not ideal:
string = new String(string.getBytes("ISO-8859-1"), "UTF-8")

John Wilson - 11/Jan/08 03:57 AM
There is no problem with the Base64 code. Base64 is an encoding of a byte stream. The Base64 decoder uses a StringBuffer to accumulate bytes held in characters. You end up with a String with all characters having a value between 0 and 255. Calling this.buffer.toString().getBytes("ISO-8859-1") just turns this into a byte array with each byte being the lower eight bits of each character (because the Unicode code points 0 through 255 are identical to the ISO-8859-1 character set).

The hard coding of the InputStreamReader to ISO-8859-1 is a problem. Strictly there is some detection magic that the system should perform to guess the encoding of the input stream, However, experience with many XML-RPC implementations leads me to believe that the "right" way of getting the encoding often leads to getting it wrong. There are a large number of PHP and C implementations which send ISO-8859-1 without an XML declaration. The standard XML encoding guessing algorithm misidentifies this as UTF-8.

However you should reasonably expect our implementation to work properly when the other end of the protocol is not broken! How about having a parameter to the XMLRPCServer and XMLRPCServerProxy which makes the code use auto detect rather than hard coded ISO-8859-1 encoding?


Mike Dillon - 11/Jan/08 09:49 AM
Thanks for the response. I'll admit that I didn't look to closely at the Base64 part, so thanks for the explanation. I see now that the ISO-8859-1 is being used as a pass-through for the underlying bytes which is obviously fine.

As for having a "detectEncoding" flag, that sounds good. It may also make sense to have a parallel flag for forcing the encoding to a known value (i.e. Windows-1252 for a broken service without an XML declaration as opposed to ISO-8859-1). For proper services, you'd use detectedEncoding, for broken ISO-8859-1 services you'd do nothing, and for anything else broken you'd have to force the encoding.


John Wilson - 12/Jan/08 05:46 AM
I have committed a patch which provides new constructors for XMLRPCServer and XMLRPCServerProxy. They take an extra boolean parameter. Supplying true causes the code to auto detect the document encoding as per the XML spec (I hope!).

Can you give it a try and close this issue if it works for you?


Russel Winder - 12/Jan/08 06:13 AM
I have bumped the version number so these changes apply to 0.5.0-SNAPSHOT not 0.4.

Mike Dillon - 14/Jan/08 11:11 PM
I gave the new flag a try on an UTF-8 XMLRPC service at work and it worked as expected.

Thanks for the fix.