Here's an example:
(although preview tells me that jira is mucking up the char, which is codepoint 8225, the double-dagger, or UTF-8: %E2%80%A1)
if the client goes off and generates this (say for use with JQuery), but uses escape() instead of encodeURIComponent(), the resultant string is:
Java's URLDecoder handles this correctly with UTF-8:
and throws the following:
org.mortbay.util.UrlEncoded.decodeTo(...,"UTF-8") appears to catch the invalid escape, and throw a NumberFormatException at line 653.
however, decodeUtf8To() does not catch this error, and produces the following string:
It appears that this can be corrected by one of:
- switching HttpURI.decodeQueryTo() to only use the safer version of the decoder
- for the internal UTF-8 decoder add checks for only being [a-zA-Z0-9]
- for TypeUtil.convertHexDigit() to throw an IllegalArgumentException (or similar) for invalid hex characters
For our use, we'll probably use the first and run a locally-built patched copy for now.
My further question is the existence of the internal UTF-8 decoder. Is this for performance reasons? legacy? to get around the UnsupportedCharsetException?