Jetty

Character encoding is ignored for decoding URL parameters

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Critical Critical
  • Resolution: Fixed
  • Affects Version/s: 6.0.0
  • Fix Version/s: None
  • Component/s: HTTP
  • Labels:
    None
  • Number of attachments :
    0

Description

A Servlet can call ServletRequest.setCharacterEncoding() to define the character set to be used for decoding request parameters. But org.mortbay.jetty.HttpURL.decodeQueryTo(). ignores the encoding parameter and always calls decodeUtf8To().. This has the effect that encoded URL query parameters are always decoded as UTF-8.

Activity

Hide
Greg Wilkins added a comment -

Strictly speaking, you cannot have encoded characters in query string parameters because they are not legal in a URI.

Furthermore, the setCharacterEncoding method is only required to effect the body of a request,
specifically POST requests.

So Jetty is correct in it's implementation.

However, we will investigate if it is safe to extend the specification and use the character encoding
for query parameters. If you have any references to how other servers handle this, please attach them to this issue.

Show
Greg Wilkins added a comment - Strictly speaking, you cannot have encoded characters in query string parameters because they are not legal in a URI. Furthermore, the setCharacterEncoding method is only required to effect the body of a request, specifically POST requests. So Jetty is correct in it's implementation. However, we will investigate if it is safe to extend the specification and use the character encoding for query parameters. If you have any references to how other servers handle this, please attach them to this issue.
Hide
Greg Wilkins added a comment -

I have committed to svn trunk a patch that implements this.
However, I am not 100% that I will keep this in as I need to check if it is correct behaviour or not.
Please try to convince me one way or the other

Show
Greg Wilkins added a comment - I have committed to svn trunk a patch that implements this. However, I am not 100% that I will keep this in as I need to check if it is correct behaviour or not. Please try to convince me one way or the other
Hide
Christian d'Heureuse added a comment -

The characters are %-encoded.

Example:
http://localhost/query?title=Kn%E4uel

%E4 is the character "ä" in ISO 8859-1.
Is this an illegal URI?

I understand that the documentation of ServletRequest.setCharacterEncoding() states that it affects the body of the request. It says nothing about the query parameters of a GET request. But the current web browsers (Firefox 1.5, IE6) seem to use the encoding of the HTML page to encode form field values in the URL of a GET request.

The Jetty 5 FAQ article "How do I work with international characters"
( http://jetty.mortbay.org/jetty5/faq/faq_s_900-Content_t_International.html ) says:
"... the Servlet API request.setCharacterEncoding() can be called (before calling any of the .getParameter() methods) to tell the web server which encoding to assume when decoding the query."
(This is for the query parameters of a GET request)

I had a look at the Tomcat source:

Show
Christian d'Heureuse added a comment - The characters are %-encoded. Example: http://localhost/query?title=Kn%E4uel %E4 is the character "ä" in ISO 8859-1. Is this an illegal URI? I understand that the documentation of ServletRequest.setCharacterEncoding() states that it affects the body of the request. It says nothing about the query parameters of a GET request. But the current web browsers (Firefox 1.5, IE6) seem to use the encoding of the HTML page to encode form field values in the URL of a GET request. The Jetty 5 FAQ article "How do I work with international characters" ( http://jetty.mortbay.org/jetty5/faq/faq_s_900-Content_t_International.html ) says: "... the Servlet API request.setCharacterEncoding() can be called (before calling any of the .getParameter() methods) to tell the web server which encoding to assume when decoding the query." (This is for the query parameters of a GET request) I had a look at the Tomcat source:
Hide
Christian d'Heureuse added a comment -

I had a look at the Tomcat source code:

org.apache.tomcat.util.http.Parameters:
http://svn.apache.org/repos/asf/tomcat/connectors/trunk/util/java/org/apache/tomcat/util/http/Parameters.java
Uses the the encoding set by Parameters.setQueryStringEncoding(). Default is null.
If gueryStringEncoding is null, urlDecode() copies the byte values from the URL 1-to-1 into Unicode characters.

org.apache.catalina.connector.Request:
http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Request.java
If connector.getUseBodyEncodingForURI() and getCharacterEncoding() are set, Parameters.setQueryStringEncoding() is called with getCharacterEncoding().

org.apache.catalina.connector.Connector:
http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Connector.java
Default for useBodyEncodingForURI is false.
setUseBodyEncodingForURI() is never called from anywhere within the Tomcat source tree.

Conclusion: Tomcat 5.5 treats %-encoded URL query parameter characters as Unicode codepoints and ServletRequest.setCharacterEncoding() has no effect.

Show
Christian d'Heureuse added a comment - I had a look at the Tomcat source code: org.apache.tomcat.util.http.Parameters: http://svn.apache.org/repos/asf/tomcat/connectors/trunk/util/java/org/apache/tomcat/util/http/Parameters.java Uses the the encoding set by Parameters.setQueryStringEncoding(). Default is null. If gueryStringEncoding is null, urlDecode() copies the byte values from the URL 1-to-1 into Unicode characters. org.apache.catalina.connector.Request: http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Request.java If connector.getUseBodyEncodingForURI() and getCharacterEncoding() are set, Parameters.setQueryStringEncoding() is called with getCharacterEncoding(). org.apache.catalina.connector.Connector: http://svn.apache.org/repos/asf/tomcat/container/tc5.5.x/catalina/src/share/org/apache/catalina/connector/Connector.java Default for useBodyEncodingForURI is false. setUseBodyEncodingForURI() is never called from anywhere within the Tomcat source tree. Conclusion: Tomcat 5.5 treats %-encoded URL query parameter characters as Unicode codepoints and ServletRequest.setCharacterEncoding() has no effect.
Hide
Greg Wilkins added a comment -

Well that all looks rather complex!

I will patch jetty now to have similar behaviour. By default, query strings will be interpreted as UTF-8
and ignore the setCharacterEncoding. I will add an option to allow setCharacterEncoding to be
respected.

cheers

Show
Greg Wilkins added a comment - Well that all looks rather complex! I will patch jetty now to have similar behaviour. By default, query strings will be interpreted as UTF-8 and ignore the setCharacterEncoding. I will add an option to allow setCharacterEncoding to be respected. cheers
Hide
Greg Wilkins added a comment -

Added setQueryEncoding to Request.

While this is difficult to access in a Servlet, it is not standard behaviour anyway... so a simple Handler or connector extension will be able to set this on all requests (or just some requests etc.)

Show
Greg Wilkins added a comment - Added setQueryEncoding to Request. While this is difficult to access in a Servlet, it is not standard behaviour anyway... so a simple Handler or connector extension will be able to set this on all requests (or just some requests etc.)
Hide
Christian d'Heureuse added a comment -

Why do you want to interpret query strings as UTF-8 by default?
Tomcat does not use UTF-8.

Example:
The character 'ä' URL-encoded in UTF-8 is: %C3%A4
Tomcat uses direct encoding: %E4

0xE4 is the Unicode code point for 'ä'.
0xC3 0xA4 is the UTF-8 encoding for the Unicode character 0xE4.

Tomcat does the following in ache.tomcat.util.http.Parameters.urlDecode():
...
for (int i = 0; i < bc.getLength(); i++) { cbuf[i] = (char) (bbuf[i + start] & 0xff); }

The byte value bbuf[i] is directly copied into the character value cbuf[i], without UTF-8 decoding.

Show
Christian d'Heureuse added a comment - Why do you want to interpret query strings as UTF-8 by default? Tomcat does not use UTF-8. Example: The character 'ä' URL-encoded in UTF-8 is: %C3%A4 Tomcat uses direct encoding: %E4 0xE4 is the Unicode code point for 'ä'. 0xC3 0xA4 is the UTF-8 encoding for the Unicode character 0xE4. Tomcat does the following in ache.tomcat.util.http.Parameters.urlDecode(): ... for (int i = 0; i < bc.getLength(); i++) { cbuf[i] = (char) (bbuf[i + start] & 0xff); } The byte value bbuf[i] is directly copied into the character value cbuf[i], without UTF-8 decoding.
Hide
Greg Wilkins added a comment -

Christian,

Because modern browsers are tending towards UTF-8 as the default encoding.

For example if you put http://jetty.mortbay.org/test/dump?a=ä
into firefox, it translates this for you to http://jetty.mortbay.org/test/dump?a=%C3%A4

Some more stupid clients may make the same utf-8 assumption, but not % encode it.

So it is a reasonable default to use utf-8 now.

Show
Greg Wilkins added a comment - Christian, Because modern browsers are tending towards UTF-8 as the default encoding. For example if you put http://jetty.mortbay.org/test/dump?a=ä into firefox, it translates this for you to http://jetty.mortbay.org/test/dump?a=%C3%A4 Some more stupid clients may make the same utf-8 assumption, but not % encode it. So it is a reasonable default to use utf-8 now.
Hide
Greg Wilkins added a comment -

interesting that JIRA correctly decides that ä is not part of the URL in the link from my last entry

Show
Greg Wilkins added a comment - interesting that JIRA correctly decides that ä is not part of the URL in the link from my last entry
Hide
Christian d'Heureuse added a comment -

Greg,

Thanks for your comments and for your work.

My Firefox versions 1.5.0.6 and 2.0b1 on Windows XP translate 'ä' to %E4 when I directly type in the above URL.
The same link (...dump?a=ä) with IE6 gives a "HTTP 400 - Bad Request".

But when I click on the link within Thunderbird, 'ä' is converted to %C3%A4 (UTF-8) in Firefox (1.5 and 2.0b1).
When I start Firefox from the command line with the URL as a command line argument, Firefox 1.5 gives %E4 and Firefox 2.0b1 gives %C3%A4.

I agree that UTF-8 is the best method for encoding query parameters in the URL. Only UTF-8 supports all Unicode characters.

When I add accept-charset="UTF-8" to the HTML FORM element, Firefox 1.5 encodes the GET query parameters as UTF-8 (even if the HTML page is ISO 8859-1 encoded). But IE6 ignores the accept-charset setting and uses the HTML page encoding.

I will probably convert the HTML pages of my application to UTF-8 to solve this problem. Google also uses UTF-8 for the HTML pages and for the query string URL, so all modern browsers on all platforms should support this.

I have filed an RFE at http://bugs.sun.com that suggests to add a setQueryCharacterEncoding() method to the ServletRequest class. I will post the link here, when the RFE is visible on bugs.sun.com.

Show
Christian d'Heureuse added a comment - Greg, Thanks for your comments and for your work. My Firefox versions 1.5.0.6 and 2.0b1 on Windows XP translate 'ä' to %E4 when I directly type in the above URL. The same link (...dump?a=ä) with IE6 gives a "HTTP 400 - Bad Request". But when I click on the link within Thunderbird, 'ä' is converted to %C3%A4 (UTF-8) in Firefox (1.5 and 2.0b1). When I start Firefox from the command line with the URL as a command line argument, Firefox 1.5 gives %E4 and Firefox 2.0b1 gives %C3%A4. I agree that UTF-8 is the best method for encoding query parameters in the URL. Only UTF-8 supports all Unicode characters. When I add accept-charset="UTF-8" to the HTML FORM element, Firefox 1.5 encodes the GET query parameters as UTF-8 (even if the HTML page is ISO 8859-1 encoded). But IE6 ignores the accept-charset setting and uses the HTML page encoding. I will probably convert the HTML pages of my application to UTF-8 to solve this problem. Google also uses UTF-8 for the HTML pages and for the query string URL, so all modern browsers on all platforms should support this. I have filed an RFE at http://bugs.sun.com that suggests to add a setQueryCharacterEncoding() method to the ServletRequest class. I will post the link here, when the RFE is visible on bugs.sun.com.
Hide
Artem Kozarezov added a comment -

Actually, when i last checked, Jetty implementation correctly used ISO-8859-1 (that is - "no encoding") to decipher the URL. That allows the application to later properly decode the URL on it's own.

In modern world, you can't choose one predefined query decoding, becouse different browsers with different options work different. IE6 have an option to use UTF-8, which in some versions is on, and in other versions is back off, and the user can always change it too. Correct way to handle encoding is to have some known parameter in the query and choose encoding fron that. And ISO-8859-1 is the safest way not to make things worse.

Here is an example how we tackle query decoding:
http://webcrm.ru/dev/JETTY-113/decoding.txt

Show
Artem Kozarezov added a comment - Actually, when i last checked, Jetty implementation correctly used ISO-8859-1 (that is - "no encoding") to decipher the URL. That allows the application to later properly decode the URL on it's own. In modern world, you can't choose one predefined query decoding, becouse different browsers with different options work different. IE6 have an option to use UTF-8, which in some versions is on, and in other versions is back off, and the user can always change it too. Correct way to handle encoding is to have some known parameter in the query and choose encoding fron that. And ISO-8859-1 is the safest way not to make things worse. Here is an example how we tackle query decoding: http://webcrm.ru/dev/JETTY-113/decoding.txt
Hide
Artem Kozarezov added a comment -

Jetty always had an option (system property) to specify default URL decoding charset (it was a jetty-util option). Does Jetty have it still? I would like to set charset decoding back to ISO-8859-1 globally, and without writing "a simple Handler or connector extension" for that!

Show
Artem Kozarezov added a comment - Jetty always had an option (system property) to specify default URL decoding charset (it was a jetty-util option). Does Jetty have it still? I would like to set charset decoding back to ISO-8859-1 globally, and without writing "a simple Handler or connector extension" for that!
Hide
Greg Wilkins added a comment -

Jetty 6 still has the org.mortbay.util.URI.charset system property that can be used to effect the global default.
the HttpServletRequest.setCharacterEncoding(String) method can be used to control the body encoding
and the Request.setQueryEncoding(String) method can be used to control the encoding of the URL query.

I think UTF8 is a reasonable default now, as that is what firefox uses and they have been much more
forward thinking thanIE. Does anybody know what IE7 uses as a default?

Show
Greg Wilkins added a comment - Jetty 6 still has the org.mortbay.util.URI.charset system property that can be used to effect the global default. the HttpServletRequest.setCharacterEncoding(String) method can be used to control the body encoding and the Request.setQueryEncoding(String) method can be used to control the encoding of the URL query. I think UTF8 is a reasonable default now, as that is what firefox uses and they have been much more forward thinking thanIE. Does anybody know what IE7 uses as a default?
Hide
Artem Kozarezov added a comment -

"Jetty 6 still has the org.mortbay.util.URI.charset system property that can be used to effect the global default."

  • Thanks.
Show
Artem Kozarezov added a comment - "Jetty 6 still has the org.mortbay.util.URI.charset system property that can be used to effect the global default."
  • Thanks.
Hide
Artem Kozarezov added a comment -

The javadoc comment in
org.mortbay.util.URIUtil says:
"ISO_8859_1 encoding is used by default for % encoded characters."
though implementation, a few strings later, is:
"public static final String _CHARSET=System.getProperty("org.mortbay.util.URI.charset",StringUtil._UTF8);"

You might want to fix the docu.

Show
Artem Kozarezov added a comment - The javadoc comment in org.mortbay.util.URIUtil says: "ISO_8859_1 encoding is used by default for % encoded characters." though implementation, a few strings later, is: "public static final String _CHARSET=System.getProperty("org.mortbay.util.URI.charset",StringUtil._UTF8);" You might want to fix the docu.
Hide
Greg Wilkins added a comment -

thanks: javadoc fixed.

Show
Greg Wilkins added a comment - thanks: javadoc fixed.
Hide
Christian d'Heureuse added a comment -

Greg,

I don't understand the above discussion.

In Request.java, the default value for _queryEncoding is null. Request.setQueryEncoding() is never called within the Jetty source tree, so _queryEncoding stays null.

In HttpURI.decodeQueryTo(), StringUtil.isUTF8(encoding) returns false because encoding is null. So the optimized decoding routine for UTF-8, UrlEncoded.decodeUtf8To(), is never called.

In UrlEncoded.decodeTo(String,MultiMap,String):, charset is set to UTF-8 if it is null.
So the default value for decoding % values within the URL query string is UTF-8 and the system property org.mortbay.util.URI.charset is not used.

Show
Christian d'Heureuse added a comment - Greg, I don't understand the above discussion. In Request.java, the default value for _queryEncoding is null. Request.setQueryEncoding() is never called within the Jetty source tree, so _queryEncoding stays null. In HttpURI.decodeQueryTo(), StringUtil.isUTF8(encoding) returns false because encoding is null. So the optimized decoding routine for UTF-8, UrlEncoded.decodeUtf8To(), is never called. In UrlEncoded.decodeTo(String,MultiMap,String):, charset is set to UTF-8 if it is null. So the default value for decoding % values within the URL query string is UTF-8 and the system property org.mortbay.util.URI.charset is not used.
Hide
Greg Wilkins added a comment -

Ooops some how this

if (encoding==null)
encoding=URIUtil.__CHARSET;

did not get checked in.

it is in svn now for 6.1-SNAPSHOT

Show
Greg Wilkins added a comment - Ooops some how this if (encoding==null) encoding=URIUtil.__CHARSET; did not get checked in. it is in svn now for 6.1-SNAPSHOT
Hide
Greg Wilkins added a comment -

I found this W3C recommendation that definitely points to UTF-8 as being the way forward!

http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars

Show
Greg Wilkins added a comment - I found this W3C recommendation that definitely points to UTF-8 as being the way forward! http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars
Hide
Christian d'Heureuse added a comment -

This contradicts http://www.w3.org/TR/html40/interact/forms.html#adef-accept-charset:

"The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element."

This is what the current browsers do for FORM GET parameters. They use the encoding of the HTML page to encode the query parameters in the URI. If the HTML page that contains the FORM element is UTF-8, the query parameters are encoded in UTF-8. But if the HTML page is in ISO-8859-1, the query parameters are also encoded in ISO 8859-1.

RFC 3986 (http://tools.ietf.org/html/rfc3986#page-16) states that new URI shemes should use UTF-8. I think there is currently no standard that explicitly defines the characterset encoding of %-encoded HTTP URI query parameters.

Show
Christian d'Heureuse added a comment - This contradicts http://www.w3.org/TR/html40/interact/forms.html#adef-accept-charset: "The default value for this attribute is the reserved string "UNKNOWN". User agents may interpret this value as the character encoding that was used to transmit the document containing this FORM element." This is what the current browsers do for FORM GET parameters. They use the encoding of the HTML page to encode the query parameters in the URI. If the HTML page that contains the FORM element is UTF-8, the query parameters are encoded in UTF-8. But if the HTML page is in ISO-8859-1, the query parameters are also encoded in ISO 8859-1. RFC 3986 (http://tools.ietf.org/html/rfc3986#page-16) states that new URI shemes should use UTF-8. I think there is currently no standard that explicitly defines the characterset encoding of %-encoded HTTP URI query parameters.
Hide
Christian d'Heureuse added a comment -

Correct link for the previous comment:
http://www.w3.org/TR/html40/interact/forms.html#adef-accept-charset
(The link was wrong because it included the ":" character at the end)

Show
Christian d'Heureuse added a comment - Correct link for the previous comment: http://www.w3.org/TR/html40/interact/forms.html#adef-accept-charset (The link was wrong because it included the ":" character at the end)
Hide
Greg Wilkins added a comment -

Christian,

I think the accept charset refers the content of the document and not the URI, so I believe these two sections are not in conflict.

Show
Greg Wilkins added a comment - Christian, I think the accept charset refers the content of the document and not the URI, so I believe these two sections are not in conflict.
Hide
Artem Kozarezov added a comment -

> I found this W3C recommendation that definitely points to UTF-8 as being the way forward!
> http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars

Please, note, that this is a recommendation for User-Agents. I would have made such recomendation myself. However, on the servlet container part, one must have the ability to cope with different URI charsets.

For example, current Internet Explorer and Firefox browsers will use the character set of the page when encoding "a href" URI, but when use their default character sets when this URI is entered by the user explicitly, instead of being followed from a page. Thus, the server can't really predict what the charset is, with modern browsers.

Show
Artem Kozarezov added a comment - > I found this W3C recommendation that definitely points to UTF-8 as being the way forward! > http://www.w3.org/TR/html40/appendix/notes.html#non-ascii-chars Please, note, that this is a recommendation for User-Agents. I would have made such recomendation myself. However, on the servlet container part, one must have the ability to cope with different URI charsets. For example, current Internet Explorer and Firefox browsers will use the character set of the page when encoding "a href" URI, but when use their default character sets when this URI is entered by the user explicitly, instead of being followed from a page. Thus, the server can't really predict what the charset is, with modern browsers.
Hide
Christian d'Heureuse added a comment -

Greg,

Here an example of how the "accept-charset" attribute of the HTML FORM element influences the encoding of the %-encoded characters in the URI of the FORM GET request:

<form action="http://www.google.com/search" accept-charset="UTF-8">
<input type="text" name="q" value="ä">
<input type="submit" value="Send as UTF-8">
</form>
<form action="http://www.google.com/search" accept-charset="ISO-8859-1">
<input type="text" name="q" value="ä">
<input type="submit" value="Send as ISO-8859-1">
</form>

This works with Firefox 1.5.0.7. on Windows XP.
The generated URIs are:

Show
Christian d'Heureuse added a comment - Greg, Here an example of how the "accept-charset" attribute of the HTML FORM element influences the encoding of the %-encoded characters in the URI of the FORM GET request: <form action="http://www.google.com/search" accept-charset="UTF-8"> <input type="text" name="q" value="ä"> <input type="submit" value="Send as UTF-8"> </form> <form action="http://www.google.com/search" accept-charset="ISO-8859-1"> <input type="text" name="q" value="ä"> <input type="submit" value="Send as ISO-8859-1"> </form> This works with Firefox 1.5.0.7. on Windows XP. The generated URIs are:
Hide
Christian d'Heureuse added a comment -

The generated URIs are:

FORM 1: http://www.google.com/search?q=%C3%A4
FORM 2: http://www.google.com/search?q=%E4

Google interprets both query values as an "ä", even though Firefox does not send the encoding type (UTF-8 or ISO-8859-1) in the header fields of the GET request. I assume that Google tries to guess the encoding.

IE6 ignores the "accept-charset" attribute.

Show
Christian d'Heureuse added a comment - The generated URIs are: FORM 1: http://www.google.com/search?q=%C3%A4 FORM 2: http://www.google.com/search?q=%E4 Google interprets both query values as an "ä", even though Firefox does not send the encoding type (UTF-8 or ISO-8859-1) in the header fields of the GET request. I assume that Google tries to guess the encoding. IE6 ignores the "accept-charset" attribute.
Hide
Christian d'Heureuse added a comment -

Interesting:

Show
Christian d'Heureuse added a comment - Interesting:
Hide
Christian d'Heureuse added a comment -

Interesting:

When I click on one of the two URLs above, Google displays an "ä" in the "search" field in both cases.

If I move the mouse pointer over the URLs above, Firefox displays "...?q=ä" on the status line in both cases. So Firefox also tries to automatically detect the charset encoding of the %-encoded characters.

There is currently no easy, standardized method to implement this behavious in a Java servlet.

Show
Christian d'Heureuse added a comment - Interesting: When I click on one of the two URLs above, Google displays an "ä" in the "search" field in both cases. If I move the mouse pointer over the URLs above, Firefox displays "...?q=ä" on the status line in both cases. So Firefox also tries to automatically detect the charset encoding of the %-encoded characters. There is currently no easy, standardized method to implement this behavious in a Java servlet.
Hide
Artem Kozarezov added a comment -

Jetty SVN revision 1042.
Calling HttpServletRequest.getQueryString()...

Stack trace:

Request.getQueryString() line: 1014
HttpURI.getQuery(String) line: 423
StringUtil.toString(byte[], int, int, String) line: 325

executed code:

if (charset==null || StringUtil.isUTF8(charset))
return toUTF8String(b,offset,length);

charset == Request._queryEncoding
and by default is null

FYI, the HttpServletRequest.getQueryString() is the only reliable way to detect the actual character set, example of which is given in http://webcrm.ru/dev/JETTY-113/decoding.txt; and it no longer works, with Jetty.

Show
Artem Kozarezov added a comment - Jetty SVN revision 1042. Calling HttpServletRequest.getQueryString()... Stack trace: Request.getQueryString() line: 1014 HttpURI.getQuery(String) line: 423 StringUtil.toString(byte[], int, int, String) line: 325 executed code: if (charset==null || StringUtil.isUTF8(charset)) return toUTF8String(b,offset,length); charset == Request._queryEncoding and by default is null FYI, the HttpServletRequest.getQueryString() is the only reliable way to detect the actual character set, example of which is given in http://webcrm.ru/dev/JETTY-113/decoding.txt; and it no longer works, with Jetty.
Hide
Artem Kozarezov added a comment -

"is the only reliable way to detect ..." - sorry, i've meant that the getQueryString() is the only reliable way to detect character set, if both the name and the value of the parameter are localized.

Show
Artem Kozarezov added a comment - "is the only reliable way to detect ..." - sorry, i've meant that the getQueryString() is the only reliable way to detect character set, if both the name and the value of the parameter are localized.
Hide
Artem Kozarezov added a comment -

Following is a fix.
Please, don't forget to apply it. It is an important bug.

Index: Request.java
===================================================================
— Request.java (revision 1067)
+++ Request.java (working copy)
@@ -1012,7 +1012,7 @@
public String getQueryString()

{ if (_queryString==null && _uri!=null) - _queryString=_uri.getQuery(_queryEncoding); + _queryString=_queryEncoding==null?_uri.getQuery():_uri.getQuery(_queryEncoding); return _queryString; }
Show
Artem Kozarezov added a comment - Following is a fix. Please, don't forget to apply it. It is an important bug. Index: Request.java =================================================================== — Request.java (revision 1067) +++ Request.java (working copy) @@ -1012,7 +1012,7 @@ public String getQueryString() { if (_queryString==null && _uri!=null) - _queryString=_uri.getQuery(_queryEncoding); + _queryString=_queryEncoding==null?_uri.getQuery():_uri.getQuery(_queryEncoding); return _queryString; }
Hide
Greg Wilkins added a comment -

Artem,

thanks for chasing this down.
I have applied a variation of the patch. Basically I handle a null encoding passed to
getQuery(encoding) the same as getQuery().

Note also, to make it easier to call setQueryEncoding, I have implemented
Request.setAttribute so that attribute "org.mortbay.jetty.Request.queryEncoding"
may be set instead. This avoids the need of a cast and should go through request wrappers.

Note also that you can now call setQueryEncoding multiple times and it will effect the
return of getQueryString, but once a getParameter method has been called, the parameters
will not change.

Show
Greg Wilkins added a comment - Artem, thanks for chasing this down. I have applied a variation of the patch. Basically I handle a null encoding passed to getQuery(encoding) the same as getQuery(). Note also, to make it easier to call setQueryEncoding, I have implemented Request.setAttribute so that attribute "org.mortbay.jetty.Request.queryEncoding" may be set instead. This avoids the need of a cast and should go through request wrappers. Note also that you can now call setQueryEncoding multiple times and it will effect the return of getQueryString, but once a getParameter method has been called, the parameters will not change.
Hide
Artem Kozarezov added a comment -

Thanks!

Show
Artem Kozarezov added a comment - Thanks!

People

Vote (0)
Watch (2)

Dates

  • Created:
    Updated:
    Resolved: