Details
Description
Hi
The default encoding of POST requests has changed several times, most recently back to UTF-8.
Every time it changes, it breaks things for some users. For high volume messaging applications, with submissions not in one's control, but coming from thousands of different users who don't use HTML forms at all, one can see how this is a serious problem. (In our case, we had to downgrade Jetty to resolve it).
One could debate which encoding actually should be used as the default, and I include a history of the changes below, as well as such an argument. However, my request for this ticket is only that it is made possible to control this default encoding via a variable, similarly to how you made it possible to specify org.mortbay.util.URI.charset for GET requests. Without making this configurable, it's always going to break for someone.
The history of this issue:
In JETTY-633 (Jetty-6.1.11), the default encoding for POST requests (where no charset is specified in the Content-Type) was changed to ISO-8859-1.
In JETTY-853, this was explained as being "in accordance with the HTTP spec: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1", which it does seem to be.
In JETTY-1098 (Jetty-6.1.21), it was commented that "I shall have to ask Greg why we changed from a default of UTF8 to ISO 88591".
Greg was asked here, and said "I don't think this was done on purpose...":
http://dev.eclipse.org/mhonarc/lists/jetty-dev/msg00264.html
Jan said here that the RFC "only stipulates ISO-8859-1 for Content-Type of "text", so not sure why I thought that was relevant":
http://dev.eclipse.org/mhonarc/lists/jetty-dev/msg00265.html
Although RFC 2616 does specify 'media subtypes of the "text" type', that doesn't decide the issue of what the charset for application/x-www-form-urlencoded should be. See also RFC 2616 section 3.4.1: Missing Charset (not entirely clear to me, but seems to imply ISO-8859-1 as the default).
The HTML 4.01 specification (published 6 months later, back in 1999), however, says:
'The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default character encoding when the "charset" parameter is absent from the "Content-Type" header field. In practice, this recommendation has proved useless because some servers don't allow a "charset" parameter to be sent, and others may not be configured to send the parameter. Therefore, user agents must not assume any default value for the "charset" parameter'.
( http://www.w3.org/TR/html4/charset.html )
Take that as you will
Anyway, what the default should be is less important than making it configurable. Could it please be made so?
Thanks
Paolo