Jetty
  1. Jetty
  2. JETTY-1153

Default charset/encoding of HTTP POST requests

    Details

    • Type: Improvement Improvement
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 6.1.21
    • Fix Version/s: 6.1.27, 7.4.4
    • Component/s: None
    • Labels:
      None
    • Number of attachments :
      1

      Description

      Hi

      The default encoding of POST requests has changed several times, most recently back to UTF-8.

      Every time it changes, it breaks things for some users. For high volume messaging applications, with submissions not in one's control, but coming from thousands of different users who don't use HTML forms at all, one can see how this is a serious problem. (In our case, we had to downgrade Jetty to resolve it).

      One could debate which encoding actually should be used as the default, and I include a history of the changes below, as well as such an argument. However, my request for this ticket is only that it is made possible to control this default encoding via a variable, similarly to how you made it possible to specify org.mortbay.util.URI.charset for GET requests. Without making this configurable, it's always going to break for someone.

      The history of this issue:
      In JETTY-633 (Jetty-6.1.11), the default encoding for POST requests (where no charset is specified in the Content-Type) was changed to ISO-8859-1.

      In JETTY-853, this was explained as being "in accordance with the HTTP spec: http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1", which it does seem to be.

      In JETTY-1098 (Jetty-6.1.21), it was commented that "I shall have to ask Greg why we changed from a default of UTF8 to ISO 88591".

      Greg was asked here, and said "I don't think this was done on purpose...":
      http://dev.eclipse.org/mhonarc/lists/jetty-dev/msg00264.html

      Jan said here that the RFC "only stipulates ISO-8859-1 for Content-Type of "text", so not sure why I thought that was relevant":
      http://dev.eclipse.org/mhonarc/lists/jetty-dev/msg00265.html

      Although RFC 2616 does specify 'media subtypes of the "text" type', that doesn't decide the issue of what the charset for application/x-www-form-urlencoded should be. See also RFC 2616 section 3.4.1: Missing Charset (not entirely clear to me, but seems to imply ISO-8859-1 as the default).

      The HTML 4.01 specification (published 6 months later, back in 1999), however, says:
      'The HTTP protocol ([RFC2616], section 3.7.1) mentions ISO-8859-1 as a default character encoding when the "charset" parameter is absent from the "Content-Type" header field. In practice, this recommendation has proved useless because some servers don't allow a "charset" parameter to be sent, and others may not be configured to send the parameter. Therefore, user agents must not assume any default value for the "charset" parameter'.

      ( http://www.w3.org/TR/html4/charset.html )

      Take that as you will

      Anyway, what the default should be is less important than making it configurable. Could it please be made so?

      Thanks

      Paolo

        Activity

        Hide
        Greg Wilkins added a comment -

        as much as I dislike system properties,
        I have added org.mortbay.util.UrlEncoded.charset
        and org.eclipse.jetty.util.UrlEncoded.charset
        to allow the default to be set.

        Show
        Greg Wilkins added a comment - as much as I dislike system properties, I have added org.mortbay.util.UrlEncoded.charset and org.eclipse.jetty.util.UrlEncoded.charset to allow the default to be set.
        Hide
        Paolo Campanella added a comment -

        Thanks Greg, much appreciated!

        Show
        Paolo Campanella added a comment - Thanks Greg, much appreciated!
        Hide
        Paolo Campanella added a comment -

        JUnit tests, including a test for JETTY-1153. Requires Apache Commons HTTP client.

        Show
        Paolo Campanella added a comment - JUnit tests, including a test for JETTY-1153 . Requires Apache Commons HTTP client.
        Hide
        Paolo Campanella added a comment -

        Hi

        Sorry that it's taken so long to report, but this issue appears to not have been fixed by the patches.

        I've attached a file with two JUnit tests, one of which shows up the POST problem. (Note: the source file is encoded in ISO-8859-1, and will not work if you save it as UTF-8).

        If run against Jetty 6.1.18, both tests succeed. If run against Jetty 6.1.26 (or an intervening version), testPOST_UTF_8 fails.

        I have not run any tests against Jetty 7.x.

        Regards

        Paolo

        Show
        Paolo Campanella added a comment - Hi Sorry that it's taken so long to report, but this issue appears to not have been fixed by the patches. I've attached a file with two JUnit tests, one of which shows up the POST problem. (Note: the source file is encoded in ISO-8859-1, and will not work if you save it as UTF-8). If run against Jetty 6.1.18, both tests succeed. If run against Jetty 6.1.26 (or an intervening version), testPOST_UTF_8 fails. I have not run any tests against Jetty 7.x. Regards Paolo
        Hide
        Sergi Pérez Contreras added a comment -

        In jetty 6.1.26 the method org.mortbay.util.UrlEncoded.decodeTo(InputStream in, MultiMap map, String charset, int maxLength)
        still uses UTF8 as the default charset, ignoring the org.mortbay.util.UrlEncoding.charset system property.

        Show
        Sergi Pérez Contreras added a comment - In jetty 6.1.26 the method org.mortbay.util.UrlEncoded.decodeTo(InputStream in, MultiMap map, String charset, int maxLength) still uses UTF8 as the default charset, ignoring the org.mortbay.util.UrlEncoding.charset system property.
        Hide
        Paolo Campanella added a comment -

        Improved unit test, which doesn't depend on the encoding of the source file, should be more future-proof, and requires org.apache.http.client 4.1.

        Show
        Paolo Campanella added a comment - Improved unit test, which doesn't depend on the encoding of the source file, should be more future-proof, and requires org.apache.http.client 4.1.
        Hide
        Jan Bartel added a comment -

        Looks like there was 1 case where we didn't check the SystemProperty. Fixed and a test case added for that.

        Fixed for jetty-7 and jetty-6.

        Show
        Jan Bartel added a comment - Looks like there was 1 case where we didn't check the SystemProperty. Fixed and a test case added for that. Fixed for jetty-7 and jetty-6.

          People

          • Assignee:
            Jan Bartel
            Reporter:
            Paolo Campanella
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: