Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 7.0.2
    • Fix Version/s: None
    • Component/s: HTTP
    • Labels:
      None
    • Number of attachments :
      0

      Description

      I have an encoding issue with url query string.

      I just post the link "http://localhost/search?q=中文" in ie6, and the servlet throws exception:
      Caused by: java.lang.IllegalStateException: !utf8
      at org.eclipse.jetty.util.Utf8StringBuilder.toString(Utf8StringBuilder.java:154)
      at org.eclipse.jetty.util.UrlEncoded.decodeUtf8To(UrlEncoded.java:308)
      at org.eclipse.jetty.http.HttpURI.decodeQueryTo(HttpURI.java:561)
      at org.eclipse.jetty.server.Request.extractParameters(Request.java:204)
      at org.eclipse.jetty.server.Request.getParameter(Request.java:648)
      at org.apache.tapestry5.internal.services.RequestImpl.getParameter(RequestImpl.java:72)

      I'm sure request.setCharacterEncoding() has been called before getting the parameters.

      I also see many comments from jetty developers with this issue, but no one really fixed it, you cant stop the terminal clients to request with non-utf8 characters, please do something.

        Issue Links

          Activity

          Hide
          Jan Bartel added a comment -

          Hi,

          How are you sure that request.setCharacterEncoding() has been called before any of the content is retrieved?

          I would recommend you try setting the System property org.eclipse.jetty.util.UrlEncoding.charset to whatever charset you want the url to be interpreted as.

          If that doesn't work, then it would be helpful if you could try a newer version of jetty, say jetty-7.1.4 and report back.

          thanks
          Jan

          Show
          Jan Bartel added a comment - Hi, How are you sure that request.setCharacterEncoding() has been called before any of the content is retrieved? I would recommend you try setting the System property org.eclipse.jetty.util.UrlEncoding.charset to whatever charset you want the url to be interpreted as. If that doesn't work, then it would be helpful if you could try a newer version of jetty, say jetty-7.1.4 and report back. thanks Jan
          Hide
          Greg Wilkins added a comment -

          I believe this was fixed by JETTY-1151.
          with the currently release, I can run the test webapp and hit http://localhost:8080/dump/info?q=%E4%B8%AD%E6%96%87 without a problem.

          Show
          Greg Wilkins added a comment - I believe this was fixed by JETTY-1151 . with the currently release, I can run the test webapp and hit http://localhost:8080/dump/info?q=%E4%B8%AD%E6%96%87 without a problem.
          Hide
          Greg Wilkins added a comment -

          already fixed

          Show
          Greg Wilkins added a comment - already fixed
          Hide
          donf.yang added a comment - - edited

          surely, you can run the test webapp and hit http://localhost:8080/dump/info?q=%E4%B8%AD%E6%96%87 without a problem.

          but you can't run htt p://localhost:8080/dump/info?q=测试 without error!!!
          you can copy the url string in your ie6, try it!
          because firefox encode the url to ascii automatically but ie6 wont!
          Chinese chars in byte is negative, so in Utf8StringBuilder.append(byte b), if the url is not encoded to ascii, the following codes will always been touch:

          (line 114 in Utf8StringBuilder.java, 7.0.2.v20100331 , and won't be fixed in the lasted version)
          if ((b&0xc0)==0xc0) // <-- here b < 0 , and this condition always true

          { // 11?????? _buffer.append('?'); _more=0; _bits=0; throw new IllegalArgumentException("!utf8"); }

          that's why i said you can't stop client submits non-utf8 requests

          Show
          donf.yang added a comment - - edited surely, you can run the test webapp and hit http://localhost:8080/dump/info?q=%E4%B8%AD%E6%96%87 without a problem. but you can't run htt p://localhost:8080/dump/info?q=测试 without error!!! you can copy the url string in your ie6, try it! because firefox encode the url to ascii automatically but ie6 wont! Chinese chars in byte is negative, so in Utf8StringBuilder.append(byte b), if the url is not encoded to ascii, the following codes will always been touch: (line 114 in Utf8StringBuilder.java, 7.0.2.v20100331 , and won't be fixed in the lasted version) if ((b&0xc0)==0xc0) // <-- here b < 0 , and this condition always true { // 11?????? _buffer.append('?'); _more=0; _bits=0; throw new IllegalArgumentException("!utf8"); } that's why i said you can't stop client submits non-utf8 requests
          Hide
          donf.yang added a comment - - edited

          I have tried 7.1.4.v20100610, same problem!
          Here is my servlet

          protected void service(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException

          { request.setAttribute("org.eclipse.jetty.server.Request.queryEncoding", "UTF-8"); System.setProperty("org.eclipse.jetty.util.UrlEncoding.charset", "UTF-8"); request.setCharacterEncoding("UTF-8"); System.out.println(request.getParameter("q")); }

          Exception:

          java.lang.IllegalArgumentException: !utf8
          at org.eclipse.jetty.util.Utf8StringBuilder.append(Utf8StringBuilder.java:119)
          at org.eclipse.jetty.util.Utf8StringBuilder.append(Utf8StringBuilder.java:49)
          at org.eclipse.jetty.http.HttpURI.toUtf8String(HttpURI.java:488)
          at org.eclipse.jetty.http.HttpURI.decodeQueryTo(HttpURI.java:647)
          at org.eclipse.jetty.server.Request.extractParameters(Request.java:210)
          at org.eclipse.jetty.server.Request.getParameter(Request.java:649)
          at com.weiqipu.dapu.servlet.JSApi.service(JSApi.java:46)
          at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

          and i also have tried encoding "GBK"!

          Show
          donf.yang added a comment - - edited I have tried 7.1.4.v20100610, same problem! Here is my servlet protected void service(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { request.setAttribute("org.eclipse.jetty.server.Request.queryEncoding", "UTF-8"); System.setProperty("org.eclipse.jetty.util.UrlEncoding.charset", "UTF-8"); request.setCharacterEncoding("UTF-8"); System.out.println(request.getParameter("q")); } Exception: java.lang.IllegalArgumentException: !utf8 at org.eclipse.jetty.util.Utf8StringBuilder.append(Utf8StringBuilder.java:119) at org.eclipse.jetty.util.Utf8StringBuilder.append(Utf8StringBuilder.java:49) at org.eclipse.jetty.http.HttpURI.toUtf8String(HttpURI.java:488) at org.eclipse.jetty.http.HttpURI.decodeQueryTo(HttpURI.java:647) at org.eclipse.jetty.server.Request.extractParameters(Request.java:210) at org.eclipse.jetty.server.Request.getParameter(Request.java:649) at com.weiqipu.dapu.servlet.JSApi.service(JSApi.java:46) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) and i also have tried encoding "GBK"!
          Hide
          Greg Wilkins added a comment -

          I'm a bit confused... is your browser sending UTF-8 or not?
          I just did a telnet to 8080 and typed in a HTTP GET directly and used the UTF-8 characters (not % encoded):

          [332] telnet localhost 8080
          Trying 127.0.0.1...
          Connected to localhost.
          Escape character is '^]'.
          GET /dump/info?q=测试 HTTP/1.1
          Host: localhost:8080

          the response came back fine with

          <th align="right">getQueryString: </th><td>q=测试</td></tr><tr>

          and

          <th align="left" colspan="2"><big><br/>Request Parameters:</big></th></tr><tr>
          <th align="right">q: </th><td>测试</td></tr><tr>

          I expect your browser is in some non-utf-8 mode and is not sending utf-8 characters (% encoded or otherwise).

          Can you do a tcpdump or wireshark capture so I can see the exact bytes being sent over the wire?

          regards

          Show
          Greg Wilkins added a comment - I'm a bit confused... is your browser sending UTF-8 or not? I just did a telnet to 8080 and typed in a HTTP GET directly and used the UTF-8 characters (not % encoded): [332] telnet localhost 8080 Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. GET /dump/info?q=测试 HTTP/1.1 Host: localhost:8080 the response came back fine with <th align="right">getQueryString: </th><td>q=测试</td></tr><tr> and <th align="left" colspan="2"><big><br/>Request Parameters:</big></th></tr><tr> <th align="right">q: </th><td>测试</td></tr><tr> I expect your browser is in some non-utf-8 mode and is not sending utf-8 characters (% encoded or otherwise). Can you do a tcpdump or wireshark capture so I can see the exact bytes being sent over the wire? regards
          Hide
          Greg Wilkins added a comment -

          I tried IE-6. It doesn't display the characters at all - just boxes and ?.
          But server does not throw an exception.

          Show
          Greg Wilkins added a comment - I tried IE-6. It doesn't display the characters at all - just boxes and ?. But server does not throw an exception.
          Hide
          donf.yang added a comment - - edited

          here is the data sent by ie6

          GET http://www.weiqipu.com/search?q=\300\356\312\300\312\257 HTTP/1.1\r\n

          0000 47 45 54 20 68 74 74 70 3a 2f 2f 77 77 77 2e 77 GET http://www.w
          0010 65 69 71 69 70 75 2e 63 6f 6d 2f 73 65 61 72 63 eiqipu.com/searc
          0020 68 3f 71 3d 5c 33 30 30 5c 33 35 36 5c 33 31 32 h?q=\300\356\312
          0030 5c 33 30 30 5c 33 31 32 5c 32 35 37 20 48 54 54 \300\312\257 HTT
          0040 50 2f 31 2e 31 0d 0a
          P/1.1..

          Show
          donf.yang added a comment - - edited here is the data sent by ie6 GET http://www.weiqipu.com/search?q=\300\356\312\300\312\257 HTTP/1.1\r\n 0000 47 45 54 20 68 74 74 70 3a 2f 2f 77 77 77 2e 77 GET http://www.w 0010 65 69 71 69 70 75 2e 63 6f 6d 2f 73 65 61 72 63 eiqipu.com/searc 0020 68 3f 71 3d 5c 33 30 30 5c 33 35 36 5c 33 31 32 h?q=\300\356\312 0030 5c 33 30 30 5c 33 31 32 5c 32 35 37 20 48 54 54 \300\312\257 HTT 0040 50 2f 31 2e 31 0d 0a P/1.1..
          Hide
          Greg Wilkins added a comment -

          This is really strange encoding. Firstly it is in octal with \ encoding, when URLs are meant either to be in UTF-8 or % encoded hex. Then the numbers themselves are not in hex.

          Can you try a different IE6 installation.

          Show
          Greg Wilkins added a comment - This is really strange encoding. Firstly it is in octal with \ encoding, when URLs are meant either to be in UTF-8 or % encoded hex. Then the numbers themselves are not in hex. Can you try a different IE6 installation.
          Hide
          donf.yang added a comment -

          Ok, I find the reason how this happened:

          assume i'm requesting "/search?q=测试", and set encoding to "utf-8".

          the invoke stack of jvm should be:

          org.eclipse.jetty.util.Utf8StringBuilder.append(Utf8StringBuilder.java:119)
          org.eclipse.jetty.util.Utf8StringBuilder.append(Utf8StringBuilder.java:49)
          org.eclipse.jetty.http.HttpURI.toUtf8String(HttpURI.java:414)
          org.eclipse.jetty.http.HttpURI.decodeQueryTo(HttpURI.java:573)

          when i debug in toUtf8String, i found the value of _raw is:
          [71, 69, 84, 32, 47, 115, 101, 97, 114, 99, 104, 63, 113, 61, -78, -30, -54, -44, 32, 72, 84, 84, 80, 47 .... more

          as i said, bytes of Chinese is negative, so i make a simple test:

          String str = "测试";
          byte[] bs = str.getBytes("GBK");
          for(byte b : bs)

          { System.out.print(b+" "); }

          the output is:
          -78 -30 -54 -44

          in Utf8StringBuilder.append(Utf8StringBuilder.java:119),
          ((b&0xc0)==0xc0), only -78 return false, so there always throw IllegalArgumentException whatever the Encoding or Character is.

          i'm not very experienced on http protocol, not sure why (b&0xc0)==0xc0 throw new IllegalArgumentException......

          Show
          donf.yang added a comment - Ok, I find the reason how this happened: assume i'm requesting "/search?q=测试", and set encoding to "utf-8". the invoke stack of jvm should be: org.eclipse.jetty.util.Utf8StringBuilder.append(Utf8StringBuilder.java:119) org.eclipse.jetty.util.Utf8StringBuilder.append(Utf8StringBuilder.java:49) org.eclipse.jetty.http.HttpURI.toUtf8String(HttpURI.java:414) org.eclipse.jetty.http.HttpURI.decodeQueryTo(HttpURI.java:573) when i debug in toUtf8String, i found the value of _raw is: [71, 69, 84, 32, 47, 115, 101, 97, 114, 99, 104, 63, 113, 61, -78, -30, -54, -44, 32, 72, 84, 84, 80, 47 .... more as i said, bytes of Chinese is negative, so i make a simple test: String str = "测试"; byte[] bs = str.getBytes("GBK"); for(byte b : bs) { System.out.print(b+" "); } the output is: -78 -30 -54 -44 in Utf8StringBuilder.append(Utf8StringBuilder.java:119), ((b&0xc0)==0xc0), only -78 return false, so there always throw IllegalArgumentException whatever the Encoding or Character is. i'm not very experienced on http protocol, not sure why (b&0xc0)==0xc0 throw new IllegalArgumentException......
          Hide
          donf.yang added a comment -

          Hi Wilkins, I found the problem, could you take a look, can this be fixed?

          Show
          donf.yang added a comment - Hi Wilkins, I found the problem, could you take a look, can this be fixed?
          Hide
          donf.yang added a comment -

          As I said, this issue never been fixed!!!
          How disappointed.

          Show
          donf.yang added a comment - As I said, this issue never been fixed!!! How disappointed.
          Hide
          Greg Wilkins added a comment -

          donf.yang, we are dealing with this issue as fast as we can, so please DO NOT COMPLAIN about the support that we are giving you for free. If you want faster support you are welcome to take out a support contract.

          I still do not see what there is to fix here. As your test shows, the raw bytes being sent by your browser are in GBK encoding, not in UTF-8. So if you set the encoding on the server to UTF-8, the it will have an error because GBK is not legal UTF-8.

          You need to try to make your browser send UTF-8. Sometimes forcing the content type of the page served to the browser prior to this request can help... however, many browsers will just erroneously send there own encoding.

          I don't understand what you want us to fix? We cannot make our UTF-8 handling accept GBK.

          Show
          Greg Wilkins added a comment - donf.yang, we are dealing with this issue as fast as we can, so please DO NOT COMPLAIN about the support that we are giving you for free. If you want faster support you are welcome to take out a support contract. I still do not see what there is to fix here. As your test shows, the raw bytes being sent by your browser are in GBK encoding, not in UTF-8. So if you set the encoding on the server to UTF-8, the it will have an error because GBK is not legal UTF-8. You need to try to make your browser send UTF-8. Sometimes forcing the content type of the page served to the browser prior to this request can help... however, many browsers will just erroneously send there own encoding. I don't understand what you want us to fix? We cannot make our UTF-8 handling accept GBK.

            People

            • Assignee:
              Jan Bartel
              Reporter:
              donf.yang
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: