Jetty
  1. Jetty
  2. JETTY-1328

org.mortbay.jetty.HttpConnection.printWriter("UTF-8") returns a broken UTF-8 writer wrt. Strings containing surrogate characters

    Details

    • Type: Bug Bug
    • Status: Resolved Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 6.1.14
    • Fix Version/s: 6.1.27
    • Component/s: None
    • Labels:
      None
    • Number of attachments :
      2

      Description

      I've attached Surrogates.java which demonstrates the problem.

      pw = HttpConnection.printWriter("UTF-8");
      String a = new String("<?xml version=\"1.0\" ?><root>\uDBB8\uDF30</root>");
      pw.write(alpha);

      Both the low and high surrogate characters are encoded as utf-8. They need to be combined per utf-8 spec.
      A: Surrogates are code points from two special ranges of Unicode values, reserved for use as the leading, and trailing values of paired code units in UTF-16. Leading, also called high, surrogates are from D80016 to DBFF16, and trailing, or low, surrogates are from DC0016 to DFFF16. They are called surrogates, since they do not represent characters directly, but only as a pair.

      I've also attached a AbstractGenerator.java with a working fix. But you guys might want to refactor. I didn't see a fix in subsequent versions of Jetty.

      1. AbstractGenerator.java
        31 kB
        Alex Makarewycz
      2. Surrogates.java
        3 kB
        Alex Makarewycz

        Issue Links

          Activity

          Greg Wilkins made changes -
          Field Original Value New Value
          Link This issue is related to JETTY-1324 [ JETTY-1324 ]
          Hide
          Greg Wilkins added a comment -

          I think these may be related?

          We have fixed surrogate handling in jetty-7 - see https://bugs.eclipse.org/bugs/show_bug.cgi?id=333481

          If you can confirm that is the fix, then we can look at backporting to jetty-6

          Show
          Greg Wilkins added a comment - I think these may be related? We have fixed surrogate handling in jetty-7 - see https://bugs.eclipse.org/bugs/show_bug.cgi?id=333481 If you can confirm that is the fix, then we can look at backporting to jetty-6
          Hide
          Greg Wilkins added a comment -

          I just ran your surrogate class in the latest Jetty -7 in trunk and it produces the correct output of f3 be 8c b0.

          So I think it is mostly fixed in 7.3.0 and will be totally fixed in 7.3.1

          We will back port to jetty-6 at the next release, but that is not yet scheduled.

          Show
          Greg Wilkins added a comment - I just ran your surrogate class in the latest Jetty -7 in trunk and it produces the correct output of f3 be 8c b0. So I think it is mostly fixed in 7.3.0 and will be totally fixed in 7.3.1 We will back port to jetty-6 at the next release, but that is not yet scheduled.
          Greg Wilkins made changes -
          Assignee Greg Wilkins [ gregw ]
          Greg Wilkins made changes -
          Link This issue is duplicated by JETTY-1340 [ JETTY-1340 ]
          Hide
          Greg Wilkins added a comment -

          backported jetty-7 utf-8 handling

          Show
          Greg Wilkins added a comment - backported jetty-7 utf-8 handling
          Greg Wilkins made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 6.1.27 [ 17238 ]
          Resolution Fixed [ 1 ]

            People

            • Assignee:
              Greg Wilkins
              Reporter:
              Alex Makarewycz
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: