Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 6.1.14
-
Fix Version/s: 6.1.27
-
Component/s: None
-
Labels:None
-
Number of attachments :
Description
I've attached Surrogates.java which demonstrates the problem.
pw = HttpConnection.printWriter("UTF-8");
String a = new String("<?xml version=\"1.0\" ?><root>\uDBB8\uDF30</root>");
pw.write(alpha);
Both the low and high surrogate characters are encoded as utf-8. They need to be combined per utf-8 spec.
A: Surrogates are code points from two special ranges of Unicode values, reserved for use as the leading, and trailing values of paired code units in UTF-16. Leading, also called high, surrogates are from D80016 to DBFF16, and trailing, or low, surrogates are from DC0016 to DFFF16. They are called surrogates, since they do not represent characters directly, but only as a pair.
I've also attached a AbstractGenerator.java with a working fix. But you guys might want to refactor. I didn't see a fix in subsequent versions of Jetty.
Issue Links
- is duplicated by
-
JETTY-1340
Utf8StringBuffer incorrectly handles characters outside of the basic multilingual plane
-
- is related to
-
JETTY-1324
Jetty does not adhere to UTF-8 spec and uses CESU-8 for supplementary characters
-
I think these may be related?
We have fixed surrogate handling in jetty-7 - see https://bugs.eclipse.org/bugs/show_bug.cgi?id=333481
If you can confirm that is the fix, then we can look at backporting to jetty-6