Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: 6.1.3
-
Fix Version/s: 6.1.27
-
Component/s: None
-
Labels:None
-
Environment:Linux with Solr
-
Testcase included:yes
-
Number of attachments :
Description
Jetty does not parse URIs correctly. It interprets them as CESU-8 instead of UTF-8. This is an issue when
using Unicode supplementary characters beyond \uFFFF. What's happening is that it accepts these supplementary characters as 6 bytes (3 bytes per surrogate code unit). To be using UTF_8, it needs to accept the 4 byte form.
Here is some example code that replicates the issue.
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/exampledocs/test_utf8.sh
Here is an example report of the issue:
http://www.mail-archive.com/solr-user@lucene.apache.org/msg43399.html
Parts of Solr has worked around this issue, but it would be better if Jetty conformed to the UTF-8 spec. Lots of web browsers and software out there do support CESU-8.
Issue Links
- relates to
-
JETTY-1328
org.mortbay.jetty.HttpConnection.printWriter("UTF-8") returns a broken UTF-8 writer wrt. Strings containing surrogate characters
-
Activity
| Field | Original Value | New Value |
|---|---|---|
| Assignee | Greg Wilkins [ gregw ] |
| Link |
This issue relates to |
| Status | Open [ 1 ] | Resolved [ 5 ] |
| Fix Version/s | 6.1.27 [ 17238 ] | |
| Resolution | Fixed [ 1 ] |