Details
-
Type:
Bug
-
Status:
Resolved
-
Priority:
Major
-
Resolution: Fixed
-
Affects Version/s: JRuby 1.6.7
-
Fix Version/s: JRuby 1.7.0.pre1
-
Component/s: None
-
Labels:None
-
Testcase included:yes
-
Number of attachments :
Description
In Moped (my MongoDB driver), I have some code attempting to safely convert input text into UTF-8 (which all strings must be).
See for a simplified test-case of the implementation: https://gist.github.com/2351047
Problem: that calling encode('utf-8') on a binary string creates garbage utf-8 text instead of raising a conversion error like 1.9.
Reason the code exists: We're tracking referrers in our app, but the header value in the rack environment is encoded as ASCII-8bit; this code then follows the happy-path of calling `encode('utf-8')` and finally tries forcing the encoding when that raises an error.
Tested on: jruby-1.6.7 and jruby-1.7.0-dev
Activity
Charles Oliver Nutter
made changes -
| Field | Original Value | New Value |
|---|---|---|
| Resolution | Fixed [ 1 ] | |
| Fix Version/s | JRuby 1.7 [ 17049 ] | |
| Assignee | Thomas E Enebo [ enebo ] | Charles Oliver Nutter [ headius ] |
| Status | Open [ 1 ] | Resolved [ 5 ] |
As a work-around, replacing:
begin data = string.encode('utf-8') rescue EncodingError data = string.dup data.force_encoding 'utf-8' raise unless data.valid_encoding? endwith:
seems to accomplish the same results; though unfortunately this requires checking the encoding for all strings, while the other way allows strings that are already or compatible with utf-8 to be processed without the encoding check.