DisplayTag

Excel and Text exports use Windows Latin-1 encoding

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Minor Minor
  • Resolution: Fixed
  • Affects Version/s: 1.0 RC2
  • Fix Version/s: 1.1
  • Component/s: Export
  • Labels:
    None
  • Application server:
    JBoss 3.2.5

Description

Data coming out of my UTF-8 format Postgres database appears correctly in displaytag tables on webpages, but when I request an export in Excel or text format, all non ASCII characters are garbled. I am using the Mac OS X Platform.

After some exploration, I determined that this is due to the file encoding being set to Windows Latin-1. Excel documents which were created on Windows and that I open on my Mac do not have garbled non-ASCII text, so I suspect that all that is missing in the exported file is some way to indicate to Excel what encoding has been used. Some of my recent web searches suggest that at least the more recent versions of Office store characters in little-endian UCS2 encoding.

For text files, it seems to me that UTF-8 would be the the most platform and language agnostic encoding to use.

Activity

Hide
fabrizio giustina added a comment -
testcase added: in rc2 encoding is always preserved during export.

If you set utf-8 encoding in page with
<jsp:directive.page contentType="text/html; charset=UTF8" />
or
response.setContentType("text/html; charset=utf-8");
displaytag correctly preserve encoding and csv/excel output is written using utf8.

Are you sure you are setting the output encoding correctly? Are you facing this problem with the displaytag filter enabled or not? Which version of displaytag are you using?

Please reopen the bug if you are trying with the latest snapshot and you are sure everything has been done correctly (i.e. charset is set to utf8 when not exporting).

You can look in displaytag test code for DISPL-107.jsp and Displ107.java for an example of a working utf8 export.
Show
fabrizio giustina added a comment - testcase added: in rc2 encoding is always preserved during export. If you set utf-8 encoding in page with <jsp:directive.page contentType="text/html; charset=UTF8" /> or response.setContentType("text/html; charset=utf-8"); displaytag correctly preserve encoding and csv/excel output is written using utf8. Are you sure you are setting the output encoding correctly? Are you facing this problem with the displaytag filter enabled or not? Which version of displaytag are you using? Please reopen the bug if you are trying with the latest snapshot and you are sure everything has been done correctly (i.e. charset is set to utf8 when not exporting). You can look in displaytag test code for DISPL-107.jsp and Displ107.java for an example of a working utf8 export.
Hide
J. Patterson Waltz III added a comment -
In my Tiles master layout, I have the following directive, which is the non-XML equivalent to what you suggested:

<%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8" %>

When I look at Firefox's Page Info window, it tells me the page is UTF-8 encoded. Same thing with curl: jpw$ curl -i 'http://cerenit.local:8080/SAGE/client.do?retrieve=&d-16544-p=2&status=active&#39;
HTTP/1.1 200 OK
Set-Cookie: JSESSIONID=42665940856D4110CCB07BD62FD7F523; Path=/SAGE
Content-Type: text/html;charset=UTF-8
Transfer-Encoding: chunked
Date: Tue, 16 Nov 2004 02:04:45 GMT
Server: Apache-Coyote/1.1

But when I export a CSV, it's Windows Latin-1 encoded... :-(

All the above is on my OS X box running JBoss 3.2.5.

I have the same app running on a Linux box and it sends back pages which have both ISO-8859 *and* UTF-8 listed in the headers:

$ curl -I 'http://xxxxx.dyndns.org:8080/SAGE/client.do?retrieve=&d-16544-p=2&status=active&#39;
HTTP/1.1 200 OK
Set-Cookie: JSESSIONID=27E58087A1A139543FD72477379D826F; Path=/SAGE
Content-Type: text/html; charset=ISO-8859-1;charset=UTF-8
Content-Length: 0
Date: Tue, 16 Nov 2004 02:12:39 GMT
Server: Apache-Coyote/1.1

Exporting from that machine in CSV also gives Windows Latin-1 encoding, perhaps less surprisingly, in light of the weird double encoding header.

Che palle !
Show
J. Patterson Waltz III added a comment - In my Tiles master layout, I have the following directive, which is the non-XML equivalent to what you suggested: <%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8" %> When I look at Firefox's Page Info window, it tells me the page is UTF-8 encoded. Same thing with curl: jpw$ curl -i 'http://cerenit.local:8080/SAGE/client.do?retrieve=&d-16544-p=2&status=active&#39; HTTP/1.1 200 OK Set-Cookie: JSESSIONID=42665940856D4110CCB07BD62FD7F523; Path=/SAGE Content-Type: text/html;charset=UTF-8 Transfer-Encoding: chunked Date: Tue, 16 Nov 2004 02:04:45 GMT Server: Apache-Coyote/1.1 But when I export a CSV, it's Windows Latin-1 encoded... :-( All the above is on my OS X box running JBoss 3.2.5. I have the same app running on a Linux box and it sends back pages which have both ISO-8859 *and* UTF-8 listed in the headers: $ curl -I 'http://xxxxx.dyndns.org:8080/SAGE/client.do?retrieve=&d-16544-p=2&status=active&#39; HTTP/1.1 200 OK Set-Cookie: JSESSIONID=27E58087A1A139543FD72477379D826F; Path=/SAGE Content-Type: text/html; charset=ISO-8859-1;charset=UTF-8 Content-Length: 0 Date: Tue, 16 Nov 2004 02:12:39 GMT Server: Apache-Coyote/1.1 Exporting from that machine in CSV also gives Windows Latin-1 encoding, perhaps less surprisingly, in light of the weird double encoding header. Che palle !
Hide
fabrizio giustina added a comment -
Sorry, but I am still unable to reproduce it, I will need addictional details:
- which version of displaytag are you using, is it the latest rc2 snapshot or a previous release?
- do you use the export filter?

And a couple of addictional tests:
- try putting the encoding declaration directly in the page containing the display:table tag (not only in the master tile layout)
- can you look at the content-type header during an export? Does it contain the double declaration?
Show
fabrizio giustina added a comment - Sorry, but I am still unable to reproduce it, I will need addictional details: - which version of displaytag are you using, is it the latest rc2 snapshot or a previous release? - do you use the export filter? And a couple of addictional tests: - try putting the encoding declaration directly in the page containing the display:table tag (not only in the master tile layout) - can you look at the content-type header during an export? Does it contain the double declaration?
Hide
Anonymous added a comment -
I am using a slightly modified version of an RC2 pulled from CVS shortly before the 10th of November. The mods implement the Collator-based sorting described in DISPL-77, in the checkNullsAndCompare method of RowSorter, thus should have no effect on export file formats. If you think it matters, I could retry everything using an unmodified CVS snapshot.

Otherwise, I *am* using the export filter. Not sure if it would make any difference, but my web application uses a servlet 2.4/JSP 2.0-style web.xml file.

I added an additional <%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8" %> delcaration to the .jsp which contains the actual tile where the displaytag table is. After redeploying the application, I got the same results (web page charset=UTF-8 and CSV export in ISO-8859):

$ curl -i 'http://cerenit.local:8080/SAGE/client.do?6578706f7274=1&d-16544-e=1&status=active&retrieve=&#39;
HTTP/1.1 200 OK
Set-Cookie: JSESSIONID=E08718BE8363353F1BC71B0A2C522229; Path=/SAGE
Content-Type: text/csv;charset=ISO-8859-1
Content-Length: 2567
Date: Tue, 16 Nov 2004 20:13:58 GMT
Server: Apache-Coyote/1.1

I have not yet tried uploading an updated .war to the server with the "double charset" declarations, as it is a semi-production server, and I'm not getting the results I want even on my development server which doesn't have the double charset problem.
Show
Anonymous added a comment - I am using a slightly modified version of an RC2 pulled from CVS shortly before the 10th of November. The mods implement the Collator-based sorting described in DISPL-77, in the checkNullsAndCompare method of RowSorter, thus should have no effect on export file formats. If you think it matters, I could retry everything using an unmodified CVS snapshot. Otherwise, I *am* using the export filter. Not sure if it would make any difference, but my web application uses a servlet 2.4/JSP 2.0-style web.xml file. I added an additional <%@ page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8" %> delcaration to the .jsp which contains the actual tile where the displaytag table is. After redeploying the application, I got the same results (web page charset=UTF-8 and CSV export in ISO-8859): $ curl -i 'http://cerenit.local:8080/SAGE/client.do?6578706f7274=1&d-16544-e=1&status=active&retrieve=&#39; HTTP/1.1 200 OK Set-Cookie: JSESSIONID=E08718BE8363353F1BC71B0A2C522229; Path=/SAGE Content-Type: text/csv;charset=ISO-8859-1 Content-Length: 2567 Date: Tue, 16 Nov 2004 20:13:58 GMT Server: Apache-Coyote/1.1 I have not yet tried uploading an updated .war to the server with the "double charset" declarations, as it is a semi-production server, and I'm not getting the results I want even on my development server which doesn't have the double charset problem.
Hide
Anonymous added a comment -
i have the same problem,and i believe the reason is that the excel just using the system default decoding and i haven't find a way to change that .perhaps the ms excel does not support utf-8.
Show
Anonymous added a comment - i have the same problem,and i believe the reason is that the excel just using the system default decoding and i haven't find a way to change that .perhaps the ms excel does not support utf-8.
Hide
fabrizio giustina added a comment -
sorry, postponing a fix since I'm still unable to reproduce the problem. Unit tests give a positive results and sniffing the content type on a live web application gives me correctly UTF8. :/
I'll try to run some test on a similar environment, but at the moment I can't do anything to fix it.

If you need a quick solution for the excel export you could use a customized export view using POI, which generates binary excel files.
Show
fabrizio giustina added a comment - sorry, postponing a fix since I'm still unable to reproduce the problem. Unit tests give a positive results and sniffing the content type on a live web application gives me correctly UTF8. :/ I'll try to run some test on a similar environment, but at the moment I can't do anything to fix it. If you need a quick solution for the excel export you could use a customized export view using POI, which generates binary excel files.
Hide
Anonymous added a comment -
?
Show
Anonymous added a comment - ?
Hide
Ray Tsang added a comment -
I am experiencing the same issue. The exported file is indeed UTF8 encoded, but Excel does not open it correctly when opened directly (through the browser or by double clicking). If you start Excel first, then open the file through "Open File" operation, it would detect the encoding correctly. Moreover, if you resave the file from Excel as tab delimited file again, it would open correctly in the future.

Turns out, when saving Tab-delimited file from Excel, it does not save w/ the surrounding quotes (").
Show
Ray Tsang added a comment - I am experiencing the same issue. The exported file is indeed UTF8 encoded, but Excel does not open it correctly when opened directly (through the browser or by double clicking). If you start Excel first, then open the file through "Open File" operation, it would detect the encoding correctly. Moreover, if you resave the file from Excel as tab delimited file again, it would open correctly in the future. Turns out, when saving Tab-delimited file from Excel, it does not save w/ the surrounding quotes (").
Hide
fabrizio giustina added a comment -
since the problem is related to Excel handling of UTF8 encoded files, the fix here is to use the new binary excel format which will be included in displaytag 1.1.
Tests for for the csv/xml format also show that encoding is preserved during export and I can't see any other possible fox to displaytag...
Show
fabrizio giustina added a comment - since the problem is related to Excel handling of UTF8 encoded files, the fix here is to use the new binary excel format which will be included in displaytag 1.1. Tests for for the csv/xml format also show that encoding is preserved during export and I can't see any other possible fox to displaytag...
Hide
Jan added a comment -
Hello,
this bug is closed but I'm also having that issue in 1.2 when exporting to csv file. The JSP page has a UTF-8 encoding and characters with accents are all displayed ok. When I export to csv, the response has a "Content-Type: text/csv;charset=ISO-8859-1
". The result file is then not displayed correctly on Linux.
Show
Jan added a comment - Hello, this bug is closed but I'm also having that issue in 1.2 when exporting to csv file. The JSP page has a UTF-8 encoding and characters with accents are all displayed ok. When I export to csv, the response has a "Content-Type: text/csv;charset=ISO-8859-1 ". The result file is then not displayed correctly on Linux.

People

Vote (1)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: