Details
Description
There is problem in the way how multibyte UTF-8 characters are handled at end of chunk in the method org.mortbay.jetty.AbstractGenerator.OutputWriter.write(char[] s,int offset, int length).
When multibyte UTF-8 character (for example á - \u00E1) is last character which can fit into "bytes" buffer, it is printed two times to output. One times at the end of buffer, but than this code
if (chunk-i>buffer.length-bytes)
chunk=buffer.length-bytes+i;
cuts the chunk (it is right in the other places - we spend two or more bytes form "bytes" buffer, so we must shorten number of chars which can fir teh buffer). But when this cut occurs at the end of "for (int i = 0; i < chunk; i++)" cycle, this shortcuting of chunk appears like we didn't write last char into buffer. So it is written again in next cycle of OutputWrite.write() call.
I think condition
if (chunk-i>buffer.length-bytes)
chunk=buffer.length-bytes+i;
should be properly
if (chunk-i>buffer.length-bytes && buffer.length-bytes>0)
chunk=buffer.length-bytes+i;
Testcase servlet. Should write 21 lines with "áíáíáí..." text. All lines should has the same length and chars "á" and "í" should alternate regularly.
With default configuration (buffer length set to 2048), it now (before applying patch) prints lines 7 and 20 one character longer - there is double "íí" at line 7 and double "áá" at beginning of line 20. (You can find it by "(.)\1" RegExp.)