History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: CASTOR-2228
Type: Bug Bug
Status: Open Open
Priority: Blocker Blocker
Assignee: Edward Kuns
Reporter: Raffaele Fabbri
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
castor

wrong charset bad regular expression

Created: 14/Dec/07 03:54 AM   Updated: 07/Jan/08 04:24 AM
Component/s: XML code generator
Affects Version/s: 1.1.2.1
Fix Version/s: None

Time Tracking:
Original Estimate: 3 days
Original Estimate - 3 days
Remaining Estimate: 3 days
Remaining Estimate - 3 days
Time Spent: Not Specified
Remaining Estimate - 3 days

File Attachments: 1. Zip Archive exsample2.zip (1 kb)
2. Zip Archive issueCastor.zip (37 kb)

Environment:
Eclipse 3.2.2
Castor 1.1.2.1
Java 1.4.2_14

Testcase included: yes


 Description  « Hide
have an Xsd where some field restriction is based on regular expression that uses special character such as "([À-ža-zA-Z\- '])*"

In this case in package descriptors the objectDescriptors contain the next line

typeValidator.addPattern("([À-ža-zA-Z\\- '])*");

where the pattern is not the same.

What I wrong?

Thank's in advance

In the attached file there are two sample.
The ant castor_xml_saia_build.1.1.xml contains two targets: GeneratorTestCastor and GeneratorRichiesta
The first is a small sample
The second is a complex sample that has another problem. With the version 0.9.5.4 the generated class was compilable but with that new version none.
With the old version there where problem with collision but source code was complilable.
Thank's in advance



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Werner Guttmann - 14/Dec/07 04:04 AM
Raffaele can you please create a separate issue for the second problem ? We'd like to keep things as much separated and self-contained as possible. Or let's first have a discussion about this problem on the mailing lists.

Werner Guttmann - 14/Dec/07 04:07 AM
In addition, when posting an issue, please have a look at http://castor.org/how-to-submit-an-xml-bug.html first. In addition, please avoid attaching complete projects when I have asked you (on the mailing lists) to attach a minimal XML schema that highlights the problem at hand and just the problem (and nothing else). Can you please delete the ZIP archive from this issue and attach a small XML schema that exposes the issue. Thanks in advance.

Raffaele Fabbri - 14/Dec/07 04:40 AM
Excuse Me.

I hope that this file is what you want.

Thank's


Raffaele Fabbri - 14/Dec/07 06:11 AM
I downloaded the castor source code and I found that the problem is in class
org.exolab.castor.builder.SourceGenerator

the method invoked by the main of SourceGeneratorMain
sgen.generateSource(schemaFilename, options.getProperty(ARGUMENT_PACKAGE));

public final void generateSource(
final String filename, final String packageName) throws
IOException {
final File schemaFile;
if (filename.startsWith("./")) { schemaFile = new File(filename.substring(2)); } else { schemaFile = new File(filename); }

FileReader reader = new FileReader(schemaFile);

try { InputSource source = new InputSource(reader); source.setEncoding("UTF-8"); // added by Raffaele Fabbri source.setSystemId(toURIRepresentation(schemaFile.getAbsolutePath())); generateSource(source, packageName); } finally {
try { reader.close(); } catch (java.io.IOException iox) { // ignore }
}
} //-- generateSource

This is an specific solution that solves our situation. It would be better
that a general solution could be found.
I would like to ask you if you could upload the solution.
If you have any dobts please call or send me a mail.

Thank's very much.

Raffaele Fabbri


Raffaele Fabbri - 14/Dec/07 06:41 AM
The previus solution run only if the pattern is in the root xsd

If the pattern is defined in an include file it doesn't work.

I try to find the problem but I need some help


Edward Kuns - 15/Dec/07 10:51 AM
I expect there is more than one location in the source generator that needs to set a charset. The general solution would be to allow providing a charset in the configuration file, and in the absence of one being provided, either not set a charset or set a default charset.

Edward Kuns - 15/Dec/07 01:10 PM
Hmm, are you sure that the xsd is a valid UTF-8 file? It looks like it may be ISO-8859-1 or something else.

Edward Kuns - 15/Dec/07 01:21 PM
Raffaele, try marking the XSD with

<?xml version="1.0" encoding="ISO-8851"?>

or some other appropriate character set. When I load your XSD in a text editor, I don't see the proper characters. If I cut-and-paste from a browser into the xsd file and re-save it, and then run everything, it works for me. When the input XSD is read in, the encoding specified in the <?xml statement is used to figure out the encoding.


Raffaele Fabbri - 16/Dec/07 02:12 AM
I can try but the xsd I use is not mine. It is a specific delivered by Italian Ministero Interni and I can't change the encoding.
Default Eclipse settings don't use a correct encoding to open the xsd file.
I add a file type in Windows->Preferences->General->Content Types->Text->XML: File Associations *.xsd, Default encoding UTF-8
and with these settings Amateras or default editor loads correctly the file.
Question: if the encoding is wrong why if I change encoding as I explained with a minimal XDS schema it works fine?

Sorry for my bad english but I'm italian


Raffaele Fabbri - 16/Dec/07 02:23 AM
If I change character set with ISO-8859-1my editor tell me: "Some Characters cnnot be mapped using "ISO-8859-1" character encoding. Either change the encodinga or remove the character which are not supported by the "ISO-8859-1" character encoding.

Raffaele Fabbri - 16/Dec/07 02:32 AM
I send another sample like prova.xsd but with the type definitions in a separated file named tipi.xsd

Edward Kuns - 17/Dec/07 12:13 PM
I am still looking into this. You're right – I looked further and this file is not ISO-8859-1.

Werner Guttmann - 27/Dec/07 04:55 PM
So what is the current understanding of this issue ? It is a genuine bug ?

Raffaele Fabbri - 29/Dec/07 10:54 AM
I think it is a genuinie bug but at this time I'am not be able to find the right fix. Can somebody help me?

Thank's in advance


Edward Kuns - 31/Dec/07 11:02 AM
I still need to do some research to understand more fully the interaction of locales and input streams and parsers, but I am still looking into this.

Edward Kuns - 06/Jan/08 01:16 PM
If you add the following line

source.setEncoding("UTF-8");

to both places in the source file that make "new InputSource()", does that help the import problem? If I do this, the CTF master test suite runs successfully.


Raffaele Fabbri - 07/Jan/08 04:24 AM
I add the line in all places in the source file "SourceGernerator.java" but when I try to run this utility with the sample that include tipi.xsd the generated regular expression is wrong.
Please send me your modified source?
Thanks in advance