groovy

GroovyClassLoader does not parse files saved in utf-8

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.1-rc-1
  • Fix Version/s: 1.5.5
  • Labels:
    None
  • Environment:
    Vista, maven project built with <encoding>utf-8</encoding>
  • Number of attachments :
    0

Description

The following code

CompilerConfiguration config = new CompilerConfiguration();
config.setSourceEncoding("UTF-8");
gcl = new GroovyClassLoader(Thread.currentThread().getContextClassLoader(), config);
gcl.parseClass(afile);

causes the following error:
1: Invalid variable name. Must start with a letter but was: ?survivantManager.
Node: org.codehaus.groovy.ast.expr.VariableExpression. At [1:1] @ line 1, column 1.
?survivantManager...

The file has been saved as utf-8 and tested with various programs to confirm that it is presented correctly, and that there is no hidden character (the one appearing as a ?) in the beginning of the file.

Any suggestions? Or is it indeed a bug?

Thanks for the help in advance!

Activity

Hide
Guillaume Laforge added a comment -

With what editor / IDE have you edited your file?
Does your file have a BOM (Byte Order Mark) at the beginning of the file? (check with an hex viewer of some sort)

Show
Guillaume Laforge added a comment - With what editor / IDE have you edited your file? Does your file have a BOM (Byte Order Mark) at the beginning of the file? (check with an hex viewer of some sort)
Hide
Nicolas Kyriazopoulos-Panagiotopoulos added a comment -

Notepad, and notepad++. Unfortunately there was indeed a BOM (we saw it with HHD Free Hex Editor).
It was this problem indeed.
Wouldn't it be nice if Groovy knew when to ignore BOM?
Do you suggest us a particular editor to avoid this problem?

Show
Nicolas Kyriazopoulos-Panagiotopoulos added a comment - Notepad, and notepad++. Unfortunately there was indeed a BOM (we saw it with HHD Free Hex Editor). It was this problem indeed. Wouldn't it be nice if Groovy knew when to ignore BOM? Do you suggest us a particular editor to avoid this problem?
Hide
Guillaume Laforge added a comment -

There's a couple dedicated classes in Groovy which could/should be used to do this.
In the past, they were used, but it seems it's not the case anymore.
Regarding your last question, if you use an IDE like IntelliJ IDEA or Eclipse (and their respective Groovy plugins), you should be safe.
I think it's only Windows basic text editors which do add the BOM automatically.
Anyway, we'll have to investigate this issue, as I think we could leverage the CharsetToolkit class present in Groovy to properly skip those BOM bytes.

Show
Guillaume Laforge added a comment - There's a couple dedicated classes in Groovy which could/should be used to do this. In the past, they were used, but it seems it's not the case anymore. Regarding your last question, if you use an IDE like IntelliJ IDEA or Eclipse (and their respective Groovy plugins), you should be safe. I think it's only Windows basic text editors which do add the BOM automatically. Anyway, we'll have to investigate this issue, as I think we could leverage the CharsetToolkit class present in Groovy to properly skip those BOM bytes.
Hide
Nicolas Kyriazopoulos-Panagiotopoulos added a comment -

The problem has a workaround (using an IDE or tool that does not add BOM characters), but it could also be properly resolved.

Show
Nicolas Kyriazopoulos-Panagiotopoulos added a comment - The problem has a workaround (using an IDE or tool that does not add BOM characters), but it could also be properly resolved.
Hide
Nicolas Kyriazopoulos-Panagiotopoulos added a comment -

While we managed to avoide BOF problems, we still have problems with unicode characters.
When doing the following (this is slightly simplified):

public void init() { CompilerConfiguration config = new CompilerConfiguration(); config.setSourceEncoding("UTF-8"); gcl = new GroovyClassLoader(Thread.currentThread().getContextClassLoader(), config); }

public Class parse(final String codeScript) {
Class clazz = gcl.parseClass(codeScript);
return clazz;
}
greek and chinese characters in codeScript appear properly in the debugger, but the messages (included in the groovy scripts) on the page show '?'s. Everything else presented on the page is properly presented (including utf-8 coming from the database, in the page, or coming from files), so it's not a problem in the presentation layer, but a problem specifically involving Groovy parsing.

Show
Nicolas Kyriazopoulos-Panagiotopoulos added a comment - While we managed to avoide BOF problems, we still have problems with unicode characters. When doing the following (this is slightly simplified): public void init() { CompilerConfiguration config = new CompilerConfiguration(); config.setSourceEncoding("UTF-8"); gcl = new GroovyClassLoader(Thread.currentThread().getContextClassLoader(), config); } public Class parse(final String codeScript) { Class clazz = gcl.parseClass(codeScript); return clazz; } greek and chinese characters in codeScript appear properly in the debugger, but the messages (included in the groovy scripts) on the page show '?'s. Everything else presented on the page is properly presented (including utf-8 coming from the database, in the page, or coming from files), so it's not a problem in the presentation layer, but a problem specifically involving Groovy parsing.
Hide
Nicolas Kyriazopoulos-Panagiotopoulos added a comment -

We tested it under 1.5.1 too and the problem persists. This is seriously blocking, as we try to build a multilanguage version of the application...

I have the impression that the sourceEncoding parameter of the config is not taken into account by GroovyClassLoader, but I might be mistaken (the source code of GroovyClassLoader doesn't explicitely take into account the parameter, but some called object might).

Show
Nicolas Kyriazopoulos-Panagiotopoulos added a comment - We tested it under 1.5.1 too and the problem persists. This is seriously blocking, as we try to build a multilanguage version of the application... I have the impression that the sourceEncoding parameter of the config is not taken into account by GroovyClassLoader, but I might be mistaken (the source code of GroovyClassLoader doesn't explicitely take into account the parameter, but some called object might).
Hide
blackdrag blackdrag added a comment -

I added code that will respect the BOM and I changed the classloader to respect the encoding given in the config for strings.

Show
blackdrag blackdrag added a comment - I added code that will respect the BOM and I changed the classloader to respect the encoding given in the config for strings.
Hide
Nicolas Kyriazopoulos-Panagiotopoulos added a comment -

Thank you for correcting the BOM issue.
Unfortunately, the classloader unicode issue is not fixed.
We had a termporary solution in place (converting utf-8 string to iso before passing them to groovy) but it wasn't working in all environnements and for all characters, so I switched to groovy 1.5.4 to use the corrected loader. Unfortunately, the problem persists (with a twist):
if I set config.setSourceEncoding("UTF-16"); the script (correctly) does not compile, so it is clear that now the source encoding is indeed taken into account
Unfortunately, the problems with non-ascii characters persists (you can test the following: é中ε - copy paste from wikipedia)

I tried to see if it's a problem in string creation by groovy (instead of compilation), but the following correctly shows the chinese 中 character

byte[] e  =  [-26,-106,-121]
tata = new  String(e, "utf-8")
return messageFactory.createInformationMessage(tata);

So it's still a parsing problem.

Thanks for the effort - it is extremely important for us.

PS: I did extremely detailed tests to verify that the correct bytecodes are passed to groovy and that there is no problem elsewhere.

Show
Nicolas Kyriazopoulos-Panagiotopoulos added a comment - Thank you for correcting the BOM issue. Unfortunately, the classloader unicode issue is not fixed. We had a termporary solution in place (converting utf-8 string to iso before passing them to groovy) but it wasn't working in all environnements and for all characters, so I switched to groovy 1.5.4 to use the corrected loader. Unfortunately, the problem persists (with a twist): if I set config.setSourceEncoding("UTF-16"); the script (correctly) does not compile, so it is clear that now the source encoding is indeed taken into account Unfortunately, the problems with non-ascii characters persists (you can test the following: é中ε - copy paste from wikipedia) I tried to see if it's a problem in string creation by groovy (instead of compilation), but the following correctly shows the chinese 中 character
byte[] e  =  [-26,-106,-121]
tata = new  String(e, "utf-8")
return messageFactory.createInformationMessage(tata);
So it's still a parsing problem. Thanks for the effort - it is extremely important for us. PS: I did extremely detailed tests to verify that the correct bytecodes are passed to groovy and that there is no problem elsewhere.
Hide
Nicolas Kyriazopoulos-Panagiotopoulos added a comment -

I don't understand at all. We tried again and now it works. If it stays like this for this week, I'll close the issue.

Show
Nicolas Kyriazopoulos-Panagiotopoulos added a comment - I don't understand at all. We tried again and now it works. If it stays like this for this week, I'll close the issue.
Hide
blackdrag blackdrag added a comment -

I updated the release version to 1.5.5... can we close this bug?

Show
blackdrag blackdrag added a comment - I updated the release version to 1.5.5... can we close this bug?
Hide
Nicolas Kyriazopoulos-Panagiotopoulos added a comment -

Yes it's resolved indeed. No problems have been encountered.
Sorry for the delay, but I changed company so I am not any more in the project and thus I forgot the issue!

Show
Nicolas Kyriazopoulos-Panagiotopoulos added a comment - Yes it's resolved indeed. No problems have been encountered. Sorry for the delay, but I changed company so I am not any more in the project and thus I forgot the issue!
Hide
blackdrag blackdrag added a comment -

no problem...

I am closing this bug now

Show
blackdrag blackdrag added a comment - no problem... I am closing this bug now

People

Vote (1)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: