groovy
  1. groovy
  2. GROOVY-3828

Groovy compiler chokes on MBCS in the script

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 1.6.5
    • Fix Version/s: None
    • Component/s: parser
    • Labels:
      None
    • Environment:
      Linux Fedora(64 bit) with JDK 1.6.0_06
    • Testcase included:
      yes
    • Number of attachments :
      0

      Description

      If groovy script contains any multi byte char set groovy compiler fails and throws exception. How to make this program work?

      package project1;
      
      import groovy.lang.Binding;
      import groovy.lang.GroovyShell;
      
      import java.nio.ByteBuffer;
      import java.nio.CharBuffer;
      import java.nio.charset.Charset;
      
      import org.codehaus.groovy.control.CompilationUnit;
      import org.codehaus.groovy.control.CompilerConfiguration;
      import org.codehaus.groovy.control.ErrorCollector;
      import org.codehaus.groovy.control.MultipleCompilationErrorsException;
      import org.codehaus.groovy.control.SourceUnit;
      
      public class TestMBCSWithGroovy {
          public TestMBCSWithGroovy() {
              super();
          }
      
          public static void main(String[] args) {
              try {
                  byte[] b = { '' };
                  String encodingName = "UTF-8";
                  Charset cs = Charset.forName(encodingName);
                  ByteBuffer bb = ByteBuffer.wrap(b);
                  CharBuffer cb = cs.decode(bb);
                  String s = cb.toString();
                  //        s = new String(someBytes, encodingName);
                  //        s = "цец";
                  CompilerConfiguration debugConfiguration =
                      new CompilerConfiguration(CompilerConfiguration.DEFAULT);
                  debugConfiguration.setSourceEncoding(encodingName);
                  GroovyShell mGS =
                      new GroovyShell(Thread.currentThread().getContextClassLoader(),
                                      new Binding(), debugConfiguration);
                  CompilerConfiguration conf =
                      new CompilerConfiguration(CompilerConfiguration.DEFAULT);
                  ErrorCollector err = new ErrorCollector(conf);
                  SourceUnit src =
                      new SourceUnit("test", s, conf, mGS.getClassLoader(), err);
                  CompilationUnit compilationUnit = new CompilationUnit(conf);
                  compilationUnit.addSource(src);
                  compilationUnit.compile();
      
              } catch (MultipleCompilationErrorsException me) {
                  me.printStackTrace();
              }
      
          }
      }
      

      Exception:
      project1.TestMBCSWithGroovy
      org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed, test: 1: unexpected char: 0x14 @ line 1, column 1.

      ^

      1 error

      at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:296)
      at org.codehaus.groovy.control.ErrorCollector.addFatalError(ErrorCollector.java:143)
      at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:113)
      at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:125)
      at org.codehaus.groovy.control.SourceUnit.addError(SourceUnit.java:353)
      at org.codehaus.groovy.antlr.AntlrParserPlugin.transformCSTIntoAST(AntlrParserPlugin.java:89)
      at org.codehaus.groovy.antlr.AntlrParserPlugin.parseCST(AntlrParserPlugin.java:61)
      at org.codehaus.groovy.control.SourceUnit.parse(SourceUnit.java:249)
      at org.codehaus.groovy.control.CompilationUnit$1.call(CompilationUnit.java:160)
      at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:798)
      at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:464)
      at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:443)
      at project1.TestMBCSWithGroovy.main(TestMBCSWithGroovy.java:44)
      Process exited with exit code 0.

        Activity

        Hide
        Guillaume Laforge added a comment -

        And where is the Groovy script coming from?
        Is it put as is in the java class? Or is it coming from a file / database or elsewhere?

        Yes, there's the CompilerConfiguration#setEncoding method that you can use to specify the encoding used.

        Also, could you make the same test you're doing, but with Groovy 1.7-beta-2, please?
        I made some changes somewhat related to that, and this may be fixed in 1.7-beta-2 already.

        Show
        Guillaume Laforge added a comment - And where is the Groovy script coming from? Is it put as is in the java class? Or is it coming from a file / database or elsewhere? Yes, there's the CompilerConfiguration#setEncoding method that you can use to specify the encoding used. Also, could you make the same test you're doing, but with Groovy 1.7-beta-2, please? I made some changes somewhat related to that, and this may be fixed in 1.7-beta-2 already.
        Hide
        vijay redla added a comment -

        The groovy script is coming from a xml file, read in as a java string and the string is then parsed with the groovy parser as defined in the above program. The names of some binding variable in the script are the same as column names in a MBCS enabled database, hence the requirement.

        I verified with groovy1.7 beta2 same error.

        /opt/jdk16/bin/java -classpath /home/vredla/sample/groovy/Application2/Project1/classes:/home/vredla/groovy-1.7-beta-2/lib/ant-1.7.1.jar:/home/vredla/groovy-1.7-beta-2/lib/ant-junit-1.7.1.jar:/home/vredla/groovy-1.7-beta-2/lib/ant-launcher-1.7.1.jar:/home/vredla/groovy-1.7-beta-2/lib/antlr-2.7.7.jar:/home/vredla/groovy-1.7-beta-2/lib/asm-3.2.jar:/home/vredla/groovy-1.7-beta-2/lib/asm-analysis-3.2.jar:/home/vredla/groovy-1.7-beta-2/lib/asm-commons-3.2.jar:/home/vredla/groovy-1.7-beta-2/lib/asm-tree-3.2.jar:/home/vredla/groovy-1.7-beta-2/lib/asm-util-3.2.jar:/home/vredla/groovy-1.7-beta-2/lib/bsf-2.4.0.jar:/home/vredla/groovy-1.7-beta-2/lib/commons-cli-1.2.jar:/home/vredla/groovy-1.7-beta-2/lib/commons-logging-1.1.jar:/home/vredla/groovy-1.7-beta-2/lib/groovy-1.7-beta-2.jar:/home/vredla/groovy-1.7-beta-2/lib/ivy-2.1.0-rc2.jar:/home/vredla/groovy-1.7-beta-2/lib/jline-0.9.94.jar:/home/vredla/groovy-1.7-beta-2/lib/jsp-api-2.0.jar:/home/vredla/groovy-1.7-beta-2/lib/junit-4.7.jar:/home/vredla/groovy-1.7-beta-2/lib/servlet-api-2.4.jar:/home/vredla/groovy-1.7-beta-2/lib/xstream-1.3.1.jar project1.TestMBCSWithGroovy
        org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed:
        test: 1: unexpected char: 0x14 @ line 1, column 1.

        ^

        1 error

        at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:296)
        at org.codehaus.groovy.control.ErrorCollector.addFatalError(ErrorCollector.java:143)
        at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:113)
        at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:125)
        at org.codehaus.groovy.control.SourceUnit.addError(SourceUnit.java:353)
        at org.codehaus.groovy.antlr.AntlrParserPlugin.transformCSTIntoAST(AntlrParserPlugin.java:97)
        at org.codehaus.groovy.antlr.AntlrParserPlugin.parseCST(AntlrParserPlugin.java:69)
        at org.codehaus.groovy.control.SourceUnit.parse(SourceUnit.java:249)
        at org.codehaus.groovy.control.CompilationUnit$1.call(CompilationUnit.java:155)
        at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:810)
        at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:470)
        at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:449)
        at project1.TestMBCSWithGroovy.main(TestMBCSWithGroovy.java:44)

        Show
        vijay redla added a comment - The groovy script is coming from a xml file, read in as a java string and the string is then parsed with the groovy parser as defined in the above program. The names of some binding variable in the script are the same as column names in a MBCS enabled database, hence the requirement. I verified with groovy1.7 beta2 same error. /opt/jdk16/bin/java -classpath /home/vredla/sample/groovy/Application2/Project1/classes:/home/vredla/groovy-1.7-beta-2/lib/ant-1.7.1.jar:/home/vredla/groovy-1.7-beta-2/lib/ant-junit-1.7.1.jar:/home/vredla/groovy-1.7-beta-2/lib/ant-launcher-1.7.1.jar:/home/vredla/groovy-1.7-beta-2/lib/antlr-2.7.7.jar:/home/vredla/groovy-1.7-beta-2/lib/asm-3.2.jar:/home/vredla/groovy-1.7-beta-2/lib/asm-analysis-3.2.jar:/home/vredla/groovy-1.7-beta-2/lib/asm-commons-3.2.jar:/home/vredla/groovy-1.7-beta-2/lib/asm-tree-3.2.jar:/home/vredla/groovy-1.7-beta-2/lib/asm-util-3.2.jar:/home/vredla/groovy-1.7-beta-2/lib/bsf-2.4.0.jar:/home/vredla/groovy-1.7-beta-2/lib/commons-cli-1.2.jar:/home/vredla/groovy-1.7-beta-2/lib/commons-logging-1.1.jar:/home/vredla/groovy-1.7-beta-2/lib/groovy-1.7-beta-2.jar:/home/vredla/groovy-1.7-beta-2/lib/ivy-2.1.0-rc2.jar:/home/vredla/groovy-1.7-beta-2/lib/jline-0.9.94.jar:/home/vredla/groovy-1.7-beta-2/lib/jsp-api-2.0.jar:/home/vredla/groovy-1.7-beta-2/lib/junit-4.7.jar:/home/vredla/groovy-1.7-beta-2/lib/servlet-api-2.4.jar:/home/vredla/groovy-1.7-beta-2/lib/xstream-1.3.1.jar project1.TestMBCSWithGroovy org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed: test: 1: unexpected char: 0x14 @ line 1, column 1. ^ 1 error at org.codehaus.groovy.control.ErrorCollector.failIfErrors(ErrorCollector.java:296) at org.codehaus.groovy.control.ErrorCollector.addFatalError(ErrorCollector.java:143) at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:113) at org.codehaus.groovy.control.ErrorCollector.addError(ErrorCollector.java:125) at org.codehaus.groovy.control.SourceUnit.addError(SourceUnit.java:353) at org.codehaus.groovy.antlr.AntlrParserPlugin.transformCSTIntoAST(AntlrParserPlugin.java:97) at org.codehaus.groovy.antlr.AntlrParserPlugin.parseCST(AntlrParserPlugin.java:69) at org.codehaus.groovy.control.SourceUnit.parse(SourceUnit.java:249) at org.codehaus.groovy.control.CompilationUnit$1.call(CompilationUnit.java:155) at org.codehaus.groovy.control.CompilationUnit.applyToSourceUnits(CompilationUnit.java:810) at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:470) at org.codehaus.groovy.control.CompilationUnit.compile(CompilationUnit.java:449) at project1.TestMBCSWithGroovy.main(TestMBCSWithGroovy.java:44)
        Hide
        Guillaume Laforge added a comment -

        And the XML is encoded in UTF-8 and properly parsed? (ie not parsed as some other 8-bit encoding)

        Show
        Guillaume Laforge added a comment - And the XML is encoded in UTF-8 and properly parsed? (ie not parsed as some other 8-bit encoding)
        Hide
        vijay redla added a comment -

        XML is encoded UTF8, I will verify if it got parsed right and get back to you.

        Show
        vijay redla added a comment - XML is encoded UTF8, I will verify if it got parsed right and get back to you.
        Hide
        blackdrag blackdrag added a comment -

        I wondering about 0x14. Even if the source was in UTF8 and it is now ASCII 0x14 would stay 0x14 and it would be invalid. I thought it might be the byte order mark, but that is using a differnt sequence. In 1.6 we transcoded the string, which could have lead to such problems, although I have still no idea where the 0x14 might be comming from. But in 1.7 we use the String as it is. If any encoding problem happens, then it is already in the string. In your first example you convert ", but that is ASCII 0x22, converted into UTF8 it is still 0x22 and not 0x14. Even UCS2 uses the ASCII code for this.

        Show
        blackdrag blackdrag added a comment - I wondering about 0x14. Even if the source was in UTF8 and it is now ASCII 0x14 would stay 0x14 and it would be invalid. I thought it might be the byte order mark, but that is using a differnt sequence. In 1.6 we transcoded the string, which could have lead to such problems, although I have still no idea where the 0x14 might be comming from. But in 1.7 we use the String as it is. If any encoding problem happens, then it is already in the string. In your first example you convert ", but that is ASCII 0x22, converted into UTF8 it is still 0x22 and not 0x14. Even UCS2 uses the ASCII code for this.

          People

          • Assignee:
            Guillaume Laforge
            Reporter:
            vijay redla
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: