groovy
  1. groovy
  2. GROOVY-2382

64K limit on strings -- Caught: java.lang.ClassFormatError: Unknown constant tag 84 in class file test

    Details

    • Type: Bug Bug
    • Status: Closed Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5
    • Fix Version/s: 1.6-rc-1, 1.5.8, 1.7-beta-1
    • Component/s: bytecode
    • Labels:
      None
    • Environment:
      Fedora Core 7 64-bit running 32-bit jdk 1.5.10
    • Testcase included:
      yes
    • Number of attachments :
      1

      Description

      Assigning a string of 65535 characters or larger results in this error: Caught: java.lang.ClassFormatError: Unknown constant tag 84 in class file test.

      Attached testcase's string is exactly 65535 characters. Removing a single character from the string eliminates this error.

      1. test.groovy
        64 kB
        Jonathan Meeks

        Activity

        Hide
        blackdrag blackdrag added a comment -

        yes I guess we should check this and several other things (like method length)

        Show
        blackdrag blackdrag added a comment - yes I guess we should check this and several other things (like method length)
        Hide
        Jesse Watson added a comment -

        I ran into this issue as well. Might seem like not that big a deal, but here it something to consider...

        While it is unfortunate that there is this limit on String literals in groovy, it is much worse to get this kind of error message as opposed to something like "String literal too long". The ClassFormatError is not confidence-building in a new language like Groovy, and a better error message would at least lend the appearance of some robustness. I love Groovy, but if we want it to gain acceptance, focusing on robustness is key – error messages like this one really reveal the immaturity of a language.

        Show
        Jesse Watson added a comment - I ran into this issue as well. Might seem like not that big a deal, but here it something to consider... While it is unfortunate that there is this limit on String literals in groovy, it is much worse to get this kind of error message as opposed to something like "String literal too long". The ClassFormatError is not confidence-building in a new language like Groovy, and a better error message would at least lend the appearance of some robustness. I love Groovy, but if we want it to gain acceptance, focusing on robustness is key – error messages like this one really reveal the immaturity of a language.
        Hide
        blackdrag blackdrag added a comment -

        The 64k limit is none of the language Groovy, it is more a limit of the JVM, See http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#88659

        The length of field and method names, field and method descriptors, and other constant string values is limited to 65535 characters by the 16-bit unsigned length item of the CONSTANT_Utf8_info structure (4.4.7). Note that the limit is on the number of bytes in the encoding and not on the number of encoded characters. UTF-8 encodes some characters using two or three bytes. Thus, strings incorporating multibyte characters are further constrained.


        The obvious solution is to split the String by the compiler in chunks small than 64k and concat them to get the real string. And that is the target of resolving this issue. I guess adding checks for the number of constant pool entries, descriptor lengths and others are also something we should do. The only reason for me making the current situation bad, is that the ClassFormatErrors or VerifyErrors are errors and no exceptions. From the side of having a security leak or something like this, because of such an error... I think there is nothing to be afraid of. The JVM is responsible for this, not the generated bytecode.

        Show
        blackdrag blackdrag added a comment - The 64k limit is none of the language Groovy, it is more a limit of the JVM, See http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html#88659 The length of field and method names, field and method descriptors, and other constant string values is limited to 65535 characters by the 16-bit unsigned length item of the CONSTANT_Utf8_info structure (4.4.7). Note that the limit is on the number of bytes in the encoding and not on the number of encoded characters. UTF-8 encodes some characters using two or three bytes. Thus, strings incorporating multibyte characters are further constrained. The obvious solution is to split the String by the compiler in chunks small than 64k and concat them to get the real string. And that is the target of resolving this issue. I guess adding checks for the number of constant pool entries, descriptor lengths and others are also something we should do. The only reason for me making the current situation bad, is that the ClassFormatErrors or VerifyErrors are errors and no exceptions. From the side of having a security leak or something like this, because of such an error... I think there is nothing to be afraid of. The JVM is responsible for this, not the generated bytecode.
        Hide
        Jesse Watson added a comment -

        Jochen – thanks for the clarification. I didn't realize this limit was in the JVM – good to know.

        I guess my larger point was more perception-oriented than anything. ClassFormatError smells like a bad internal error, and someone like myself seeing that (without knowing as much as yourself about what is really going on), I think: UH-OH, SOMETHING IS DEEPLY WRONG HERE... GROOVY IS UNSTABLE!

        For no other reason than because in Java, the only time you would see a ClassFormatError is when your compiler has a bug in it, or if you are trying to run some non-class file through a JVM.

        For example, a too-long string literal run through javac produces the following compile-time error: "constant string too long". That feels like the error message from a stable, mature compiler / language. And maybe this is your point as well when you say: "ClassFormatErrors or VerifyErrors are errors and no exceptions".

        My only point is that I think the perception of stability comes mainly from making sure that errors are caught up front and reasonable error messages are given, rather than producing an invalid class file and falling back on the JVM to throw a ClassFormatError. And (like many others) I want to see Groovy become perceived as stable in the market place so I can justify its use in my organization, etc.

        I agree with you there is nothing to truly be afraid of from a technical standpoint (security, etc). The only thing to fear is fear itself

        Show
        Jesse Watson added a comment - Jochen – thanks for the clarification. I didn't realize this limit was in the JVM – good to know. I guess my larger point was more perception-oriented than anything. ClassFormatError smells like a bad internal error, and someone like myself seeing that (without knowing as much as yourself about what is really going on), I think: UH-OH, SOMETHING IS DEEPLY WRONG HERE... GROOVY IS UNSTABLE! For no other reason than because in Java, the only time you would see a ClassFormatError is when your compiler has a bug in it, or if you are trying to run some non-class file through a JVM. For example, a too-long string literal run through javac produces the following compile-time error: "constant string too long". That feels like the error message from a stable, mature compiler / language. And maybe this is your point as well when you say: "ClassFormatErrors or VerifyErrors are errors and no exceptions". My only point is that I think the perception of stability comes mainly from making sure that errors are caught up front and reasonable error messages are given, rather than producing an invalid class file and falling back on the JVM to throw a ClassFormatError. And (like many others) I want to see Groovy become perceived as stable in the market place so I can justify its use in my organization, etc. I agree with you there is nothing to truly be afraid of from a technical standpoint (security, etc). The only thing to fear is fear itself
        Hide
        blackdrag blackdrag added a comment -

        the compiler will now complain if the string is too big

        Show
        blackdrag blackdrag added a comment - the compiler will now complain if the string is too big

          People

          • Assignee:
            blackdrag blackdrag
            Reporter:
            Jonathan Meeks
          • Votes:
            2 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: