groovy

Lexer fails on forward slash used in division

Details

  • Type: Bug Bug
  • Status: Closed Closed
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: 1.1-rc-2
  • Fix Version/s: None
  • Component/s: lexer
  • Labels:
    None
  • Environment:
    OS X Tiger 10.4.11
  • Testcase included:
    yes
  • Number of attachments :
    0

Description

This simple test is failing (I have groovy-all on classpath):

import groovyjarjarantlr.*;
import java.io.*;
import org.codehaus.groovy.antlr.parser.GroovyLexer;
public class Main {
    public static void main(String[] args) throws TokenStreamException {
        String exp = "println 4 / 2 + 3";
        InputStream inputStream = new ByteArrayInputStream(exp.getBytes());
        GroovyLexer lexer = new GroovyLexer(inputStream);
        while (true) {
            Token token = lexer.nextToken();
            if (token.getType() == Token.EOF_TYPE) return;
            System.out.println("token = " + token);
        }
    }
}

with error:

token = ["println",<84>,line=1,col=1]
token = ["4",<194>,line=1,col=9]
Exception in thread "main" line 1:18: unexpected char: 0xFFFF
        at org.codehaus.groovy.antlr.parser.GroovyLexer.nextToken(GroovyLexer.java:687)
        at Main.main(Main.java:14)
Java Result: 1

If I replace '/' character with '*' or something else it works fine.

Activity

Hide
Martin Adamek added a comment -

This will become invalid I guess, as I found at http://groovy.codehaus.org/Migration+From+Classic+to+JSR+syntax that '\' was used in old syntax for division and now one should use intdiv() function? Is that correct? If so, why it is compilable if one uses '/' for int division?

Show
Martin Adamek added a comment - This will become invalid I guess, as I found at http://groovy.codehaus.org/Migration+From+Classic+to+JSR+syntax that '\' was used in old syntax for division and now one should use intdiv() function? Is that correct? If so, why it is compilable if one uses '/' for int division?
Hide
Paul King added a comment -

Add code tags

Show
Paul King added a comment - Add code tags
Hide
Roshan Dawrani added a comment -

Just wanted to add that it is failing as in "println 4 / 2 + 3", "/" is seen as starting the regular expression and since it does not find the matching closing "/" for the regular expression, it fails.

So, "println 4 / 2 + 3 /" yields the tokens as:

token = ["println",<84>,line=1,col=1,lineLast=1,colLast=8]
token = ["4",<194>,line=1,col=9,lineLast=1,colLast=10]
token = [" 2 + 3 ",<85>,line=1,col=11,lineLast=1,colLast=20] // "2 + 3" seen as the regex value
Show
Roshan Dawrani added a comment - Just wanted to add that it is failing as in "println 4 / 2 + 3", "/" is seen as starting the regular expression and since it does not find the matching closing "/" for the regular expression, it fails. So, "println 4 / 2 + 3 /" yields the tokens as:
token = ["println",<84>,line=1,col=1,lineLast=1,colLast=8]
token = ["4",<194>,line=1,col=9,lineLast=1,colLast=10]
token = [" 2 + 3 ",<85>,line=1,col=11,lineLast=1,colLast=20] // "2 + 3" seen as the regex value
Hide
Roshan Dawrani added a comment - - edited

I think I know now why the lexer is failing in processing "/" in your code when you try to tokenize "println 4 / 2 + 3".

The reason it is failing is to see "/" as division operator and is seeing it as regex starting "/", because it needs to know the type of last token it has processed, which your code is failing to set.

I have made a small change to your code to set the token type correctly to make it like:

package org.codehaus.groovy.antlr.parser;

import groovyjarjarantlr.*;
import java.io.*;
import org.codehaus.groovy.antlr.parser.GroovyLexer;
public class Main {
    public static void main(String[] args) throws TokenStreamException {
        String exp = "println 4 / 2 + 3";
        InputStream inputStream = new ByteArrayInputStream(exp.getBytes());
        GroovyLexer lexer = new GroovyLexer(inputStream);
        while (true) {
            Token token = lexer.nextToken();
            if (token.getType() == Token.EOF_TYPE) return;
            // Roshan: make a note that last token processed is "4", an int, so "/" is to be taken as division operator
            lexer.lastSigTokenType = token.getType();
            System.out.println("token = " + token);
        }
    }
}

and now it yields the tokens correctly as:

token = ["println",<84>,line=1,col=1,lineLast=1,colLast=8]
token = ["4",<194>,line=1,col=9,lineLast=1,colLast=10]
token = ["/",<186>,line=1,col=11,lineLast=1,colLast=12]
token = ["2",<194>,line=1,col=13,lineLast=1,colLast=14]
token = ["+",<144>,line=1,col=15,lineLast=1,colLast=16]
token = ["3",<194>,line=1,col=17,lineLast=1,colLast=18]

Hope it helps.
Roshan

Show
Roshan Dawrani added a comment - - edited I think I know now why the lexer is failing in processing "/" in your code when you try to tokenize "println 4 / 2 + 3". The reason it is failing is to see "/" as division operator and is seeing it as regex starting "/", because it needs to know the type of last token it has processed, which your code is failing to set. I have made a small change to your code to set the token type correctly to make it like:
package org.codehaus.groovy.antlr.parser;

import groovyjarjarantlr.*;
import java.io.*;
import org.codehaus.groovy.antlr.parser.GroovyLexer;
public class Main {
    public static void main(String[] args) throws TokenStreamException {
        String exp = "println 4 / 2 + 3";
        InputStream inputStream = new ByteArrayInputStream(exp.getBytes());
        GroovyLexer lexer = new GroovyLexer(inputStream);
        while (true) {
            Token token = lexer.nextToken();
            if (token.getType() == Token.EOF_TYPE) return;
            // Roshan: make a note that last token processed is "4", an int, so "/" is to be taken as division operator
            lexer.lastSigTokenType = token.getType();
            System.out.println("token = " + token);
        }
    }
}
and now it yields the tokens correctly as:
token = ["println",<84>,line=1,col=1,lineLast=1,colLast=8]
token = ["4",<194>,line=1,col=9,lineLast=1,colLast=10]
token = ["/",<186>,line=1,col=11,lineLast=1,colLast=12]
token = ["2",<194>,line=1,col=13,lineLast=1,colLast=14]
token = ["+",<144>,line=1,col=15,lineLast=1,colLast=16]
token = ["3",<194>,line=1,col=17,lineLast=1,colLast=18]
Hope it helps. Roshan

People

Vote (0)
Watch (1)

Dates

  • Created:
    Updated:
    Resolved: