Janino

UnparseVisitor does not handle precedence of operators

Details

  • Type: Bug Bug
  • Status: Resolved Resolved
  • Priority: Major Major
  • Resolution: Fixed
  • Affects Version/s: None
  • Fix Version/s: None
  • Component/s: None
  • Labels:
    None
  • Testcase included:
    yes
  • Number of attachments :
    3

Description

Consider the following hand created AST and UnparseVisitor:

ExpressionStatement es = new Java.ExpressionStatement(
new Java.Assignment( null,
new Java.AmbiguousName( null, new String[] { "x" }),
"=",
new Java.BinaryOperation(null,
new Java.Literal(null, Integer.valueOf(1)),
"*",
new Java.BinaryOperation(null,
new Java.Literal(null, Integer.valueOf(2)),
"+",
new Java.Literal(null, Integer.valueOf(3)),
)
)
)
);

StringWriter sw = new StringWriter();
UnparseVisitor uv = new UnparseVisitor(sw);
uv.visitExpressionStatement(es);

sw will contain "x = 1 * 2 + 3;" it should contain something more like "x = 1 * (2 + 3);" to handle the precedence difference in '+' and '*'.

This is even worse for Unary '!' and "instanceof", because the generated code won't compile without the parentheses.

  1. unparse.patch
    22/May/08 12:57 PM
    11 kB
    Matt Fowles
  2. unparse.patch
    07/May/08 1:30 PM
    41 kB
    Matt Fowles
  3. unparse-fix.patch
    23/May/08 10:38 AM
    53 kB
    Matt Fowles

Activity

Hide
Matt Fowles added a comment -

I am part way through an implementation of this. But I wanted to get the bug written down in case I get pulled of the problem.

Show
Matt Fowles added a comment - I am part way through an implementation of this. But I wanted to get the bug written down in case I get pulled of the problem.
Hide
Matt Fowles added a comment -

The attached parse implements minimal parenthesizing of expressions based on precedence (i.e. parentheses will only be included in those places where they are needed for correctness and no more). It also add a bunch of test cases for this (many of which fail without the patch).

Show
Matt Fowles added a comment - The attached parse implements minimal parenthesizing of expressions based on precedence (i.e. parentheses will only be included in those places where they are needed for correctness and no more). It also add a bunch of test cases for this (many of which fail without the patch).
Hide
Matt Fowles added a comment -

additionally, it would be nice if UnparseVisitors members were either public or protected. I have implemented this in my version, but extracting that diff from my other changes is a fair amount of effort.

Show
Matt Fowles added a comment - additionally, it would be nice if UnparseVisitors members were either public or protected. I have implemented this in my version, but extracting that diff from my other changes is a fair amount of effort.
Hide
Arno Unkrig added a comment -

Hi Matt,

here are my thoughts on the issue:

The AST generated by the Java parser is a representation of the syntactical elements of a JAVA program. E.g. "1+2*3" is parsed in to "sum of 1 and (product of 2 and 3)", while "(1+2)*3" is parsed into "product of (parenthesized expression of (sum of 1 and 2)) and 3". Due to the precedence rules of Java, certain combinations of the syntactical elements are impossible, e.g. "product of (sum of 1 and 2) and 3". These can only be constructed programmatically, as you did. (Let's call these "unnatural syntax trees".)

All natural syntax trees (as those created by the Java parsers) are correctly converted to text by the UnparseVisitor (UV). This behavior could be documented as a limitation of the UV, and we're done.

Alternatively, a new type hierarchy derived from "Rvalue" could be introduced with members like "MultiplicativeOperation" and "AdditiveOperation", where the AO takes two MOs as its operands, but not vice versa. This way, unnatural syntax trees can no longer be constructed, and the output of the UV would be correct in ALL cases.

Third, as you proposed, the UV could be enhanced to generate correct output even for unnatural syntax trees. While producing correct results is nice, I don't really like the option, because this is not an exact inversion of the parsing process: If you re-parse the output of the UV, you get a different AST. (In previous versions of JANINO, I had no "ParenthesizedExpression" AST element, and later introduced it for exactly that reason.)

All in all, I am quite ambivalent about the issue, which is why I'd like to ask "why the heck would you unparse a programmatically created AST"?

Show
Arno Unkrig added a comment - Hi Matt, here are my thoughts on the issue: The AST generated by the Java parser is a representation of the syntactical elements of a JAVA program. E.g. "1+2*3" is parsed in to "sum of 1 and (product of 2 and 3)", while "(1+2)*3" is parsed into "product of (parenthesized expression of (sum of 1 and 2)) and 3". Due to the precedence rules of Java, certain combinations of the syntactical elements are impossible, e.g. "product of (sum of 1 and 2) and 3". These can only be constructed programmatically, as you did. (Let's call these "unnatural syntax trees".) All natural syntax trees (as those created by the Java parsers) are correctly converted to text by the UnparseVisitor (UV). This behavior could be documented as a limitation of the UV, and we're done. Alternatively, a new type hierarchy derived from "Rvalue" could be introduced with members like "MultiplicativeOperation" and "AdditiveOperation", where the AO takes two MOs as its operands, but not vice versa. This way, unnatural syntax trees can no longer be constructed, and the output of the UV would be correct in ALL cases. Third, as you proposed, the UV could be enhanced to generate correct output even for unnatural syntax trees. While producing correct results is nice, I don't really like the option, because this is not an exact inversion of the parsing process: If you re-parse the output of the UV, you get a different AST. (In previous versions of JANINO, I had no "ParenthesizedExpression" AST element, and later introduced it for exactly that reason.) All in all, I am quite ambivalent about the issue, which is why I'd like to ask "why the heck would you unparse a programmatically created AST"?
Hide
Matt Fowles added a comment -

Arno~

We (the developers at my company) unparse programmatically created ASTs all the time as a debugging aid. The ability to do this is entirely vital for my project.

I did note that all natural trees were handled correctly, and in the case of natural trees my changes to UV will have no change on the output (other than the whitespace changes that are already made).

Thus, my change simply make UV work for ALL cases, which seems like a better state of the world, especially as we heavily depend on this feature for our debugging. As a side note, you claim specifically on the website that Janino is good for this use:

http://www.janino.net/use.html#code_manipulator

and I can easily envision such manipulations producing unnatural ASTs. In fact my test includes a visitor that implements a manipulation that removes paren expression in order to produce a minimally parenthesized output).

Given that I have already implemented and provided a patch for the third proposal, I am in favor of option 3

Matt

Show
Matt Fowles added a comment - Arno~ We (the developers at my company) unparse programmatically created ASTs all the time as a debugging aid. The ability to do this is entirely vital for my project. I did note that all natural trees were handled correctly, and in the case of natural trees my changes to UV will have no change on the output (other than the whitespace changes that are already made). Thus, my change simply make UV work for ALL cases, which seems like a better state of the world, especially as we heavily depend on this feature for our debugging. As a side note, you claim specifically on the website that Janino is good for this use: http://www.janino.net/use.html#code_manipulator and I can easily envision such manipulations producing unnatural ASTs. In fact my test includes a visitor that implements a manipulation that removes paren expression in order to produce a minimally parenthesized output). Given that I have already implemented and provided a patch for the third proposal, I am in favor of option 3 Matt
Hide
Arno Unkrig added a comment -

Yes, correct unparsing is important for debugging, accepted.

Show
Arno Unkrig added a comment - Yes, correct unparsing is important for debugging, accepted.
Hide
Arno Unkrig added a comment -

Implemented the correct unparsing of "unnatural" syntaxes, with the following changes:

Instead of inserting artificial "(" and ")", insert "((( " and " )))". This makes unnatural syntaxes clearly visible in the output, while the output is still correct Java syntax. (And alternative would have been "(/UNNATURAL/ " and " /UNNATURAL/".) To make your unit tests work, I changed them to replace the "((( "s and " )))"s back on-the-fly.

I didn't use your code, because the functionality can be implemented statelessly. However, thanks a million for the unit tests, that made the (re-)implementation of the feature so much easier!

Please review the code.

Will go into the next version of JANINO.

Show
Arno Unkrig added a comment - Implemented the correct unparsing of "unnatural" syntaxes, with the following changes: Instead of inserting artificial "(" and ")", insert "((( " and " )))". This makes unnatural syntaxes clearly visible in the output, while the output is still correct Java syntax. (And alternative would have been "(/UNNATURAL/ " and " /UNNATURAL/".) To make your unit tests work, I changed them to replace the "((( "s and " )))"s back on-the-fly. I didn't use your code, because the functionality can be implemented statelessly. However, thanks a million for the unit tests, that made the (re-)implementation of the feature so much easier! Please review the code. Will go into the next version of JANINO.
Hide
Matt Fowles added a comment -

I found a few bugs in your implementation. The attached patch adds more test cases and fixes the bugs.

Show
Matt Fowles added a comment - I found a few bugs in your implementation. The attached patch adds more test cases and fixes the bugs.
Hide
Matt Fowles added a comment -

Also, can we change the member variables of UnparseVisitor to be protected?

My projects needs access to them so that it can intersperse extra debug information in the output as comments.

Show
Matt Fowles added a comment - Also, can we change the member variables of UnparseVisitor to be protected? My projects needs access to them so that it can intersperse extra debug information in the output as comments.
Hide
Arno Unkrig added a comment -

Hi Matt,

I applied your bug fixes, and fixed another bunch of bugs. UnparseTests now parses, unparses and parses all JANINO sources, and compares the two ASTs.

Please review my changes to "Java", "Scanner", "UnparseVisitor" and "UnparseTests".

Show
Arno Unkrig added a comment - Hi Matt, I applied your bug fixes, and fixed another bunch of bugs. UnparseTests now parses, unparses and parses all JANINO sources, and compares the two ASTs. Please review my changes to "Java", "Scanner", "UnparseVisitor" and "UnparseTests".
Hide
Matt Fowles added a comment -

Looks good at the moment, let me run it through our nightly tests and see if anything turns up.

Show
Matt Fowles added a comment - Looks good at the moment, let me run it through our nightly tests and see if anything turns up.
Hide
Matt Fowles added a comment -

Scratch that, this fails horribly after the change to unparsing negative numbers as hex.

Show
Matt Fowles added a comment - Scratch that, this fails horribly after the change to unparsing negative numbers as hex.
Hide
Matt Fowles added a comment -
public void testDirectAst() throws Exception {
        Java.Literal lit = new Java.Literal(null, Integer.valueOf(-1));
        
        
        StringWriter sw = new StringWriter();
        UnparseVisitor uv = new UnparseVisitor(sw);
        lit.accept(uv);
        String s = sw.toString();
        Assert.assertEquals("-1", s);
}

Will fail with

Expected: "-1"
Actual: "ffffffff"

but the latter isn't even valid java.

Show
Matt Fowles added a comment -
public void testDirectAst() throws Exception {
        Java.Literal lit = new Java.Literal(null, Integer.valueOf(-1));
        
        
        StringWriter sw = new StringWriter();
        UnparseVisitor uv = new UnparseVisitor(sw);
        lit.accept(uv);
        String s = sw.toString();
        Assert.assertEquals("-1", s);
}
Will fail with Expected: "-1" Actual: "ffffffff" but the latter isn't even valid java.
Hide
Matt Fowles added a comment -

The attached patch fixes this by allowing negative umbers to unparse normally and making testParseUnparseParseJanino() collect literals follow by negations into a single literal.

Show
Matt Fowles added a comment - The attached patch fixes this by allowing negative umbers to unparse normally and making testParseUnparseParseJanino() collect literals follow by negations into a single literal.
Hide
Arno Unkrig added a comment -

Hi Matt,

nope, your fix is wrong (bummer). The negative sign is never part of a Java literal! From file:///C:/jls3/lexical.html#3.10.1:

A decimal numeral is either the single ASCII character 0, representing the integer zero, or consists of an ASCII digit from 1 to 9, optionally followed by one or more ASCII digits from 0 to 9, representing a positive integer:
...
A hexadecimal numeral consists of the leading ASCII characters 0x or 0X followed by one or more ASCII hexadecimal digits and can represent a positive, zero, or negative integer.

Also, my code is buggy: 2^32-1 should not be unparsed to "ffffffff", but to "0xffffffff".

Generally, Java literals allow different ways to represent the same values, e.g.:

Integer: 12 0xc 0xC 0x00000000000000000c
Long: 47l 47L
Double: 1.3 1.3E0 1.3D 1.300000
Character: 'A' '\100'
String: "ABC" "\100BC"

The (precisely: my personal) definition of unparsing is: Creation of source code that is syntactically identical with the original code. This implies:

  • There are no negative numeric literals: "-3" is the negation of a literal
  • (Non-doc) Comments and whitespace are modified while unparsing, because they are syntactically irrelevant
  • Literals may also be modified while unparsing, because that also is syntactically irrelevant

In other words: It is wrong to write a JUNIT test that checks that "-1" parses/unparses to "-1". (Just like it is wrong to check that "a + b" parses/unparses to "a+b" or "a + b".)

Please review my changes to JLS2Tests, UnparseTests and Scanner. (The commit comment to Scanner/345 is wrong, please excuse.)

Show
Arno Unkrig added a comment - Hi Matt, nope, your fix is wrong (bummer). The negative sign is never part of a Java literal! From file:///C:/jls3/lexical.html#3.10.1:
A decimal numeral is either the single ASCII character 0, representing the integer zero, or consists of an ASCII digit from 1 to 9, optionally followed by one or more ASCII digits from 0 to 9, representing a positive integer: ... A hexadecimal numeral consists of the leading ASCII characters 0x or 0X followed by one or more ASCII hexadecimal digits and can represent a positive, zero, or negative integer.
Also, my code is buggy: 2^32-1 should not be unparsed to "ffffffff", but to "0xffffffff". Generally, Java literals allow different ways to represent the same values, e.g.: Integer: 12 0xc 0xC 0x00000000000000000c Long: 47l 47L Double: 1.3 1.3E0 1.3D 1.300000 Character: 'A' '\100' String: "ABC" "\100BC" The (precisely: my personal) definition of unparsing is: Creation of source code that is syntactically identical with the original code. This implies:
  • There are no negative numeric literals: "-3" is the negation of a literal
  • (Non-doc) Comments and whitespace are modified while unparsing, because they are syntactically irrelevant
  • Literals may also be modified while unparsing, because that also is syntactically irrelevant
In other words: It is wrong to write a JUNIT test that checks that "-1" parses/unparses to "-1". (Just like it is wrong to check that "a + b" parses/unparses to "a+b" or "a + b".) Please review my changes to JLS2Tests, UnparseTests and Scanner. (The commit comment to Scanner/345 is wrong, please excuse.)
Hide
Matt Fowles added a comment -

I don't particularly care whether negative literals unparse as hex or decimal, as long as they unparse to legal java code.

Show
Matt Fowles added a comment - I don't particularly care whether negative literals unparse as hex or decimal, as long as they unparse to legal java code.
Hide
Matt Fowles added a comment -

Arno, I do not have the ability to close this issue. I think my login to Jira does not have the power. You can consider this closed.

Show
Matt Fowles added a comment - Arno, I do not have the ability to close this issue. I think my login to Jira does not have the power. You can consider this closed.

People

Vote (0)
Watch (0)

Dates

  • Created:
    Updated:
    Resolved: