History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: JANINO-111
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Arno Unkrig
Reporter: Matt Fowles
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Janino

UnparseVisitor does not handle precedence of operators

Created: 06/May/08 04:24 PM   Updated: 16/Jun/08 09:29 AM
Component/s: None
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments: 1. Text File unparse-fix.patch (53 kb)
2. Text File unparse.patch (11 kb)
3. Text File unparse.patch (41 kb)


Testcase included: yes


 Description  « Hide
Consider the following hand created AST and UnparseVisitor:

ExpressionStatement es = new Java.ExpressionStatement(
new Java.Assignment( null,
new Java.AmbiguousName( null, new String[] { "x" }),
"=",
new Java.BinaryOperation(null,
new Java.Literal(null, Integer.valueOf(1)),
"*",
new Java.BinaryOperation(null,
new Java.Literal(null, Integer.valueOf(2)),
"+",
new Java.Literal(null, Integer.valueOf(3)),
)
)
)
);

StringWriter sw = new StringWriter();
UnparseVisitor uv = new UnparseVisitor(sw);
uv.visitExpressionStatement(es);

sw will contain "x = 1 * 2 + 3;" it should contain something more like "x = 1 * (2 + 3);" to handle the precedence difference in '+' and '*'.

This is even worse for Unary '!' and "instanceof", because the generated code won't compile without the parentheses.



 All   Comments   Work Log   Change History      Sort Order: Ascending order - Click to sort in descending order
Matt Fowles - 07/May/08 09:30 AM
I am part way through an implementation of this. But I wanted to get the bug written down in case I get pulled of the problem.

Matt Fowles - 07/May/08 01:30 PM
The attached parse implements minimal parenthesizing of expressions based on precedence (i.e. parentheses will only be included in those places where they are needed for correctness and no more). It also add a bunch of test cases for this (many of which fail without the patch).

Matt Fowles - 09/May/08 09:30 PM
additionally, it would be nice if UnparseVisitors members were either public or protected. I have implemented this in my version, but extracting that diff from my other changes is a fair amount of effort.

Arno Unkrig - 15/May/08 02:37 PM
Hi Matt,

here are my thoughts on the issue:

The AST generated by the Java parser is a representation of the syntactical elements of a JAVA program. E.g. "1+2*3" is parsed in to "sum of 1 and (product of 2 and 3)", while "(1+2)*3" is parsed into "product of (parenthesized expression of (sum of 1 and 2)) and 3". Due to the precedence rules of Java, certain combinations of the syntactical elements are impossible, e.g. "product of (sum of 1 and 2) and 3". These can only be constructed programmatically, as you did. (Let's call these "unnatural syntax trees".)

All natural syntax trees (as those created by the Java parsers) are correctly converted to text by the UnparseVisitor (UV). This behavior could be documented as a limitation of the UV, and we're done.

Alternatively, a new type hierarchy derived from "Rvalue" could be introduced with members like "MultiplicativeOperation" and "AdditiveOperation", where the AO takes two MOs as its operands, but not vice versa. This way, unnatural syntax trees can no longer be constructed, and the output of the UV would be correct in ALL cases.

Third, as you proposed, the UV could be enhanced to generate correct output even for unnatural syntax trees. While producing correct results is nice, I don't really like the option, because this is not an exact inversion of the parsing process: If you re-parse the output of the UV, you get a different AST. (In previous versions of JANINO, I had no "ParenthesizedExpression" AST element, and later introduced it for exactly that reason.)

All in all, I am quite ambivalent about the issue, which is why I'd like to ask "why the heck would you unparse a programmatically created AST"?


Matt Fowles - 15/May/08 02:49 PM
Arno~

We (the developers at my company) unparse programmatically created ASTs all the time as a debugging aid. The ability to do this is entirely vital for my project.

I did note that all natural trees were handled correctly, and in the case of natural trees my changes to UV will have no change on the output (other than the whitespace changes that are already made).

Thus, my change simply make UV work for ALL cases, which seems like a better state of the world, especially as we heavily depend on this feature for our debugging. As a side note, you claim specifically on the website that Janino is good for this use:

http://www.janino.net/use.html#code_manipulator

and I can easily envision such manipulations producing unnatural ASTs. In fact my test includes a visitor that implements a manipulation that removes paren expression in order to produce a minimally parenthesized output).

Given that I have already implemented and provided a patch for the third proposal, I am in favor of option 3

Matt


Arno Unkrig - 21/May/08 06:58 PM
Yes, correct unparsing is important for debugging, accepted.

Arno Unkrig - 21/May/08 07:03 PM
Implemented the correct unparsing of "unnatural" syntaxes, with the following changes:

Instead of inserting artificial "(" and ")", insert "((( " and " )))". This makes unnatural syntaxes clearly visible in the output, while the output is still correct Java syntax. (And alternative would have been "(/UNNATURAL/ " and " /UNNATURAL/".) To make your unit tests work, I changed them to replace the "((( "s and " )))"s back on-the-fly.

I didn't use your code, because the functionality can be implemented statelessly. However, thanks a million for the unit tests, that made the (re-)implementation of the feature so much easier!

Please review the code.

Will go into the next version of JANINO.


Matt Fowles - 22/May/08 12:57 PM
I found a few bugs in your implementation. The attached patch adds more test cases and fixes the bugs.

Matt Fowles - 22/May/08 01:21 PM
Also, can we change the member variables of UnparseVisitor to be protected?

My projects needs access to them so that it can intersperse extra debug information in the output as comments.


Arno Unkrig - 22/May/08 06:13 PM
Hi Matt,

I applied your bug fixes, and fixed another bunch of bugs. UnparseTests now parses, unparses and parses all JANINO sources, and compares the two ASTs.

Please review my changes to "Java", "Scanner", "UnparseVisitor" and "UnparseTests".


Matt Fowles - 23/May/08 09:30 AM
Looks good at the moment, let me run it through our nightly tests and see if anything turns up.

Matt Fowles - 23/May/08 09:38 AM
Scratch that, this fails horribly after the change to unparsing negative numbers as hex.

Matt Fowles - 23/May/08 09:44 AM
public void testDirectAst() throws Exception {
        Java.Literal lit = new Java.Literal(null, Integer.valueOf(-1));
        
        
        StringWriter sw = new StringWriter();
        UnparseVisitor uv = new UnparseVisitor(sw);
        lit.accept(uv);
        String s = sw.toString();
        Assert.assertEquals("-1", s);
}

Will fail with

Expected: "-1"
Actual: "ffffffff"

but the latter isn't even valid java.


Matt Fowles - 23/May/08 10:38 AM
The attached patch fixes this by allowing negative umbers to unparse normally and making testParseUnparseParseJanino() collect literals follow by negations into a single literal.

Arno Unkrig - 25/May/08 02:44 PM
Hi Matt,

nope, your fix is wrong (bummer). The negative sign is never part of a Java literal! From file:///C:/jls3/lexical.html#3.10.1:

A decimal numeral is either the single ASCII character 0, representing the integer zero, or consists of an ASCII digit from 1 to 9, optionally followed by one or more ASCII digits from 0 to 9, representing a positive integer:
...
A hexadecimal numeral consists of the leading ASCII characters 0x or 0X followed by one or more ASCII hexadecimal digits and can represent a positive, zero, or negative integer.

Also, my code is buggy: 2^32-1 should not be unparsed to "ffffffff", but to "0xffffffff".

Generally, Java literals allow different ways to represent the same values, e.g.:

Integer: 12 0xc 0xC 0x00000000000000000c
Long: 47l 47L
Double: 1.3 1.3E0 1.3D 1.300000
Character: 'A' '\100'
String: "ABC" "\100BC"

The (precisely: my personal) definition of unparsing is: Creation of source code that is syntactically identical with the original code. This implies:

  • There are no negative numeric literals: "-3" is the negation of a literal
  • (Non-doc) Comments and whitespace are modified while unparsing, because they are syntactically irrelevant
  • Literals may also be modified while unparsing, because that also is syntactically irrelevant

In other words: It is wrong to write a JUNIT test that checks that "-1" parses/unparses to "-1". (Just like it is wrong to check that "a + b" parses/unparses to "a+b" or "a + b".)

Please review my changes to JLS2Tests, UnparseTests and Scanner. (The commit comment to Scanner/345 is wrong, please excuse.)


Matt Fowles - 25/May/08 10:43 PM
I don't particularly care whether negative literals unparse as hex or decimal, as long as they unparse to legal java code.

Matt Fowles - 16/Jun/08 09:29 AM
Arno, I do not have the ability to close this issue. I think my login to Jira does not have the power. You can consider this closed.