|
|
|
[
Permlink
| « Hide
]
Matt Fowles - 07/May/08 09:30 AM
I am part way through an implementation of this. But I wanted to get the bug written down in case I get pulled of the problem.
The attached parse implements minimal parenthesizing of expressions based on precedence (i.e. parentheses will only be included in those places where they are needed for correctness and no more). It also add a bunch of test cases for this (many of which fail without the patch).
additionally, it would be nice if UnparseVisitors members were either public or protected. I have implemented this in my version, but extracting that diff from my other changes is a fair amount of effort.
Hi Matt,
here are my thoughts on the issue: The AST generated by the Java parser is a representation of the syntactical elements of a JAVA program. E.g. "1+2*3" is parsed in to "sum of 1 and (product of 2 and 3)", while "(1+2)*3" is parsed into "product of (parenthesized expression of (sum of 1 and 2)) and 3". Due to the precedence rules of Java, certain combinations of the syntactical elements are impossible, e.g. "product of (sum of 1 and 2) and 3". These can only be constructed programmatically, as you did. (Let's call these "unnatural syntax trees".) All natural syntax trees (as those created by the Java parsers) are correctly converted to text by the UnparseVisitor (UV). This behavior could be documented as a limitation of the UV, and we're done. Alternatively, a new type hierarchy derived from "Rvalue" could be introduced with members like "MultiplicativeOperation" and "AdditiveOperation", where the AO takes two MOs as its operands, but not vice versa. This way, unnatural syntax trees can no longer be constructed, and the output of the UV would be correct in ALL cases. Third, as you proposed, the UV could be enhanced to generate correct output even for unnatural syntax trees. While producing correct results is nice, I don't really like the option, because this is not an exact inversion of the parsing process: If you re-parse the output of the UV, you get a different AST. (In previous versions of JANINO, I had no "ParenthesizedExpression" AST element, and later introduced it for exactly that reason.) All in all, I am quite ambivalent about the issue, which is why I'd like to ask "why the heck would you unparse a programmatically created AST"? Arno~
We (the developers at my company) unparse programmatically created ASTs all the time as a debugging aid. The ability to do this is entirely vital for my project. I did note that all natural trees were handled correctly, and in the case of natural trees my changes to UV will have no change on the output (other than the whitespace changes that are already made). Thus, my change simply make UV work for ALL cases, which seems like a better state of the world, especially as we heavily depend on this feature for our debugging. As a side note, you claim specifically on the website that Janino is good for this use: http://www.janino.net/use.html#code_manipulator and I can easily envision such manipulations producing unnatural ASTs. In fact my test includes a visitor that implements a manipulation that removes paren expression in order to produce a minimally parenthesized output). Given that I have already implemented and provided a patch for the third proposal, I am in favor of option 3 Matt Yes, correct unparsing is important for debugging, accepted.
Implemented the correct unparsing of "unnatural" syntaxes, with the following changes:
Instead of inserting artificial "(" and ")", insert "((( " and " )))". This makes unnatural syntaxes clearly visible in the output, while the output is still correct Java syntax. (And alternative would have been "(/UNNATURAL/ " and " /UNNATURAL/".) To make your unit tests work, I changed them to replace the "((( "s and " )))"s back on-the-fly. I didn't use your code, because the functionality can be implemented statelessly. However, thanks a million for the unit tests, that made the (re-)implementation of the feature so much easier! Please review the code. Will go into the next version of JANINO. I found a few bugs in your implementation. The attached patch adds more test cases and fixes the bugs.
Also, can we change the member variables of UnparseVisitor to be protected?
My projects needs access to them so that it can intersperse extra debug information in the output as comments. Hi Matt,
I applied your bug fixes, and fixed another bunch of bugs. UnparseTests now parses, unparses and parses all JANINO sources, and compares the two ASTs. Please review my changes to "Java", "Scanner", "UnparseVisitor" and "UnparseTests". Looks good at the moment, let me run it through our nightly tests and see if anything turns up.
Scratch that, this fails horribly after the change to unparsing negative numbers as hex.
public void testDirectAst() throws Exception { Java.Literal lit = new Java.Literal(null, Integer.valueOf(-1)); StringWriter sw = new StringWriter(); UnparseVisitor uv = new UnparseVisitor(sw); lit.accept(uv); String s = sw.toString(); Assert.assertEquals("-1", s); } Will fail with Expected: "-1" but the latter isn't even valid java. The attached patch fixes this by allowing negative umbers to unparse normally and making testParseUnparseParseJanino() collect literals follow by negations into a single literal.
Hi Matt,
nope, your fix is wrong (bummer). The negative sign is never part of a Java literal! From file:///C:/jls3/lexical.html#3.10.1:
Also, my code is buggy: 2^32-1 should not be unparsed to "ffffffff", but to "0xffffffff". Generally, Java literals allow different ways to represent the same values, e.g.: Integer: 12 0xc 0xC 0x00000000000000000c The (precisely: my personal) definition of unparsing is: Creation of source code that is syntactically identical with the original code. This implies:
In other words: It is wrong to write a JUNIT test that checks that "-1" parses/unparses to "-1". (Just like it is wrong to check that "a + b" parses/unparses to "a+b" or "a + b".) Please review my changes to JLS2Tests, UnparseTests and Scanner. (The commit comment to Scanner/345 is wrong, please excuse.) I don't particularly care whether negative literals unparse as hex or decimal, as long as they unparse to legal java code.
Arno, I do not have the ability to close this issue. I think my login to Jira does not have the power. You can consider this closed.
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||