Skip to content

[java] Make new java expression grammar stricter#1855

Merged
adangel merged 23 commits into
pmd:java-grammarfrom
oowekyala:grammar-opt-expressions
Jun 16, 2019
Merged

[java] Make new java expression grammar stricter#1855
adangel merged 23 commits into
pmd:java-grammarfrom
oowekyala:grammar-opt-expressions

Conversation

@oowekyala

Copy link
Copy Markdown
Member

This makes the grammar of expressions much stricter than what it is currently on this branch. Currently eg foo()() "parses", but the jjtClose of ASTMethodCall fails, so we get a very undescriptive error message. Other absurdities like a::b[0]::c are parsed and end up in the final tree... Now we get regular syntax errors.

It also optimises the lookaheads in primary expression, although from what I measured:

  • the lookahead that we already removed in 96ea140 was by far the most costly method of the parser anyway.
  • time spent in IO in a typical run makes these optimisations pretty insignificant anyway. That's good though, that means that the parser is about as fast as could be :) This parser is also just as fast as the 6.0.x parser.

I tested the parser on the spring framework and the JDK 12 sources, and added some test cases when I found a few bugs (actually #1843 comes from JDK 12). It can parse those with no problem now.

I didn't add any negative tests, maybe we should do that


Details:

The change doesn't touch many things apart from the parser. I made it so though, that we rely less on jjtClose hooks and more on the parser directly. Some more nodes are LeftRecursiveNodes, but that just means that they get injected with some additional children to their left (the interface name has become not really descriptive).

We can also most of the time avoid using jjtree.extendLeft() and only use regular jjtree nodes with a definite number of children (but the the node needs to implement LeftRecursiveNode to fix the tokens).

The change to the parser boils down to making PrimarySuffix much dumber. It can actually only be a field access, method call, array access or method reference now. When the first method reference is parsed, we stop the iteration, to make eg. expr::foo.bar impossible.

PrimaryPrefix is split into two steps for readability. The production takes care of any start token that is not an ambiguous name. If the expression starts with an ambiguous name though (and only then), we follow to the next step, as still more options are possible than in primary suffix. Then we start iterating PrimarySuffix.

@oowekyala oowekyala added an:enhancement An improvement on existing features / rules in:ast About the AST structure or API, the parsing step labels Jun 1, 2019
@oowekyala oowekyala added this to the 7.0.0 milestone Jun 1, 2019
@oowekyala oowekyala changed the title Grammar opt expressions [java] Make new java expression grammar stricter Jun 2, 2019

@adangel adangel left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! 👍

I've tried to update https://github.com/pmd/pmd/wiki/PMD-7.0.0-Java

I tested the parser on the spring framework and the JDK 12 sources

Sounds like we should add openjd12 sources some time to pmd-regression-tester...

@adangel adangel merged commit 4159675 into pmd:java-grammar Jun 16, 2019
@oowekyala oowekyala deleted the grammar-opt-expressions branch June 17, 2019 00:48
@adangel adangel mentioned this pull request Jan 23, 2023
55 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

an:enhancement An improvement on existing features / rules in:ast About the AST structure or API, the parsing step

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants