[modelica] Normalize invalid node ranges#2195
Conversation
Some files were parsed into ASTs with nodes having negative ranges.
For example:
package TestPackage
package EmptyPackage
end EmptyPackage;
end TestPackage;
has subtree:
SimpleLongClassSpecifier "EmptyPackage"
SimpleName "EmptyPackage"
Composition
ElementList <-- start = 3:3, end = 2:22
SimpleName "EmptyPackage"
Generated by 🚫 Danger |
| endLine = parser.token.endLine; | ||
| endColumn = parser.token.endColumn; | ||
|
|
||
| if (endLine < beginLine) { |
There was a problem hiding this comment.
You could avoid doing any of that (in jjtOpen/Close) if instead of using the fields of AbstractNode, you delegated the method calls to the tokens, eg:
There was a problem hiding this comment.
Thanks, this definitely looks significantly less hackish. Unfortunately, some nodes end up not containing any tokens (like in the example above), leading to invalid ranges. Are such grammars invalid or maybe it is a minor issue in common code?
There was a problem hiding this comment.
The range [3:3, 2:22] for ElemementList suggests that [FirstToken, LastToken] takes the form [t, t-1]
There was a problem hiding this comment.
I'd first create a unit test, that verifies the locations of the nodes in the given code example. Then you can be sure, that a) the bug is fixed, b) it is not reintroduced later on.
The grammar creates the node ElementList, even though, there are no Element children. You can avoid creating the ElementList node by using a conditional node in the grammar:
-void ElementList(Visibility v): {}
+void ElementList(Visibility v) #ElementList(>1): {}(see line 353 in Modelica.jjt)
Whether this makes sense here, I can't say. You seem to set the visibility on the node, although it is always "UNSPEC". The same would need to be applied for the node Composition and possibly other nodes.
There was a problem hiding this comment.
I thought on making this node conditional. On one hand, it cleans up the AST and in this particular case it may be the best solution. On the other hand, it is interesting, whether either some generic solution exists or such token-free subtrees are explicitly considered an issue.
Related to previous, on adding a test: it do exists, but in SCM repo. I agree, it is worth moving it here in slightly reduced form. It is still interesting, is this condition expected to hold for every language module?
There was a problem hiding this comment.
You seem to set the visibility on the node, although it is always "UNSPEC".
Visibility is public, private, etc. modifiers set in source file:
package TestPackage
public
package EmptyPackage
end EmptyPackage;
end TestPackage;This creates two ElementLists, one of them is PUBLIC. But, frankly speaking, I have never used such modifiers and not yet handle them in type resolution.
There was a problem hiding this comment.
Token free subtrees are not a bug, they're useful to make the parsed AST more regular. This is used for example in the Scala AST, and I'd like to introduce that in the Java AST in 7.0. It's probably just an oversight of JJTree not to handle this case, but we shouldn't limit PMD to it.
I've pushed a solution, that uses an implicit zero-length token. I expect this is how we'll handle it in PMD 7 for the Java tree, though at that point we'll share the solution between language modules, and not do this ad-hoc in jjtClose (eg fixing JJTree).
There was a problem hiding this comment.
FYI - we are using Javacc 5.0, which is a old version. It might be different with a newer javacc version.
|
Thank you! |
Some files were parsed into ASTs with nodes having negative ranges.
For example:
has subtree: