Current problems of the plsql grammar:
- Not really complying to oracle's plsql definition. The grammar is able to parse most plsql source files, but it might parse it wrongly (e.g. choosing wrong productions)
- There is often a mixture of using specific AST nodes or image-strings to transport information. This image creation (via StringBuilders) makes the grammar hard to maintain and hard to read. E.g.
|
ASTVariableOrConstantDeclarator VariableOrConstantDeclarator() : |
|
{ |
|
PLSQLNode simpleNode = null ; |
|
StringBuilder sb = new StringBuilder(); |
|
} |
|
{ |
|
( |
|
simpleNode = VariableOrConstantDeclaratorId() { sb.append(simpleNode.getImage());} |
|
[LOOKAHEAD(2) <CONSTANT> {sb.append(" " + token.image);} ] simpleNode = Datatype() { sb.append(" " + simpleNode.getImage());} |
|
[[<NOT> {sb.append(" " + token.image);} ] <NULL> {sb.append(" " + token.image);} ] |
|
[ ( ":" "=" {sb.append(" :=");}| <_DEFAULT> {sb.append(" " + token.image);}) |
|
simpleNode = VariableOrConstantInitializer() { sb.append(" " + simpleNode.getImage());} |
|
] |
|
) |
|
{ jjtThis.setImage(sb.toString()) ; return jjtThis ; } |
|
} |
- Since changing this will definitively change the resulting AST, most rules will need to be adjusted.
Doesn't this production try to do too much? I mean dumping the whole operator to a string makes it harder to analyse later on, whereas reifying the structure of the operator into an AST node would avoid losing this information.
E.g.
RelationalExpression ::=
AdditiveExpression RelationalOperator AdditiveExpression EscapeClause?
RelationalOperator ::= // a #void interface, to keep the AST flat
ComparisonOperator
| NegatedOperator
| NegatableOperator
NegatedOperator ::=
"NOT" NegatableOperator
NegatableOperator ::= // a #void interface
BetweenOp // These nodes would each correspond to their respective keyword
| InOp
| LikeOp
| SetInclusionOp
| MultisetOp
ComparisonOperator ::=
"<>" | "<=" | ...
SetInclusionOp ::=
(MemberOp | SubmultisetOp) "OF"?
MultisetOp ::=
"MULTISET" ("EXCEPT | "INTERSECT" | "UNION") [ "DISTINCT" | "ALL"]
EscapeClause ::=
"ESCAPE" (CharLiteral | StringLiteral)
Having one node for each operator like
may seem verbose at first, but JJTree allows to write that inline as e.g.
<IN> #InOp
| <LIKE> #LikeOp
instead of splitting this in very many productions. (I mention this because I think I've never come across this syntax anywhere in our grammars so maybe you're unaware of it.) This makes the AST structure very informative, but has the downside to increase the number of node classes significantly. If this is a problem, we may want to use a single common node type, e.g. ASTPlsqlKeyword, to represent any one-keyword AST node, with the image selecting the actual keyword used.
This is obviously out of the scope of this PR, and maybe to be left for 7.0.0, but something to think about anyway.
I can open an issue to track it separately if you think this change is worth it.
Current problems of the plsql grammar:
pmd/pmd-plsql/etc/grammar/PldocAST.jjt
Lines 636 to 651 in 178c9a8
Comment from #1493 (comment) copied here:
Doesn't this production try to do too much? I mean dumping the whole operator to a string makes it harder to analyse later on, whereas reifying the structure of the operator into an AST node would avoid losing this information.
E.g.
Having one node for each operator like
may seem verbose at first, but JJTree allows to write that inline as e.g.
instead of splitting this in very many productions. (I mention this because I think I've never come across this syntax anywhere in our grammars so maybe you're unaware of it.) This makes the AST structure very informative, but has the downside to increase the number of node classes significantly. If this is a problem, we may want to use a single common node type, e.g.
ASTPlsqlKeyword, to represent any one-keyword AST node, with the image selecting the actual keyword used.This is obviously out of the scope of this PR, and maybe to be left for 7.0.0, but something to think about anyway.
I can open an issue to track it separately if you think this change is worth it.