Un-inline the enums in the CST#698
Conversation
|
| - VariableDeclarationType (Rule): # 157..166 "\t\tuint256" | ||
| - TypeName (Rule): # 157..166 "\t\tuint256" | ||
| - ElementaryType (Rule): # 157..166 "\t\tuint256" | ||
| - UintKeyword (Token): "uint256" # 159..166 |
There was a problem hiding this comment.
One idea we discussed before when we talked about this is flattening the tree structure of the tests a little bit to make it more readable/usable. So, for the below example:
- Instead of each node on a separate line, we can group rules that have only one child (that is also a rule) into the same line, as they will have the same exact range and preview comment.
- Removing
(Rule)and(Token)suffixes, since we already make the dinstinction through the YAML value/LHS on the same line.
Just putting the idea out here. Definetely not blocking for this PR, as we probably should do it in a subsequent PR to make it easier to review.
| - VariableDeclarationType (Rule): # 157..166 "\t\tuint256" | |
| - TypeName (Rule): # 157..166 "\t\tuint256" | |
| - ElementaryType (Rule): # 157..166 "\t\tuint256" | |
| - UintKeyword (Token): "uint256" # 159..166 | |
| - VariableDeclarationType > TypeName > ElementaryType: # 157..166 "\t\tuint256" | |
| - UintKeyword: "uint256" # 159..166 |
There was a problem hiding this comment.
I'll add a TODO for myself.
| context: lex_ctx, | ||
| // Enums have a single reference per variant, so they should be inlined. | ||
| is_inline: matches!(elem.as_ref(), Item::Enum { .. }), | ||
| is_inline: false, |
There was a problem hiding this comment.
I would like to get @AntonyBlakey's eyes on this one. Some of these nodes look very useful, for example ContractMember and ConstructorAttribute, as it makes it easier to find/match on these elements.
However, some of them look extraneous indeed, and are only there for purpose of authoring the grammar/versioning. For example:
-
ElementaryTypeandYulLiteralwhich will (almost) always have a unique parent that can be matched against. Maybe we can refactor the grammar a bit to make this more accurate? -
TypedTupleMemberandUntypedTupleMemberwhich only exist to make parsing/backtracking correct, but provide no additional meaning. Not sure how to avoid it.
I’m trying to avoid adding optional inlining to the DSL unless we absolutely need it. As based on all of our new design decisions, and AST structure, it will make it less obvious to go from grammar to CST/AST, and add another layer of complexity that users have to deal with ..
Without inlining, people can easily depend on the fact that any NonTerminal node is convertible to its matching AST type, and vice versa. If we start to have some inlined enums, it won’t be obvious which CST nodes can be converted to AST types directly. And vice versa, it will be confusing if some AST types started returning a root node that have a different NonTerminalKind.
So, if we are happy with these changes, I’m fine with merging the PR as-is for now, and I can manually go over the enums to see if any of them can be better structured. For example, inlining something like ElementaryType variants into the types it references, since it is almost always used inside another Enum. It will probably be more nuanced than that though.
What do you think?
There was a problem hiding this comment.
Enums that are artefacts of the grammar machinations are not of interest to the CST or the AST. We should I think find a better way to achieve whatever they accomplish. Although I am concerned about the 'almost' comment, because that seems to imply that the constraint is not a logical necessity, and so it must be surfaced as a parent + child.
There was a problem hiding this comment.
Could we make the non-interesting enum into something else in the grammar i.e. effectively add a non-outlined characteristic by using a different name for the concept. It sure sounds like it has a different purpose.
There was a problem hiding this comment.
f2f: we think the PR is good enough for now, and let's review the added kinds later to see if anything can be pruned/removed.
988c2b1 to
5329d30
Compare
|
Rebased and force-pushed; this is still just c4c2f4a + re-generated files and adjusted test. |
Context: #698 (comment) - combined parents with a single child on the same line - using the `꞉` unicode character instead of colon `:` to separate node name and kind, in order not to break YAML parsing/formatting. - surround entire nodes with parenthesis instead of just the kind, to make it easier to read. - include whitespace in the snapshots, since they now take less visual space, and it will make it easier to spot changes to trivia during development.
Context: NomicFoundation/slang#698 (comment) - combined parents with a single child on the same line - using the `꞉` unicode character instead of colon `:` to separate node name and kind, in order not to break YAML parsing/formatting. - surround entire nodes with parenthesis instead of just the kind, to make it easier to read. - include whitespace in the snapshots, since they now take less visual space, and it will make it easier to spot changes to trivia during development.
Part of #638
As we discussed, inlining some nodes types causes us to lose information that is otherwise hard/inconvenient to reconstruct and this hopefully increases the usability of CST alone.