refactor(formatter): core revamp#11511
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Instrumentation Performance ReportMerging #11511 will degrade performances by 7.85%Comparing Summary
Benchmarks breakdown
|
|
@Boshen The PR is ready to review, there are a few parts in the |
f5179d9 to
3ffb1ef
Compare
3ffb1ef to
5555509
Compare
Merge activity
|
close: #11339 (Another approach is the memory address-based approach, but it is not ideal and has some problems.) This PR introduces `AstNodes` and `AstNode` to replace the `ParentStack`, and changes that implement `Format` and `FormatWrite` traits from `AST`s to `AstNode` so that now we can access parent by `self.parent` during formatting. All other changes stem from the main one. ```rs pub enum AstNodes<'a> { Dummy(), Program(&'a AstNode<'a, Program<'a>>), IdentifierName(&'a AstNode<'a, IdentifierName<'a>>), ... } pub struct AstNode<'a, T> { inner: &'a T, pub parent: &'a AstNodes<'a>, allocator: &'a Allocator, } ``` ### What is the purpose of `AstNode`? The `AstNode` aims to get the parent node easily. Before we use `ParentStack` to get the parent node correctly, which has a significant problem, that is, we have to push ancestor nodes sequentially for a node, whereas the implementation of some ASTs can't meet this condition. For example, in the `CallExpression`, we have special logic to print `Arguments` if the first argument is an `ArrowFunctionExpression`; if so, we will print `ArrowFunctionExpression` directly rather than printing a `Argument`, which causes us to lose a parent node `Argument`. The above issue is exactly what we aim to address with `AstNode`. ### How does `AstNode` solve the above issue? As you can see from the `AstNode` struct, which is shown above, every AstNode has `inner` and `parent` fields. The `inner` field stores the original AST, which would be used to print the child of itself, and the `parent`, which is `AstNodes` and points to the parent of the current AST, so that we can know what the parent is. Let's take a look at how we print the children of the `Program`. ```rs impl<'a> AstNode<'a, Program<'a>> { ... #[inline] pub fn hashbang(&self) -> Option<&AstNode<'a, Hashbang<'a>>> { self.allocator .alloc(self.inner.hashbang.as_ref().map(|inner| AstNode { inner, allocator: self.allocator, parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))), })) .as_ref() } #[inline] pub fn directives(&self) -> &AstNode<'a, Vec<'a, Directive<'a>>> { self.allocator.alloc(AstNode { inner: &self.inner.directives, allocator: self.allocator, parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))), }) } #[inline] pub fn body(&self) -> &AstNode<'a, Vec<'a, Statement<'a>>> { self.allocator.alloc(AstNode { inner: &self.inner.body, allocator: self.allocator, parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))), }) } } // Printing impl<'a> FormatWrite<'a> for AstNode<'a, Program<'a>> { fn write(&self, f: &mut Formatter<'_, 'a>) -> FormatResult<()> { write!( f, [ self.hashbang(), self.directives(), self.body(), hard_line_break() ] ) } } ``` As you can see, now we implement `FormatWrite` trait for the `AstNode<'a, Prgoram<'a>>` rather than `Program<'a>`, and we need to call the `AstNode`'s methods to get the child `AstNode` to print, which converts `inner`'s child nodes to `AstNode` and binds the parent. That is to say that we handle `AstNode` all along. NOTE: All of the field-to-AstNode methods are generated by the ast tools and generated in [PLACEHOLDER]. ### Worth noting things #### As little as possible to get child nodes by those methods Since getting a child node from those methods has some overhead, as we need to wrap an `AstNode` for the original AST. It would be better to only use those methods to get the child nodes when they need to be printed. #### Efficiently check certain fields Same as the above, to avoid unnecessary overhead, it would be better to get the original AST to check if you want to check something. For example, if you want to check the first statement of the `body` of the `Program` if it's an `EmptyStatement`, you can check by `self.as_ref().body.first()` rather than `self.body().first()`. ```rs impl<'a, T> AsRef<T> for AstNode<'a, T> { fn as_ref(&self) -> &T { self.inner } } ``` `as_ref()` is to get the original AST. #### `Deref` trait for the `AstNode` ```rs impl<'a, T> Deref for AstNode<'a, T> { type Target = T; fn deref(&self) -> &Self::Target { self.inner } } ``` Since we implemented the `Deref` trait for the `AstNode`, now we can directly call the original AST methods by `self.method()` rather than `self.as_ref() method()` which can reduce a lot of `as_ref()` calls. But there is a footgun, after `Deref` was implemented, both `self.body` and `self.body()` are work, but only `self.body()` can be printed because only `AstNode` implements `Format` trait, so this may make the new contributor who unfamiliar with the underly design to struggle. I will keep an eye on this to see whether it should be removed. #### `as_ast_nodes` method for enum AST, such as `Statement`, `Expression`, and so on. In general, we don't need to call `as_ast_nodes` because in most cases we only need to call `fmt` for the inner AST of the enum AST. But in some cases, we need to deal with the specific thing for a certain inner node of the enum AST. For example, if the `Expression` is a `ClassExpression`, you need to print the decorators of that class first, then handle the class somewhere. In that time, you need to use `as_ast_nodes` to get the inner node which is converted to the `AstNode`, the code lis ike: ```rs impl<'a> AstNode<'a, Expression<'a>> { #[inline] pub fn as_ast_nodes(&self) -> &AstNodes<'a> { let parent = self.parent; let node = match self.inner { Expression::ClassExpression(s) => AstNodes::Class(self.allocator.alloc(AstNode { inner: s.as_ref(), parent, allocator: self.allocator, })) ... }; self.allocator.alloc(node) } } if let AstNodes::Class(class) = expr.as_ast_nodes() { write!(f, class.decorators()) } ``` ### Performance The initial expectation was a performance improvement due to removing a large `Vec` (previously used for parent kinds) and eliminating indexed parent lookups. However, this has not been observed. The performance regression might stem from the overhead of wrapping original ASTs into `AstNode` instances, which involves frequent `Allocator::alloc` calls. Additionally, the current implementation is incomplete, with limited parent-checking logic, so the benefits of `AstNode` may not be fully realized yet. ### Downsides The extensive use of generic structs and the generation of a large file (over 8000 lines) have led to an increase in binary size and compilation time, though the increases are not excessive. #### Binary Size * Before: 4.8 MB * After: 6.9 MB #### Compilation Time (Release Profile) * Before: 6.67s * After: 7.41s
5555509 to
cf0b7d5
Compare

close: #11339 (Another approach is the memory address-based approach, but it is not ideal and has some problems.)
This PR introduces
AstNodesandAstNodeto replace theParentStack, and changes that implementFormatandFormatWritetraits fromASTs toAstNodeso that now we can access parent byself.parentduring formatting. All other changes stem from the main one.What is the purpose of
AstNode?The
AstNodeaims to get the parent node easily. Before we useParentStackto get the parent node correctly, which has a significant problem, that is, we have to push ancestor nodes sequentially for a node, whereas the implementation of some ASTs can't meet this condition.For example, in the
CallExpression, we have special logic to printArgumentsif the first argument is anArrowFunctionExpression; if so, we will printArrowFunctionExpressiondirectly rather than printing aArgument, which causes us to lose a parent nodeArgument.The above issue is exactly what we aim to address with
AstNode.How does
AstNodesolve the above issue?As you can see from the
AstNodestruct, which is shown above, every AstNode hasinnerandparentfields. Theinnerfield stores the original AST, which would be used to print the child of itself, and theparent, which isAstNodesand points to the parent of the current AST, so that we can know what the parent is.Let's take a look at how we print the children of the
Program.As you can see, now we implement
FormatWritetrait for theAstNode<'a, Prgoram<'a>>rather thanProgram<'a>, and we need to call theAstNode's methods to get the childAstNodeto print, which convertsinner's child nodes toAstNodeand binds the parent. That is to say that we handleAstNodeall along.NOTE: All of the field-to-AstNode methods are generated by the ast tools and generated in [PLACEHOLDER].
Worth noting things
As little as possible to get child nodes by those methods
Since getting a child node from those methods has some overhead, as we need to wrap an
AstNodefor the original AST. It would be better to only use those methods to get the child nodes when they need to be printed.Efficiently check certain fields
Same as the above, to avoid unnecessary overhead, it would be better to get the original AST to check if you want to check something. For example, if you want to check the first statement of the
bodyof theProgramif it's anEmptyStatement, you can check byself.as_ref().body.first()rather thanself.body().first().as_ref()is to get the original AST.Dereftrait for theAstNodeSince we implemented the
Dereftrait for theAstNode, now we can directly call the original AST methods byself.method()rather thanself.as_ref() method()which can reduce a lot ofas_ref()calls.But there is a footgun, after
Derefwas implemented, bothself.bodyandself.body()are work, but onlyself.body()can be printed because onlyAstNodeimplementsFormattrait, so this may make the new contributor who unfamiliar with the underly design to struggle. I will keep an eye on this to see whether it should be removed.as_ast_nodesmethod for enum AST, such asStatement,Expression, and so on.In general, we don't need to call
as_ast_nodesbecause in most cases we only need to callfmtfor the inner AST of the enum AST. But in some cases, we need to deal with the specific thing for a certain inner node of the enum AST.For example, if the
Expressionis aClassExpression, you need to print the decorators of that class first, then handle the class somewhere. In that time, you need to useas_ast_nodesto get the inner node which is converted to theAstNode, the code lis ike:Performance
The initial expectation was a performance improvement due to removing a large
Vec(previously used for parent kinds) and eliminating indexed parent lookups. However, this has not been observed.The performance regression might stem from the overhead of wrapping original ASTs into
AstNodeinstances, which involves frequentAllocator::alloccalls. Additionally, the current implementation is incomplete, with limited parent-checking logic, so the benefits ofAstNodemay not be fully realized yet.Downsides
The extensive use of generic structs and the generation of a large file (over 8000 lines) have led to an increase in binary size and compilation time, though the increases are not excessive.
Binary Size
Compilation Time (Release Profile)