refactor(formatter): core revamp by Dunqing · Pull Request #11511 · oxc-project/oxc

Dunqing · 2025-06-06T09:35:58Z

close: #11339 (Another approach is the memory address-based approach, but it is not ideal and has some problems.)

This PR introduces AstNodes and AstNode to replace the ParentStack, and changes that implement Format and FormatWrite traits from ASTs to AstNode so that now we can access parent by self.parent during formatting. All other changes stem from the main one.

pub enum AstNodes<'a> {
    Dummy(),
    Program(&'a AstNode<'a, Program<'a>>),
    IdentifierName(&'a AstNode<'a, IdentifierName<'a>>),
    ...
}

pub struct AstNode<'a, T> {
    inner: &'a T,
    pub parent: &'a AstNodes<'a>,
    allocator: &'a Allocator,
}

What is the purpose of `AstNode`?

The AstNode aims to get the parent node easily. Before we use ParentStack to get the parent node correctly, which has a significant problem, that is, we have to push ancestor nodes sequentially for a node, whereas the implementation of some ASTs can't meet this condition.

For example, in the CallExpression, we have special logic to print Arguments if the first argument is an ArrowFunctionExpression; if so, we will print ArrowFunctionExpression directly rather than printing a Argument, which causes us to lose a parent node Argument.

The above issue is exactly what we aim to address with AstNode.

How does `AstNode` solve the above issue?

As you can see from the AstNode struct, which is shown above, every AstNode has inner and parent fields. The inner field stores the original AST, which would be used to print the child of itself, and the parent, which is AstNodes and points to the parent of the current AST, so that we can know what the parent is.

Let's take a look at how we print the children of the Program.

impl<'a> AstNode<'a, Program<'a>> {
   ...
   #[inline]
   pub fn hashbang(&self) -> Option<&AstNode<'a, Hashbang<'a>>> {
       self.allocator
           .alloc(self.inner.hashbang.as_ref().map(|inner| AstNode {
               inner,
               allocator: self.allocator,
               parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))),
           }))
           .as_ref()
   }

   #[inline]
   pub fn directives(&self) -> &AstNode<'a, Vec<'a, Directive<'a>>> {
       self.allocator.alloc(AstNode {
           inner: &self.inner.directives,
           allocator: self.allocator,
           parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))),
       })
   }

   #[inline]
   pub fn body(&self) -> &AstNode<'a, Vec<'a, Statement<'a>>> {
       self.allocator.alloc(AstNode {
           inner: &self.inner.body,
           allocator: self.allocator,
           parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))),
       })
   }
}

// Printing
impl<'a> FormatWrite<'a> for AstNode<'a, Program<'a>> {
   fn write(&self, f: &mut Formatter<'_, 'a>) -> FormatResult<()> {
       write!(
           f,
           [
               self.hashbang(),
               self.directives(),
               self.body(),
               hard_line_break()
           ]
       )
   }
}

As you can see, now we implement FormatWrite trait for the AstNode<'a, Prgoram<'a>> rather than Program<'a>, and we need to call the AstNode's methods to get the child AstNode to print, which converts inner's child nodes to AstNode and binds the parent. That is to say that we handle AstNode all along.

NOTE: All of the field-to-AstNode methods are generated by the ast tools and generated in [PLACEHOLDER].

Worth noting things

As little as possible to get child nodes by those methods

Since getting a child node from those methods has some overhead, as we need to wrap an AstNode for the original AST. It would be better to only use those methods to get the child nodes when they need to be printed.

Efficiently check certain fields

Same as the above, to avoid unnecessary overhead, it would be better to get the original AST to check if you want to check something. For example, if you want to check the first statement of the body of the Program if it's an EmptyStatement, you can check by self.as_ref().body.first() rather than self.body().first().

impl<'a, T> AsRef<T> for AstNode<'a, T> {
    fn as_ref(&self) -> &T {
        self.inner
    }
}

as_ref() is to get the original AST.

`Deref` trait for the `AstNode`

impl<'a, T> Deref for AstNode<'a, T> {
    type Target = T;
    fn deref(&self) -> &Self::Target {
        self.inner
    }
}

Since we implemented the Deref trait for the AstNode, now we can directly call the original AST methods by self.method() rather than self.as_ref() method() which can reduce a lot of as_ref() calls.

But there is a footgun, after Deref was implemented, both self.body and self.body() are work, but only self.body() can be printed because only AstNode implements Format trait, so this may make the new contributor who unfamiliar with the underly design to struggle. I will keep an eye on this to see whether it should be removed.

`as_ast_nodes` method for enum AST, such as `Statement`, `Expression`, and so on.

In general, we don't need to call as_ast_nodes because in most cases we only need to call fmt for the inner AST of the enum AST. But in some cases, we need to deal with the specific thing for a certain inner node of the enum AST.

For example, if the Expression is a ClassExpression, you need to print the decorators of that class first, then handle the class somewhere. In that time, you need to use as_ast_nodes to get the inner node which is converted to the AstNode, the code lis ike:

impl<'a> AstNode<'a, Expression<'a>> {
    #[inline]
    pub fn as_ast_nodes(&self) -> &AstNodes<'a> {
        let parent = self.parent;
        let node = match self.inner {
            Expression::ClassExpression(s) => AstNodes::Class(self.allocator.alloc(AstNode {
                inner: s.as_ref(),
                parent,
                allocator: self.allocator,
            }))
            ...
        };
        self.allocator.alloc(node)
    }
}

if let AstNodes::Class(class) = expr.as_ast_nodes() {
  write!(f, class.decorators())
}

Performance

The initial expectation was a performance improvement due to removing a large Vec (previously used for parent kinds) and eliminating indexed parent lookups. However, this has not been observed.

The performance regression might stem from the overhead of wrapping original ASTs into AstNode instances, which involves frequent Allocator::alloc calls. Additionally, the current implementation is incomplete, with limited parent-checking logic, so the benefits of AstNode may not be fully realized yet.

Downsides

The extensive use of generic structs and the generation of a large file (over 8000 lines) have led to an increase in binary size and compilation time, though the increases are not excessive.

Binary Size

Before: 4.8 MB
After: 6.9 MB

Compilation Time (Release Profile)

Before: 6.67s
After: 7.41s

Dunqing · 2025-06-06T09:36:15Z

refactor(formatter): core revamp #11511 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

codspeed-hq · 2025-06-06T09:40:51Z

CodSpeed Instrumentation Performance Report

Merging #11511 will degrade performances by 7.85%

_{Comparing formatter-major-overhaual (cf0b7d5) with main (9136685)}

Summary

❌ 4 regressions
✅ 34 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
❌	`formatter[RadixUIAdoptionSection.jsx]`	302.3 µs	328 µs	-7.85%
❌	`formatter[binder.ts]`	19.4 ms	20.6 ms	-5.81%
❌	`formatter[cal.com.tsx]`	150.8 ms	161 ms	-6.38%
❌	`formatter[react.development.js]`	9.5 ms	10.1 ms	-5.4%

Dunqing · 2025-06-10T04:13:05Z

@Boshen The PR is ready to review, there are a few parts in the ast_nodes generator file I want to polish later, so keep it draft. After all work is done, I will rebase the branch and solve conflicts.

Boshen · 2025-06-11T02:33:55Z

Merge activity

Jun 11, 2:33 AM UTC: The merge label '0-merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
Jun 11, 2:34 AM UTC: Boshen added this pull request to the Graphite merge queue.
Jun 11, 2:40 AM UTC: Merged by the Graphite merge queue.

close: #11339 (Another approach is the memory address-based approach, but it is not ideal and has some problems.) This PR introduces `AstNodes` and `AstNode` to replace the `ParentStack`, and changes that implement `Format` and `FormatWrite` traits from `AST`s to `AstNode` so that now we can access parent by `self.parent` during formatting. All other changes stem from the main one. ```rs pub enum AstNodes<'a> { Dummy(), Program(&'a AstNode<'a, Program<'a>>), IdentifierName(&'a AstNode<'a, IdentifierName<'a>>), ... } pub struct AstNode<'a, T> { inner: &'a T, pub parent: &'a AstNodes<'a>, allocator: &'a Allocator, } ``` ### What is the purpose of `AstNode`? The `AstNode` aims to get the parent node easily. Before we use `ParentStack` to get the parent node correctly, which has a significant problem, that is, we have to push ancestor nodes sequentially for a node, whereas the implementation of some ASTs can't meet this condition. For example, in the `CallExpression`, we have special logic to print `Arguments` if the first argument is an `ArrowFunctionExpression`; if so, we will print `ArrowFunctionExpression` directly rather than printing a `Argument`, which causes us to lose a parent node `Argument`. The above issue is exactly what we aim to address with `AstNode`. ### How does `AstNode` solve the above issue? As you can see from the `AstNode` struct, which is shown above, every AstNode has `inner` and `parent` fields. The `inner` field stores the original AST, which would be used to print the child of itself, and the `parent`, which is `AstNodes` and points to the parent of the current AST, so that we can know what the parent is. Let's take a look at how we print the children of the `Program`. ```rs impl<'a> AstNode<'a, Program<'a>> { ... #[inline] pub fn hashbang(&self) -> Option<&AstNode<'a, Hashbang<'a>>> { self.allocator .alloc(self.inner.hashbang.as_ref().map(|inner| AstNode { inner, allocator: self.allocator, parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))), })) .as_ref() } #[inline] pub fn directives(&self) -> &AstNode<'a, Vec<'a, Directive<'a>>> { self.allocator.alloc(AstNode { inner: &self.inner.directives, allocator: self.allocator, parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))), }) } #[inline] pub fn body(&self) -> &AstNode<'a, Vec<'a, Statement<'a>>> { self.allocator.alloc(AstNode { inner: &self.inner.body, allocator: self.allocator, parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))), }) } } // Printing impl<'a> FormatWrite<'a> for AstNode<'a, Program<'a>> { fn write(&self, f: &mut Formatter<'_, 'a>) -> FormatResult<()> { write!( f, [ self.hashbang(), self.directives(), self.body(), hard_line_break() ] ) } } ``` As you can see, now we implement `FormatWrite` trait for the `AstNode<'a, Prgoram<'a>>` rather than `Program<'a>`, and we need to call the `AstNode`'s methods to get the child `AstNode` to print, which converts `inner`'s child nodes to `AstNode` and binds the parent. That is to say that we handle `AstNode` all along. NOTE: All of the field-to-AstNode methods are generated by the ast tools and generated in [PLACEHOLDER]. ### Worth noting things #### As little as possible to get child nodes by those methods Since getting a child node from those methods has some overhead, as we need to wrap an `AstNode` for the original AST. It would be better to only use those methods to get the child nodes when they need to be printed. #### Efficiently check certain fields Same as the above, to avoid unnecessary overhead, it would be better to get the original AST to check if you want to check something. For example, if you want to check the first statement of the `body` of the `Program` if it's an `EmptyStatement`, you can check by `self.as_ref().body.first()` rather than `self.body().first()`. ```rs impl<'a, T> AsRef<T> for AstNode<'a, T> { fn as_ref(&self) -> &T { self.inner } } ``` `as_ref()` is to get the original AST. #### `Deref` trait for the `AstNode` ```rs impl<'a, T> Deref for AstNode<'a, T> { type Target = T; fn deref(&self) -> &Self::Target { self.inner } } ``` Since we implemented the `Deref` trait for the `AstNode`, now we can directly call the original AST methods by `self.method()` rather than `self.as_ref() method()` which can reduce a lot of `as_ref()` calls. But there is a footgun, after `Deref` was implemented, both `self.body` and `self.body()` are work, but only `self.body()` can be printed because only `AstNode` implements `Format` trait, so this may make the new contributor who unfamiliar with the underly design to struggle. I will keep an eye on this to see whether it should be removed. #### `as_ast_nodes` method for enum AST, such as `Statement`, `Expression`, and so on. In general, we don't need to call `as_ast_nodes` because in most cases we only need to call `fmt` for the inner AST of the enum AST. But in some cases, we need to deal with the specific thing for a certain inner node of the enum AST. For example, if the `Expression` is a `ClassExpression`, you need to print the decorators of that class first, then handle the class somewhere. In that time, you need to use `as_ast_nodes` to get the inner node which is converted to the `AstNode`, the code lis ike: ```rs impl<'a> AstNode<'a, Expression<'a>> { #[inline] pub fn as_ast_nodes(&self) -> &AstNodes<'a> { let parent = self.parent; let node = match self.inner { Expression::ClassExpression(s) => AstNodes::Class(self.allocator.alloc(AstNode { inner: s.as_ref(), parent, allocator: self.allocator, })) ... }; self.allocator.alloc(node) } } if let AstNodes::Class(class) = expr.as_ast_nodes() { write!(f, class.decorators()) } ``` ### Performance The initial expectation was a performance improvement due to removing a large `Vec` (previously used for parent kinds) and eliminating indexed parent lookups. However, this has not been observed. The performance regression might stem from the overhead of wrapping original ASTs into `AstNode` instances, which involves frequent `Allocator::alloc` calls. Additionally, the current implementation is incomplete, with limited parent-checking logic, so the benefits of `AstNode` may not be fully realized yet. ### Downsides The extensive use of generic structs and the generation of a large file (over 8000 lines) have led to an increase in binary size and compilation time, though the increases are not excessive. #### Binary Size * Before: 4.8 MB * After: 6.9 MB #### Compilation Time (Release Profile) * Before: 6.67s * After: 7.41s

github-actions bot added A-ast Area - AST A-ast-tools Area - AST tools A-formatter Area - Formatter C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior labels Jun 6, 2025

Dunqing mentioned this pull request Jun 7, 2025

refactor(ast): get jsx types out of AstKind exceptions #11535

Merged

Dunqing requested a review from Boshen June 10, 2025 04:11

Dunqing assigned Boshen Jun 10, 2025

Boshen mentioned this pull request Jun 10, 2025

This week in Oxc #10329

Closed

Dunqing force-pushed the formatter-major-overhaual branch 2 times, most recently from f5179d9 to 3ffb1ef Compare June 11, 2025 01:35

Dunqing marked this pull request as ready for review June 11, 2025 01:37

Dunqing requested a review from overlookmotel as a code owner June 11, 2025 01:37

Dunqing force-pushed the formatter-major-overhaual branch from 3ffb1ef to 5555509 Compare June 11, 2025 01:40

Boshen added the 0-merge Merge with Graphite Merge Queue label Jun 11, 2025

graphite-app bot force-pushed the formatter-major-overhaual branch from 5555509 to cf0b7d5 Compare June 11, 2025 02:34

graphite-app bot merged commit cf0b7d5 into main Jun 11, 2025
24 checks passed

graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Jun 11, 2025

graphite-app bot deleted the formatter-major-overhaual branch June 11, 2025 02:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(formatter): core revamp#11511

refactor(formatter): core revamp#11511
graphite-app[bot] merged 1 commit intomainfrom
formatter-major-overhaual

Dunqing commented Jun 6, 2025 •

edited

Loading

Uh oh!

Dunqing commented Jun 6, 2025

Uh oh!

codspeed-hq bot commented Jun 6, 2025 •

edited

Loading

Uh oh!

Dunqing commented Jun 10, 2025 •

edited

Loading

Uh oh!

Boshen commented Jun 11, 2025 •

edited by graphite-app bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Dunqing commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of AstNode?

How does AstNode solve the above issue?

Worth noting things

As little as possible to get child nodes by those methods

Efficiently check certain fields

Deref trait for the AstNode

as_ast_nodes method for enum AST, such as Statement, Expression, and so on.

Performance

Downsides

Binary Size

Compilation Time (Release Profile)

Uh oh!

Dunqing commented Jun 6, 2025

How to use the Graphite Merge Queue

Uh oh!

codspeed-hq bot commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Instrumentation Performance Report

Merging #11511 will degrade performances by 7.85%

Summary

Benchmarks breakdown

Uh oh!

Dunqing commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Boshen commented Jun 11, 2025 • edited by graphite-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dunqing commented Jun 6, 2025 •

edited

Loading

What is the purpose of `AstNode`?

How does `AstNode` solve the above issue?

`Deref` trait for the `AstNode`

`as_ast_nodes` method for enum AST, such as `Statement`, `Expression`, and so on.

codspeed-hq bot commented Jun 6, 2025 •

edited

Loading

Dunqing commented Jun 10, 2025 •

edited

Loading

Boshen commented Jun 11, 2025 •

edited by graphite-app bot

Loading