Skip to content

refactor(formatter): core revamp#11511

Merged
graphite-app[bot] merged 1 commit intomainfrom
formatter-major-overhaual
Jun 11, 2025
Merged

refactor(formatter): core revamp#11511
graphite-app[bot] merged 1 commit intomainfrom
formatter-major-overhaual

Conversation

@Dunqing
Copy link
Copy Markdown
Member

@Dunqing Dunqing commented Jun 6, 2025

close: #11339 (Another approach is the memory address-based approach, but it is not ideal and has some problems.)

This PR introduces AstNodes and AstNode to replace the ParentStack, and changes that implement Format and FormatWrite traits from ASTs to AstNode so that now we can access parent by self.parent during formatting. All other changes stem from the main one.

pub enum AstNodes<'a> {
    Dummy(),
    Program(&'a AstNode<'a, Program<'a>>),
    IdentifierName(&'a AstNode<'a, IdentifierName<'a>>),
    ...
}

pub struct AstNode<'a, T> {
    inner: &'a T,
    pub parent: &'a AstNodes<'a>,
    allocator: &'a Allocator,
}

What is the purpose of AstNode?

The AstNode aims to get the parent node easily. Before we use ParentStack to get the parent node correctly, which has a significant problem, that is, we have to push ancestor nodes sequentially for a node, whereas the implementation of some ASTs can't meet this condition.

For example, in the CallExpression, we have special logic to print Arguments if the first argument is an ArrowFunctionExpression; if so, we will print ArrowFunctionExpression directly rather than printing a Argument, which causes us to lose a parent node Argument.

The above issue is exactly what we aim to address with AstNode.

How does AstNode solve the above issue?

As you can see from the AstNode struct, which is shown above, every AstNode has inner and parent fields. The inner field stores the original AST, which would be used to print the child of itself, and the parent, which is AstNodes and points to the parent of the current AST, so that we can know what the parent is.

Let's take a look at how we print the children of the Program.

impl<'a> AstNode<'a, Program<'a>> {
   ...
   #[inline]
   pub fn hashbang(&self) -> Option<&AstNode<'a, Hashbang<'a>>> {
       self.allocator
           .alloc(self.inner.hashbang.as_ref().map(|inner| AstNode {
               inner,
               allocator: self.allocator,
               parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))),
           }))
           .as_ref()
   }

   #[inline]
   pub fn directives(&self) -> &AstNode<'a, Vec<'a, Directive<'a>>> {
       self.allocator.alloc(AstNode {
           inner: &self.inner.directives,
           allocator: self.allocator,
           parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))),
       })
   }

   #[inline]
   pub fn body(&self) -> &AstNode<'a, Vec<'a, Statement<'a>>> {
       self.allocator.alloc(AstNode {
           inner: &self.inner.body,
           allocator: self.allocator,
           parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))),
       })
   }
}

// Printing
impl<'a> FormatWrite<'a> for AstNode<'a, Program<'a>> {
   fn write(&self, f: &mut Formatter<'_, 'a>) -> FormatResult<()> {
       write!(
           f,
           [
               self.hashbang(),
               self.directives(),
               self.body(),
               hard_line_break()
           ]
       )
   }
}

As you can see, now we implement FormatWrite trait for the AstNode<'a, Prgoram<'a>> rather than Program<'a>, and we need to call the AstNode's methods to get the child AstNode to print, which converts inner's child nodes to AstNode and binds the parent. That is to say that we handle AstNode all along.

NOTE: All of the field-to-AstNode methods are generated by the ast tools and generated in [PLACEHOLDER].

Worth noting things

As little as possible to get child nodes by those methods

Since getting a child node from those methods has some overhead, as we need to wrap an AstNode for the original AST. It would be better to only use those methods to get the child nodes when they need to be printed.

Efficiently check certain fields

Same as the above, to avoid unnecessary overhead, it would be better to get the original AST to check if you want to check something. For example, if you want to check the first statement of the body of the Program if it's an EmptyStatement, you can check by self.as_ref().body.first() rather than self.body().first().

impl<'a, T> AsRef<T> for AstNode<'a, T> {
    fn as_ref(&self) -> &T {
        self.inner
    }
}

as_ref() is to get the original AST.

Deref trait for the AstNode

impl<'a, T> Deref for AstNode<'a, T> {
    type Target = T;
    fn deref(&self) -> &Self::Target {
        self.inner
    }
}

Since we implemented the Deref trait for the AstNode, now we can directly call the original AST methods by self.method() rather than self.as_ref() method() which can reduce a lot of as_ref() calls.

But there is a footgun, after Deref was implemented, both self.body and self.body() are work, but only self.body() can be printed because only AstNode implements Format trait, so this may make the new contributor who unfamiliar with the underly design to struggle. I will keep an eye on this to see whether it should be removed.

as_ast_nodes method for enum AST, such as Statement, Expression, and so on.

In general, we don't need to call as_ast_nodes because in most cases we only need to call fmt for the inner AST of the enum AST. But in some cases, we need to deal with the specific thing for a certain inner node of the enum AST.

For example, if the Expression is a ClassExpression, you need to print the decorators of that class first, then handle the class somewhere. In that time, you need to use as_ast_nodes to get the inner node which is converted to the AstNode, the code lis ike:

impl<'a> AstNode<'a, Expression<'a>> {
    #[inline]
    pub fn as_ast_nodes(&self) -> &AstNodes<'a> {
        let parent = self.parent;
        let node = match self.inner {
            Expression::ClassExpression(s) => AstNodes::Class(self.allocator.alloc(AstNode {
                inner: s.as_ref(),
                parent,
                allocator: self.allocator,
            }))
            ...
        };
        self.allocator.alloc(node)
    }
}

if let AstNodes::Class(class) = expr.as_ast_nodes() {
  write!(f, class.decorators())
}

Performance

The initial expectation was a performance improvement due to removing a large Vec (previously used for parent kinds) and eliminating indexed parent lookups. However, this has not been observed.

The performance regression might stem from the overhead of wrapping original ASTs into AstNode instances, which involves frequent Allocator::alloc calls. Additionally, the current implementation is incomplete, with limited parent-checking logic, so the benefits of AstNode may not be fully realized yet.

Downsides

The extensive use of generic structs and the generation of a large file (over 8000 lines) have led to an increase in binary size and compilation time, though the increases are not excessive.

Binary Size

  • Before: 4.8 MB
  • After: 6.9 MB

Compilation Time (Release Profile)

  • Before: 6.67s
  • After: 7.41s

Copy link
Copy Markdown
Member Author

Dunqing commented Jun 6, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions bot added A-ast Area - AST A-ast-tools Area - AST tools A-formatter Area - Formatter C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior labels Jun 6, 2025
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Jun 6, 2025

CodSpeed Instrumentation Performance Report

Merging #11511 will degrade performances by 7.85%

Comparing formatter-major-overhaual (cf0b7d5) with main (9136685)

Summary

❌ 4 regressions
✅ 34 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
formatter[RadixUIAdoptionSection.jsx] 302.3 µs 328 µs -7.85%
formatter[binder.ts] 19.4 ms 20.6 ms -5.81%
formatter[cal.com.tsx] 150.8 ms 161 ms -6.38%
formatter[react.development.js] 9.5 ms 10.1 ms -5.4%

@Dunqing
Copy link
Copy Markdown
Member Author

Dunqing commented Jun 10, 2025

@Boshen The PR is ready to review, there are a few parts in the ast_nodes generator file I want to polish later, so keep it draft. After all work is done, I will rebase the branch and solve conflicts.

@Boshen Boshen mentioned this pull request Jun 10, 2025
@Dunqing Dunqing force-pushed the formatter-major-overhaual branch 2 times, most recently from f5179d9 to 3ffb1ef Compare June 11, 2025 01:35
@Dunqing Dunqing marked this pull request as ready for review June 11, 2025 01:37
@Dunqing Dunqing requested a review from overlookmotel as a code owner June 11, 2025 01:37
@Dunqing Dunqing force-pushed the formatter-major-overhaual branch from 3ffb1ef to 5555509 Compare June 11, 2025 01:40
@Boshen Boshen added the 0-merge Merge with Graphite Merge Queue label Jun 11, 2025
Copy link
Copy Markdown
Member

Boshen commented Jun 11, 2025

Merge activity

close: #11339 (Another approach is the memory address-based approach, but it is not ideal and has some problems.)

This PR introduces `AstNodes` and `AstNode` to replace the `ParentStack`, and changes that implement `Format` and `FormatWrite` traits from `AST`s to `AstNode` so that now we can access parent by `self.parent` during formatting. All other changes stem from the main one.

```rs
pub enum AstNodes<'a> {
    Dummy(),
    Program(&'a AstNode<'a, Program<'a>>),
    IdentifierName(&'a AstNode<'a, IdentifierName<'a>>),
    ...
}

pub struct AstNode<'a, T> {
    inner: &'a T,
    pub parent: &'a AstNodes<'a>,
    allocator: &'a Allocator,
}
```

### What is the purpose of `AstNode`?

The `AstNode` aims to get the parent node easily. Before we use `ParentStack` to get the parent node correctly, which has a significant problem, that is, we have to push ancestor nodes sequentially for a node, whereas the implementation of some ASTs can't meet this condition.

For example, in the `CallExpression`, we have special logic to print `Arguments` if the first argument is an `ArrowFunctionExpression`; if so, we will print `ArrowFunctionExpression` directly rather than printing a `Argument`, which causes us to lose a parent node `Argument`.

The above issue is exactly what we aim to address with `AstNode`.

###  How does `AstNode` solve the above issue?

As you can see from the `AstNode` struct, which is shown above, every AstNode has `inner` and `parent` fields. The `inner` field stores the original AST, which would be used to print the child of itself, and the `parent`, which is `AstNodes` and points to the parent of the current AST, so that we can know what the parent is.

Let's take a look at how we print the children of the `Program`.
 ```rs

impl<'a> AstNode<'a, Program<'a>> {
    ...
    #[inline]
    pub fn hashbang(&self) -> Option<&AstNode<'a, Hashbang<'a>>> {
        self.allocator
            .alloc(self.inner.hashbang.as_ref().map(|inner| AstNode {
                inner,
                allocator: self.allocator,
                parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))),
            }))
            .as_ref()
    }

    #[inline]
    pub fn directives(&self) -> &AstNode<'a, Vec<'a, Directive<'a>>> {
        self.allocator.alloc(AstNode {
            inner: &self.inner.directives,
            allocator: self.allocator,
            parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))),
        })
    }

    #[inline]
    pub fn body(&self) -> &AstNode<'a, Vec<'a, Statement<'a>>> {
        self.allocator.alloc(AstNode {
            inner: &self.inner.body,
            allocator: self.allocator,
            parent: self.allocator.alloc(AstNodes::Program(transmute_self(self))),
        })
    }
}

// Printing
impl<'a> FormatWrite<'a> for AstNode<'a, Program<'a>> {
    fn write(&self, f: &mut Formatter<'_, 'a>) -> FormatResult<()> {
        write!(
            f,
            [
                self.hashbang(),
                self.directives(),
                self.body(),
                hard_line_break()
            ]
        )
    }
}
```

As you can see, now we implement `FormatWrite` trait for the `AstNode<'a, Prgoram<'a>>` rather than `Program<'a>`, and we need to call the `AstNode`'s methods to get the child `AstNode` to print, which converts `inner`'s child nodes to `AstNode` and binds the parent. That is to say that we handle `AstNode` all along.

NOTE: All of the field-to-AstNode methods are generated by the ast tools and generated in [PLACEHOLDER].

### Worth noting things

#### As little as possible to get child nodes by those methods

Since getting a child node from those methods has some overhead, as we need to wrap an `AstNode` for the original AST. It would be better to only use those methods to get the child nodes when they need to be printed.

#### Efficiently check certain fields
Same as the above, to avoid unnecessary overhead, it would be better to get the original AST to check if you want to check something. For example, if you want to check the first statement of the `body` of the `Program` if it's an `EmptyStatement`, you can check by `self.as_ref().body.first()` rather than `self.body().first()`.

```rs
impl<'a, T> AsRef<T> for AstNode<'a, T> {
    fn as_ref(&self) -> &T {
        self.inner
    }
}
```

`as_ref()` is to get the original AST.

#### `Deref` trait for the `AstNode`

```rs
impl<'a, T> Deref for AstNode<'a, T> {
    type Target = T;
    fn deref(&self) -> &Self::Target {
        self.inner
    }
}
```

Since we implemented the `Deref` trait for the `AstNode`, now we can directly call the original AST methods by `self.method()` rather than `self.as_ref() method()` which can reduce a lot of `as_ref()` calls.

But there is a footgun, after `Deref` was implemented,  both `self.body` and `self.body()` are work, but only `self.body()` can be printed because only `AstNode` implements `Format` trait, so this may make the new contributor who unfamiliar with the underly design to struggle. I will keep an eye on this to see whether it should be removed.

#### `as_ast_nodes` method for enum AST, such as `Statement`, `Expression`, and so on.

In general, we don't need to call `as_ast_nodes` because in most cases we only need to call `fmt` for the inner AST of the enum AST. But in some cases, we need to deal with the specific thing for a certain inner node of the enum AST.

For example, if the `Expression` is a `ClassExpression`, you need to print the decorators of that class first, then handle the class somewhere. In that time, you need to use `as_ast_nodes` to get the inner node which is converted to the `AstNode`, the code lis ike:
```rs

impl<'a> AstNode<'a, Expression<'a>> {
    #[inline]
    pub fn as_ast_nodes(&self) -> &AstNodes<'a> {
        let parent = self.parent;
        let node = match self.inner {
            Expression::ClassExpression(s) => AstNodes::Class(self.allocator.alloc(AstNode {
                inner: s.as_ref(),
                parent,
                allocator: self.allocator,
            }))
            ...
        };
        self.allocator.alloc(node)
    }
}

if let AstNodes::Class(class) = expr.as_ast_nodes() {
  write!(f, class.decorators())
}
```

### Performance

The initial expectation was a performance improvement due to removing a large `Vec` (previously used for parent kinds) and eliminating indexed parent lookups. However, this has not been observed.

The performance regression might stem from the overhead of wrapping original ASTs into `AstNode` instances, which involves frequent `Allocator::alloc` calls. Additionally, the current implementation is incomplete, with limited parent-checking logic, so the benefits of `AstNode` may not be fully realized yet.

### Downsides

The extensive use of generic structs and the generation of a large file (over 8000 lines) have led to an increase in binary size and compilation time, though the increases are not excessive.

#### Binary Size
*   Before: 4.8 MB
*   After: 6.9 MB

#### Compilation Time (Release Profile)
*   Before: 6.67s
*   After: 7.41s
@graphite-app graphite-app bot force-pushed the formatter-major-overhaual branch from 5555509 to cf0b7d5 Compare June 11, 2025 02:34
@graphite-app graphite-app bot merged commit cf0b7d5 into main Jun 11, 2025
24 checks passed
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Jun 11, 2025
@graphite-app graphite-app bot deleted the formatter-major-overhaual branch June 11, 2025 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-ast Area - AST A-ast-tools Area - AST tools A-formatter Area - Formatter C-cleanup Category - technical debt or refactoring. Solution not expected to change behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants