Skip to content

[ty] AST garbage collection#18482

Merged
ibraheemdev merged 13 commits intomainfrom
ibraheem/ast-node-gc
Jun 13, 2025
Merged

[ty] AST garbage collection#18482
ibraheemdev merged 13 commits intomainfrom
ibraheem/ast-node-gc

Conversation

@ibraheemdev
Copy link
Member

@ibraheemdev ibraheemdev commented Jun 5, 2025

Summary

Garbage collect ASTs once we are done checking a given file. Queries with a cross-file dependency on the AST will reparse the file on demand. This reduces ty's peak memory usage by ~20-30%.

The primary change of this PR is adding a node_index field to every AST node, that is assigned by the parser. ParsedModule can use this to create a flat index of AST nodes any time the file is parsed (or reparsed). This allows AstNodeRef to simply index into the current instance of the ParsedModule, instead of storing a pointer directly.

The indices are somewhat hackily (using an atomic integer) assigned by the parsed_module query instead of by the parser directly. Assigning the indices in source-order in the (recursive) parser turns out to be difficult, and collecting the nodes during semantic indexing is impossible as SemanticIndex does not hold onto a specific ParsedModuleRef, which the pointers in the flat AST are tied to. This means that we have to do an extra AST traversal to assign and collect the nodes into a flat index, but the small performance impact (~3% on cold runs) seems worth it for the memory savings.

Part of astral-sh/ty#214.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 5, 2025

mypy_primer results

No ecosystem changes detected ✅

@MichaReiser
Copy link
Member

I'll review tomorrow. Can you add a log how many times we end up reparsing an ast and do a cold run on some project to see if we see any reparses to validate our assumption that this shouldn't happen?

@codspeed-hq
Copy link

codspeed-hq bot commented Jun 5, 2025

CodSpeed Performance Report

Merging #18482 will degrade performances by 4.43%

Comparing ibraheem/ast-node-gc (347934a) with main (76d9009)

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 32 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
linter/default-rules[numpy/globals.py] 204.4 µs 196.2 µs +4.19%
parser[unicode/pypinyin.py] 306.6 µs 320.8 µs -4.43%

@ibraheemdev
Copy link
Member Author

ibraheemdev commented Jun 5, 2025

The incremental performance regression is expected. I'm not sure if we want less eager garbage collection for the LSP or if the initial hit is fine, and we instead try to keep recently reparsed ASTs around on subsequent runs?

The cold check performance regression is pretty minor (~3% both on codspeed and locally on mypy_primer runs) even with the extra AST traversal.

@carljm
Copy link
Contributor

carljm commented Jun 5, 2025

Thanks for working on this! I'm going to let @MichaReiser handle review :)

I'm curious about the new NodeIndex bit. Is the entire purpose of that so that we can use it in AstNodeRef, given we can no longer ensure a pointer into the AST will remain valid? Or are there other reasons we need it?

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

I'd prefer to implement the ID assigning in the parser. Not only enables this other crates to make use of the index, but it also allows us to change NodeIndex to a u32 to which is Copy.

Do you think it would be possible to collect the nodes inside the parser (as part of the parsing step?). That would remove the need for an extra traversal. We could add an option to only collect them when requested.

Besides these code changes: It would be great to have some end to end benchmarks.

  • Can you run ty_benchmark on some project and share the before/after runtime?
  • Can you add some logging to when an AST gets reparsed because it was cleared before and run ty across some projects. Do we see any reparses within the same revisions? If so, why?

For the LSP case. It probably makes sense for us not to purge any AST for a file that is in project.open_files. This should address any LSP concerns and I think even the perf regression in the benchmark

@ibraheemdev
Copy link
Member Author

I'm curious about the new NodeIndex bit. Is the entire purpose of that so that we can use it in AstNodeRef, given we can no longer ensure a pointer into the AST will remain valid? Or are there other reasons we need it?

AstNodeRef could still store a pointer to the AST, the difference is that it would have to store a weak pointer to the module (a strong reference would prevent garbage collection) and validate that reference before access. The problem comes when the module is garbage collected — we have to update the pointer to be into the new reparsed module. This has two problems:

  • We need a way to mutate the AstNodeRef, which is tricky and probably requires it being a Arc<Mutex<_>> or similar.
  • We have to traverse the entire AST to update every node that is accessed.

Creating a flat index of the AST upfront allow us to avoid both of those, and AstNodeRef becomes a simple index into the flat AST, which will remain stable even across garbage collection cycles within the same revision.

@ibraheemdev
Copy link
Member Author

The ty-benchmark results match the 3% regression on Codspeed.

black (cold)
Benchmark 1: /home/ibraheem/dev/astral/ruff/target/profiling/ty-new
  Time (mean ± σ):      56.9 ms ±   1.4 ms    [User: 290.4 ms, System: 74.9 ms]
  Range (min … max):    53.1 ms …  59.9 ms    51 runs
 
Benchmark 2: /home/ibraheem/dev/astral/ruff/target/profiling/ty-old
  Time (mean ± σ):      55.8 ms ±   1.4 ms    [User: 281.9 ms, System: 72.1 ms]
  Range (min … max):    52.1 ms …  59.0 ms    52 runs
 
Summary
  /home/ibraheem/dev/astral/ruff/target/profiling/ty-old ran
    1.02 ± 0.04 times faster than /home/ibraheem/dev/astral/ruff/target/profiling/ty-new

jinja (cold)
Benchmark 1: /home/ibraheem/dev/astral/ruff/target/profiling/ty-new
  Time (mean ± σ):      43.6 ms ±   1.6 ms    [User: 266.0 ms, System: 65.8 ms]
  Range (min … max):    40.7 ms …  47.5 ms    65 runs
 
 
Benchmark 2: /home/ibraheem/dev/astral/ruff/target/profiling/ty-old
  Time (mean ± σ):      41.7 ms ±   1.5 ms    [User: 259.7 ms, System: 62.9 ms]
  Range (min … max):    39.0 ms …  45.7 ms    74 runs
 
Summary
  /home/ibraheem/dev/astral/ruff/target/profiling/ty-old ran
    1.05 ± 0.05 times faster than /home/ibraheem/dev/astral/ruff/target/profiling/ty-new

isort (cold)
Benchmark 1: /home/ibraheem/dev/astral/ruff/target/profiling/ty-new
  Time (mean ± σ):      51.1 ms ±   2.5 ms    [User: 201.0 ms, System: 45.3 ms]
  Range (min … max):    46.1 ms …  57.0 ms    56 runs
 
 
Benchmark 2: /home/ibraheem/dev/astral/ruff/target/profiling/ty-old
  Time (mean ± σ):      49.5 ms ±   2.4 ms    [User: 197.4 ms, System: 45.2 ms]
  Range (min … max):    42.9 ms …  56.7 ms    62 runs

Summary
  /home/ibraheem/dev/astral/ruff/target/profiling/ty-old ran
    1.03 ± 0.07 times faster than /home/ibraheem/dev/astral/ruff/target/profiling/ty-new

@carljm carljm removed their request for review June 9, 2025 18:25
@ibraheemdev ibraheemdev force-pushed the ibraheem/ast-node-gc branch from 8cf6695 to f6b7b7c Compare June 9, 2025 18:26
@ibraheemdev
Copy link
Member Author

Given that the regression is quite minor I'm inclined to get this merged.

I spent some time trying to assign indices in the parser but it turned out to be non-trivial. The parser is recursive-descendant and we want to assign indices in source-order, so we run into problems where the parent node depends on the result of the recursion, so we don't know whether or not to claim an outer index before parsing the inner nodes. I think we would end up having to do mini-traversals to adjust the children indices — it's much easier to do this in a separate pass. If the parser also collected the nodes then it could do a simple pass over those and assign the indices then, but that could also be addressed in a future PR.

I also ran on a number of projects and was unable to trigger a reparse within a revision, I added some logs in case that ever happens.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@AlexWaygood AlexWaygood removed their request for review June 9, 2025 18:37
@MichaReiser
Copy link
Member

MichaReiser commented Jun 9, 2025

The parser is recursive-descendant and we want to assign indices in source-order, so we run into problems where the parent node depends on the result of the recursion, so we don't know whether or not to claim an outer index before parsing the inner nodes

Just trying to understand this better. Can you share a specific instance of this?

The main thing that I find slightly annoying is that NodeId isn't Copy because it's an AtomicU32. This is obviously not the end of the world but it would be nice if it could be avoided.

@ibraheemdev
Copy link
Member Author

Just trying to understand this better. Can you share a specific instance of this?

In parse_expression_list for example, we first parse the inner expression (which will increment the index), and then we conditionally wrap it in a ExprTuple node, which should have a lower index. Generally we would claim the parent index before recursing, but the conditional part makes this tricky.

@carljm
Copy link
Contributor

carljm commented Jun 9, 2025

the conditional part makes this tricky

Curious - would there be negative consequences to skipping an index in some scenarios?

@ibraheemdev
Copy link
Member Author

ibraheemdev commented Jun 9, 2025

Curious - would there be negative consequences to skipping an index in some scenarios?

Right now all indices are set strictly monotonic, so we can use a simple Box<[Node]> for the flat index, where each node's index matches its position in the list. If assigning indices in the parser means we have to switch to using a HashMap, or insert dummy nodes in the list, I'm not sure it's worth it.

@MichaReiser
Copy link
Member

Could we update the index when wrapping the node in a tuple (it may require a traversal of the inner node)

@ibraheemdev
Copy link
Member Author

We could, but I guess I don't really see the benefit of doing that. Collecting the nodes as part of parsing would have a similar problem I think (we'd have to insert parent nodes into the middle of the list before it's children). The separate pass is much simpler and doesn't seem to impact performance noticeably.

@dhruvmanila
Copy link
Member

Could we increase the index non-monotonically during parsing and then remove the gaps after the parsing is done? Or, would that be equivalent to visiting the parsed tree at the end to assign the index?

@sharkdp sharkdp removed their request for review June 10, 2025 06:21
@MichaReiser
Copy link
Member

I also ran on a number of projects and was unable to trigger a reparse within a revision, I added some logs in case that ever happens.

Just double checking, did you do this check with a release build?

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with the extra traversal for now. Eventough a 3% regression is fairly substantial for something that doesn't add new typing features. But the memory reduction is probably a good justification for it.

But no longer doing it in the parser invalidates the main reason why we added the NodeIndex to every AST node. I did a quick search and it seems that the only place where we create AstNodeRefs is in the SemanticIndexBuilder. This makes me wonder if an approach closer to AstNodeIds would be a better fit where:

  • SemanticIndex stores an: nodes: IndexVec<NodeIndex, AnyRootNodeRef> and an inverse map from AnyNodeRef (using ptr equality) to NodeIndex.
  • SemanticIndexBuilder has a register_node method that, given a node, returns an AstNodeRef to it

This has a couple of advantages:

  • No changes to the AST, no size increase of the AST
  • We only register exactly the nodes we need and not all nodes.
  • It doesn't require an extra traversal

The obvious downside is that it requires an inverse hash map which might be coslty. What are your thoughts on building the index outside the AST?

I wonder if it would make sense to split this PR into a few smaller PRs to, at least, get the most annoying changes landed sooner and also have a chance to better focus on some discussions:

  • Add NodeIndex: Decide on which nodes have a NodeIndex, whether it should be an atomic or not
  • Add a SourceOrderMutVisitor (if we decide to not use atomics)
  • The actual AST GC implementation

But I leave this decision up to you.

On the atomic discussion. I'm a bit undecided:

  • I think we should remove the atomic and instead introduce a SourceOrderMutVisitor (or SourceOrderTransformer) if we don't plan on integrating the indexing/numbering into the semantic index step
  • Otherwise, keeping atomic seems fine, but we should follow up with trying to integrate the pass into semantic indexing



# ------------------------------------------------------------------------------
# AnyRootNodeRef
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this type is, isn't clear to me wrom it's name. First I thought it's about types that can be a "root" from what the parser returns (ModModule or ModExpression) but that's not what it is. Instead, it's a very specific selection of nodes?

Reading through the code, I get the impression that it is related to nodes to which we assign IDs. Which brings up a related question: Why are we only assigning IDs to some nodes but not all? IMO, it's confusing that only some nodes have a NodeId. I'd prefer to just assign an ID to every node instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some documentation to why we need a new type.. the name isn't great but I'm not sure of a better one. I believe all nodes have a NodeId, unless I missed one.

/// Create a new [`IndexedModule`] from the given AST.
#[allow(clippy::unnecessary_cast)]
pub fn new(parsed: Parsed<ModModule>) -> Arc<Self> {
let mut visitor = Visitor {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future: It could be interesting to collect some more numbers about the parsed module here. E.g. how many expressions are there? This could help to correctly initialize the HashMaps in the SemanticIndexBuilder that currently suffer from many resizes.

@MichaReiser MichaReiser changed the title AST garbage collection [ty] AST garbage collection Jun 12, 2025
@ibraheemdev
Copy link
Member Author

I explored a number of options here:

  • Integrating the AST traversal into the semantic index. Unfortunately this does not work because the flat AST index is directly tied (through pointers) to a given ParsedModuleRef, while SemanticIndex does not hold on to a specific ParsedModuleRef instance. The SemanticIndex is transparent to the current instance of the AST, which is it now uses indices to reference nodes.
  • Similarly, the node_index field being on the AST nodes directly is important. Storing a lookup map in SemanticIndex would be tricky for the reasons above — we don't want SemanticIndex to rely on any AST pointers, only indices.
  • Removing the atomics from NodeIndex and adding a mutable visitor. This doesn't work because we hold on to immutable references to the nodes while traversing the tree, to form the flat index.

I think the current solution, where the flat index is created and managed directly by the ParsedModule and AST node indices are stored directly on the AST instance, makes the most sense and is easiest to reason about. The only future change I can see is if the node indices and collection are initialized even earlier by the parser, but I'm not sure that complexity is worth it (as I previously mentioned).

@ibraheemdev ibraheemdev force-pushed the ibraheem/ast-node-gc branch from 01ec2b7 to 98049c9 Compare June 12, 2025 20:17
@ibraheemdev ibraheemdev force-pushed the ibraheem/ast-node-gc branch from 98049c9 to 10e5edf Compare June 12, 2025 20:23
@ibraheemdev ibraheemdev added the ty Multi-file analysis & type inference label Jun 12, 2025
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this to Micha but a few things I noticed.

---
source: crates/ruff_python_parser/src/string.rs
expression: suite
snapshot_kind: text
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always forget, is this because the snapshots haven't been touched in a while and so cargo-insta has removed this field on a recent version or is it that you've an older version of cargo-insta?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the latest version of cargo-insta so I assume the former.

Comment on lines +8 to +10
node_index: NodeIndex(
"_",
),
Copy link
Member

@dhruvmanila dhruvmanila Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we generate a custom Debug implementation for the nodes to avoid including this field in the snapshot? Or, is it required that we include this field in the Debug implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the benefit of removing this field, we may end up setting it in the parser later. I updated the debug implementation of NodeIndex to avoid the extra whitespace.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This is great and thank you for exploring all my not very fruitful alternatives.

I think it could be nice to have a non atomic NodeIndex that we could use in AstNodeRef, get_by_index and in NodeKey.

@MichaReiser MichaReiser added the internal An internal refactor or improvement label Jun 13, 2025
@ibraheemdev ibraheemdev force-pushed the ibraheem/ast-node-gc branch from 08ad6c5 to 270768f Compare June 13, 2025 12:11
@ibraheemdev ibraheemdev force-pushed the ibraheem/ast-node-gc branch from 270768f to 347934a Compare June 13, 2025 12:19
@ibraheemdev ibraheemdev merged commit c9dff5c into main Jun 13, 2025
34 of 35 checks passed
@ibraheemdev ibraheemdev deleted the ibraheem/ast-node-gc branch June 13, 2025 12:40
@carljm
Copy link
Contributor

carljm commented Jun 13, 2025

Just noticed this had been merged after I made the alpha-10 release. I think a 20-30% peak memory reduction is worth noting for users! So in future I think we could leave off the internal label on such a PR.

This is awesome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

internal An internal refactor or improvement ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants