perf(generate): drop ast_table after instantiate_chunks#9554
Closed
Boshen wants to merge 1 commit into
Closed
Conversation
…k heap Per-module bumpalo arenas live in `LinkStageOutput.ast_table` and are the largest single component of rolldown's peak heap (~62% on rome, 1,193 modules). `instantiate_chunks` is the last reader; nothing between it and end-of-bundle needs the per-module ASTs, but they currently survive past `minify_chunks` (re-parses chunk output into fresh arenas) and `finalize_assets` (allocates large output buffers). Releasing the arenas right after `instantiate_chunks` cuts peak heap ~21.6% on rome and ~29.3% on rome --minify, per dhat measurements in issue #9516. Refs #9516.
✅ Deploy Preview for rolldown-rs canceled.
|
Merging this PR will not alter performance
Comparing Footnotes
|
This was referenced May 26, 2026
graphite-app Bot
pushed a commit
that referenced
this pull request
May 26, 2026
## Summary Alternative to #9554. Same memory win on the realistic case, enforced by the type system instead of by a comment. Remove `ast_table` from `LinkStageOutput` and change `LinkStage::link()` to return `(LinkStageOutput, IndexEcmaAst)`. The AST table then flows by value through `bundle_up → GenerateStage::new → generate → finalize_modules → render_chunk_to_assets → instantiate_chunks → create_chunk_to_codegen_ret_map`, where the compiler drops it at scope exit (`create_chunk_to_codegen_ret_map` is the last reader). Adding a new post-codegen `ast_table` reader becomes a compile error — there is no field to read. ## Measured impact (3 runs each, verified locally with dhat + getrusage + `/usr/bin/time -ahl`) | Config | main | This PR | Δ vs main | Δ vs #9554 | |---|---:|---:|---:|---:| | **dhat At t-gmax (peak heap)** | | | | | | rome | 140.51 MiB | 108.07 MiB | **−32.4 MiB (−23.1 %)** | −2.08 MiB | | rome --minify | 159.78 MiB | 113.09 MiB | **−46.7 MiB (−29.2 %)** | +0.01 MiB | | **ru_maxrss (getrusage, 3-run avg)** | | | | | | rome | 172.05 MiB | 170.56 MiB | −1.49 MiB | +1.17 MiB | | rome --minify | 183.56 MiB | 175.71 MiB | −7.85 MiB | −5.70 MiB | | **time -ahl peak RSS (3-run avg)** | | | | | | rome | 180.99 MiB | 179.31 MiB | −1.68 MiB | +1.15 MiB | | rome --minify | 193.37 MiB | 184.69 MiB | −8.68 MiB | −6.42 MiB | The ~2 MiB additional saving on rome no-minify vs #9554 comes from dropping one frame earlier (inside `instantiate_chunks` rather than after it returns) — which matters when peak heap is reached during `instantiate_chunks`'s own `try_join_all`. On rome --minify, peak is firmly in `finalize_assets`, far past both drop points, so the two PRs are identical at the dhat level (the OS-RSS gap there is run-to-run noise — system malloc's page caching is sticky). ## Trade-offs vs #9554 - Pros: Compile-time guarantee that no post-codegen reader can exist. - Cons: 5 files instead of 1; signature changes through the codegen pipeline. Pick one to land — they are mutually exclusive. The minimal version (#9554) is simpler; this one is what you'd reach for if the team values type-system enforcement over diff size. Refs #9516.
IWANABETHATGUY
pushed a commit
that referenced
this pull request
May 26, 2026
## Summary Alternative to #9554. Same memory win on the realistic case, enforced by the type system instead of by a comment. Remove `ast_table` from `LinkStageOutput` and change `LinkStage::link()` to return `(LinkStageOutput, IndexEcmaAst)`. The AST table then flows by value through `bundle_up → GenerateStage::new → generate → finalize_modules → render_chunk_to_assets → instantiate_chunks → create_chunk_to_codegen_ret_map`, where the compiler drops it at scope exit (`create_chunk_to_codegen_ret_map` is the last reader). Adding a new post-codegen `ast_table` reader becomes a compile error — there is no field to read. ## Measured impact (3 runs each, verified locally with dhat + getrusage + `/usr/bin/time -ahl`) | Config | main | This PR | Δ vs main | Δ vs #9554 | |---|---:|---:|---:|---:| | **dhat At t-gmax (peak heap)** | | | | | | rome | 140.51 MiB | 108.07 MiB | **−32.4 MiB (−23.1 %)** | −2.08 MiB | | rome --minify | 159.78 MiB | 113.09 MiB | **−46.7 MiB (−29.2 %)** | +0.01 MiB | | **ru_maxrss (getrusage, 3-run avg)** | | | | | | rome | 172.05 MiB | 170.56 MiB | −1.49 MiB | +1.17 MiB | | rome --minify | 183.56 MiB | 175.71 MiB | −7.85 MiB | −5.70 MiB | | **time -ahl peak RSS (3-run avg)** | | | | | | rome | 180.99 MiB | 179.31 MiB | −1.68 MiB | +1.15 MiB | | rome --minify | 193.37 MiB | 184.69 MiB | −8.68 MiB | −6.42 MiB | The ~2 MiB additional saving on rome no-minify vs #9554 comes from dropping one frame earlier (inside `instantiate_chunks` rather than after it returns) — which matters when peak heap is reached during `instantiate_chunks`'s own `try_join_all`. On rome --minify, peak is firmly in `finalize_assets`, far past both drop points, so the two PRs are identical at the dhat level (the OS-RSS gap there is run-to-run noise — system malloc's page caching is sticky). ## Trade-offs vs #9554 - Pros: Compile-time guarantee that no post-codegen reader can exist. - Cons: 5 files instead of 1; signature changes through the codegen pipeline. Pick one to land — they are mutually exclusive. The minimal version (#9554) is simpler; this one is what you'd reach for if the team values type-system enforcement over diff size. Refs #9516.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Per-module bumpalo arenas live in
LinkStageOutput.ast_tableand dominate rolldown's peak heap during bundling (~62 % on the rome benchmark, 1,193 modules).instantiate_chunksis the last reader of those ASTs — nothing between it and end-of-bundle needs the per-module ASTs, but they currently survive pastminify_chunks(which re-parses chunk output into fresh arenas) andfinalize_assets(which allocates large output buffers).This PR drops
ast_tableimmediately afterinstantiate_chunksreturns, freeing the bumpalo arenas before the post-codegen stages allocate.Measured impact (3 runs each, verified locally with dhat + getrusage +
/usr/bin/time -ahl)Larger synthetic and minify workloads from the issue (#9516) scale similarly: ~−24 % on synth 2000 --minify, ~−26 % on synth 5000 --minify. threejs r108 sees no change because its peak is at link, not post-codegen.
ru_maxrssdeltas are smaller because both the system allocator and mimalloc retain freed pages, but the heap-level reduction is what matters for memory-pressure scheduling and steady-state RSS across multiple builds (watch mode, dev server).Alternative considered
A type-system-enforced version (#9555) threads
IndexEcmaAstby value fromLinkStage::link()throughgenerate → finalize_modules → render_chunk_to_assets → instantiate_chunks → create_chunk_to_codegen_ret_map, removingast_tablefromLinkStageOutputentirely. It touches 5 files and saves ~2 MiB more on rome (no minify); identical results on rome --minify. The compile-time guarantee guards against a future reader being reintroduced after codegen — but the practical risk is essentially zero, so this minimal version is preferred.Refs #9516.