Skip to content

Reduce max RSS during bundling: memory map + opportunities #9516

@Boshen

Description

@Boshen

Summary

Investigation into where rolldown's peak heap lives during bundling, with one small fix already shipped and a prioritized list of larger opportunities. This issue is a reference for future memory work — concrete numbers, real call-stack attribution, and where the next wins are.

Methodology

  • Workload: rome bundling fixture (1,193 modules — the standard rolldown bench preset).
  • Tool: dhat profiler wrapping the system allocator, run via a small mem_profile binary in crates/bench/ that drives BundleFactory against a MemoryFileSystem. Built with --profile=release-debug + strip = false so dhat traces resolve to symbol names.
  • Comparisons done by toggling a patch with git apply / git checkout and rebuilding clean between runs.
  • All numbers are dhat max_bytes (peak heap demand) unless labeled ru_maxrss (OS-level high-water).

Where peak heap lives (rome, 140.5 MiB peak)

By code component, attributed to the deepest rolldown/oxc frame in each dhat call stack:

Component Peak % Notes
oxc bumpalo arena (AST) 87.4 MiB 62 % One arena per module, average ~75 KiB. 16 KiB initial chunks from oxc_parser::lexer::trivia_builder triggers it; 97 % of modules also need a second 16 KiB chunk from oxc_semantic::scoping::Scoping::reserve.
MemoryFileSystem (bench-only) 17.9 13 % Not present in real disk-FS runs.
oxc_semantic 13.7 MiB 10 % Scoping::reserve (~19.6 MB of capacity reserves), AstNodes::reserve (just 4 huge blocks ~2 MB each), add_binding, create_symbol. Lives on the heap, not in the bumpalo arena.
rolldown_common 5.0 MiB 3.6 % EcmaView / NormalModule and per-module structs
ArcStr 3.3 MiB 2.3 % Reference-counted source strings
link/scan/ast_scanner ~7 MiB ~5 % Linking metadata, scan-time scratch
Sourcemap join 1.8 MiB 1.3 % SourceJoiner::join
SymbolRefDbForModule 1.8 MiB 1.3 % Per-module symbol DB
StmtInfo 1.3 MiB 0.9 % 80 B × statements

By stage where peak is reached:

Stage Peak contribution
parse (oxc_parser + AST arena chunks) 41.6 %
pre_process (oxc_semantic Scoping build) 16.2 %
ast_scanner 5.7 %
generate (codegen, finalize, sourcemap) 11 %
link 1.5 %
Unattributed (low-level alloc / IndexVec / hashbrown) 22 %

Time at gmax: ~99 % of runtime (during finalize_assets / sourcemap join), but the largest contributors are allocated during parse/pre_process and never freed before peak.

For comparison, three.js r108 (370 modules, larger per-module average) hits peak at ~60 % of runtime (during link), with ~760 MiB dominated by AST arenas at ~2 MiB per module. Time-at-gmax for threejs is parse/link-bound, not generate-bound.

Block-size distribution (rome)

Size class % of peak Description
8-16 KB 22 % bumpalo 16 KiB default initial chunks (one or two per module)
1-2 KB 18 % bumpalo growth chunks + small Vec/IndexVec content
≥1 MB 19 % ~15 huge buffers: source-code strings, sourcemap output, large IndexVec resizes
16-256 KB 16 % Mid bumpalo growth chunks, chunk output
256-512 KB-1 MB 11 % Big buffers
<512 B 11 % Tens of thousands of small structs — SymbolRef, hashmap nodes

What's already shipped

A single-line drop(std::mem::take(&mut self.link_output.ast_table)) after instantiate_chunks in render_chunk_to_assets. instantiate_chunks is the last reader of the per-module ASTs; releasing the bumpalo arenas there before minify_chunks re-parses chunk output into fresh arenas avoids the worst stacking.

Measured impact (rome, dhat max_bytes):

Config BEFORE AFTER Δ
rome 140.51 MiB 110.11 MiB −30.4 MiB (−21.6 %)
rome --minify 159.78 MiB 113.04 MiB −46.7 MiB (−29.3 %)
rome --sourcemap --minify 171.45 MiB ~125 MiB ~ −46 MiB
synth 2000 --minify 121.6 MiB 92.0 MiB −29.6 MiB (−24 %)
synth 5000 --minify 309.8 MiB 229.2 MiB −80.6 MiB (−26 %)
threejs r108 760.0 MiB 760.0 MiB 0 (see below)

Tests: 1,714 / 1,714 affected tests pass (same 5 pre-existing failures as on main).

The ru_maxrss change is smaller (~ −2 to −14 MiB depending on workload) because both the system allocator and mimalloc retain freed pages rather than returning them to the OS. The heap-level reduction is what matters for memory-pressure scheduling and for steady-state RSS across multiple builds (watch mode, dev server).

Why threejs sees no improvement

threejs r108 has only 370 modules but each has a much larger bumpalo arena (~2 MiB avg vs rome's ~75 KiB). Peak heap is reached at link/pre-codegen, not after codegen, so dropping the AST table after codegen has nothing to free that was contributing to a post-codegen peak. The drop helps whenever post-codegen stages (minify, sourcemap collapse, finalize_assets) push heap above the link-time peak. Many small chunks + minify = big win; one giant arena = no win.

What was explored but not shipped

Box-pattern shrinking of rare EcmaView fields

Eight `FxHashMap`/`FxHashSet`/`HmrInfo` fields on `EcmaView` that are empty for most modules were rewritten as `Option<Box<...>>` (HMR info, enum_member_value_map, new_url_references, this_expr_replace_map, dummy_record_set, self_referenced_class_decl_symbol_ids, constant_export_map, import_attribute_map).

Result: `NormalModule` shrank 992 → 768 bytes, `EcmaView` shrank 872 → 648 bytes. Locked in by `const_assert!`. For rome's 1,193 modules: 260 KB peak heap saved (~0.18 %). Real but tiny next to the AST-drop win. Touches 15 files. Not worth bundling with the AST-drop fix; could be its own follow-up but the impact is marginal compared to oxc-side opportunities (see below).

Full type-system-enforced AST lifetime refactor

`LinkStage::link()` returning `(LinkStageOutput, IndexEcmaAst)` with the AST table flowing by value through `generate → finalize_modules → render_chunk_to_assets → instantiate_chunks → create_chunk_to_codegen_ret_map`, where the compiler drops it at the consumer's exit.

Result: ~2 MiB additional savings on rome (no minify) vs. the simple `mem::take` drop. Identical results on rome --minify (the realistic case). Touches 5 files with signature changes through the entire codegen pipeline. The type-system enforcement is nice in theory but the practical risk it guards against (someone reading `link_output.ast_table` after codegen) is essentially zero. Skipped in favor of the 6-line fix.

`Allocator::with_capacity(source.len())` for parser

Tried sizing each module's bumpalo arena initial chunk based on source length instead of bumpalo's 16 KiB default.

Result: regressed by ~3 MiB on rome. bumpalo's chunk-doubling growth means starting too small forces several extra growth chunks, and the doubling overshoots more than the default does for typical modules. A smarter heuristic (e.g. `max(source.len() * 4, 16384)`) might work but needs cross-corpus validation — naïve sizing is worse.

Opportunities (prioritized)

Tier A — oxc-side (largest potential)

  1. Reduce oxc bumpalo arena per-module footprint (~87 MiB / 62 % of rome peak). Each module's arena averages ~75 KiB; 97 % of modules need a second 16 KiB chunk because `Scoping::reserve` runs early and fills past the lexer's initial chunk. Two angles:
    • Right-size the initial chunk per module — needs cross-corpus tuning (see "explored" above for what doesn't work).
    • Provide an arena-shrink API in oxc that compacts after parse (~10-20 MiB recoverable; many arenas have substantial trailing free space after parse completes).
  2. Cap or shrink `oxc_semantic::Scoping` reservations (~14 MiB / 10 %). Top oxc_semantic allocator is `Scoping::reserve` at 19.6 MiB across 2,365 blocks (~8 KiB pre-reserved per module). Just 4 `AstNodes::reserve` blocks contribute 8.6 MiB. Likely over-reserving for worst-case capacity; `shrink_to_fit` after build would help.
  3. Smaller AST node representations — purely an oxc footprint reduction; the bulk of every module's arena is AST nodes + spans.

Tier B — rolldown-side

  1. Stream output instead of buffering. ~33 MiB of peak heap (rome, 19 % of total) is in 15 huge ≥1 MiB single allocations: large source-code strings, sourcemap join output, output-buffer assembly. Writing chunks/sourcemaps as they're produced rather than buffering up to `finalize_assets` would cut into this.
  2. Drop AST per-module during codegen, not after. This issue's fix drops the whole `ast_table` after codegen. A more granular version would take each module out of the table when its chunk is rendered, freeing arenas progressively. Estimated additional ~5-15 MiB on rome, more on threejs (which currently sees zero benefit because peak is at link time — but per-module drop wouldn't help threejs either since the peak moment doesn't include codegen).
  3. Shrink per-module struct overhead. Box-pattern on rare `EcmaView` fields (explored above) — about 220 B/module = ~260 KB on rome, scales linearly. Net positive but tiny.

Tier C — bench-only / measurement infra

  1. `MemoryFileSystem` in bench preloads every file's bytes (~18 MiB on rome). Not present in real production runs. Worth noting only because it inflates bench numbers — production peak is ~18 MiB lower than what the bench reports.

Measurement scaffolding (not in the PR)

If anyone wants to reproduce or extend these numbers:

  • `crates/bench/src/bin/mem_profile.rs`: small harness that drives `BundleFactory` against a preloaded `MemoryFileSystem` and prints `dhat` `max_bytes` + `getrusage` `ru_maxrss`. Built with `--features dhat` (workspace dep `dhat = "0.3.3"`).
  • Run: `cargo run -p bench --release --features dhat --bin mem_profile -- rome [--minify] [--sourcemap]`.
  • For resolved stack traces in `dhat-heap.json`, build with `--profile=release-debug` and `strip = false` in the profile.

Happy to upstream the harness as a separate PR if useful.

Related PR

The 6-line fix:

```diff

  • // `instantiate_chunks` is the last reader of `ast_table`. Release the
  • // per-module bumpalo arenas now so the heap dip is in place before
  • // `minify_chunks` re-parses chunk output into fresh arenas (~30 MiB peak
  • // heap reduction on rome).
  • drop(std::mem::take(&mut self.link_output.ast_table));
    ```

In `crates/rolldown/src/stages/generate_stage/render_chunk_to_assets.rs`, immediately after `instantiate_chunks` returns.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Priority

    None yet

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions