perf(semantic): add Scoping::reset and SemanticBuilder::with_scoping for arena reuse#22571
perf(semantic): add Scoping::reset and SemanticBuilder::with_scoping for arena reuse#22571Dunqing wants to merge 7 commits into
Conversation
Merging this PR will improve performance by 6.35%
Performance Changes
Tip Curious why this is faster? Comment Comparing Footnotes
|
43d856d to
180aa9f
Compare
|
Update: the regression flagged by this comment was real. After investigation, I confirmed that Phase 2 of this PR ( The PR has been rebased to Phase 1 only ( CodSpeed should re-run on the new HEAD shortly. |
…nces The unresolved_references Vec grows from 0 with default doubling, requiring ~13 reallocations to reach the typical ~5k references on a moderately-sized TypeScript file. Stats already knows the count up-front (or computes it in Stats::count); pre-reserving eliminates the growth reallocations. Benchmark (cargo bench --bench semantic): RadixUI.jsx (3KB JSX): 3.155 µs -> 3.051 µs (-3.3%) react.development.js: 136.47 µs -> 134.93 µs (-1.1%) cal.com.tsx (1MB TSX): 2.7414 ms -> 2.7055 ms (-1.3%) binder.ts (193KB TS): 314.90 µs -> 312.15 µs (-0.9%) All intervals non-overlapping on Criterion.
2b6ee67 to
23ba41f
Compare
…nces (#22580) ## Summary The `unresolved_references` `Vec<(Ident, ReferenceId)>` inside `SemanticBuilder` grows from `cap = 0` via the default doubling policy — that's ~13 reallocations to reach the ~5k references of a moderately-sized TypeScript file (193 KB `binder.ts`), copying a cumulative ~16k entries (~250 KB of memory traffic) just to grow the buffer. `Stats` already knows the exact reference count up front (via `Stats::count` or passed in by the caller via `SemanticBuilder::with_stats`). Pre-reserving with `stats.references` eliminates all the growth reallocations. 8-line internal change. No public API impact. ## Benchmark ### `cargo bench --bench semantic` (local Criterion) | Bench | main | this PR | Δ | |---|---|---|---| | `RadixUI.jsx` (3 KB) | 3.155 µs | 3.046 µs | **−3.5%** | | `react.development.js` (72 KB) | 136.47 µs | 133.78 µs | **−2.0%** | | `cal.com.tsx` (1 MB TSX) | 2.7414 ms | 2.7064 ms | **−1.3%** | | `binder.ts` (193 KB TS) | 314.90 µs | 309.46 µs | **−1.7%** | All Criterion intervals non-overlapping. ### CodSpeed (precise, instruction-count based, from #22571 which tested the same commit) | Bench | Δ | |---|---| | `pipeline[binder.ts]` | **−4.34%** | | `semantic[binder.ts]` | **−9.75%** | | `semantic[react.development.js]` | **−5.06%** | CodSpeed's instruction-count measurement is more sensitive to allocator-call elimination than wall-clock Criterion, so the numbers are larger but consistent in direction. ## Why `reserve_exact` (not `reserve`) - `stats.references` is the exact total push count (asserted in debug builds via `Stats::assert_accurate`). No need for amortized growth headroom. - Matches the existing precedent in `Scoping::reserve` which uses `reserve_exact` on arena-allocated structures. - Sub-100% reserves (tested 50%) were empirically *worse* than no reserve — the single resulting realloc is a large-block alloc that misses the size-class fast paths the small doubling reallocs hit. So 100% is the optimum. ## Test Plan - [x] `cargo test -p oxc_semantic --lib --tests` — 71 tests pass - [x] `cargo bench --bench semantic` — wins above - [x] CodSpeed re-run on commit (positive results above) AI disclosure: drafted with Claude Code, reviewed manually.
|
Closing this PR. Superseded by #22580 (merged), which extracted the only commit from this branch that delivered measurable performance — the `unresolved_references` pre-reserve fix. CodSpeed validated #22580 at −9.75% on `semantic[binder.ts]`, −5.06% on `semantic[react.development.js]`, and −4.34% on `pipeline[binder.ts]`. Why the rest of this branch wasn't worth keepingThe branch originally included two larger pieces of work that I dropped after measurement: Phase 1 — Arena-reuse APIs (`Scoping::reset` + `SemanticBuilder::with_scoping` + `Compressor::dead_code_elimination_with_scoping_returning_scoping`): Designed for callers like rolldown that rebuild `Scoping` multiple times per file. A targeted microbench showed the arena-reuse path saves +5.2% on small files but −1.8% on larger ones — the `reset()` work (draining cell-internal containers, pulling the `Allocator` out, rebuilding the cell) cancels the chunk-allocation savings on most real-world workloads. Net zero. The bench wins I had earlier reported on rolldown's bundle bench turned out to be Criterion measurement noise — once I controlled for that, the rolldown-side changes added no measurable value (companion rolldown PR also closed: rolldown/rolldown#9460). Phase 2 — `SemanticBuilder::with_build_nodes(false)` (skip `AstNodes` storage + `ClassTable`): Designed to skip work for downstream consumers that only need `Scoping`. The `parent_kind_stack` push/pop and `if self.build_nodes` branches added to `create_ast_node` / `pop_ast_node` cost ~3-5% on every default-mode build — exactly the path the linter, syntax checker, and most IDE / CI usage of `oxc_semantic` takes. CodSpeed correctly flagged this as a 5.42% regression. The hot-path overhead is fundamental to the design: any mechanism that lets the binder ask "what's my parent's kind?" without reading `AstNodes` has to maintain a parallel structure, and that maintenance roughly cancels the savings from skipping `nodes.push`. The right shape turned out to be a much smaller change — pre-reserving one heap `Vec` using a stat that `Stats::count` already produced. That's #22580. Notes for future workA separate follow-up that might be worth pursuing is `SemanticBuilder::with_class_table(bool)` — opt-out for the `ClassTable` build path, which is only consumed by the linter, mangler, and the syntax checker. The transformer, minifier, formatter, and most of rolldown's preprocessing pipeline never read it. That's a narrower opt-in (one boolean, two gate points) than the with_build_nodes(false) attempt and avoids the per-visit overhead that sank Phase 2. I haven't measured it yet — if anyone wants to pick it up, the gate points are `builder.rs`'s two `self.class_table_builder.declare_class_body` and `add_private_identifier_reference` call sites, and a `debug_assert!` enforcing `check_syntax_error ⇒ build_class_table` (the syntax checker reads ClassTable extensively per `crates/oxc_semantic/src/checker/{javascript,typescript}.rs`). A separately-pushed PR (#22594) applies the same #22580-shape pattern to the formatter's output buffer, pre-sizing the `CodeBuffer` using `source_text.len()` to eliminate ~9-19 buffer-doubling reallocs per file. |
Summary
Two changes to
oxc_semantic:1. Pre-reserve
unresolved_references(NEW — small clear win)The
unresolved_referencesVec<(Ident, ReferenceId)>was growing from 0 with default doubling (~13 reallocations to reach ~5k entries on a typical TS file).Statsalready tracks the reference count, so pre-reserving withstats.referenceseliminates the growth churn.Modest but consistent (Criterion intervals non-overlapping). This change has no public API impact — purely internal pre-sizing.
2. Opt-in APIs for callers that rebuild
Scopingmultiple times against the same programScoping::reset()— clear all internal state while preserving the bumpalo arena's chunk capacity and the heap capacity of flatIndexVec/multi_index_vectables.SemanticBuilder::with_scoping(scoping)— consume a (reset)Scopinginstead of starting fromScoping::default(), reusing both the bumpalo arena and the flat-table capacity.Compressor::dead_code_elimination_with_scoping_returning_scoping— returns the consumedScopingafter DCE so callers can recycle the arena.Honest disclosure: in the bench files measured, the arena-reuse APIs show net-zero impact (−17 µs to +3 µs per call depending on file size —
reset()overhead roughly cancels the saved allocator init). They're additive surface intended for callers like rolldown that may benefit from arena recycling in specific workloads (multiplerecreate_scopingcalls per file). If maintainers prefer to land only the unresolved_references perf fix (commit2b6ee676de) and defer the API surface, I can split.Test Plan
cargo test -p oxc_semantic --lib --tests— 74 tests passcargo test -p oxc_minifier --lib --tests— 502+ tests passcargo bench --bench semantic— measured wins aboveCompanion PR
rolldown PR using the arena-reuse APIs: rolldown/rolldown#9460
AI disclosure: drafted with Claude Code, reviewed manually.