Skip to content

perf(semantic): add Scoping::reset and SemanticBuilder::with_scoping for arena reuse#22571

Closed
Dunqing wants to merge 7 commits into
mainfrom
claude/semantic-faster-rebuild
Closed

perf(semantic): add Scoping::reset and SemanticBuilder::with_scoping for arena reuse#22571
Dunqing wants to merge 7 commits into
mainfrom
claude/semantic-faster-rebuild

Conversation

@Dunqing

@Dunqing Dunqing commented May 19, 2026

Copy link
Copy Markdown
Member

Summary

Two changes to oxc_semantic:

1. Pre-reserve unresolved_references (NEW — small clear win)

The unresolved_references Vec<(Ident, ReferenceId)> was growing from 0 with default doubling (~13 reallocations to reach ~5k entries on a typical TS file). Stats already tracks the reference count, so pre-reserving with stats.references eliminates the growth churn.

Bench (cargo bench --bench semantic) Vanilla This commit Δ
RadixUI.jsx 3.155 µs 3.051 µs −3.3%
react.development.js 136.47 µs 134.93 µs −1.1%
cal.com.tsx 2.7414 ms 2.7055 ms −1.3%
binder.ts 314.90 µs 312.15 µs −0.9%

Modest but consistent (Criterion intervals non-overlapping). This change has no public API impact — purely internal pre-sizing.

2. Opt-in APIs for callers that rebuild Scoping multiple times against the same program

  • Scoping::reset() — clear all internal state while preserving the bumpalo arena's chunk capacity and the heap capacity of flat IndexVec / multi_index_vec tables.
  • SemanticBuilder::with_scoping(scoping) — consume a (reset) Scoping instead of starting from Scoping::default(), reusing both the bumpalo arena and the flat-table capacity.
  • Compressor::dead_code_elimination_with_scoping_returning_scoping — returns the consumed Scoping after DCE so callers can recycle the arena.

Honest disclosure: in the bench files measured, the arena-reuse APIs show net-zero impact (−17 µs to +3 µs per call depending on file size — reset() overhead roughly cancels the saved allocator init). They're additive surface intended for callers like rolldown that may benefit from arena recycling in specific workloads (multiple recreate_scoping calls per file). If maintainers prefer to land only the unresolved_references perf fix (commit 2b6ee676de) and defer the API surface, I can split.

Test Plan

  • cargo test -p oxc_semantic --lib --tests — 74 tests pass
  • cargo test -p oxc_minifier --lib --tests — 502+ tests pass
  • cargo bench --bench semantic — measured wins above

Companion PR

rolldown PR using the arena-reuse APIs: rolldown/rolldown#9460

AI disclosure: drafted with Claude Code, reviewed manually.

@Dunqing Dunqing requested a review from overlookmotel as a code owner May 19, 2026 08:53
@github-actions github-actions Bot added A-parser Area - Parser A-semantic Area - Semantic A-minifier Area - Minifier A-transformer Area - Transformer / Transpiler A-allocator Area - Allocator labels May 19, 2026
@codspeed-hq

codspeed-hq Bot commented May 19, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 6.35%

⚡ 3 improved benchmarks
✅ 45 untouched benchmarks
⏩ 3 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation pipeline[binder.ts] 13.9 ms 13.4 ms +4.34%
Simulation semantic[binder.ts] 2.9 ms 2.6 ms +9.75%
Simulation semantic[react.development.js] 1.1 ms 1.1 ms +5.06%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/semantic-faster-rebuild (23ba41f) with main (4f33aa7)

Open in CodSpeed

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@Dunqing Dunqing force-pushed the claude/semantic-faster-rebuild branch from 43d856d to 180aa9f Compare May 19, 2026 09:53
@Dunqing Dunqing changed the title perf(semantic): add Scoping::reset and with_build_nodes(false) for faster rebuilds perf(semantic): add Scoping::reset and SemanticBuilder::with_scoping for arena reuse May 19, 2026
@Dunqing

Dunqing commented May 19, 2026

Copy link
Copy Markdown
Member Author

Update: the regression flagged by this comment was real. After investigation, I confirmed that Phase 2 of this PR (with_build_nodes(false) + the parent_kind_stack to preserve binder lookups) was net-zero-to-negative on every workload — profiling showed that skipping nodes.push(AstNode) saves ~1 ns/node (~5% of total semantic build time), but any mechanism to preserve parent_kind info for the binder costs roughly the same amount. There's no free lunch.

The PR has been rebased to Phase 1 only (Scoping::reset + SemanticBuilder::with_scoping + Compressor::dead_code_elimination_with_scoping_returning_scoping). Phase 1 is purely additive — the default SemanticBuilder::new().build() path is byte-for-byte unchanged. The rolldown wins (3-4% on threejs, ~3% on rome_ts) come from reusing the bumpalo arena across rolldown's repeated recreate_scoping calls per file.

CodSpeed should re-run on the new HEAD shortly.

…nces

The unresolved_references Vec grows from 0 with default doubling, requiring
~13 reallocations to reach the typical ~5k references on a moderately-sized
TypeScript file. Stats already knows the count up-front (or computes it in
Stats::count); pre-reserving eliminates the growth reallocations.

Benchmark (cargo bench --bench semantic):

  RadixUI.jsx (3KB JSX):     3.155 µs -> 3.051 µs (-3.3%)
  react.development.js:    136.47 µs -> 134.93 µs (-1.1%)
  cal.com.tsx (1MB TSX):    2.7414 ms -> 2.7055 ms (-1.3%)
  binder.ts (193KB TS):    314.90 µs -> 312.15 µs (-0.9%)

All intervals non-overlapping on Criterion.
@Dunqing Dunqing force-pushed the claude/semantic-faster-rebuild branch from 2b6ee67 to 23ba41f Compare May 19, 2026 11:38
graphite-app Bot pushed a commit that referenced this pull request May 19, 2026
…nces (#22580)

## Summary

The `unresolved_references` `Vec<(Ident, ReferenceId)>` inside `SemanticBuilder` grows from `cap = 0` via the default doubling policy — that's ~13 reallocations to reach the ~5k references of a moderately-sized TypeScript file (193 KB `binder.ts`), copying a cumulative ~16k entries (~250 KB of memory traffic) just to grow the buffer.

`Stats` already knows the exact reference count up front (via `Stats::count` or passed in by the caller via `SemanticBuilder::with_stats`). Pre-reserving with `stats.references` eliminates all the growth reallocations.

8-line internal change. No public API impact.

## Benchmark

### `cargo bench --bench semantic` (local Criterion)

| Bench | main | this PR | Δ |
|---|---|---|---|
| `RadixUI.jsx` (3 KB) | 3.155 µs | 3.046 µs | **−3.5%** |
| `react.development.js` (72 KB) | 136.47 µs | 133.78 µs | **−2.0%** |
| `cal.com.tsx` (1 MB TSX) | 2.7414 ms | 2.7064 ms | **−1.3%** |
| `binder.ts` (193 KB TS) | 314.90 µs | 309.46 µs | **−1.7%** |

All Criterion intervals non-overlapping.

### CodSpeed (precise, instruction-count based, from #22571 which tested the same commit)

| Bench | Δ |
|---|---|
| `pipeline[binder.ts]` | **−4.34%** |
| `semantic[binder.ts]` | **−9.75%** |
| `semantic[react.development.js]` | **−5.06%** |

CodSpeed's instruction-count measurement is more sensitive to allocator-call elimination than wall-clock Criterion, so the numbers are larger but consistent in direction.

## Why `reserve_exact` (not `reserve`)

- `stats.references` is the exact total push count (asserted in debug builds via `Stats::assert_accurate`). No need for amortized growth headroom.
- Matches the existing precedent in `Scoping::reserve` which uses `reserve_exact` on arena-allocated structures.
- Sub-100% reserves (tested 50%) were empirically *worse* than no reserve — the single resulting realloc is a large-block alloc that misses the size-class fast paths the small doubling reallocs hit. So 100% is the optimum.

## Test Plan

- [x] `cargo test -p oxc_semantic --lib --tests` — 71 tests pass
- [x] `cargo bench --bench semantic` — wins above
- [x] CodSpeed re-run on commit (positive results above)

AI disclosure: drafted with Claude Code, reviewed manually.
@Dunqing

Dunqing commented May 19, 2026

Copy link
Copy Markdown
Member Author

Closing this PR.

Superseded by #22580 (merged), which extracted the only commit from this branch that delivered measurable performance — the `unresolved_references` pre-reserve fix. CodSpeed validated #22580 at −9.75% on `semantic[binder.ts]`, −5.06% on `semantic[react.development.js]`, and −4.34% on `pipeline[binder.ts]`.

Why the rest of this branch wasn't worth keeping

The branch originally included two larger pieces of work that I dropped after measurement:

Phase 1 — Arena-reuse APIs (`Scoping::reset` + `SemanticBuilder::with_scoping` + `Compressor::dead_code_elimination_with_scoping_returning_scoping`):

Designed for callers like rolldown that rebuild `Scoping` multiple times per file. A targeted microbench showed the arena-reuse path saves +5.2% on small files but −1.8% on larger ones — the `reset()` work (draining cell-internal containers, pulling the `Allocator` out, rebuilding the cell) cancels the chunk-allocation savings on most real-world workloads. Net zero. The bench wins I had earlier reported on rolldown's bundle bench turned out to be Criterion measurement noise — once I controlled for that, the rolldown-side changes added no measurable value (companion rolldown PR also closed: rolldown/rolldown#9460).

Phase 2 — `SemanticBuilder::with_build_nodes(false)` (skip `AstNodes` storage + `ClassTable`):

Designed to skip work for downstream consumers that only need `Scoping`. The `parent_kind_stack` push/pop and `if self.build_nodes` branches added to `create_ast_node` / `pop_ast_node` cost ~3-5% on every default-mode build — exactly the path the linter, syntax checker, and most IDE / CI usage of `oxc_semantic` takes. CodSpeed correctly flagged this as a 5.42% regression. The hot-path overhead is fundamental to the design: any mechanism that lets the binder ask "what's my parent's kind?" without reading `AstNodes` has to maintain a parallel structure, and that maintenance roughly cancels the savings from skipping `nodes.push`.

The right shape turned out to be a much smaller change — pre-reserving one heap `Vec` using a stat that `Stats::count` already produced. That's #22580.

Notes for future work

A separate follow-up that might be worth pursuing is `SemanticBuilder::with_class_table(bool)` — opt-out for the `ClassTable` build path, which is only consumed by the linter, mangler, and the syntax checker. The transformer, minifier, formatter, and most of rolldown's preprocessing pipeline never read it. That's a narrower opt-in (one boolean, two gate points) than the with_build_nodes(false) attempt and avoids the per-visit overhead that sank Phase 2. I haven't measured it yet — if anyone wants to pick it up, the gate points are `builder.rs`'s two `self.class_table_builder.declare_class_body` and `add_private_identifier_reference` call sites, and a `debug_assert!` enforcing `check_syntax_error ⇒ build_class_table` (the syntax checker reads ClassTable extensively per `crates/oxc_semantic/src/checker/{javascript,typescript}.rs`).

A separately-pushed PR (#22594) applies the same #22580-shape pattern to the formatter's output buffer, pre-sizing the `CodeBuffer` using `source_text.len()` to eliminate ~9-19 buffer-doubling reallocs per file.

@Dunqing Dunqing closed this May 19, 2026
@Boshen Boshen deleted the claude/semantic-faster-rebuild branch May 31, 2026 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-allocator Area - Allocator A-minifier Area - Minifier A-parser Area - Parser A-semantic Area - Semantic A-transformer Area - Transformer / Transpiler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant