Skip to content

perf(semantic): pre-reserve unresolved_references using Stats::references#22580

Merged
graphite-app[bot] merged 1 commit into
mainfrom
claude/perf-reserve-unresolved-refs
May 19, 2026
Merged

perf(semantic): pre-reserve unresolved_references using Stats::references#22580
graphite-app[bot] merged 1 commit into
mainfrom
claude/perf-reserve-unresolved-refs

Conversation

@Dunqing

@Dunqing Dunqing commented May 19, 2026

Copy link
Copy Markdown
Member

Summary

The unresolved_references Vec<(Ident, ReferenceId)> inside SemanticBuilder grows from cap = 0 via the default doubling policy — that's ~13 reallocations to reach the ~5k references of a moderately-sized TypeScript file (193 KB binder.ts), copying a cumulative ~16k entries (~250 KB of memory traffic) just to grow the buffer.

Stats already knows the exact reference count up front (via Stats::count or passed in by the caller via SemanticBuilder::with_stats). Pre-reserving with stats.references eliminates all the growth reallocations.

8-line internal change. No public API impact.

Benchmark

cargo bench --bench semantic (local Criterion)

Bench main this PR Δ
RadixUI.jsx (3 KB) 3.155 µs 3.046 µs −3.5%
react.development.js (72 KB) 136.47 µs 133.78 µs −2.0%
cal.com.tsx (1 MB TSX) 2.7414 ms 2.7064 ms −1.3%
binder.ts (193 KB TS) 314.90 µs 309.46 µs −1.7%

All Criterion intervals non-overlapping.

CodSpeed (precise, instruction-count based, from #22571 which tested the same commit)

Bench Δ
pipeline[binder.ts] −4.34%
semantic[binder.ts] −9.75%
semantic[react.development.js] −5.06%

CodSpeed's instruction-count measurement is more sensitive to allocator-call elimination than wall-clock Criterion, so the numbers are larger but consistent in direction.

Why reserve_exact (not reserve)

  • stats.references is the exact total push count (asserted in debug builds via Stats::assert_accurate). No need for amortized growth headroom.
  • Matches the existing precedent in Scoping::reserve which uses reserve_exact on arena-allocated structures.
  • Sub-100% reserves (tested 50%) were empirically worse than no reserve — the single resulting realloc is a large-block alloc that misses the size-class fast paths the small doubling reallocs hit. So 100% is the optimum.

Test Plan

  • cargo test -p oxc_semantic --lib --tests — 71 tests pass
  • cargo bench --bench semantic — wins above
  • CodSpeed re-run on commit (positive results above)

AI disclosure: drafted with Claude Code, reviewed manually.

@github-actions github-actions Bot added the A-semantic Area - Semantic label May 19, 2026
@codspeed-hq

codspeed-hq Bot commented May 19, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 6.29%

⚡ 3 improved benchmarks
✅ 45 untouched benchmarks
⏩ 3 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation pipeline[binder.ts] 13.9 ms 13.4 ms +4.08%
Simulation semantic[react.development.js] 1.1 ms 1.1 ms +5.17%
Simulation semantic[binder.ts] 2.9 ms 2.6 ms +9.71%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/perf-reserve-unresolved-refs (2dced86) with main (ae98296)

Open in CodSpeed

Footnotes

  1. 3 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@Boshen Boshen added the 0-merge Merge with Graphite Merge Queue label May 19, 2026

Boshen commented May 19, 2026

Copy link
Copy Markdown
Member

Merge activity

@Boshen Boshen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

…nces (#22580)

## Summary

The `unresolved_references` `Vec<(Ident, ReferenceId)>` inside `SemanticBuilder` grows from `cap = 0` via the default doubling policy — that's ~13 reallocations to reach the ~5k references of a moderately-sized TypeScript file (193 KB `binder.ts`), copying a cumulative ~16k entries (~250 KB of memory traffic) just to grow the buffer.

`Stats` already knows the exact reference count up front (via `Stats::count` or passed in by the caller via `SemanticBuilder::with_stats`). Pre-reserving with `stats.references` eliminates all the growth reallocations.

8-line internal change. No public API impact.

## Benchmark

### `cargo bench --bench semantic` (local Criterion)

| Bench | main | this PR | Δ |
|---|---|---|---|
| `RadixUI.jsx` (3 KB) | 3.155 µs | 3.046 µs | **−3.5%** |
| `react.development.js` (72 KB) | 136.47 µs | 133.78 µs | **−2.0%** |
| `cal.com.tsx` (1 MB TSX) | 2.7414 ms | 2.7064 ms | **−1.3%** |
| `binder.ts` (193 KB TS) | 314.90 µs | 309.46 µs | **−1.7%** |

All Criterion intervals non-overlapping.

### CodSpeed (precise, instruction-count based, from #22571 which tested the same commit)

| Bench | Δ |
|---|---|
| `pipeline[binder.ts]` | **−4.34%** |
| `semantic[binder.ts]` | **−9.75%** |
| `semantic[react.development.js]` | **−5.06%** |

CodSpeed's instruction-count measurement is more sensitive to allocator-call elimination than wall-clock Criterion, so the numbers are larger but consistent in direction.

## Why `reserve_exact` (not `reserve`)

- `stats.references` is the exact total push count (asserted in debug builds via `Stats::assert_accurate`). No need for amortized growth headroom.
- Matches the existing precedent in `Scoping::reserve` which uses `reserve_exact` on arena-allocated structures.
- Sub-100% reserves (tested 50%) were empirically *worse* than no reserve — the single resulting realloc is a large-block alloc that misses the size-class fast paths the small doubling reallocs hit. So 100% is the optimum.

## Test Plan

- [x] `cargo test -p oxc_semantic --lib --tests` — 71 tests pass
- [x] `cargo bench --bench semantic` — wins above
- [x] CodSpeed re-run on commit (positive results above)

AI disclosure: drafted with Claude Code, reviewed manually.
@graphite-app graphite-app Bot force-pushed the claude/perf-reserve-unresolved-refs branch from 2dced86 to 7a4120e Compare May 19, 2026 13:54
@graphite-app graphite-app Bot merged commit 7a4120e into main May 19, 2026
29 checks passed
@graphite-app graphite-app Bot removed the 0-merge Merge with Graphite Merge Queue label May 19, 2026
@graphite-app graphite-app Bot deleted the claude/perf-reserve-unresolved-refs branch May 19, 2026 13:59
graphite-app Bot pushed a commit that referenced this pull request May 20, 2026
## Summary

Adds `kitchen-sink.tsx` — a comprehensive synthetic TypeScript+JSX fixture maintained at [oxc-project/benchmark-files](https://github.com/oxc-project/benchmark-files) — to both `TestFiles::minimal()` (bench input set) and `TestFiles::complicated()` (alloc-tracking input set). The existing files in each set are untouched; this is a strict append.

## Why

The existing bench input set didn't reliably surface general-purpose perf wins above the ~1-2% measurement noise floor:
- #22580 (semantic pre-reserve) — visible because `binder.ts` exercises it
- #22594 (formatter buffer) — visible
- #22596 (minifier `try_fold_concat`) — **not visible** on the old set
- #22599 (semantic resolve-refs no-temp-Vec) — **not visible**
- #22603 (semantic var-hoist SmallVec) — **not visible**

The kitchen-sink fixes that by exercising every AST node, every transformer plugin, every minifier optimization opportunity, and every semantic step in one large file. Verified by re-benching #22596 against this fixture: **minifier mean −1.5%, min −3.7%** — above noise, signal confirmed.

## Fixture stats (cross-checked locally)

| Metric | Value |
|---|---|
| Source size | 21,117 lines / 732.90 kB |
| AST nodes | ~133,000 |
| Scopes | ~4,750 |
| Symbols | ~7,000 |
| Resolved references | ~16,000 |
| Semantic diagnostics | 0 errors / 0 warnings |

## Snap baselines

`tasks/track_memory_allocations/allocs_*.snap` updated with the kitchen-sink row across all 5 pipelines (parser / semantic / transformer / minifier / formatter). Future PRs that change allocation behavior on this fixture will produce a snap diff in CI.

## Bench-cleaner fix

`tasks/benchmark/benches/lexer.rs`'s `SourceCleaner` was missing `visit_ts_template_literal_type` — TypeScript type-level template literals (e.g. `` `${T}-${U}` `` in conditional / mapped types) are syntactically identical to value-level template literals, so the bench-mode lexer (without parser context) cannot distinguish them. Without the cleaner converting them to plain strings, kitchen-sink's type-level templates caused the lexer bench to swallow ~1 KB spans as a single `TemplateHead` and produce spurious "Unterminated string" / "Invalid Unicode escape" errors. One-line fix to mirror the existing `visit_template_literal` handling.

AI disclosure: drafted with Claude Code, reviewed manually.
graphite-app Bot pushed a commit that referenced this pull request May 21, 2026
…ec (#22599)

## Summary

`SemanticBuilder::resolve_references_for_current_scope` is the early-resolution hook fired at function / arrow / catch boundaries — it resolves the references collected since a checkpoint against the outer scope chain so that e.g. `function f(x = outerRef) {}` correctly binds `outerRef` to the outer scope rather than the function body.

The old implementation copied the pending slice into a fresh `Vec` on every call:

```rust
let refs = self.unresolved_references.slice_from(checkpoint).to_vec();
self.unresolved_references.truncate(checkpoint);
for (name, reference_id) in refs { ... }
```

The `to_vec()` is a fresh heap allocation, needed because `walk_up_resolve_reference` takes `&mut self` and would otherwise alias the borrow on the underlying `Vec<(Ident, ReferenceId)>`.

Rewrite in-place with a retain-style write cursor — each `(Ident, ReferenceId)` is read by **value** (both are `Copy`), which detaches it from the borrow on the inner Vec. Unresolved refs are compacted forward; resolved refs are dropped via the final `truncate`. No temporary `Vec`, no allocation, **no `unsafe`** — safe bounds-checked indexing via new `get` / `set` methods on `UnresolvedReferences`.

```rust
let mut write_idx = checkpoint;
for read_idx in checkpoint..end {
    let (name, reference_id) = self.unresolved_references.get(read_idx);
    if !self.walk_up_resolve_reference(name, reference_id) {
        if write_idx != read_idx {
            self.unresolved_references.set(write_idx, name, reference_id);
        }
        write_idx += 1;
    }
}
self.unresolved_references.truncate(write_idx);
```

## Allocation impact

`cargo allocs`, `allocs_semantic.snap` (baseline = parent commit, after var-hoist landed):

| File | Size | Sys allocs before | Sys allocs after | Δ |
|---|---|---|---|---|
| `checker.ts` | 2.92 MB | 2,309 | **53** | **−2,256 (−98%)** |
| `App.tsx` | 415 kB | 174 | **35** | **−139 (−80%)** |
| `binder.ts` | 193 kB | 196 | **19** | **−177 (−90%)** |
| `kitchen-sink.tsx` | 733 kB | 1,320 | **290** | **−1,030 (−78%)** |
| `pdf.mjs` | 567 kB | 1,366 | 1,355 | −11 |
| `antd.js` | 6.69 MB | 1,072 | 1,072 | 0 (ES5 bundled code rarely closes over outer scope from function params) |
| `RadixUI.jsx` | 2.5 kB | 10 | 10 | 0 |

## How I found this

After [#22580](#22580) and [#22590](#22590) the remaining semantic sys allocs were unexplained. Ran a backtrace-capturing `System` allocator wrapper on `checker.ts` and aggregated allocations by call stack — the overwhelming majority of captured allocations converged on a single site:

```
oxc_semantic::builder::SemanticBuilder::resolve_references_for_current_scope
  oxc_semantic::builder::SemanticBuilder as ...::Visit::visit_function
  ...
```

Tracing back to the `slice_from(checkpoint).to_vec()` line and confirming that the entire allocation pattern disappears with the in-place rewrite.

## Why this matters for downstream consumers

`resolve_references_for_current_scope` fires once per function / arrow / catch — for normal TS / TSX code that's hundreds to thousands of calls per build. Each call was a fresh heap allocation; eliminating them removes that allocator pressure entirely.

For rolldown's preprocessing pipeline (which builds semantic 3-4 times per file across hundreds of files per bundle), the per-build savings compound.

## No behavior change

The in-place algorithm preserves the original semantics exactly: `unresolved_references[..checkpoint]` is untouched, and `[checkpoint..]` is the list of refs that didn't resolve, in their original order. Same as the old code.

## Test Plan

- [x] `cargo test -p oxc_semantic --lib --tests` — pass
- [x] `cargo allocs` — semantic snapshot updated to reflect the reductions

AI disclosure: drafted with Claude Code, reviewed manually.
graphite-app Bot pushed a commit that referenced this pull request May 21, 2026
## Summary

When the minifier folds `.concat()` calls into template literals via `try_fold_concat` in `peephole/replace_known_methods.rs`, the original implementation built two short-lived intermediate buffers per call:

- `quasi_strs: Vec<Cow<'a, str>>` — fresh `std::Vec` allocation per call.
- A `String` inside each `Cow::Owned` — created when `Cow::to_mut()` cloned a `Borrowed` into an `Owned`, then grown via doubling as more arg strings were pushed in.

For files with many `.concat()` calls — common in bundled ES5 libraries where template literals weren't available — both structures were freshly allocated thousands of times per minify, with the `String` growing via doubling each time.

This PR replaces those per-call buffers with a single reusable scratch `String` held on `MinifierState`, and constructs `TemplateElement` AST nodes directly into the arena `Vec` as args are drained. The intermediate `Vec<Cow<'a, str>>` is gone entirely. After the first few folds the scratch's capacity stabilizes and subsequent folds are amortized zero std-heap allocs.

The state machine simplifies in the process: the `pushed_quasi: bool` flag is gone. Its invariant ("scratch holds an in-progress quasi") is naturally maintained because every expression flush clears the scratch, so a back-to-back expression's flush produces the required empty separator quasi without a branch.

## Allocation impact

`cargo allocs`, `allocs_minifier.snap` (baseline = parent commit, after resolve-refs and var-hoist landed):

| File | Size | Sys allocs before | Sys allocs after | Sys reallocs before | Sys reallocs after |
|---|---|---|---|---|---|
| `antd.js` | 6.69 MB ES5 | **4,652** | **3,084** (`−1,568`, `−33.7%`) | **1,622** | **53** (`−1,569`) |
| Other tracked files | various | unchanged | unchanged | unchanged | unchanged |

Only `antd.js` shows a change because it's the only tracked file that exercises `.concat()` folding heavily. ES5 bundled code uses `.concat()` instead of template literals; antd.js has thousands of these.

The headline `Sys reallocs` improvement (`1,622` → `53`) comes from eliminating the per-call `String` doubling. The headline `Sys allocs` improvement (`4,652` → `3,084`) comes from eliminating the per-call `Vec<Cow<'a, str>>` allocation.

## Commits

This PR contains two commits, kept separate for review history:

1. **`perf(minifier): pre-size buffers in try_fold_concat to eliminate growth reallocs`** — the original tactical fix that pre-sized the doomed-to-be-removed intermediate buffers. Dropped sys reallocs first but didn't touch sys allocs.
2. **`perf(minifier): drop per-call buffers in try_fold_concat`** — the root-cause refactor described above. Removes the intermediate buffers entirely. Cuts both columns.

Once the second commit lands, the pre-sizing in the first becomes dead-code (the buffers it pre-sized no longer exist). They're kept as separate commits so the diff against `main` is reviewable — feel free to squash on merge.

## How I found this

After #22580, profiling on `antd.js` minifier showed it had unusually high sys reallocs vs other tracked files. I wrote a backtrace-capturing `System` allocator wrapper and ran it on the minify path. The top realloc sites all converged on `try_fold_concat` — the overwhelming majority of captured reallocs came from this one function. After landing the pre-size fix, code review pushed for a root-cause fix that also addressed the `Sys allocs` column, which this refactor delivers.

## Timing

The minifier bench suite (originally) didn't include antd.js or other `.concat()`-heavy fixtures, so bench numbers locally were flat (no regression, no obvious improvement). With kitchen-sink in the bench input set (#22609), CodSpeed shows: **minifier mean −1.5%, min −3.7%** on kitchen-sink.

## Verification

- `cargo test -p oxc_minifier` — pass.
- `cargo minsize` — no size regressions (`minsize.snap` byte-identical).
- `cargo allocs` — only `antd.js` row changed in `allocs_minifier.snap`.
- `cargo fmt -p oxc_minifier` clean.

AI disclosure: drafted with Claude Code, reviewed manually.
Dunqing added a commit that referenced this pull request May 26, 2026
### 🚀 Features

- e857b0c napi/minify: Expose legalComments option and result (#20370)
(Boshen)
- 661132d parser: More friendly error messages for rest assignment
target and rest binding element (#22719) (sapphi-red)
- ee659b6 transformer/legacy-decorator: Add `strictNullChecks` option
for nullable-union design:type (#22266) (Kyle Cannon)

### 🐛 Bug Fixes

- e1d064e transformer/class-properties: Reparent lifted private method
helpers (#22716) (Cameron)
- 4ac0fca minifier: Preserve `0 && (module.exports = { ... })`
cjs-module-lexer hint (#22729) (Dunqing)
- 40ff611 minifier: Mark peephole loop changed when dropping
dead-after-throw statement (#22722) (Dunqing)
- 2f7b210 codegen: Emit pife-arrow/function leading comments inside the
wrap (#22720) (Dunqing)
- e184f74 parser: Improve invalid `import` property access diagnostic
(#22693) (camc314)
- 7baed9c transformer/private-method: Clear inherited strict flags
(#22508) (camc314)
- a9ad27e parser: Keep annotation comments leading without preceding
newline (#22711) (Dunqing)
- 9ea4d64 minifier: Re-evaluate pure/no-side-effects flags after
peephole inlining (#22595) (Dunqing)
- 07afbb6 minifier: Drop empty-body IIFE wrapper when called with
arguments (#22589) (Dunqing)
- fa7c463 semantic: Correct TS enum member symbol spans (#22689)
(camc314)
- 26b9396 semantic: Resolve parameter decorators outside parameter scope
(#22623) (camc314)
- b284045 parser: Switch to module goal eagerly on `export` (#22684)
(Boshen)
- dfa931d semantic: Propagate unresolved auto-increment enum value
instead of defaulting to 0 (#22646) (Dunqing)
- 69a6ba6 transformer/legacy-decorator: Emit Array for ReadonlyArray<T>
in decorator metadata (#22265) (Kyle Cannon)
- e421ef0 transformer/legacy-decorator: Return runtime binding for
design:type (#22640) (Dunqing)
- d61e1d7 codegen: Preserve verbatim text of pure/no-side-effects
comments (#22525) (Dunqing)
- 702b14e minifier: Preserve IIFE structure in DCE-only mode (#22547)
(Dunqing)
- 917da24 parser: Apply PURE comment through member-access chains
(#22566) (Dunqing)
- a069b1c codegen: Preserve quotes for cjs-module-lexer equality strings
(#22551) (Dunqing)

### ⚡ Performance

- 2f623b0 semantic: Skip unresolved checks for re-exports (#22660)
(camc314)
- 0d9553d semantic: Early-exit `check_object_expression` for objects
with <2 properties (#22668) (Dunqing)
- d721ad9 semantic: Use direct grandparent lookup for TS type parameters
(#22658) (camc314)
- 0aff288 semantic: Reorder numeric literal strict mode checks (#22657)
(camc314)
- 4d5ddb1 semantic: Reorder binding identifier checks (#22656) (camc314)
- e32acd8 semantic: Reorder identifier ambient binding check (#22653)
(camc314)
- 09fe178 semantic: Reorder ident reference strict mode check (#22652)
(camc314)
- 4b6add2 semantic: Avoid duplicate ident clone for bindings (#22663)
(camc314)
- 82f9662 parser: Check identifier kind before context flag (#22662)
(camc314)
- d7cd951 parser: Fast path identifier parsing and inline operator
helpers (#22650) (Boshen)
- 7b84314 semantic: Use direct byte access for numeric leading-zero
check (#22642) (camc314)
- 0345a31 semantic: Pre-size class elements hash map (#22618) (camc314)
- 04d3065 minifier: Drop per-call buffers in try_fold_concat (#22596)
(Dunqing)
- 4f289f1 semantic: Resolve_references_for_current_scope without a temp
Vec (#22599) (Dunqing)
- e862c15 semantic: Avoid heap alloc for var hoist scope ids (#22603)
(Dunqing)
- 8ff8674 semantic: Early return if `excess` is `0` in
`Stats::increase_by` (#22616) (camc314)
- 7a4120e semantic: Pre-reserve unresolved_references using
Stats::references (#22580) (Dunqing)

Co-authored-by: Dunqing <29533304+Dunqing@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-semantic Area - Semantic

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants