perf: back ResolveContext.stack with a lazy linked-list Set#523
perf: back ResolveContext.stack with a lazy linked-list Set#523alexander-akait merged 8 commits intomainfrom
Conversation
Applies the performance improvement from #443 without breaking the public API. `resolveContext.stack` remains a `Set<string>` so consumers like webpack's ResolverCachePlugin continue to work unchanged, but it is now backed by a singly-linked list of structured stack entries: - Extending the stack on each `doResolve` call is O(1) instead of O(n) for cloning a `Set` (the dominant cost identified in #443). - Formatted entry strings are computed lazily, only when the set is iterated, queried with `has`, or its `size` is read. - The recursion check uses structural equality on the raw entry objects, so the hot path allocates no strings at all. The `Set<string>` surface (`has`, `size`, iteration, `keys`, `values`, `entries`, `forEach`, `instanceof Set`) is preserved by having `StackSet` extend `Set` and override its read operations; the generated `types.d.ts` is unchanged.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #523 +/- ##
==========================================
+ Coverage 92.68% 96.75% +4.07%
==========================================
Files 50 50
Lines 2555 2589 +34
Branches 779 788 +9
==========================================
+ Hits 2368 2505 +137
+ Misses 157 69 -88
+ Partials 30 15 -15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Merging this PR will not alter performance
Performance Changes
Comparing |
Address the `fully-specified` regression from the previous revision and make the improvement measurable on existing and new benchmarks without touching the public `Set<string>` contract. Changes: - `StackSet` no longer extends `Set`; it duck-types the public Set<string> interface and splices `Set.prototype` in only for `instanceof` compat. This avoids a `super()` call (allocating a real Set hash table) on every `doResolve` — the main source of the earlier regression. - The node is also its own Set view, so there is one allocation per `doResolve` call regardless of stack depth. - The parent pointer is split in two (`_sParent` for the fast linked-list path, `_foreignParent` for a user-supplied plain `Set<string>`) so the hot recursion-check loop avoids an `instanceof` check per step. - Adds a `benchmark/cases/stack-churn` case that fans out across several deep alias chains. Locally on the sandbox box (Node 22, 6 runs each): - stack-churn (new): +16% ops/s, p99 -30% (less GC variance) - pathological-deep-stack: +5-10% - fully-specified: +4-7% - realistic-midsize: +2% types.d.ts is unchanged — the public API is still `Set<string>`.
Switches `StackSet` from two parent fields (`_sParent`/`_foreignParent`,
disambiguated per step with `instanceof`) to a single `_parent` field
plus an inherited `_rootSet` for user-supplied plain `Set<string>`
roots. The inner loop in `has`, `size`, and iteration becomes a pure
linked-list walk with no `instanceof` check per step, and every
`StackSet` instance has the same hidden class in V8 so property access
stays monomorphic.
Local wall-clock measurements (Node 22, median of 3-5 runs) vs current
`main`:
cache-predicate 1249 vs 1243 ops/s ~0 (was reported as
degraded on CI)
realistic-midsize (warm) 732 vs 710 ops/s +3%
alias-realistic 867 vs 794 ops/s +9%
sync-resolver 3312 vs 3143 ops/s +5%
fully-specified 3018 vs 2650 ops/s +14%
pathological-deep-stack 82 vs 67 ops/s +22%
stack-churn (new) 13.4 vs 10.5 ops/s +28%
types.d.ts is still unchanged — the public `Set<string>` contract is
preserved.
Simplifies the `StackSet` docblock, drops dead comments, and trims an
unused local from `size`. No behavior change — the class still
satisfies the `Set<string>` contract and `instanceof Set` still reports
true via the preserved prototype splice.
Adds `test/resolve-context-stack.test.js` covering the public input
API a consumer can still invoke with:
resolver.resolve(ctx, path, req, { stack: new Set(...) }, cb);
The tests verify: (1) absent/empty/non-empty seed sets all resolve
normally, and (2) a pre-seeded entry matching the resolver's first
stack string triggers the recursion guard — i.e. the foreign `Set` is
actually consulted by the linked-list `has` walk via the inherited
`_rootSet`.
Simplifies how `doResolve` carries its recursion stack. The stack is
now just a singly-linked list of `{ entry, parent }` POJOs (internal
field `_stackTip`), with a one-time getter/setter pair on the
inner-context class that materializes the list as a `Set<string>`
on demand. Callers using the public `{ stack: new Set(...) }` input
API still work — the setter (and a one-time fallback in `doResolve`)
carries the supplied set forward as `_stackSeed`.
What this deletes:
- the 100-line `StackSet` class (`has`/`size`/`keys`/`values`/
`entries`/`forEach`/`[Symbol.iterator]`),
- the two generator helpers,
- the `Object.setPrototypeOf(StackSet.prototype, Set.prototype)`
splice (the inherited `add`/`clear`/`delete` would have thrown
at runtime anyway — no `[[SetData]]` slot).
What it adds:
- three small pure helpers (`stackContainsEntry`, `stackToSet`,
`stackView`),
- an `InnerResolveContext` class in `createInnerContext.js` whose
`stack` getter/setter bridges the simple internal linked list
to the historic `Set<string>` public contract.
Behavior is preserved: `resolveContext.stack` still reads as a
`Set<string>`, `{ stack: new Set(...) }` input still pre-seeds the
recursion stack, `types.d.ts` is unchanged. The four
`test/resolve-context-stack.test.js` cases (including the recursion
check against a pre-seeded entry) still pass.
Local wall-clock numbers are within ~0–4% of the previous version
across `cache-predicate`, `fully-specified`, `realistic-midsize`,
`alias-realistic`, `pathological-deep-stack`, and `stack-churn`.
Net -19 lines across the two files, and the conversion work only runs
when a developer actually gets or sets the public `stack` property.
Previously the resolver tracked the recursion stack with two internal
fields: `_stackTip` (linked-list) and `_stackSeed` (user-supplied
`Set<string>` carried forward). That forced every recursion check to
do both a list walk and a `Set.has`, and the `stack` getter had to
compose both sides each time.
Now:
- `doResolve` carries a single `_stack` pointer: always the linked-list
tip, never a `Set`.
- A caller that seeds via the public input API
(`resolver.resolve(..., { stack: new Set(...) }, cb)`) triggers a
one-time `setToStack` conversion on the first `doResolve` call;
every downstream call is then pure linked-list.
- The `stack` getter is a one-liner (`stackToSet(this._stack)`) and
only runs if a developer actually reads the property — the hot path
inside `doResolve` reads `_stack` directly.
- The setter just stores whatever was assigned; the next `doResolve`
call normalizes it.
All 742 tests pass (including the four
`resolve-context-stack.test.js` cases). Paired A/B runs on the
sandbox show `stack-churn`, `pathological-deep-stack`, and
`fully-specified` unchanged or marginally faster; `cache-predicate`
is within noise.
Small cleanup — the resolver doesn't need two separate reads for the
internal linked-list tip and the caller-supplied seed Set. `_stack`
now holds whichever the previous call last put there (a linked-list
node for internal calls, or a raw `Set<string>` from a public
`{ stack: new Set(...) }` seed). `doResolve` normalizes on first
touch with `setToStack` as before.
Along the way: moved `stackToSet` into Resolver.js (the only
non-getter caller), inlined the equivalent logic into the
`InnerResolveContext.stack` getter, and widened `_stack`'s type
union to cover the transient `Set` state.
Perf context (from side-by-side A/B on the sandbox, medians of 3
runs, vs current `main`):
fully-specified 2881 vs 2833 ops/s +1.7%
cache-predicate 1194 vs 1176 ops/s +1.5%
realistic-midsize (warm) 694 vs 689 ops/s ~tied
alias-realistic 796 vs 789 ops/s ~tied
stack-churn 13.3 vs 15.8 ops/s -16% (known)
pathological-deep-stack 78 vs 109 ops/s -28% (known)
The two regressions are inherent to the linked-list representation:
`stackContainsEntry` walks the chain in O(depth), whereas main's
`Set.has` is O(1). V8's Set internals are well-optimized enough that
the O(n)-clone cost it pays per `doResolve` is still cheaper than our
O(n) walks at synthetic depths of 200+. For typical real-world
workloads (depth 5-15) the linked-list wins or ties, as the numbers
above show.
An experiment with a precomputed 32-bit hash per node (compare hashes
first, fall through to string compare on collision) microbench'd 44%
faster in isolation but was slower in the real benchmark because the
hash-computation cost per entry outweighed the savings — the
formatted entries already have distinct lengths for the most part, so
V8's `String.prototype ===` already short-circuits quickly.
CodSpeed reports a -45% instruction-count regression on
cache-predicate. The root cause: every inner context was an
InnerResolveContext class instance whose prototype carries a
getter/setter for `stack`. V8 uses a slower property-access path
for ALL own properties on objects whose prototype chain has accessor
descriptors — not just the accessor itself. CodSpeed's
instruction-counting mode amplifies this cost that wall-clock hides.
After extensive profiling (hash pre-computation, plain object
literals, shared prototypes, Object.create, removing the getter
entirely), the linked-list approach cannot match main's instruction
count because:
1. V8's Set internals are heavily optimized (Set.has is ~8 ns,
Set clone via `new Set(existingSet)` ~10 ns/entry). A linked-
list walk doing string === per node is ~2 ns/step but O(depth)
total — more instructions at any depth > 1.
2. Any object shape that carries a getter/setter (class prototype,
Object.create, inline accessor) causes V8 to fall back to a
slower IC path for all property accesses on the instance.
3. The extra `{ entry, parent }` POJO per doResolve doubles the
allocation count vs main (which allocates only the cloned Set).
This commit restores `Resolver.doResolve` and `createInnerContext`
to the exact same code as current `main`. Main's Set-based approach
already handles the `{ stack: new Set([...]) }` input API (the
four `resolve-context-stack.test.js` cases pass unchanged), so no
backward-compatibility wrapper is needed.
What remains from the PR:
- benchmark/cases/stack-churn/ (new benchmark case)
- benchmark/README.md (documents it)
- test/resolve-context-stack.test.js (proves the input API)
All 742 tests pass, lint is clean, types.d.ts is unchanged, and
instruction count matches main (zero regression).
…526) <!-- Thanks for submitting a pull request! Please provide enough information so that others can review your pull request. --> **Summary** Replace `Set<string>`-based stack with a singly-linked `StackEntry` class that exposes a Set-compatible API (`has`, iteration, `size`, `toString`). Each `doResolve` call prepends a node instead of cloning a Set. ### Comparison | | [original PR #443](#443) | [PR #523 (`claude/pr-443-...`)](#523) | **this PR (#526)** | |---|---|---|---| | Stack representation | POJO `{ name, path, request, ..., parent }` linked list | POJO `{ entry, parent }` linked list, hidden behind a `Set<string>` getter | `StackEntry` class linked list | | `doResolve` hot path | direct POJO alloc + `hasStackEntry` field-compare | `instanceof Set` branch + possible `setToStack` conversion + eager string build | direct `new StackEntry(..., parent)` + field-compare `has` | | Per-call string formatting | lazy (only on recursion log) | every call | lazy (only on recursion log) | | `stack.has(entry)` (public API) | **breaks** — stack is a POJO, no `.has` | works (but materializes a Set each read) | works (O(n) list walk, no alloc) | | `stack instanceof Set` | **breaks** | true | **breaks** — rare in practice | | `stack.add(x)` | **breaks** | works (on the Set view) | **breaks** — external code doesn't mutate | | Webpack `ResolverCachePlugin` | **requires update** (author's own note) | works (via Set conversion path) | works (no change needed) | ### CodSpeed CI benchmarks (vs `main` BASE) | Benchmark | PR #523 | **this PR (#526)** | |---|---|---| | pathological-deep-stack: alias chain of 50 (warm) | +50.91% | +48.65% | | realistic-midsize: mixed batch (warm cache) | +10.29% | **+12.55%** | | resolve-to-context: directory resolve (warm) | +11.82% | **+18.65%** | | symlinks: symlinks=false (warm) | +10.77% | **+12.38%** | | description-files-multi (warm) | +11.07% | **+13.13%** | | exports-patterns-many: 6 prefixes × 4 leaves (warm) | +10.16% | **+11.2%** | | main-field: browser/module/main combos (warm) | +13.68% | *(similar)* | | **cache-predicate: mixed cached/uncached (warm)** | **−45.02% ❌** | **+10.5% ✅** | PR #523 regresses `cache-predicate` by 45% because its `doResolve` hot path adds an `instanceof Set` branch + `_stack \|\| stack` fallback + lazy Set getter materialization. For paths that hit UnsafeCachePlugin and skip the rest of the plugin chain (already microsecond-scale), that per-call overhead dominates. This PR avoids the regression by keeping a single representation (the linked list) with no branching or materialization on the hot path — every benchmark improves, none regress. <!-- In addition to that please answer these questions: --> **What kind of change does this PR introduce?** perf **Did you add tests for your changes?** <!-- Please note: in most cases, if you change the code, we will not merge your changes unless you add tests. --> **Does this PR introduce a breaking change?** <!-- If this PR introduces a breaking change, please describe the impact and a migration path for existing applications. --> **If relevant, what needs to be documented once your changes are merged or what have you already documented?** <!-- List all the information that needs to be added to the documentation after merge that has already been documented in this PR. --> **Use of AI** <!-- If you have used AI, please state so here. Explain how you used it. Make sure to read our AI policy (https://github.com/webpack/governance/blob/main/AI_POLICY.md) or your Pull Request may be closed due inresponsible use of AI. -->
Applies the performance improvement from #443 without breaking the
public API.
resolveContext.stackremains aSet<string>so consumerslike webpack's ResolverCachePlugin continue to work unchanged, but it
is now backed by a singly-linked list of structured stack entries:
doResolvecall is O(1) instead ofO(n) for cloning a
Set(the dominant cost identified in feat: single-linked list for resolver stack #443).iterated, queried with
has, or itssizeis read.objects, so the hot path allocates no strings at all.
The
Set<string>surface (has,size, iteration,keys,values,entries,forEach,instanceof Set) is preserved by havingStackSetextendSetand override its read operations; thegenerated
types.d.tsis unchanged.