perf: walk side-effects iteratively without per-module generator allocation#21014
Conversation
🦋 Changeset detectedLatest commit: 18ff7f8 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
This PR is packaged and the instant preview is available (c755747). Install it locally:
npm i -D webpack@https://pkg.pr.new/webpack@c755747
yarn add -D webpack@https://pkg.pr.new/webpack@c755747
pnpm add -D webpack@https://pkg.pr.new/webpack@c755747 |
There was a problem hiding this comment.
Pull request overview
This PR optimizes NormalModule#getSideEffectsConnectionState by replacing the per-module generator allocation (introduced to avoid deep recursion stack overflows) with an explicit iterative traversal using parallel-array stacks. This keeps the walk stack-safe on deep chains while reducing allocations and improving rebuild performance.
Changes:
- Replaced the generator-trampoline side-effects walk with an allocation-light iterative loop using explicit stacks.
- Added a fast-path helper to resolve side-effects state without descending when it can be determined from metadata / re-entry checks.
- Extended unit tests with a branching (“diamond”) dependency case to validate correct aggregation across multiple deps.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
lib/NormalModule.js |
Reworks side-effects traversal to an iterative stack-based algorithm with a metadata-based fast path. |
test/NormalModule.unittest.js |
Adds a branching-deps test to validate correct state aggregation and flag cleanup. |
.changeset/iterative-side-effects-walk.md |
Adds a patch changeset documenting the perf-focused rewrite. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #21014 +/- ##
==========================================
- Coverage 91.63% 91.56% -0.08%
==========================================
Files 573 573
Lines 59277 59460 +183
Branches 16012 16054 +42
==========================================
+ Hits 54321 54443 +122
- Misses 4956 5017 +61
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2e8d002 to
aa20f65
Compare
aa20f65 to
52746f1
Compare
| if (state !== ModuleGraphConnection.CIRCULAR_CONNECTION) { | ||
| currentStack[top] = ModuleGraphConnection.addConnectionStates( | ||
| currentStack[top], | ||
| state | ||
| ); | ||
| } | ||
| indexStack[top]++; |
| descended = true; | ||
| break; | ||
| } | ||
| } else { |
Three additive optimizations to the cached walker:
- Inline the fast-path checks into both recursive and iterative
walkers, saving one V8 stack frame per descent. The standalone
`sideEffectsFastPath` helper is removed (only ever used by the
walkers, now inlined).
- Hoist `getHarmonyImportSideEffectDependency()` to one resolution at
the public entry; threaded into the recursive call as a parameter.
- Replace the per-frame `_sideEffectsCircularSeen` save/restore with a
single per-walk flag reset only at the public entry. A cycle
anywhere in the walk inhibits caching for the rest of that walk —
strictly conservative vs. the previous subtree-scoped tracking, but
cycle-free builds (the common case) cache identically.
- Add cache lookup and cache store to the iterative form so deep-chain
walks also benefit from memoization, and inline its fast-path
checks.
Latest microbench (best of 7, fresh chain per call where applicable):
First-walk only (cold cache):
chain BEFORE NOW
10 311 ns 293 ns (faster than BEFORE)
100 99 ns 96 ns (faster than BEFORE)
1000 127 ns 136 ns
5000 123 ns 485 ns (iterative tail engaged)
Repeated queries (SideEffectsFlagPlugin access pattern):
chain BEFORE NOW
10 0.03 ms 0.08 ms
100 0.13 ms 0.06 ms 2.2x
1000 14.34 ms 0.42 ms 34x
5000 694.30 ms 1.90 ms 365x
52746f1 to
9a593fa
Compare
A purely linear chain of side-effect-free imports — module A imports B imports C imports ... — is the worst case for stack-frame-per-module walking. It is also the pattern that triggered #20986. walkSideEffectsRecursive now peels off linear-chain heads inside a single frame. Each iteration of the inner loop: - runs the cache check + fast-path checks against the current module, - if the module is a linear-chain head (exactly one HarmonyImportSideEffectDependency to another NormalModule), pushes it onto a per-walk ancestor stack and continues with the referenced module, - if not, breaks out to the existing general for-loop walk. The chain's tail returns a state which propagateLinearResult walks back up the ancestor stack, applying the same aggregation rules the recursive call would have applied (bailout, CIRCULAR filtering, caching). Net effect: walks that were strictly linear cost one V8 frame total no matter how deep, and never hit walkSideEffectsIterative. The depth limit only applies to genuinely branching graphs where each level truly needs its own frame. Microbench (median, fresh chain per call, no cache reuse): chain BEFORE prev (v2) NOW (v3) 10 309 ns 293 ns 292 ns 100 143 ns 96 ns 127 ns 1000 114 ns 136 ns 123 ns 5000 123 ns 485 ns 146 ns The chain=5000 regression vs. BEFORE is gone — the linear-chain fast path keeps it within run-to-run noise of the recursive baseline. Repeated queries (SideEffectsFlagPlugin pattern, chain=5000): BEFORE: 676.64 ms NOW: 1.83 ms (370x faster)
67f2ef5 to
b669589
Compare
| if (state === true) { | ||
| recordSideEffectsBailout(topMod, moduleGraph, dep); | ||
| topMod._isEvaluatingSideEffects = false; | ||
| modStack.pop(); | ||
| depsStack.pop(); | ||
| indexStack.pop(); | ||
| currentStack.pop(); | ||
| pending = true; | ||
| continue; |
| * Memoizes the result of `getSideEffectsConnectionState`. The | ||
| * graph slot keys the cached value to the `ModuleGraph` it was | ||
| * computed against so stale values never leak across compilations | ||
| * — a walk that targets a different graph just overwrites both | ||
| * slots. Populated only for results computed without encountering | ||
| * a circular connection (see `walkSideEffectsRecursive`). | ||
| * @type {ModuleGraph | undefined} | ||
| */ | ||
| this._sideEffectsStateGraph = undefined; | ||
| /** @type {ConnectionState | undefined} */ | ||
| this._sideEffectsStateValue = undefined; | ||
| /** | ||
| * @private |
| // Non-linear walk. Each genuine recursive call costs one V8 frame, so | ||
| // honour the depth limit here. | ||
| if (depth >= SIDE_EFFECTS_RECURSION_LIMIT) { | ||
| return propagateLinearResult( | ||
| linearAncestors, | ||
| walkSideEffectsIterative(mod, moduleGraph), | ||
| moduleGraph | ||
| ); |
| it("propagates a deep-chain bailout all the way back to the root", () => { | ||
| // 5000 modules is far beyond what the recursive walker would walk | ||
| // using V8 stack frames; the linear-chain peeling code path must | ||
| // still propagate a bailout deep in the tail back to module 0. | ||
| const { modules, moduleGraph } = buildChain(5000); |
The previous linear-chain optimization only fired when a module had exactly one entry in `dependencies` — but a real ESM module compiled by webpack has the `HarmonyImportSideEffectDependency` *plus* an `HarmonyExportSpecifierDependency` per `export` and a `ConstDependency` per replaced expression. With the strict `length === 1` check the 1300-module cyclic chain (the canonical #20986 repro) fell through to the general for-loop walk and overflowed V8's stack at ~1300 frames again on Node 22+. Loosen the linear-chain detection to "exactly one `SideEffectDep` among any number of deps whose `getModuleEvaluationSideEffectsState` returns `false`/`CIRCULAR_CONNECTION`". The walker now processes the module's non-recursive deps in their declared order — bailing out correctly with the first dep whose state is `true`, aggregating non-`false` states into `nonRecursiveCurrent` — and only attempts the linear tail-call when: - at most one `SideEffectDep` was seen, - no non-recursive dep triggered a bailout, - `nonRecursiveCurrent` stayed at `false` (otherwise the "ancestor's current = chain state" propagation rule no longer holds and we fall back to the general walk). Verified on the actual #20986 repro (1300-module cyclic chain): AFTER (#20993 generator trampoline) ~12.3 s NOW (this branch) ~6.5 s (~1.9x faster) Larger chains (N >= 3000) hit pre-existing stack overflows in `ModuleConcatenationPlugin._tryToAdd` and `SideEffectsFlagPlugin.optimizeIncomingConnections` that are out of scope here — both branches behave identically there. Adds a regression test that builds a 5000-module cyclic chain whose modules have the same `[SideEffectDep, ConstDependency, ConstDependency]` dependency shape as the real repro.
| if (pending !== undefined) { | ||
| const state = pending; | ||
| pending = undefined; | ||
| const dep = deps[index]; | ||
|
|
||
| if (state === true) { | ||
| recordSideEffectsBailout(topMod, moduleGraph, dep); | ||
| topMod._isEvaluatingSideEffects = false; | ||
| modStack.pop(); | ||
| depsStack.pop(); | ||
| indexStack.pop(); | ||
| currentStack.pop(); | ||
| pending = true; | ||
| continue; |
| ancestor._isEvaluatingSideEffects = false; | ||
| const dep = ancestor.dependencies[0]; | ||
|
|
||
| if (state === true) { | ||
| recordSideEffectsBailout(ancestor, moduleGraph, dep); | ||
| // `true` is monotonic — safe to cache regardless of cycle status. | ||
| ancestor._sideEffectsStateGraph = moduleGraph; | ||
| ancestor._sideEffectsStateValue = true; |
Three correctness fixes + a leak cleanup + a coverage gap, all flagged by Copilot on PR #21014: 1. propagateLinearResult was using `ancestor.dependencies[0]` as the bailout dep, but a module qualifying for the linear-chain fast path can have non-recursive deps (ConstDependency, HarmonyExport*) before its single HarmonyImportSideEffectDependency. The recorded bailout would then point at the wrong dep type / location. Push pairs of `(mod, linearDep)` onto `linearAncestors` and pop both in the propagation walk so the bailout always references the actual SideEffectDep that triggered the descent. 2. The iterative walker's `pending === true` propagation path popped each parent frame without writing `_sideEffectsStateGraph` / `_sideEffectsStateValue`, even though `true` is monotonic and is already memoized in the sibling `state === true` branch. Memoize `true` here too so deep-tree bailouts don't re-walk on subsequent queries. 3. `cleanupForCache` now clears `_sideEffectsStateGraph` and `_sideEffectsStateValue`. Otherwise a long-lived NormalModule strong-references the ModuleGraph it was last walked against, keeping the entire Compilation alive across watch-mode rebuilds for modules that are never re-queried with a fresh ModuleGraph. 4. Adds a regression test that builds a 2500-module *non-linear* graph (each module has two SideEffectDeps so the linear-chain fast path can't apply). The walker recurses one V8 frame per module and must cross `SIDE_EFFECTS_RECURSION_LIMIT` into the iterative fallback — exercising the path that was previously only covered by indirect tests.
Types CoverageCoverage after merging perf/side-effects-iterative-walk into main will be
Coverage Report
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The generator trampoline added in #20993 fixed the deep-chain stack
overflow (#20986) but allocated a fresh
walkSideEffectsgeneratorobject for every NormalModule traversed by
getSideEffectsConnectionState, which CodSpeed flagged as a 22-48%memory regression in three rebuild benchmarks.
Replace the generator with an explicit parallel-array frame stack
(
modStack/depsStack/indexStack/currentStack) so eachdescent only grows fixed-size arrays instead of allocating an iterator
object. A short-circuit fast path (
sideEffectsFastPath) handles thefactoryMeta / buildMeta / re-entry checks without ever touching the
frame stack — the common case (most modules) returns in a couple of
property reads with zero allocation.
The 20000-module chain in
NormalModule.unittest.jsstill completeswithout overflowing the stack; a new diamond test confirms branching
deps still aggregate correctly. A local microbench of the existing
chain fixture shows the walk is ~2-3x faster than the generator form
on chains of 10-20000 modules.