Skip to content

FileSystemInfo context-queue overflow (RangeError: Invalid array length) on pnpm peer-variant realpaths — residual of #12246, not fixed by #14019 #21084

Description

@joykhera

Bug report

What is the current behavior?

On a workspace whose node_modules is a cyclic symlink graph (pnpm's content-addressed store with peer-dependency back-edges), FileSystemInfo's context-resolution queue re-pushes already-visited targets indefinitely. The queue grows past V8's 2³²-element array limit and the build dies with:

RangeError: Invalid array length
    at Array.push (<anonymous>)

The push site is processAsyncTree (lib/util/processAsyncTree.js), driven by FileSystemInfo._resolveContextTsh and its timestamp-only sibling (contextTimestampQueue / contextHashQueue). processAsyncTree has no visited-set, so a cyclic input graph never terminates.

This is the exact failure described in #12246 ("contextTimestampQueue and contextHashQueue in FileSystemInfo will never reach the end"). #12246 was closed by #14019 (track + resolve symlinks when reading context timestamps/hashes), but the fix is incomplete: the residual overflow was reproduced immediately after #14019 merged (see #12246 comment, 2021-09-09, video.js + cnpm) and dismissed as a memory-size issue. It is not a memory-size issue — it is unbounded queue growth from revisiting symlink-equivalent paths.

Environment where we hit it

  • Bundler: webpack vendored in next@16.3.0-canary.39 (dist/compiled/webpack/bundle5.js); also reproduced on the webpack carried by earlier Next 16.x.
  • Install: pnpm (symlinked store), large monorepo (~hundreds of workspace packages with peer back-edges).
  • Surfaces during next build static prerender. Local reproduction also possible on the dev server per the Vercel community thread (“RangeError: Invalid array length During Vercel Build”, identical at Array.push).

Reproduction (scale-dependent — disclosed honestly)

This is a scale-dependent overflow, not a structural one. A minimal 2-package
cyclic pnpm workspace (peer/dependency back-edges, filesystem cache on) sets up
the right structure but builds cleanly on current webpack — a 2-node cycle
never revisits enough directories to approach 2^32. The crash needs scale:
enough symlink-equivalent context directories that the un-deduped queue grows
past the array limit.

  • Historical: the FileSystemInfo can not createSnapshot with symbol link #12246 repro (killagu/npminstall_webpack, cnpm layout)
    still triggered the residual once a large back-edged package (video.js)
    was added — reported in-thread 2021-09-09, dismissed without a fix.
  • Ours: a real pnpm monorepo (hundreds of workspace packages, peer back-edges)
    building under next build (webpack vendored in next@16.3.0-canary.39)
    overflows reliably. The full reproduction is our private build; a minimal
    structural scaffold + a sketch for scaling it toward the overflow is
    available on request.

I'm aware "reproduces at scale, minimal structural setup attached" is weaker
than a one-command repro — happy to invest in a self-contained large-graph
generator if a maintainer confirms the seam and is willing to look.

Diagnostic — where #14019's dedup gap is (this is the useful part)

I tried to build a synthetic one-command repro and it does NOT reproduce, in a
way that points straight at the gap. Generating a dense cyclic workspace
graph (tried up to 2000 interlinked packages, fanout 8, low heap to fail fast)
builds cleanly every time. Reason: pnpm links workspace packages as a single
real directory each, so #14019 realpath-collapses every revisit and the queue
stays bounded.

The real-world overflow comes from registry packages with peer
dependencies. pnpm's virtual store materializes the same logical package at
many distinct realpaths — pkg@ver_<peerhash1>, pkg@ver_<peerhash2>, … —
one per peer-resolution set. These are genuinely different directories, so
#14019's realpath collapse does NOT dedup them; the un-deduped processAsyncTree
queue revisits each peer-variant's context dependencies, and at a large monorepo's
peer-variant count the push total crosses 2^32 (or OOMs first with
invalid array length Allocation failed).

So the missing dedup is not "resolve symlinks to realpath" (#14019 did that) —
it's "treat peer-variant realpaths of the same package as already-seen in the
context-hash walk," or equivalently a visited-set in processAsyncTree itself.
A faithful synthetic repro therefore needs registry packages with conflicting
peers (multi-version), not a workspace cycle — which is why the attached
scaffold is structural-only.

What is the expected behavior?

FileSystemInfo context resolution should visit each real directory at most once. Per #12246's own suggestion: resolve symlink targets to their realpath and skip already-processed targets, so symlink-equivalent paths collapse to one node and the queue terminates.


Proposed fix (correct seam)

Dedup at the FileSystemInfo context-queue level keyed on realpath, so the generic processAsyncTree semantics are untouched and only the symlink-cycle source is collapsed. This is the minimal, semantically-safe fix and matches the direction of #14019.

Workaround we ship today (broader hammer — noting it for completeness, NOT proposing it as the upstream fix)

A per-call visited Set inside processAsyncTree itself:

const processAsyncTree = (items, concurrency, processor, callback) => {
  const queue = Array.from(items);
  if (queue.length === 0) return callback();
  const seen = new Set(queue);          // added
  // ...
  const push = item => {
    if (seen.has(item)) return;          // added
    seen.add(item);                      // added
    queue.push(item);
    // ...
  };

We apply this via a package patch on the vendored bundle. Empirically it took our build from a hard crash to exit 0 and dropped peak compile RSS ~26% (12.7 GB → 9.4 GB). We flag that processAsyncTree is a generic utility — a blanket dedup there changes behavior for every caller, which is why we believe the realpath-dedup at the FileSystemInfo seam is the right upstream landing spot, not this.

Related: #12246, #14019, #12503, #12810 (all symlink-cycle FileSystemInfo reports).

Happy to open a PR against the FileSystemInfo seam with a pnpm-cycle unit test if maintainers agree on the approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions