Follow-up to #11857 / #11851. The remaining clean-install gap to pnpm CLI is architectural, not a kernel-contention problem. Spinning this out as its own tracking issue so #11857 can stay scoped to the original (now-debunked) hypothesis.
TL;DR
install_with_fresh_lockfile → install_subtree is a recursive per-package async tree walk that doesn't batch the link phase. install_frozen_lockfile → CreateVirtualStore is a phased warm/cold-batch architecture with a single rayon par_iter over snapshots. The two paths solve the same problem on disk but the recursive shape is structurally slower. Routing the fresh-lockfile path through CreateVirtualStore after resolution should close the gap.
Evidence
4-scenario sweep on the alotta-files fixture (3 343 packages, verdaccio mock), 10-core M-series Mac, 5-run hyperfine with 2 warmups:
| scenario |
pacquet wall |
pacquet sys |
pnpm wall |
pnpm sys |
gap |
| frozen-lockfile, warm cache |
7.5 ± 0.2 s |
24.6 s |
8.6 ± 0.7 s |
10.1 s |
pacquet +14% |
| frozen-lockfile, cold cache |
21.1 ± 0.5 s |
31.3 s |
21.7 ± 1.0 s |
19.7 s |
pacquet +3% |
| clean-install, cold cache |
25.5 ± 1.5 s |
32.5 s |
20.9 ± 1.7 s |
18.4 s |
pnpm +22% |
| full-resolution, warm cache |
22.1 ± 1.7 s |
24.6 s |
11.4 ± 0.6 s |
10.0 s |
pnpm +94% |
The full-resolution-warm row pins the architectural cost: with a warm store and no lockfile, the resolve phase (~4–5 s) plus the recursive install_subtree link phase (~17 s) totals 22 s. With a lockfile, the same on-disk work goes through CreateVirtualStore and lands in 7.5 s.
Where the cost is
install_with_fresh_lockfile (the no-lockfile entry, crates/package-manager/src/install_with_fresh_lockfile.rs) does:
resolve_importer — walks the manifest via the resolver chain, produces the peer-resolved graph (~4-5 s)
prefetch_cas_paths — best-effort warm-cache batch lookup against index.db (~0 s on cold; populates for the install pass on warm)
build_fresh_lockfile — converts the resolved graph into the v9 snapshots: / packages: shape (~3 ms)
VirtualStoreLayout::new — precomputes per-snapshot slot directories
install_subtree — recursive per-package walk that awaits download + import + symlink before recursing into children
Step 5 is the structural problem. Each install_subtree call awaits install_package_from_registry for one package, which itself runs import_indexed_dir synchronously (blocking the tokio worker on a rayon par_iter for one destination directory). The try_join_all across siblings only buys ~10-way parallelism (= tokio worker count) and rayon work-stealing across simultaneous par_iters on different destination dirs is worse than one phased par_iter over all of them.
install_frozen_lockfile (the lockfile entry, crates/package-manager/src/install_frozen_lockfile.rs) does:
- Lockfile parsed →
packages, snapshots, importers, current_snapshots, current_packages
prefetch_cas_paths + integrity verify via rayon par_iter (~ms when warm)
compute_skipped_snapshots for platform-mismatched optionals + previously-skipped entries
VirtualStoreLayout::new
CreateVirtualStore::run — single phased pass that does:
- warm batch: rayon
par_iter over snapshots whose CAS paths the prefetch already verified, calling CreateVirtualDirBySnapshot::run (import + symlink layout via rayon::join)
- cold batch:
try_join_all of downloads, falling into the same shape once the tarball is in CAS
SymlinkDirectDependencies::run — creates the node_modules/<alias> → slot symlinks for every importer's direct deps in one batch
LinkVirtualStoreBins::run — creates <slot>/node_modules/.bin/* per slot using the prefetched manifests
The CreateVirtualStore warm batch is a single rayon par_iter over all snapshots, with one work-stealing scope. The fresh-lockfile path's recursive install_subtree is N nested par_iters each scoped to one package's files. Same total CPU, very different rayon scheduling behavior.
What the refactor needs to do
install_with_fresh_lockfile already produces everything install_frozen_lockfile's pipeline needs. After step 4 above (VirtualStoreLayout::new), the diff is:
-
Replace step 5 (install_subtree) with:
compute_skipped_snapshots against built_lockfile.snapshots + the empty current_snapshots/current_packages (no prior install)
CreateVirtualStore::run with built_lockfile's packages / snapshots, the resolved layout, and the same PrefetchingResolver-populated MemCache the resolve phase already populated
SymlinkDirectDependencies::run with built_lockfile's importers
LinkVirtualStoreBins::run
-
Plumb the install-pass state in: tarball_mem_cache, verified_files_cache, store_index, store_index_writer are already owned in install_with_fresh_lockfile; CreateVirtualStore::run already takes the same handles in install_frozen_lockfile. Just thread them through.
-
Direct-dep tracking: today install_subtree is invoked once per direct dep via peers_result.direct_dependencies_by_alias. The refactored path needs to surface the same (alias, depPath) pairs to SymlinkDirectDependencies. direct_dependencies_by_alias is already built; built_lockfile.importers should already carry the importer-keyed direct list.
-
Build phase: install_frozen_lockfile calls BuildModules after the link phase. install_with_fresh_lockfile should mirror that, gated by the same allow_build_policy.
-
Hoisted linker: install_frozen_lockfile has a hoisted branch that dispatches through link_hoisted_modules instead of SymlinkDirectDependencies + LinkVirtualStoreBins. The refactored fresh-lockfile path needs to mirror this so nodeLinker: hoisted still works.
Expected impact
- full-resolution warm: 22.1 s → ~11–12 s (resolve 4–5 s + warm-batch link ~7 s, matching the frozen-warm path that lands at 7.5 s on the same fixture)
- clean-install cold: 25.5 s → ~18–20 s (cold path still pays the network + extract cost, but the link phase goes through the warm/cold batch and stops blocking tokio workers on rayon)
- frozen-lockfile scenarios: unchanged (already routed through
CreateVirtualStore)
Net: closes the headline pnpm-vs-pacquet gap on the two scenarios where pnpm currently leads, without regressing the two where pacquet already leads.
Out of scope
Plan
- Lift
install_frozen_lockfile's lockfile-driven pipeline (steps 5–7 above) into a shared helper that takes the lockfile shape + the install-scoped handles.
- Wire
install_with_fresh_lockfile to call that helper after build_fresh_lockfile + VirtualStoreLayout::new.
- Delete
install_subtree (and install_package_from_registry's caller side, if no other entry point remains).
- Bench all four scenarios to confirm the wins and the no-regression on frozen.
Will track the PR against this issue.
Written by an agent (Claude Code, claude-opus-4-7).
Follow-up to #11857 / #11851. The remaining clean-install gap to pnpm CLI is architectural, not a kernel-contention problem. Spinning this out as its own tracking issue so #11857 can stay scoped to the original (now-debunked) hypothesis.
TL;DR
install_with_fresh_lockfile→install_subtreeis a recursive per-package async tree walk that doesn't batch the link phase.install_frozen_lockfile→CreateVirtualStoreis a phased warm/cold-batch architecture with a single rayonpar_iterover snapshots. The two paths solve the same problem on disk but the recursive shape is structurally slower. Routing the fresh-lockfile path throughCreateVirtualStoreafter resolution should close the gap.Evidence
4-scenario sweep on the
alotta-filesfixture (3 343 packages, verdaccio mock), 10-core M-series Mac, 5-runhyperfinewith 2 warmups:The full-resolution-warm row pins the architectural cost: with a warm store and no lockfile, the resolve phase (~4–5 s) plus the recursive
install_subtreelink phase (~17 s) totals 22 s. With a lockfile, the same on-disk work goes throughCreateVirtualStoreand lands in 7.5 s.Where the cost is
install_with_fresh_lockfile(the no-lockfile entry, crates/package-manager/src/install_with_fresh_lockfile.rs) does:resolve_importer— walks the manifest via the resolver chain, produces the peer-resolved graph (~4-5 s)prefetch_cas_paths— best-effort warm-cache batch lookup againstindex.db(~0 s on cold; populates for the install pass on warm)build_fresh_lockfile— converts the resolved graph into the v9snapshots:/packages:shape (~3 ms)VirtualStoreLayout::new— precomputes per-snapshot slot directoriesinstall_subtree— recursive per-package walk that awaits download + import + symlink before recursing into childrenStep 5 is the structural problem. Each
install_subtreecall awaitsinstall_package_from_registryfor one package, which itself runsimport_indexed_dirsynchronously (blocking the tokio worker on a rayonpar_iterfor one destination directory). Thetry_join_allacross siblings only buys ~10-way parallelism (= tokio worker count) and rayon work-stealing across simultaneouspar_iters on different destination dirs is worse than one phasedpar_iterover all of them.install_frozen_lockfile(the lockfile entry, crates/package-manager/src/install_frozen_lockfile.rs) does:packages,snapshots,importers,current_snapshots,current_packagesprefetch_cas_paths+ integrity verify via rayon par_iter (~ms when warm)compute_skipped_snapshotsfor platform-mismatched optionals + previously-skipped entriesVirtualStoreLayout::newCreateVirtualStore::run— single phased pass that does:par_iterover snapshots whose CAS paths the prefetch already verified, callingCreateVirtualDirBySnapshot::run(import + symlink layout viarayon::join)try_join_allof downloads, falling into the same shape once the tarball is in CASSymlinkDirectDependencies::run— creates thenode_modules/<alias>→ slot symlinks for every importer's direct deps in one batchLinkVirtualStoreBins::run— creates<slot>/node_modules/.bin/*per slot using the prefetched manifestsThe
CreateVirtualStorewarm batch is a single rayonpar_iterover all snapshots, with one work-stealing scope. The fresh-lockfile path's recursiveinstall_subtreeis N nestedpar_iters each scoped to one package's files. Same total CPU, very different rayon scheduling behavior.What the refactor needs to do
install_with_fresh_lockfilealready produces everythinginstall_frozen_lockfile's pipeline needs. After step 4 above (VirtualStoreLayout::new), the diff is:Replace step 5 (
install_subtree) with:compute_skipped_snapshotsagainstbuilt_lockfile.snapshots+ the emptycurrent_snapshots/current_packages(no prior install)CreateVirtualStore::runwithbuilt_lockfile'spackages/snapshots, the resolved layout, and the samePrefetchingResolver-populatedMemCachethe resolve phase already populatedSymlinkDirectDependencies::runwithbuilt_lockfile'simportersLinkVirtualStoreBins::runPlumb the install-pass state in:
tarball_mem_cache,verified_files_cache,store_index,store_index_writerare already owned ininstall_with_fresh_lockfile;CreateVirtualStore::runalready takes the same handles ininstall_frozen_lockfile. Just thread them through.Direct-dep tracking: today
install_subtreeis invoked once per direct dep viapeers_result.direct_dependencies_by_alias. The refactored path needs to surface the same(alias, depPath)pairs toSymlinkDirectDependencies.direct_dependencies_by_aliasis already built;built_lockfile.importersshould already carry the importer-keyed direct list.Build phase:
install_frozen_lockfilecallsBuildModulesafter the link phase.install_with_fresh_lockfileshould mirror that, gated by the sameallow_build_policy.Hoisted linker:
install_frozen_lockfilehas a hoisted branch that dispatches throughlink_hoisted_modulesinstead ofSymlinkDirectDependencies+LinkVirtualStoreBins. The refactored fresh-lockfile path needs to mirror this sonodeLinker: hoistedstill works.Expected impact
CreateVirtualStore)Net: closes the headline pnpm-vs-pacquet gap on the two scenarios where pnpm currently leads, without regressing the two where pacquet already leads.
Out of scope
pick_package,pick_package_from_meta, peer resolution) keeps its current cost. Profile shows 4-5 s on a 3 343-package fixture; that's not the headline gap and the existingPrefetchingResolverfrom perf(pacquet): close the clean-install gap to pnpm CLI #11856 already overlaps tarball downloads with resolution.clonefileatcontention pattern from pacquet:pacquet installslower thanpnpm installdue to syscall contention #11851 is real on 16-core+ hosts but is a separate axis. None of the experiments inpacquet-perf5moved the wall on the 10-core local box, and the architectural fix above is the one that matters across all hardware.Plan
install_frozen_lockfile's lockfile-driven pipeline (steps 5–7 above) into a shared helper that takes the lockfile shape + the install-scoped handles.install_with_fresh_lockfileto call that helper afterbuild_fresh_lockfile+VirtualStoreLayout::new.install_subtree(andinstall_package_from_registry's caller side, if no other entry point remains).Will track the PR against this issue.
Written by an agent (Claude Code, claude-opus-4-7).