Skip to content

pacquet: remaining ~20% wall-clock gap to pnpm CLI on warm-cache GVS resolve #11842

Description

@zkochan

Follow-up to #11832 / #11837.

#11837 closed most of the gap on the alotta-files fixture under warm cache + global virtual store + no lockfile, but a wall-clock gap to the TypeScript pnpm CLI remains and the source isn't obvious from the per-phase trace alone.

Current state

time
pacquet (best of several runs after #11837) 5.03 s
pacquet (typical run) 5.0 - 5.7 s
pnpm CLI 4.16 s
gap (best case) ~0.87 s (~21%)

For context, before #11837 pacquet was at 11.83 s on this scenario — so the PR closed ~6.8 s and what remains is the long tail.

Per-phase trace

TRACE=pacquet::install::phase=info pacquet install on a fresh rm -rf pnpm-lock.yaml node_modules:

phase: "resolve_importer"        elapsed_ms: 3125  nodes: 1362
phase: "prefetch_cas_paths"      elapsed_ms: 69    cache_keys: 1362  hits: 1362
phase: "build_fresh_lockfile"    elapsed_ms: 3
phase: "virtual_store_layout_new" elapsed_ms: 11
phase: "install_subtree"         elapsed_ms: 1440

Wall = 5.03 s, sum of traced phases ≈ 4.65 s, the remaining ~0.4 s is state init / lockfile write / .modules.yaml / bin linking.

The resolve walk dominates (~3.1 s of ~5 s).

What #11837 already changed

In order, with the per-step phase impact called out:

  1. perf(pacquet): pipeline tarball fetches with resolution (PrefetchingResolver) — later removed in favour of (2).
  2. perf(pacquet/resolving-npm-resolver): dedup concurrent packument fetches (PackumentFetchLocker, port of upstream's metafileOperationLimits + runLimited).
  3. fix(pacquet/resolving-npm-resolver): forward etag / modified on the abbreviated → full upgrade fetch.
  4. perf(pacquet/resolving-npm-resolver): move mirror disk reads off the tokio worker via spawn_blocking.
  5. fix(pacquet/tarball): emit pnpm:progress exactly once per URL (fixed reused: 164746 for 1362 packages).
  6. perf(pacquet/package-manager): batched prefetch_cas_paths on the fresh-lockfile install path (port of create_virtual_store's frozen-path shape).
  7. perf(pacquet/resolving-npm-resolver): share Package via Arc in the meta cache (avoid deep-cloning packuments on hits).
  8. perf(pacquet/resolving-resolver-base): share ResolveResult.manifest via Arc.
  9. perf(pacquet/resolving-npm-resolver): dedup picked-manifest serde_json::to_value per (name, version).
  10. perf(pacquet/resolving-deps-resolver): swap tokio::sync::Mutex for std::sync::Mutex on TreeCtx.
  11. perf(pacquet/resolving-deps-resolver): share ResolveResult itself via Arc on ResolvedPackage / DependenciesGraphNode.

What to investigate next

Best done with a profile attached. Quick path:

cargo install --locked samply
cargo build --profile release-debug -p pacquet-cli
rm -rf pnpm-lock.yaml node_modules
samply record ./target/release-debug/pacquet install

samply record opens a profiler.firefox.com URL with the flamegraph; attach a screenshot or link.

Hypotheses to evaluate against the profile (in roughly decreasing-likelihood order):

  • Package.versions: HashMap<String, PackageVersion> is cloned per pickpick_matching_version_* returns owned PackageVersion cloned from Package.versions.get(version). Wrapping each version in Arc (HashMap<String, Arc<PackageVersion>>) collapses the pick to an Arc::clone. Likely ~50-200 ms.
  • async_recursion::async_recursion on resolve_node allocates a Box<dyn Future> per recursive call (1362+). Worth measuring whether removing the macro and using an explicit Pin<Box<Future>> only at the recursion edge (or a manual stack-managed walker) is cheaper. Probably small.
  • Vec<String> next_ancestors is cloned per child edge in resolve_node (one clone per edge in the tree). Switching to a Arc<im::Vector<String>> or a cons-list / persistent structure trades the per-child Vec clone for an Arc::clone. Probably small.
  • futures_util::try_join_all polling overhead when the resolver returns synchronously from the meta cache (no actual IO). Consider FuturesUnordered instead, or batching the synchronous-cache-hit branch off the futures path.
  • HTTP-layer overhead per packument (304 responses through reqwest). pnpm uses make-fetch-happen; if the per-request setup cost is meaningfully higher in reqwest, this would show in the profile as hyper / rustls frames.

If install_subtree variance is significant (we've seen ~1.4-2.3 s across runs), that's also worth pinning down — the file-system work should be deterministic given the same starting state.

Reproduction

  • macOS, ARM (M-series), warm pnpm store at the default location.
  • The alotta-files benchmark fixture (/Volumes/src/pnpm/pnpm.io/benchmarks/fixtures/alotta-files) — 110 direct deps, 1362 resolved packages.
  • pnpm-workspace.yaml with enableGlobalVirtualStore: true.
  • Default minimumReleaseAge: 1440 (24 h).

Out of scope for this issue

Everything #11837 already shipped, including the per-phase tracing the TRACE=pacquet::install::phase=info invocation above relies on.


Written by an agent (Claude Code, claude-opus-4-7).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions