pacquet: remaining ~20% wall-clock gap to pnpm CLI on warm-cache GVS resolve

Follow-up to #11832 / #11837.

#11837 closed most of the gap on the `alotta-files` fixture under warm cache + global virtual store + no lockfile, but a wall-clock gap to the TypeScript pnpm CLI remains and the source isn't obvious from the per-phase trace alone.

## Current state

| | time |
|---|---|
| pacquet (best of several runs after #11837) | **5.03 s** |
| pacquet (typical run) | 5.0 - 5.7 s |
| pnpm CLI | **4.16 s** |
| gap (best case) | **~0.87 s (~21%)** |

For context, before #11837 pacquet was at 11.83 s on this scenario — so the PR closed ~6.8 s and what remains is the long tail.

## Per-phase trace

`TRACE=pacquet::install::phase=info pacquet install` on a fresh `rm -rf pnpm-lock.yaml node_modules`:

```
phase: "resolve_importer"        elapsed_ms: 3125  nodes: 1362
phase: "prefetch_cas_paths"      elapsed_ms: 69    cache_keys: 1362  hits: 1362
phase: "build_fresh_lockfile"    elapsed_ms: 3
phase: "virtual_store_layout_new" elapsed_ms: 11
phase: "install_subtree"         elapsed_ms: 1440
```

Wall = 5.03 s, sum of traced phases ≈ 4.65 s, the remaining ~0.4 s is state init / lockfile write / `.modules.yaml` / bin linking.

The resolve walk dominates (~3.1 s of ~5 s).

## What #11837 already changed

In order, with the per-step phase impact called out:

1. `perf(pacquet)`: pipeline tarball fetches with resolution (`PrefetchingResolver`) — later removed in favour of (2).
2. `perf(pacquet/resolving-npm-resolver)`: dedup concurrent packument fetches (`PackumentFetchLocker`, port of upstream's `metafileOperationLimits` + `runLimited`).
3. `fix(pacquet/resolving-npm-resolver)`: forward `etag` / `modified` on the abbreviated → full upgrade fetch.
4. `perf(pacquet/resolving-npm-resolver)`: move mirror disk reads off the tokio worker via `spawn_blocking`.
5. `fix(pacquet/tarball)`: emit `pnpm:progress` exactly once per URL (fixed `reused: 164746` for 1362 packages).
6. `perf(pacquet/package-manager)`: batched `prefetch_cas_paths` on the fresh-lockfile install path (port of `create_virtual_store`'s frozen-path shape).
7. `perf(pacquet/resolving-npm-resolver)`: share `Package` via `Arc` in the meta cache (avoid deep-cloning packuments on hits).
8. `perf(pacquet/resolving-resolver-base)`: share `ResolveResult.manifest` via `Arc`.
9. `perf(pacquet/resolving-npm-resolver)`: dedup picked-manifest `serde_json::to_value` per `(name, version)`.
10. `perf(pacquet/resolving-deps-resolver)`: swap `tokio::sync::Mutex` for `std::sync::Mutex` on `TreeCtx`.
11. `perf(pacquet/resolving-deps-resolver)`: share `ResolveResult` itself via `Arc` on `ResolvedPackage` / `DependenciesGraphNode`.

## What to investigate next

Best done with a profile attached. Quick path:

```bash
cargo install --locked samply
cargo build --profile release-debug -p pacquet-cli
rm -rf pnpm-lock.yaml node_modules
samply record ./target/release-debug/pacquet install
```

`samply record` opens a `profiler.firefox.com` URL with the flamegraph; attach a screenshot or link.

Hypotheses to evaluate against the profile (in roughly decreasing-likelihood order):

- **`Package.versions: HashMap<String, PackageVersion>` is cloned per pick** — `pick_matching_version_*` returns owned `PackageVersion` cloned from `Package.versions.get(version)`. Wrapping each version in `Arc` (`HashMap<String, Arc<PackageVersion>>`) collapses the pick to an `Arc::clone`. Likely ~50-200 ms.
- **`async_recursion::async_recursion` on `resolve_node` allocates a `Box<dyn Future>` per recursive call** (1362+). Worth measuring whether removing the macro and using an explicit `Pin<Box<Future>>` only at the recursion edge (or a manual stack-managed walker) is cheaper. Probably small.
- **`Vec<String>` `next_ancestors` is cloned per child edge** in `resolve_node` (one clone per edge in the tree). Switching to a `Arc<im::Vector<String>>` or a cons-list / persistent structure trades the per-child Vec clone for an `Arc::clone`. Probably small.
- **`futures_util::try_join_all` polling overhead** when the resolver returns synchronously from the meta cache (no actual IO). Consider `FuturesUnordered` instead, or batching the synchronous-cache-hit branch off the futures path.
- **HTTP-layer overhead per packument** (304 responses through `reqwest`). pnpm uses `make-fetch-happen`; if the per-request setup cost is meaningfully higher in `reqwest`, this would show in the profile as `hyper` / `rustls` frames.

If install_subtree variance is significant (we've seen ~1.4-2.3 s across runs), that's also worth pinning down — the file-system work should be deterministic given the same starting state.

## Reproduction

- macOS, ARM (M-series), warm pnpm store at the default location.
- The `alotta-files` benchmark fixture (`/Volumes/src/pnpm/pnpm.io/benchmarks/fixtures/alotta-files`) — 110 direct deps, 1362 resolved packages.
- `pnpm-workspace.yaml` with `enableGlobalVirtualStore: true`.
- Default `minimumReleaseAge: 1440` (24 h).

## Out of scope for this issue

Everything #11837 already shipped, including the per-phase tracing the `TRACE=pacquet::install::phase=info` invocation above relies on.

---
Written by an agent (Claude Code, claude-opus-4-7).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pacquet: remaining ~20% wall-clock gap to pnpm CLI on warm-cache GVS resolve #11842

Current state

Per-phase trace

What #11837 already changed

What to investigate next

Reproduction

Out of scope for this issue

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	time
pacquet (best of several runs after #11837)	5.03 s
pacquet (typical run)	5.0 - 5.7 s
pnpm CLI	4.16 s
gap (best case)	~0.87 s (~21%)

Uh oh!

Uh oh!

pacquet: remaining ~20% wall-clock gap to pnpm CLI on warm-cache GVS resolve #11842

Description

Current state

Per-phase trace

What #11837 already changed

What to investigate next

Reproduction

Out of scope for this issue

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions