perf(pacquet): close the remaining install-phase sys-time gap to pnpm

Continuation tracker for the perf work landed in #11856. The current state plus the most promising next steps are below — picked up where I left off, the file:line hooks should make the rest of this run-able in a fresh session.

## Where we are after #11856

On the \`alotta-files\` fixture with the verdaccio mock (\`just registry-mock launch\`), 5 runs / 2 warmups, local machine:

| Scenario        | pacquet@main | pacquet@HEAD (#11856) | pnpm   | Ratio  |
|-----------------|-------------:|----------------------:|-------:|-------:|
| clean-install   | 31.7 s       | **12.3 s**            | 7.9 s  | 1.56× slower than pnpm |
| full-resolution | 40.9 s       | **27.4 s**            | 11.5 s | 2.39× slower than pnpm |

User CPU is now lower than pnpm (~6 s vs ~7 s on clean-install). **The remaining gap is almost entirely \`sys\` time** — file-system syscalls during the install/import phase:

| Scenario        | pacquet sys | pnpm sys |
|-----------------|------------:|---------:|
| clean-install   | 42 s        | 20 s     |
| full-resolution | 27 s        | 10 s     |

A \`sample(1)\` trace during install (release-debug build) shows the hot spots on tokio worker threads, ranked:

1. \`store_dir::write_cas_file\` — Sha512 + open(O_CREAT|O_EXCL) + write per file.
2. \`tarball::extract_tarball_entries\` — per-entry tar read + buffer + dispatch into \`write_cas_file\`.
3. \`fs::ensure_file::write_atomic\` — atomic CAS write path.
4. \`fs::ensure_file::cas_write_lock\` — per-path \`DashMap\` entry + Mutex acquire.
5. \`store_dir::store_index::StoreIndex::get\` — SQLite lookups per snapshot.
6. \`store_dir::check_pkg_files_integrity::check_file\` — per-file \`fs::metadata\` on warm reinstall.

## Concrete next steps (ranked by expected impact)

### 1. Sequential per-tarball file write loop in \`extract_tarball_entries\`
\`pacquet/crates/tarball/src/lib.rs:529-678\` walks every tar entry in a single loop and calls \`StoreDir::write_cas_file\` synchronously for each. Within a tarball, files are written serially; only across tarballs is there parallelism. For a 100-file tarball, that's 100 sequential \`open/write/close\` syscall triples on one thread.

- Investigate \`tar::Entries\` parallelism. The crate's iterator is sequential by construction (each \`Entry\` borrows the underlying reader's position), so we can't trivially \`par_iter\`. Two plausible shapes: (a) collect tar entries into owned \`Vec<(path, mode, Vec<u8>)>\` first, then rayon-parallel \`write_cas_file\`; (b) keep the loop sequential but move \`Sha512::digest\` + the write into a rayon channel-fed pipeline so disk I/O and hashing overlap.
- Memory budget matters — large tarballs (e.g. \`@babel/standalone\`) can be >10 MB; collecting all entries up front would spike RSS. A bounded channel that owns \`Vec<u8>\` for one entry at a time keeps RSS flat.
- pnpm's \`store/cafs/src/addFilesFromTarball.ts\` does the same sequential walk but the JIT + libuv worker pool give it implicit per-file parallelism we don't get from a single tokio worker calling a sync extract loop. Worth checking whether \`extract_tarball_entries\` actually runs inside \`spawn_blocking\` today (it should — verify via \`pacquet/crates/tarball/src/lib.rs\` ≈ \`run_without_mem_cache\`).

### 2. Per-path \`cas_write_lock\` overhead at \`pacquet/crates/fs/src/ensure_file.rs:217\`
Every CAS write acquires \`Arc<Mutex<()>>\` from a process-wide \`DashMap<PathBuf, Arc<Mutex<()>>>\`. On a 1362-package install with ~100 files/package, that's ~136k \`DashMap::entry\` operations plus the mutex acquire. The map is never pruned.

- The lock exists to coordinate **writers vs. concurrent verifiers** (\`check_pkg_files_integrity\` may delete the file while a writer is still appending; see the doc comment). For CAFS paths that are written exactly once per install — which is the vast majority of paths — the lock is dead weight.
- Investigate: is there a way to skip the lock when we know the path is fresh? A \`(stripe, hash[0])\` lock array (e.g. 256 mutexes keyed by first byte) might be cheaper than the per-path \`DashMap\` and just as correct since the verifier only races writers on the same path.
- pnpm's \`store/cafs/src/writeFile.ts\` uses \`locker: Map<string, number>\` — a refcount, not a mutex per path. Worth modelling here.

### 3. \`StoreIndex::get\` SQLite calls per snapshot
\`pacquet/crates/store-dir/src/store_index.rs\` exposes \`get(key)\` which the install path currently calls once per snapshot during \`run_with_mem_cache\`'s store-lookup branch. With \`prefetch_cas_paths\` running once at the install head (1362 keys batched), the per-snapshot \`get\` should be a hit-or-miss against the prefetched map and never touch SQLite. But on a cold or partial \`store-dir\`, snapshots that the prefetch missed still serialize on the shared \`Arc<Mutex<StoreIndex>>\`.

- Audit \`pacquet/crates/tarball/src/lib.rs::run_with_mem_cache\` and \`load_cached_cas_paths\` for the cold-miss path. Are we taking the \`Arc<Mutex<StoreIndex>>\` lock per snapshot when we already have a populated \`prefetched_cas_paths\` for the rest of the install?
- Consider exposing \`store_index.bulk_get(&[key]) -> HashMap\` on the \`Arc\` directly so the install pass can run a single \`SELECT ... WHERE key IN (...)\` rather than N round-trips behind the mutex.

### 4. \`check_file\` (warm reinstall) — per-file \`fs::metadata\`
\`pacquet/crates/store-dir/src/check_pkg_files_integrity.rs:410\` stats every file to compare \`mtime\` against \`checked_at\`. On a warm reinstall (full-resolution scenario), this fires once per file per package — ~130k stat syscalls. The \`verified_files_cache\` is supposed to dedup but only at the \`(file_path)\` level; multiple snapshots referencing the same CAFS blob still re-stat.

- Confirm \`SharedVerifiedFilesCache\` is actually deduping at the path level on this workload (sample says it's still hot, so maybe it isn't). \`pacquet/crates/store-dir/src/lib.rs\` \`SharedVerifiedFilesCache\`.
- pnpm verifies once per blob per process and caches \`{ ino, dev }\` so the second consumer of a popular CAFS path (e.g. \`react/index.js\`) just compares inode numbers. \`pacquet/crates/package-manager/src/import_indexed_dir.rs\` could grow a similar cache so the \`link_file\` fast path skips the stat entirely when the source has been verified this install.

### 5. \`load_meta\` still appears in samples (~145 in install-phase, ~110 in resolve-phase)
The packument cache fix in \`c5562c8d01\` collapsed most of the resolve-side hits, but \`load_meta\` is still in the top 10. Spot-check what's hitting it post-fix — most likely the cold-mirror path on a brand-new registry where the first install per process has to materialize the mirror from network. Less impact than the items above but worth a profile pass after #1-3 land.

## Reproducing the bench

```
just registry-mock launch                            # one-time
cd /Volumes/src/pnpm/pnpm/<worktree>
cargo run --release --bin=integrated-benchmark -- \\
  --scenario clean-install \\
  --registry-port <port-from-launch> \\
  --runs 5 --warmup 2 \\
  --with-pnpm \\
  pacquet@HEAD pacquet@main
```

For full-resolution, swap \`--scenario full-resolution\`. For per-phase timing, set \`TRACE=pacquet=info\` and run \`./bench-work-env/pacquet@HEAD/pacquet/target/release/pacquet install\` directly — the per-phase \`elapsed_ms\` is emitted to stderr via \`tracing::info!(target: \"pacquet::install::phase\", ...)\` at \`pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs\` (search for \"phase complete\").

For CPU sampling on macOS with symbols:

```
cargo build --profile release-debug --bin pacquet
cp target/release-debug/pacquet bench-work-env/pacquet@HEAD/pacquet/target/release/pacquet
cd bench-work-env/pacquet@HEAD && rm -rf node_modules pnpm-lock.yaml store-dir
cp .saved-package.json package.json
./pacquet/target/release/pacquet install &
PID=\$!; sleep 4; sample \$PID 10 -file /tmp/p.txt; wait \$PID
awk '/Thread_.*tokio-rt-worker/{flag=1} /Thread_.*:\$/{flag=0} flag && /pacquet_/' /tmp/p.txt \\
  | grep -v 'tokio::\\|scheduler' \\
  | awk -F'pacquet_' '{print \$NF}' | awk -F'::' '{print \$1, \$2, \$3, \$4}' \\
  | sort | uniq -c | sort -rn | head -30
```

## Gotchas the bench harness will trip you on

- **\`cache_dir\` is project-global**, not bench-scoped. With \`XDG_CACHE_HOME=~/.cache\` set, pacquet writes mirrors to \`~/.cache/pnpm/v11/metadata/localhost+<port>/\`. The harness only wipes \`bench-work-env/pacquet@HEAD/{node_modules,pnpm-lock.yaml,store-dir}\` between iterations, so the metadata mirror is **warm** across runs. Verify with \`ls ~/.cache/pnpm/v11/metadata/localhost+<port> | wc -l\` after the first iteration.
- **The verdaccio mock degrades** under sustained load — its RSS grows and request latency creeps up over a long bench session. If you see pacquet's wall time monotonically growing across runs while user-CPU stays flat, restart the mock (\`just registry-mock end && just registry-mock launch\`). It picks a new port each time, so update \`--registry-port\` accordingly.
- **The integrated-benchmark requires running from the repo root** so \`canonicalize(\".\")\` resolves to the pacquet git repo. The verifier now accepts a \`.git\` *file* (linked git worktree) as well as a directory — see the verifier change in #11856.
- **\`pacquet@HEAD\` builds inside \`bench-work-env/\`** — it does a fresh \`git fetch\` + checkout of the SHA, then \`cargo build --release\`. Two minutes per cold build. Working-tree changes are *not* picked up until you commit them and the bench reruns the build.

## Pointers used in the prior session

- Pipelined fetch: \`pacquet/crates/package-manager/src/prefetching_resolver.rs\` (new file from #11856).
- Resolve-time meta cache fix: \`pacquet/crates/resolving-npm-resolver/src/pick_package.rs:517-555\` (version-spec / publishedBy fast paths).
- Prefetch dedup + read-lock: \`pacquet/crates/tarball/src/lib.rs:1825-1855\` and \`pacquet/crates/package-manager/src/prefetching_resolver.rs:140-170\`.
- Link-file stat trim: \`pacquet/crates/package-manager/src/link_file.rs:119\` and the helper \`try_import\` at \`:230\`.
- Symlink mkdir trim: \`pacquet/crates/package-manager/src/symlink_package.rs:46\`.
- Bench harness: \`pacquet/tasks/integrated-benchmark/\` (CLI args, work-env construction, hyperfine driver).
- Per-phase timing emits: search \`tracing::info!(target: \"pacquet::install::phase\"\` in \`pacquet/crates/package-manager/src/install_with_fresh_lockfile.rs\`.

## Suggested attack order

Start with **#1 (tarball extract parallelism)** — biggest sample share, likely 5-10s of wallclock on clean-install. Re-bench. Then **#3 (StoreIndex bulk lookup)** if cold-store install is still slow. **#2 and #4** are smaller wins but cheap to land. **#5** last, after re-sampling.

---
Written by an agent (Claude Code, claude-opus-4-7).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf(pacquet): close the remaining install-phase sys-time gap to pnpm #11857

Where we are after #11856

Concrete next steps (ranked by expected impact)

1. Sequential per-tarball file write loop in `extract_tarball_entries`

2. Per-path `cas_write_lock` overhead at `pacquet/crates/fs/src/ensure_file.rs:217`

3. `StoreIndex::get` SQLite calls per snapshot

4. `check_file` (warm reinstall) — per-file `fs::metadata`

5. `load_meta` still appears in samples (~145 in install-phase, ~110 in resolve-phase)

Reproducing the bench

Gotchas the bench harness will trip you on

Pointers used in the prior session

Suggested attack order

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scenario	pacquet@main	pacquet@HEAD (#11856)	pnpm	Ratio
clean-install	31.7 s	12.3 s	7.9 s	1.56× slower than pnpm
full-resolution	40.9 s	27.4 s	11.5 s	2.39× slower than pnpm

Uh oh!

Uh oh!

perf(pacquet): close the remaining install-phase sys-time gap to pnpm #11857

Description

Where we are after #11856

Concrete next steps (ranked by expected impact)

1. Sequential per-tarball file write loop in `extract_tarball_entries`

2. Per-path `cas_write_lock` overhead at `pacquet/crates/fs/src/ensure_file.rs:217`

3. `StoreIndex::get` SQLite calls per snapshot

4. `check_file` (warm reinstall) — per-file `fs::metadata`

5. `load_meta` still appears in samples (~145 in install-phase, ~110 in resolve-phase)

Reproducing the bench

Gotchas the bench harness will trip you on

Pointers used in the prior session

Suggested attack order

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions