Prevent cold-batch link serialization (and similar pnpr cold-path regressions) from recurring

## Background

#12249 fixed a performance bug where the **cold batch** in `CreateVirtualStore` serialized its per-package slot links: each `InstallPackageBySnapshot` future called the *blocking* `CreateVirtualDirBySnapshot::run` (`rayon::join`) directly inside one cooperative `try_join_all` task, so the ~1300 links ran one-at-a-time instead of in a single parallel pass like the **warm batch** already did.

It only manifested on the **pnpr client path**: pnpr's fast offloaded resolution starts materialization before the prefetcher has finished, so most packages land in the cold batch (measured `warm=312 cold=996` on a fresh ~1300-package install via a real pnpr server, vs `warm=1308 cold=0` for a local fresh install whose slower resolution warms everything first). The serialization cost it ~14.5s in `CreateVirtualStore` vs ~5-7s for the warm path; the fix flipped pnpr from ~4s slower than a direct install to ~3s faster.

Three things let it through — the code regressed, the test suite didn't catch it, and the benchmark didn't flag it. This issue tracks guardrails for all three.

## Follow-ups

### 1. Make the bug class structurally unrepresentable (code)
The warm and cold batches now both link via `rayon` `par_iter`, but the two passes are **duplicated** in `create_virtual_store.rs`. A future edit can re-break one without the other. Extract a single `link_slots_parallel(...)` helper that both batches call, so there is exactly one slot-linking path and it is parallel by construction.

### 2. Deterministic regression test (test)
Assert the structural property, not timing. Add an install-scoped counter that `CreateVirtualDirBySnapshot` bumps on entry / decrements on exit, recording the **max concurrent links**. Add a test that forces an all-cold batch deterministically (prefetch tarballs into the mem cache but leave the store index empty, so the warm/cold partition routes them all cold without any timing race) and asserts `max_concurrent_links >= 2`. Serialized linking pins it at 1; the parallel pass gives ~`num_cpus`. No wall-clock flakiness. The existing `pnpr_install` tests cover cold-batch *correctness* but not *concurrency*.

### 3. Close the integrated-benchmark blind spots (observability)
The benchmark *ran* the cold path (it sets `--registry-latency-ms=50 --registry-bandwidth-mbps=200` and runs cold-store scenarios) but couldn't surface the cost:

- **Emit `warm=N cold=M` as a benchmark metric and assert a fresh pnpr install has a non-trivial cold batch.** Canary: proves the benchmark is actually exercising the cold path; if a change accidentally warms everything, the gate fires instead of silently testing nothing.
- **Watch the absolute `pnpr@HEAD / pacquet@HEAD` ratio with the prior that it should be ≤ 1**, not just HEAD-vs-main deltas. The serialization was a long-standing constant, so delta-tracking was blind to it; a "pnpr should not be slower than a direct install" threshold would have failed.
- **Give the pnpr server a low registry latency** (separate knob, e.g. `--pnpr-server-registry-latency-ms ≈ 0`) distinct from the client's. The workflow currently charges the pnpr server the same `REGISTRY_LATENCY_MS` as a remote client, which slows its resolution and shrinks the modeled cold batch below what a production (co-located) server produces — under-representing exactly the cost this bug scaled with.
- **Report `CreateVirtualStore` time as its own line item** so future materialize-phase regressions localize to a phase instead of hiding in a total averaged over the network.

## Suggested order
#1 + #2 are the hard gates (block it at `cargo test` / review); #3 is observability so subtler future regressions show up in the perf report.

---
Written by an agent (Claude Code, claude-opus-4-8).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Prevent cold-batch link serialization (and similar pnpr cold-path regressions) from recurring #12250

Background

Follow-ups

1. Make the bug class structurally unrepresentable (code)

2. Deterministic regression test (test)

3. Close the integrated-benchmark blind spots (observability)

Suggested order

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Uh oh!

Prevent cold-batch link serialization (and similar pnpr cold-path regressions) from recurring #12250

Description

Background

Follow-ups

1. Make the bug class structurally unrepresentable (code)

2. Deterministic regression test (test)

3. Close the integrated-benchmark blind spots (observability)

Suggested order

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions