Background
#12249 fixed a performance bug where the cold batch in CreateVirtualStore serialized its per-package slot links: each InstallPackageBySnapshot future called the blocking CreateVirtualDirBySnapshot::run (rayon::join) directly inside one cooperative try_join_all task, so the ~1300 links ran one-at-a-time instead of in a single parallel pass like the warm batch already did.
It only manifested on the pnpr client path: pnpr's fast offloaded resolution starts materialization before the prefetcher has finished, so most packages land in the cold batch (measured warm=312 cold=996 on a fresh ~1300-package install via a real pnpr server, vs warm=1308 cold=0 for a local fresh install whose slower resolution warms everything first). The serialization cost it ~14.5s in CreateVirtualStore vs ~5-7s for the warm path; the fix flipped pnpr from ~4s slower than a direct install to ~3s faster.
Three things let it through — the code regressed, the test suite didn't catch it, and the benchmark didn't flag it. This issue tracks guardrails for all three.
Follow-ups
1. Make the bug class structurally unrepresentable (code)
The warm and cold batches now both link via rayon par_iter, but the two passes are duplicated in create_virtual_store.rs. A future edit can re-break one without the other. Extract a single link_slots_parallel(...) helper that both batches call, so there is exactly one slot-linking path and it is parallel by construction.
2. Deterministic regression test (test)
Assert the structural property, not timing. Add an install-scoped counter that CreateVirtualDirBySnapshot bumps on entry / decrements on exit, recording the max concurrent links. Add a test that forces an all-cold batch deterministically (prefetch tarballs into the mem cache but leave the store index empty, so the warm/cold partition routes them all cold without any timing race) and asserts max_concurrent_links >= 2. Serialized linking pins it at 1; the parallel pass gives ~num_cpus. No wall-clock flakiness. The existing pnpr_install tests cover cold-batch correctness but not concurrency.
3. Close the integrated-benchmark blind spots (observability)
The benchmark ran the cold path (it sets --registry-latency-ms=50 --registry-bandwidth-mbps=200 and runs cold-store scenarios) but couldn't surface the cost:
- Emit
warm=N cold=M as a benchmark metric and assert a fresh pnpr install has a non-trivial cold batch. Canary: proves the benchmark is actually exercising the cold path; if a change accidentally warms everything, the gate fires instead of silently testing nothing.
- Watch the absolute
pnpr@HEAD / pacquet@HEAD ratio with the prior that it should be ≤ 1, not just HEAD-vs-main deltas. The serialization was a long-standing constant, so delta-tracking was blind to it; a "pnpr should not be slower than a direct install" threshold would have failed.
- Give the pnpr server a low registry latency (separate knob, e.g.
--pnpr-server-registry-latency-ms ≈ 0) distinct from the client's. The workflow currently charges the pnpr server the same REGISTRY_LATENCY_MS as a remote client, which slows its resolution and shrinks the modeled cold batch below what a production (co-located) server produces — under-representing exactly the cost this bug scaled with.
- Report
CreateVirtualStore time as its own line item so future materialize-phase regressions localize to a phase instead of hiding in a total averaged over the network.
Suggested order
#1 + #2 are the hard gates (block it at cargo test / review); #3 is observability so subtler future regressions show up in the perf report.
Written by an agent (Claude Code, claude-opus-4-8).
Background
#12249 fixed a performance bug where the cold batch in
CreateVirtualStoreserialized its per-package slot links: eachInstallPackageBySnapshotfuture called the blockingCreateVirtualDirBySnapshot::run(rayon::join) directly inside one cooperativetry_join_alltask, so the ~1300 links ran one-at-a-time instead of in a single parallel pass like the warm batch already did.It only manifested on the pnpr client path: pnpr's fast offloaded resolution starts materialization before the prefetcher has finished, so most packages land in the cold batch (measured
warm=312 cold=996on a fresh ~1300-package install via a real pnpr server, vswarm=1308 cold=0for a local fresh install whose slower resolution warms everything first). The serialization cost it ~14.5s inCreateVirtualStorevs ~5-7s for the warm path; the fix flipped pnpr from ~4s slower than a direct install to ~3s faster.Three things let it through — the code regressed, the test suite didn't catch it, and the benchmark didn't flag it. This issue tracks guardrails for all three.
Follow-ups
1. Make the bug class structurally unrepresentable (code)
The warm and cold batches now both link via
rayonpar_iter, but the two passes are duplicated increate_virtual_store.rs. A future edit can re-break one without the other. Extract a singlelink_slots_parallel(...)helper that both batches call, so there is exactly one slot-linking path and it is parallel by construction.2. Deterministic regression test (test)
Assert the structural property, not timing. Add an install-scoped counter that
CreateVirtualDirBySnapshotbumps on entry / decrements on exit, recording the max concurrent links. Add a test that forces an all-cold batch deterministically (prefetch tarballs into the mem cache but leave the store index empty, so the warm/cold partition routes them all cold without any timing race) and assertsmax_concurrent_links >= 2. Serialized linking pins it at 1; the parallel pass gives ~num_cpus. No wall-clock flakiness. The existingpnpr_installtests cover cold-batch correctness but not concurrency.3. Close the integrated-benchmark blind spots (observability)
The benchmark ran the cold path (it sets
--registry-latency-ms=50 --registry-bandwidth-mbps=200and runs cold-store scenarios) but couldn't surface the cost:warm=N cold=Mas a benchmark metric and assert a fresh pnpr install has a non-trivial cold batch. Canary: proves the benchmark is actually exercising the cold path; if a change accidentally warms everything, the gate fires instead of silently testing nothing.pnpr@HEAD / pacquet@HEADratio with the prior that it should be ≤ 1, not just HEAD-vs-main deltas. The serialization was a long-standing constant, so delta-tracking was blind to it; a "pnpr should not be slower than a direct install" threshold would have failed.--pnpr-server-registry-latency-ms ≈ 0) distinct from the client's. The workflow currently charges the pnpr server the sameREGISTRY_LATENCY_MSas a remote client, which slows its resolution and shrinks the modeled cold batch below what a production (co-located) server produces — under-representing exactly the cost this bug scaled with.CreateVirtualStoretime as its own line item so future materialize-phase regressions localize to a phase instead of hiding in a total averaged over the network.Suggested order
#1 + #2 are the hard gates (block it at
cargo test/ review); #3 is observability so subtler future regressions show up in the perf report.Written by an agent (Claude Code, claude-opus-4-8).