pnpr install accelerator: single-stream /v1/install is bandwidth-bound on cold/WAN installs — decouple resolution from parallel file transfer

## Summary

The pnpr install accelerator (`/v1/install`) returns the resolved lockfile **and all missing file contents inline in a single gzipped response over one connection**. That single stream is throughput-bound by the per-TCP-connection bandwidth-delay product, so on a **cold client store over a non-trivial-RTT link** the accelerated install is *slower* than a plain registry install — which fans out many parallel connections to a CDN.

**Direction (after measurement, below): move to a two-phase design — server-side resolve in one request, then fetch tarballs in parallel like a normal install.** The file-level dedup that the current single-stream design buys is worth only ~5% on a real graph and is dwarfed by the bandwidth cost of serializing everything through one connection.

## Setup that surfaced it

- Client: `pacquet install` with `pnprServer` set (accelerated path), **fresh empty `store`/`cache`** (cold client store).
- Server: `@pnpm/pnpr` (prebuilt binary) on a Hetzner VPS, **~31 ms RTT**, warm cache.
- Fixture: ~100 direct deps (babel/react/webpack/gulp), **387 MB** installed `node_modules` (38k files).

## Measurement 1 — bandwidth: single stream is the bottleneck, not the box or pnpr

Latency is fine (31 ms). Single-stream tarball throughput, box vs npm CDN:

| package | npm CDN | via pnpr box |
|---|---|---|
| highcharts 11.4.8 (9.48 MB) | 22.3 MB/s | 8.8 MB/s |
| uglify-js 3.19.3 (0.24 MB) | 1.76 MB/s | 1.02 MB/s |

The box's **egress is not the limit — per-connection throughput is.** Same tarball pulled N ways from the box:

```
streams= 1  agg= 10.5 MB/s
streams= 4  agg= 12.8 MB/s
streams= 8  agg= 18.2 MB/s
streams=16  agg= 75.1 MB/s     <- ~600 Mbit aggregate available
```

`throughput ≈ window ÷ RTT`, so one TCP stream caps ~10 MB/s over the 31 ms path. A normal install opens many parallel connections and rides the aggregate; `/v1/install` rides one connection. pnpr serves correctly (`application/octet-stream`, real `content-length`, no gzip re-compression, not chunked) — this is the protocol shape, not a serving bug.

## Measurement 2 — overfetch: file-level dedup is only ~5% on a real graph

What you'd give up by fetching whole tarballs instead of deduped files, on the 387 MB fixture:

| | bytes | files |
|---|---|---|
| **T** — every package's files (CDN tarballs carry this; package-level dedup only) | 279.4 MB | 38,044 |
| **U** — unique content in the CAS store (what pnpr's file-dedup carries) | 265.0 MB | 31,870 |

**File-level (cross-package) dedup saves ~5.2% (14.4 MB).** 6,174 duplicate file *paths* — 16% of paths but only ~5% of bytes (the dups are small files: licenses, shims). The headline pnpm savings ("don't fetch `react@18` twice") is **package-level**, which tarball fetching already preserves.

So the overfetch from going CDN-direct is ~5%, against the ~7.5× per-connection bandwidth penalty above. Illustratively, cold (~90 MB on the wire): pnpr single-stream ≈ 9 s vs CDN parallel ≈ 1.3 s — even while moving 5% more bytes.

## Root cause

`/v1/install` is monolithic: one request in, one gzipped stream out with the lockfile + every missing file. Over any link with real RTT, one TCP stream can't compete with parallel CDN fetches for a large cold transfer. (Secondary: compression is gzip level 6 / `FILES_GZIP_LEVEL`; deflate's 32 KB window can't capture cross-package redundancy.)

## Context: this is git-shaped, and the benchmark hit its worst case

The protocol mirrors `git fetch`: the client sends `store_integrities` (git's **have**); the server resolves and `compute_diff(...)` returns only `missing_files`, deduped across the response (git's **want**). `git` is fast over a *single* connection because it ships few bytes and only what you lack. But the benchmark used a **fresh empty store**, so every file was "missing" → it shipped the whole graph over one stream = the `git clone of a huge repo over WAN` case, which is bandwidth-bound regardless. The dedup/have-want bought nothing because the client had nothing.

## Proposed architecture: resolve server-side, fetch in parallel

Split the two halves, which have opposite characteristics:

1. **Phase 1 — resolve (keep; the real, scenario-independent win).** One request returns the server-resolved, server-verified lockfile. Resolution has round-trip *depth* (`tree-depth × RTT` of serialized packument fetches client-side); offloading it collapses that to ~0. Hard to beat client-side. Policy (`minimumReleaseAge`/`trustPolicy`) is enforced here, shaping the lockfile.
2. **Phase 2 — fetch tarballs in parallel, like a normal install.** The lockfile carries standard tarball URLs; the client uses its existing parallel fetcher. This uses the full aggregate bandwidth, is resumable/retryable, and reuses battle-tested code.

**Key property this gives us: the accelerator becomes "never slower than a normal install."** Phase 2 *is* a normal install's fetch, so the accelerator can only *add* the resolution speedup on top — it can't lose to plain `pacquet install` the way the single-stream design does. The current design violates that property (slower cold/WAN).

**Bonus: pnpr becomes a stateless resolver** — no tarball storage, no large egress, no cache to warm, no per-VPS bandwidth ceiling, trivially scalable and cheap.

### Design space

| approach | dedup | parallel BW | policy / private / byte-gating | pnpr load | never-slower-than-npm |
|---|---|---|---|---|---|
| current (mono stream) | ✓ (~5%) | ✗ single stream | ✓ | high | ✗ |
| **resolve + parallel CDN tarballs** (proposed) | ✗ | ✓ CDN edge | resolution-time policy ✓; byte-gating ✗ | ~none | ✓ |
| resolve + parallel missing-*files* from pnpr | ✓ (~5%) | ✓ | ✓ | high | ✓ |

Trade-off to be explicit about: the proposed design is **faster in cold/WAN and ties in warm**, but it does not dominate *every* topology — a **co-located + warm** pnpr (client and server on the same datacenter LAN) could still win on the ~5% fewer bytes over an internal Gbit link. For distributed/CI/WAN clients (the common case) CDN-direct wins, and it wins on simplicity everywhere.

What's lost going CDN-direct: server-side **byte-gating** for private packages (the client needs registry creds to fetch private tarballs directly). Correctness/policy hold: integrity is in the verified lockfile, and release-age/trust gate which versions resolve (phase 1), not the bytes.

## Next steps

- [ ] Implement the two-phase path: phase 1 returns lockfile (standard tarball URLs) + server-side verification; phase 2 = existing parallel tarball fetcher.
- [ ] Keep **file-level dedup as an optional later layer** (parallel `GET /-/file/<integrity>` from pnpr) for co-located deployments where the ~5% pays off — do not block the simple win on it.
- [ ] Decide private-registry story: client-side creds for private tarballs, vs an opt-in "serve files through pnpr" mode for private deps.
- [ ] Validate across the matrix: (cold/warm) × (LAN/WAN) × (small/large graph), benchmarking co-located (sub-ms RTT) vs WAN explicitly.

---
Written by an agent (Claude Code, claude-opus-4-8).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pnpr install accelerator: single-stream /v1/install is bandwidth-bound on cold/WAN installs — decouple resolution from parallel file transfer #12230

Summary

Setup that surfaced it

Measurement 1 — bandwidth: single stream is the bottleneck, not the box or pnpr

Measurement 2 — overfetch: file-level dedup is only ~5% on a real graph

Root cause

Context: this is git-shaped, and the benchmark hit its worst case

Proposed architecture: resolve server-side, fetch in parallel

Design space

Next steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

package	npm CDN	via pnpr box
highcharts 11.4.8 (9.48 MB)	22.3 MB/s	8.8 MB/s
uglify-js 3.19.3 (0.24 MB)	1.76 MB/s	1.02 MB/s

	bytes	files
T — every package's files (CDN tarballs carry this; package-level dedup only)	279.4 MB	38,044
U — unique content in the CAS store (what pnpr's file-dedup carries)	265.0 MB	31,870

approach	dedup	parallel BW	policy / private / byte-gating	pnpr load	never-slower-than-npm
current (mono stream)	✓ (~5%)	✗ single stream	✓	high	✗
resolve + parallel CDN tarballs (proposed)	✗	✓ CDN edge	resolution-time policy ✓; byte-gating ✗	~none	✓
resolve + parallel missing-files from pnpr	✓ (~5%)	✓	✓	high	✓

Uh oh!

Uh oh!

pnpr install accelerator: single-stream /v1/install is bandwidth-bound on cold/WAN installs — decouple resolution from parallel file transfer #12230

Description

Summary

Setup that surfaced it

Measurement 1 — bandwidth: single stream is the bottleneck, not the box or pnpr

Measurement 2 — overfetch: file-level dedup is only ~5% on a real graph

Root cause

Context: this is git-shaped, and the benchmark hit its worst case

Proposed architecture: resolve server-side, fetch in parallel

Design space

Next steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions