Summary
The pnpr install accelerator (/v1/install) returns the resolved lockfile and all missing file contents inline in a single gzipped response over one connection. That single stream is throughput-bound by the per-TCP-connection bandwidth-delay product, so on a cold client store over a non-trivial-RTT link the accelerated install is slower than a plain registry install — which fans out many parallel connections to a CDN.
Direction (after measurement, below): move to a two-phase design — server-side resolve in one request, then fetch tarballs in parallel like a normal install. The file-level dedup that the current single-stream design buys is worth only ~5% on a real graph and is dwarfed by the bandwidth cost of serializing everything through one connection.
Setup that surfaced it
- Client:
pacquet install with pnprServer set (accelerated path), fresh empty store/cache (cold client store).
- Server:
@pnpm/pnpr (prebuilt binary) on a Hetzner VPS, ~31 ms RTT, warm cache.
- Fixture: ~100 direct deps (babel/react/webpack/gulp), 387 MB installed
node_modules (38k files).
Measurement 1 — bandwidth: single stream is the bottleneck, not the box or pnpr
Latency is fine (31 ms). Single-stream tarball throughput, box vs npm CDN:
| package |
npm CDN |
via pnpr box |
| highcharts 11.4.8 (9.48 MB) |
22.3 MB/s |
8.8 MB/s |
| uglify-js 3.19.3 (0.24 MB) |
1.76 MB/s |
1.02 MB/s |
The box's egress is not the limit — per-connection throughput is. Same tarball pulled N ways from the box:
streams= 1 agg= 10.5 MB/s
streams= 4 agg= 12.8 MB/s
streams= 8 agg= 18.2 MB/s
streams=16 agg= 75.1 MB/s <- ~600 Mbit aggregate available
throughput ≈ window ÷ RTT, so one TCP stream caps ~10 MB/s over the 31 ms path. A normal install opens many parallel connections and rides the aggregate; /v1/install rides one connection. pnpr serves correctly (application/octet-stream, real content-length, no gzip re-compression, not chunked) — this is the protocol shape, not a serving bug.
Measurement 2 — overfetch: file-level dedup is only ~5% on a real graph
What you'd give up by fetching whole tarballs instead of deduped files, on the 387 MB fixture:
|
bytes |
files |
| T — every package's files (CDN tarballs carry this; package-level dedup only) |
279.4 MB |
38,044 |
| U — unique content in the CAS store (what pnpr's file-dedup carries) |
265.0 MB |
31,870 |
File-level (cross-package) dedup saves ~5.2% (14.4 MB). 6,174 duplicate file paths — 16% of paths but only ~5% of bytes (the dups are small files: licenses, shims). The headline pnpm savings ("don't fetch react@18 twice") is package-level, which tarball fetching already preserves.
So the overfetch from going CDN-direct is ~5%, against the ~7.5× per-connection bandwidth penalty above. Illustratively, cold (~90 MB on the wire): pnpr single-stream ≈ 9 s vs CDN parallel ≈ 1.3 s — even while moving 5% more bytes.
Root cause
/v1/install is monolithic: one request in, one gzipped stream out with the lockfile + every missing file. Over any link with real RTT, one TCP stream can't compete with parallel CDN fetches for a large cold transfer. (Secondary: compression is gzip level 6 / FILES_GZIP_LEVEL; deflate's 32 KB window can't capture cross-package redundancy.)
Context: this is git-shaped, and the benchmark hit its worst case
The protocol mirrors git fetch: the client sends store_integrities (git's have); the server resolves and compute_diff(...) returns only missing_files, deduped across the response (git's want). git is fast over a single connection because it ships few bytes and only what you lack. But the benchmark used a fresh empty store, so every file was "missing" → it shipped the whole graph over one stream = the git clone of a huge repo over WAN case, which is bandwidth-bound regardless. The dedup/have-want bought nothing because the client had nothing.
Proposed architecture: resolve server-side, fetch in parallel
Split the two halves, which have opposite characteristics:
- Phase 1 — resolve (keep; the real, scenario-independent win). One request returns the server-resolved, server-verified lockfile. Resolution has round-trip depth (
tree-depth × RTT of serialized packument fetches client-side); offloading it collapses that to ~0. Hard to beat client-side. Policy (minimumReleaseAge/trustPolicy) is enforced here, shaping the lockfile.
- Phase 2 — fetch tarballs in parallel, like a normal install. The lockfile carries standard tarball URLs; the client uses its existing parallel fetcher. This uses the full aggregate bandwidth, is resumable/retryable, and reuses battle-tested code.
Key property this gives us: the accelerator becomes "never slower than a normal install." Phase 2 is a normal install's fetch, so the accelerator can only add the resolution speedup on top — it can't lose to plain pacquet install the way the single-stream design does. The current design violates that property (slower cold/WAN).
Bonus: pnpr becomes a stateless resolver — no tarball storage, no large egress, no cache to warm, no per-VPS bandwidth ceiling, trivially scalable and cheap.
Design space
| approach |
dedup |
parallel BW |
policy / private / byte-gating |
pnpr load |
never-slower-than-npm |
| current (mono stream) |
✓ (~5%) |
✗ single stream |
✓ |
high |
✗ |
| resolve + parallel CDN tarballs (proposed) |
✗ |
✓ CDN edge |
resolution-time policy ✓; byte-gating ✗ |
~none |
✓ |
| resolve + parallel missing-files from pnpr |
✓ (~5%) |
✓ |
✓ |
high |
✓ |
Trade-off to be explicit about: the proposed design is faster in cold/WAN and ties in warm, but it does not dominate every topology — a co-located + warm pnpr (client and server on the same datacenter LAN) could still win on the ~5% fewer bytes over an internal Gbit link. For distributed/CI/WAN clients (the common case) CDN-direct wins, and it wins on simplicity everywhere.
What's lost going CDN-direct: server-side byte-gating for private packages (the client needs registry creds to fetch private tarballs directly). Correctness/policy hold: integrity is in the verified lockfile, and release-age/trust gate which versions resolve (phase 1), not the bytes.
Next steps
Written by an agent (Claude Code, claude-opus-4-8).
Summary
The pnpr install accelerator (
/v1/install) returns the resolved lockfile and all missing file contents inline in a single gzipped response over one connection. That single stream is throughput-bound by the per-TCP-connection bandwidth-delay product, so on a cold client store over a non-trivial-RTT link the accelerated install is slower than a plain registry install — which fans out many parallel connections to a CDN.Direction (after measurement, below): move to a two-phase design — server-side resolve in one request, then fetch tarballs in parallel like a normal install. The file-level dedup that the current single-stream design buys is worth only ~5% on a real graph and is dwarfed by the bandwidth cost of serializing everything through one connection.
Setup that surfaced it
pacquet installwithpnprServerset (accelerated path), fresh emptystore/cache(cold client store).@pnpm/pnpr(prebuilt binary) on a Hetzner VPS, ~31 ms RTT, warm cache.node_modules(38k files).Measurement 1 — bandwidth: single stream is the bottleneck, not the box or pnpr
Latency is fine (31 ms). Single-stream tarball throughput, box vs npm CDN:
The box's egress is not the limit — per-connection throughput is. Same tarball pulled N ways from the box:
throughput ≈ window ÷ RTT, so one TCP stream caps ~10 MB/s over the 31 ms path. A normal install opens many parallel connections and rides the aggregate;/v1/installrides one connection. pnpr serves correctly (application/octet-stream, realcontent-length, no gzip re-compression, not chunked) — this is the protocol shape, not a serving bug.Measurement 2 — overfetch: file-level dedup is only ~5% on a real graph
What you'd give up by fetching whole tarballs instead of deduped files, on the 387 MB fixture:
File-level (cross-package) dedup saves ~5.2% (14.4 MB). 6,174 duplicate file paths — 16% of paths but only ~5% of bytes (the dups are small files: licenses, shims). The headline pnpm savings ("don't fetch
react@18twice") is package-level, which tarball fetching already preserves.So the overfetch from going CDN-direct is ~5%, against the ~7.5× per-connection bandwidth penalty above. Illustratively, cold (~90 MB on the wire): pnpr single-stream ≈ 9 s vs CDN parallel ≈ 1.3 s — even while moving 5% more bytes.
Root cause
/v1/installis monolithic: one request in, one gzipped stream out with the lockfile + every missing file. Over any link with real RTT, one TCP stream can't compete with parallel CDN fetches for a large cold transfer. (Secondary: compression is gzip level 6 /FILES_GZIP_LEVEL; deflate's 32 KB window can't capture cross-package redundancy.)Context: this is git-shaped, and the benchmark hit its worst case
The protocol mirrors
git fetch: the client sendsstore_integrities(git's have); the server resolves andcompute_diff(...)returns onlymissing_files, deduped across the response (git's want).gitis fast over a single connection because it ships few bytes and only what you lack. But the benchmark used a fresh empty store, so every file was "missing" → it shipped the whole graph over one stream = thegit clone of a huge repo over WANcase, which is bandwidth-bound regardless. The dedup/have-want bought nothing because the client had nothing.Proposed architecture: resolve server-side, fetch in parallel
Split the two halves, which have opposite characteristics:
tree-depth × RTTof serialized packument fetches client-side); offloading it collapses that to ~0. Hard to beat client-side. Policy (minimumReleaseAge/trustPolicy) is enforced here, shaping the lockfile.Key property this gives us: the accelerator becomes "never slower than a normal install." Phase 2 is a normal install's fetch, so the accelerator can only add the resolution speedup on top — it can't lose to plain
pacquet installthe way the single-stream design does. The current design violates that property (slower cold/WAN).Bonus: pnpr becomes a stateless resolver — no tarball storage, no large egress, no cache to warm, no per-VPS bandwidth ceiling, trivially scalable and cheap.
Design space
Trade-off to be explicit about: the proposed design is faster in cold/WAN and ties in warm, but it does not dominate every topology — a co-located + warm pnpr (client and server on the same datacenter LAN) could still win on the ~5% fewer bytes over an internal Gbit link. For distributed/CI/WAN clients (the common case) CDN-direct wins, and it wins on simplicity everywhere.
What's lost going CDN-direct: server-side byte-gating for private packages (the client needs registry creds to fetch private tarballs directly). Correctness/policy hold: integrity is in the verified lockfile, and release-age/trust gate which versions resolve (phase 1), not the bytes.
Next steps
GET /-/file/<integrity>from pnpr) for co-located deployments where the ~5% pays off — do not block the simple win on it.Written by an agent (Claude Code, claude-opus-4-8).