Skip to content

pnpr: redesign the install-accelerator wire protocol for the remote case (cut round trips + bytes) #12165

Description

@zkochan

Summary

pnpr's install-accelerator protocol was first prototyped in TypeScript as pnpm-agent (RFC pnpm/rfcs#9) and benchmarked on Node.js. The wire protocol we ported to Rust still carries the remote-npm-registry framing of that prototype, and that framing leaves a lot on the table for the case pnpr is actually built for: a remote server across a network from the client.

This issue lays out a measured plan to redesign the protocol. Breaking changes are acceptable — pnpr is pre-prod/experimental, and we're free to drop pnpm-CLI compatibility and support only the pacquet client.

The problem

For a remote pnpr, one pacquet install makes three sequential round trips, each paying full RTT:

Step Uploads Downloads
GET /-/pnpr (handshake) tiny
POST /v1/install full integrity list + full lockfile (JSON) lockfile (JSON) + base64'd index entries + missing digests (NDJSON)
POST /v1/files all missing digests all file bytes — gzip level 1, buffered into one batch

Structural issues, all inherited from the Node.js remote-registry framing:

  1. ~3 RTTs before/around the payload. Two of them are avoidable.
  2. No overlap. The server fully resolves → fetches every uncached tarball into its store → computes the whole diff → builds the entire response → then replies. Then /v1/files reads all files → gzips the whole batch in memory → then replies. The client waits idle, decompresses everything, writes everything, then links. Server-disk-read, network, and client-disk-write never overlap.
  3. Wire bloat. gzip level 1 (low ratio) on the file payload; base64 (+33%) on the index entries; the full integrity list re-uploaded every install (hundreds of KB on a big tree); the lockfile sent in both directions.

Relevant code (at 2b788d53fd):

Evidence: a network-injecting benchmark

pnpr's tests run on loopback, where RTT ≈ 0 and bandwidth ≈ ∞ — which hides exactly these costs. So the first deliverable is a benchmark harness (pacquet/tasks/pnpr-benchmark) that puts a latency/bandwidth-injecting TCP proxy between the pnpr client and an in-process pnpr server, counts bytes each way, and sweeps RTTs. The slope of wall-time vs. RTT measures the number of serial round trips the protocol costs.

Baseline of the current /v1 protocol (hermetic fixtures, warm server store, unlimited bandwidth):

  client     rtt     wall(ms)     up(KiB)   down(KiB)   files
    cold     0ms         28.4         1.5         5.0       4
    cold    20ms        101.7         1.5         5.0       4
    cold    50ms        207.4         1.5         5.0       4
    cold   100ms        372.5         1.5         5.0       4
    cold  → wall-time rises ~3.43 ms per ms of RTT (≈ 3.4 serial round trips)

    warm     0ms         25.0         1.8         1.3       0
    warm    20ms         74.4         1.8         1.3       0
    warm    50ms        137.0         1.8         1.3       0
    warm   100ms        247.4         1.8         1.3       0
    warm  → wall-time rises ~2.21 ms per ms of RTT (≈ 2.2 serial round trips)

So at a 100 ms WAN RTT a cold install spends ~340 ms of its ~372 ms purely on round trips. The cold path costs ~3.4 RTTs (handshake + install + files + TCP connect); the warm path ~2.2 (no files to fetch, so /v1/files is skipped). This is the cost the redesign targets.

Byte/compression deltas don't show on the tiny fixtures; the harness takes --registry/--deps to drive a real tree for production-scale byte numbers.

Run it with:

cargo run --release --bin pnpr-benchmark -- --rtt-ms 0,20,50,100 --iterations 7

The plan — four experiments, each measured vs. baseline

1. One streaming round trip (headline)

Fold /v1/files into the install response and stream file blobs inline (framed: file bytes, index entries, then lockfile + stats), and negotiate the protocol via a request header instead of the GET /-/pnpr handshake. Removes 2 RTTs and the redundant digest re-upload, and lets the client write files as bytes arrive. Expected: cold path slope ~3.4 → ~1 RTT.

2. Better compression

Whole-stream compression instead of per-batch gzip level 1. Test higher gzip levels first (no new dep), then zstd (approved for the workspace) and compare ratio/speed. Targets wire bytes (the bandwidth-bound regime).

3. Bloom-filter store-state upload

Replace the full integrity-list upload (~90 bytes/entry) with a compact probabilistic filter (~1–2 bytes/entry) of the client's store integrities. The server streams everything not-definitely-present; the client drops the rare false-positive duplicate. Big upload win for warm clients with large stores.

4. Pipeline resolve → fetch → stream

Overlap the server's own upstream tarball fetch with streaming to the client, instead of fetching every tarball into the store before responding. Biggest cold-install win (when the server cache is also cold), most invasive.

Notes

  • The local two-store byte-copy is a test artifact; production is remote, so shared-store/hardlink shortcuts are out of scope.
  • Compatibility with the pnpm (TS) CLI and the /v1 protocol may be dropped; pacquet is the only client we need to keep working.

Written by an agent (Claude Code, claude-opus-4-8).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions