Summary
pnpr currently stores, syncs, and serves package metadata as monolithic packument JSON blobs. We could make resolution faster by splitting the packument along its mutability boundary:
- Index doc (mutable, tiny): version strings + publish times + dist-tags + deprecated flags. ~40–80 bytes per version, so even
@types/node (~2,000 versions) is ~100KB instead of a multi-MB abbreviated packument.
- Per-version docs (immutable): everything needed after a version is picked — dependencies, peerDependencies, optionalDependencies, bin, engines, os/cpu/libc, dist. Served with
Cache-Control: immutable, cached forever on the client and in pnpr, never revalidated or re-parsed once cached.
A published version's manifest can never be republished, so per-version docs are truly immutable. The only mutable state is the version list, time, dist-tags, and deprecation messages — all of which fit in the small index. Everything resolution needs before picking a version (range matching, minimumReleaseAge times, dist-tags) lives in the index; everything needed after picking is in the one version doc actually selected. Today the client downloads and parses manifests for every version ever published in order to use one of them.
Where the speedup lands
- Warm installs with stale TTL — the big win. Today a TTL-expired package means refetching/re-parsing the whole abbreviated doc (the client's two-line NDJSON metadata cache re-reads and parses the full body even on 304). With the split, revalidation touches only the tiny index, and version docs are permanently warm. This is the dominant repeat-install metadata cost, especially for the heavy tail (
@types/node, typescript, aws-sdk packages).
- Cold installs — real but smaller, with a trap. Bytes drop a lot (fetch ~1–2KB per picked version instead of all versions), but a naive split doubles RTTs per package (index → pick → version doc), and the resolution frontier is serial in tree depth. On a realistic ~100ms/req link that can eat the win. Mitigations:
- a combined range-aware response (
GET /pkg?spec=^1.2.3 returns the index plus the winning version doc in one round trip), or
- inline all version docs when the packument is small — most packages have few versions; the split only needs to pay for the heavy tail.
- Note: on the
/v1/resolve server-resolution path the client never fetches packuments at all, so there the split helps pnpr internally rather than on the wire.
- pnpr server-side. The resolver walk parses full packuments per package; per-version storage means parsing only what gets picked, and upstream revalidation becomes "diff the version list, append new version docs" instead of rewriting the whole blob.
Cheap wins worth taking first (independent of the protocol change)
- pnpr sets no ETag/Cache-Control on packument responses (
pnpr/crates/pnpr/src/server.rs, response assembly around line 1739). The client persists an etag, but pnpr never emits one, so client-side conditional GETs against pnpr can't 304.
- pnpr re-parses, re-abbreviates, and re-serializes the packument on every request (
server.rs:1722–1737). Caching the serialized abbreviated bytes (keyed by the upstream validator) removes per-request CPU on hot packages with zero protocol change.
- Delta index sync. The version list is append-only, so the index can support
?since=<seq> (or be an append-only NDJSON log fetched with a Range request). Then revalidating @types/node transfers only versions published since the last fetch — this captures most of the split's warm-path benefit between pnpr and upstream too, where upstream only speaks full packuments.
Design caveats
- Keep
deprecated in the index, not the version doc — it's the one per-version field npm allows mutating; moving it preserves true immutability of version docs.
- Keep
dist.tarball in the version doc as-is: the lockfile verifier binds the tarball URL to the packument's dist.tarball, so derive-from-name shortcuts would touch that contract.
- This is a pnpr-specific protocol extension: the client needs capability detection and a fallback to standard corgi/full packuments for other registries.
- Benchmark with the integrated-benchmark latency proxy (
--registry-latency-ms / --registry-bandwidth-mbps) — on loopback the metadata RTT/byte savings are invisible. Cold installs remain tarball-dominated; the honest pitch for this design is repeat/CI installs and heavy-tail packages.
Suggested ordering
- Cheap wins 1 and 2 (ETag emission, cached serialized abbreviated bytes).
- The split itself, with the combined range-aware endpoint so the cold path stays at one RTT.
- Delta index sync as a follow-up.
Written by an agent (Claude Code, claude-fable-5).
Summary
pnpr currently stores, syncs, and serves package metadata as monolithic packument JSON blobs. We could make resolution faster by splitting the packument along its mutability boundary:
@types/node(~2,000 versions) is ~100KB instead of a multi-MB abbreviated packument.Cache-Control: immutable, cached forever on the client and in pnpr, never revalidated or re-parsed once cached.A published version's manifest can never be republished, so per-version docs are truly immutable. The only mutable state is the version list,
time,dist-tags, and deprecation messages — all of which fit in the small index. Everything resolution needs before picking a version (range matching,minimumReleaseAgetimes, dist-tags) lives in the index; everything needed after picking is in the one version doc actually selected. Today the client downloads and parses manifests for every version ever published in order to use one of them.Where the speedup lands
@types/node,typescript, aws-sdk packages).GET /pkg?spec=^1.2.3returns the index plus the winning version doc in one round trip), or/v1/resolveserver-resolution path the client never fetches packuments at all, so there the split helps pnpr internally rather than on the wire.Cheap wins worth taking first (independent of the protocol change)
pnpr/crates/pnpr/src/server.rs, response assembly around line 1739). The client persists an etag, but pnpr never emits one, so client-side conditional GETs against pnpr can't 304.server.rs:1722–1737). Caching the serialized abbreviated bytes (keyed by the upstream validator) removes per-request CPU on hot packages with zero protocol change.?since=<seq>(or be an append-only NDJSON log fetched with a Range request). Then revalidating@types/nodetransfers only versions published since the last fetch — this captures most of the split's warm-path benefit between pnpr and upstream too, where upstream only speaks full packuments.Design caveats
deprecatedin the index, not the version doc — it's the one per-version field npm allows mutating; moving it preserves true immutability of version docs.dist.tarballin the version doc as-is: the lockfile verifier binds the tarball URL to the packument'sdist.tarball, so derive-from-name shortcuts would touch that contract.--registry-latency-ms/--registry-bandwidth-mbps) — on loopback the metadata RTT/byte savings are invisible. Cold installs remain tarball-dominated; the honest pitch for this design is repeat/CI installs and heavy-tail packages.Suggested ordering
Written by an agent (Claude Code, claude-fable-5).