Skip to content

Implement authorization-aware resolution cache for pnpr (RFC #11) #12699

Description

@zkochan

Background

pnpr disables its resolution cache for any request that carries client-provided upstream credentials, making authenticated installs ~2.8× slower than anonymous ones even when every dependency is public (pnpm/pnpm#12604).

The RFC Authorization-aware resolution cache for pnpr (pnpm/rfcs#11) proposes making the cache authorization-aware instead of all-or-nothing: classify each fetch route once before the request, fetch known-public routes anonymously and share them globally, and key private resolutions/metadata by the private access descriptor that produced them. This issue tracks the implementation work.

Scenario (hot cache/store) Time Speedup vs. resolving directly
Anonymous resolve 0.523 s ~2.3× faster
Authenticated resolve 1.484 s ~1.0× (no speedup)

Goal: keep the privacy guarantee — one user's private resolution must never be served to another — while restoring the cache for the dominant public path and sharing private resolutions safely among callers who legitimately share access.

Core concepts (from the RFC)

  • Fetch route = registry URL + package name/scope + configured auth/private rules. Privacy is a property of the route, not of whether the request happened to carry a credential.
  • Private footprint = the set of private route identities whose metadata or resolve-time tarball data was actually fetched during a resolve. An empty footprint ⇒ fully public.
  • Private access descriptor = one abstraction with two derivations: proxied route → upstream-alias + generation (gate: caller may still use the alias); pnpr-hosted route → hosted route/access-policy identity (gate: caller still satisfies the package policy). Descriptor key inputs are HMACed with a server secret before use in cache keys / metadata namespaces.
  • Client auth identifies the caller to pnpr; it is never forwarded to third-party registries and is never digested as an upstream cache credential.

Actionable items

1. Route classification (Part 1)

  • Add route classification to pnpr config (pnpr/crates/pnpr/src/config.rs): built-in anonymous-public handling for unscoped packages on registry.npmjs.org (operator-disable/overridable), plus operator rules for public registries/scopes/package patterns.
  • Implement route precedence: (1) public route wins and suppresses auth → (2) pnpr-hosted private route uses shared hosted access descriptor → (3) proxied private route selects an authorized pnpr-managed upstream alias → (4) private/unknown with no auth available resolves anonymously only as a non-shareable private miss (if anonymous access permitted) else fails closed. Never forward client auth upstream; never write a global public entry for path 4.
  • Implement pnpr-hosted route detection: normalize the request registry URL against pnpr's own public base URL and configured hosted aliases before considering proxied uplinks. Hosted classification wins before proxied uplink selection.
  • Treat unknown routes as private unless explicitly declared public.

2. Team-owned upstream credential aliases

  • Add config for upstream credential aliases: each with a registry/scope/package route policy, upstream credential material (env var / secret store), generation/rotation metadata, and an access policy of which pnpr users may use it.
  • Reuse pnpr's existing caller identity (bearer-token-backed users), uplink auth config concepts, and registry package permission policy to authorize alias use. Add an optional named-group/team layer for group reuse.
  • pnpr (not the client) selects the alias when the caller is authorized, sends the alias's upstream credential, and records the alias identity in the footprint.
  • On rotation, bump alias generation so new resolves populate a new private namespace and old entries age out by TTL.

3. Private access descriptor

  • Define descriptor with a key input + authorization gate. Proxied: upstream-alias + generation / alias authorization. Hosted: hosted route/access-policy identity / package policy. Hosted descriptors are not keyed per caller and not keyed by bearer token.
  • HMAC descriptor key inputs with the server secret before use in cache keys or metadata namespaces. The server secret is generated/configured at startup via existing config plumbing.
  • Ensure descriptors are not derived from AuthHeaders::for_url_with_package, request Authorization headers, registry/scope config entries, or inline user:pass@host URL auth.

4. Thread caller identity & record routes

  • Thread resolved pnpr identity (AuthedCaller / Identity) into serve_resolve, handle_resolve, cache lookup, cache write, route selection, and pnpr-hosted access checks (server-internal API change, no pnpm wire-protocol change).
  • In the resolve path (pnpr/crates/pnpr/src/resolver/resolve.rs) and the pacquet npm/tarball fetchers, surface for every metadata fetch and resolve-time tarball fetch (including direct HTTP tarball deps fetched for manifest/integrity): the route identity, public/private flag, and selected descriptor digest. The hook must sit at the fetch/auth-selection layer (ResolutionObserver alone is insufficient).
  • Ensure fast paths that serve metadata from memory/disk without an HTTP request still use and record the route/cache scope that would have been used.

5. Reject inline URL auth

  • Reject dependency URLs and resolved tarball URLs whose parsed URL contains user:pass@host credentials, before the HEAD/GET request and before any resolution/metadata cache write. Do not convert them to Basic auth or use them as a credential identity.

6. Footprint + cache keying (Part 2) — pnpr/crates/pnpr/src/resolver.rs

  • Replace the auth_headers.is_empty() && request.lockfile.is_none() bypass. Compute a base candidate key for requests with or without an input lockfile; only a proven-fresh frozen lockfile may short-circuit before lookup (it runs no resolution).
  • Keep the current resolution-input key as the base key, but store one or more candidate entries under it. Each candidate carries its private footprint + HMAC digest of the descriptors that produced it. Public candidate = empty footprint, matches every caller; private candidate matches only when the request reproduces the same descriptor key inputs and passes every descriptor authorization gate.
  • After a resolve, compute the footprint from recorded routes; store an empty-footprint candidate for public resolutions or a descriptor-keyed candidate for private ones. (Footprint discovered during resolution is used only when writing, not for the first candidate lookup.)
  • For lockfile-seeded requests, include a stable digest of the canonical input lockfile plus lockfile behavior flags (frozenLockfile, preferFrozenLockfile, trustLockfile, manifest checks, trust policy) in the base key.
  • Store already-routed tarball URLs in cached resolutions so a private hit replays pnpr gateway URLs, never raw upstream URLs requiring server-owned credentials.
  • Bound the number of candidates per base key; evict expired/LRU private candidates first so generation rotation, route-policy changes, and unusual workspaces don't grow lookup cost without bound.
  • CachedResolution may need to carry its footprint so lookups can apply the revocation-validation path when enabled.

7. Tarball URL routing

  • When emitting streamed package frames and the returned lockfile, apply the same route decision: public/anonymous routes may keep upstream dist.tarball URLs; private proxied + unknown routes use pnpr read-only install gateway URLs; pnpr-hosted packages use pnpr-hosted tarball URLs.
  • Never return upstream Authorization values, upstream tokens, or private upstream URLs in resolver responses, streamed frames, or returned lockfiles.
  • Frozen/lockfile-seeded paths must preserve routing: if an existing lockfile contains registry-shaped entries or tarball URLs that don't match current route policy, return a freshly routed lockfile or force the client through the pnpr gateway. Clients must consume routed URLs, not reconstruct upstream URLs from local config.

8. Private metadata cache (lower mirror)

  • Add a metadata cache scope to the npm resolver fetch context: Public (current global mirror unchanged), Private { access_descriptor } (descriptor-scoped mirror path + matching in-memory/fetch-lock keys, e.g. metadata-private/<descriptor-id>/<registry>/<package>.jsonl), and Bypass (request-local only).
  • Compute scope per (registry, package) fetch and thread it through pick_package, fetch_full_metadata_cached, get_pkg_mirror_path, in-memory cache keys, fetch-lock keys, conditional request headers, prefer-offline reads, maturity/published-by shortcuts, verifier paths, and every disk fallback path. Do not approximate by swapping the whole cache_dir per request.
  • Fail closed on upstream 401/403: do not fall back to a stale global mirror or a different descriptor's mirror. Treat private-route 404 (hidden private packages) the same way. Only transport failures (5xx/timeout/network) may fall back, and only within the same namespace and freshness policy.
  • Apply the same metadata scope to lockfile verification / trust-check packument fetches (they don't add to the footprint but read/populate mirrors), to avoid a private-metadata oracle or auth-blind mirror cross-contamination.

9. Optional / later refinements (behind config)

  • Opt-in lightweight validation (single authenticated round trip) before serving an upstream private hit, for zero revocation window.
  • Opt-in lazy per-(registry, scope) probing to auto-detect public custom registries (anonymous first fetch, cache requires-auth verdict with TTL, use only configured aliases on auth-required, never client-forwarded upstream auth).
  • Operator-only metrics: cache hit/miss split by public vs. private footprint and tarball routing decisions. Candidate counts/footprint details must never be returned to clients.

Tests

  • Unscoped npmjs packages fetch without upstream auth and hit the shared cache even with a pnpr caller identity.
  • Private pnpr-hosted packages require caller package access, including when the client registry points at pnpr itself.
  • Private proxied packages require an authorized upstream alias and never use client-forwarded upstream auth.
  • Mixed public/private resolves use the correct metadata namespace per package fetch.
  • Invalid/unauthorized access misses and does not reuse private metadata mirrors.
  • 401/403 and private-route 404 do not fall back to broader mirrors.
  • Inline URL credentials are rejected before fetch.
  • Verifier metadata fetches cannot read or populate the wrong cache scope.
  • Different pnpr users authorized for the same alias share resolution and metadata cache entries.
  • Revoked alias/package access stops matching private hits.
  • Public tarballs keep upstream URLs while private/unknown tarballs are emitted as pnpr gateway URLs.
  • Cached private resolutions do not replay private upstream tarball URLs.
  • Lockfile/frozen paths do not bypass tarball route policy.
  • Custom private default registry is not treated as public.
  • Per-base-key candidate lists stay bounded.

Safety invariant

A private-footprint entry is keyed by the same private access descriptor that produced it; pnpr-managed alias entries are gated by alias access policy and pnpr-hosted entries by package access policy. Private data can never leak beyond the authorization that produced it. Unauthorized callers always observe a miss/fail-closed resolve; candidate counts and footprint details must not be returned to clients.

Risk areas

The footprint must be complete — a missed private fetch would mis-classify a resolution as public. Recording must cover every metadata fetch path, resolve-time tarball fetches (including direct HTTP tarball deps fetched for manifest/integrity), auth-blind metadata mirror fast paths, and uplink fallback ordering.

References


Written by agents (Claude Code, claude-opus-4-8; Codex, GPT-5).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions