rfc(pnpr): auth-aware resolver cache and tarball routing#11
Conversation
Proposes classifying resolutions by a per-registry private footprint (fail-safe to private, known-public allowlist) and keying private entries by an HMAC of the credential that produced them, so public installs share the cache globally while private resolutions stay isolated and an invalid credential cannot unlock them. Addresses the authenticated-resolve slowdown in pnpm/pnpm#12604.
|
Warning Review limit reached
More reviews will be available in 28 minutes and 45 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new RFC document proposing an authorization-aware resolution cache for pnpr. It defines route classification, descriptor-gated private caching, pnpr-managed uplinks, tarball URL routing, metadata scoping, implementation notes, and open questions. ChangesAuth-Aware Resolution Cache RFC
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Summary by QodoRFC: Authorization-aware resolution cache for pnpr Description
Diagram
High-Level Assessment
Files changed (1)
|
Replace the opaque per-tarball gateway URL scheme with per-uplink registry endpoints (https://<pnpr>/<uplink>) that clients point a scope at. Because the emitted tarball URLs are canonical for the configured registry, lockfile entries stay integrity-only and the host lives in client config, not the lockfile. A project then resolves to the same lockfile whether it goes through /resolve or is resolved client-side directly against the same registry config, so server-side resolution is a pure accelerator and never forks the lockfile. Also fold the separate upstream-credential-alias config into Verdaccio's existing uplink concept (url + credential + access policy + rotation generation), matched by registry origin rather than a per-credential package glob. Rotation generation becomes a server-side cache-namespacing detail and is no longer embedded in any client-visible URL. Add Alternative H documenting why the opaque-gateway-URL-in-lockfile approach was rejected (non-canonical URLs fork the lockfile across modes), and a lockfile-parity test expectation between /resolve and direct client-side resolution. Written by an agent (Claude Code, claude-opus-4-8).
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pnpr/text/0000-auth-aware-resolution-cache.md`:
- Around line 351-354: Clarify the tarball URL behavior for unknown routes in
the auth-aware resolution cache spec: in the “Unknown route” section, explicitly
state what URL is emitted when the route permits anonymous access. Update the
wording around the unknown-route resolution path to reference the tarball URL
emission behavior, and make the outcome unambiguous by saying whether it returns
the raw upstream URL, fails closed at URL emission, or requires an
operator-configured uplink before resolution can proceed.
- Around line 6-7: The speedup figures in the RFC are inconsistent between the
Summary and Motivation table. Update the Summary text and the table in this
document so the "~2.8× slower" and "~2.3× faster" numbers use the same baseline,
or explicitly label the different baselines if they refer to different
comparisons; check the affected prose around the authentication cache motivation
section and keep the figures consistent throughout.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: e9bea03e-620a-4e6f-bd03-689ce0f2d931
📒 Files selected for processing (1)
pnpr/text/0000-auth-aware-resolution-cache.md
📜 Review details
🧰 Additional context used
🪛 LanguageTool
pnpr/text/0000-auth-aware-resolution-cache.md
[grammar] ~372-~372: Ensure spelling is correct
Context: ...ution cache cannot be safe if the lower packument mirror is still auth-blind. Private met...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~391-~391: Ensure spelling is correct
Context: ...The chosen scope has to flow into every packument fast path: persistent mirror path, in-m...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~523-~523: Ensure spelling is correct
Context: ...oped-registry configuration pointing at pnpr uplink endpoints. The implementation do...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~525-~525: Ensure spelling is correct
Context: ...erver-side resolver integration, in the pacquet fetch path that chooses metadata/tarbal...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~574-~574: To elevate your writing, try using an alternative expression here.
Context: .... URL and package normalization still matter for route-policy and descriptor selecti...
(MATTERS_RELEVANT)
🔇 Additional comments (9)
pnpr/text/0000-auth-aware-resolution-cache.md (9)
372-372: Static analysis hints are false positives for domain-specific terminology.LanguageTool flags "packument" (lines 372, 391), "uplink" (line 525), and suggests replacing "matter" with "relevant" (line 574). "Packument" is established npm ecosystem terminology (package manifest document). "Uplink" is correct Verdaccio/proxy terminology used throughout this RFC. "Pacquet" (line 525) is the name of the Rust crate. "still matter" (line 574) is grammatically correct in context. These can be safely ignored.
Also applies to: 391-391, 525-525, 574-574
34-90: LGTM!
91-183: LGTM!
184-242: LGTM!
243-304: LGTM!
370-403: LGTM!
404-446: LGTM!
448-518: LGTM!
519-690: LGTM!
| client-provided upstream credentials, which makes authenticated installs ~2.8× | ||
| slower than anonymous ones even when every dependency is public. This RFC |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Inconsistent speedup figures between Summary and Motivation table.
The Summary states authenticated installs are "~2.8× slower than anonymous ones," but the Motivation table shows anonymous resolve as "~2.3× faster" vs. resolving directly, with authenticated at "~1.0× (no speedup)." The ~2.8× ratio (1.484/0.523 ≈ 2.84) and the ~2.3× figure don't align unless "resolving directly" has a different baseline. Clarify whether ~2.3× and ~2.8× refer to different baselines, or correct one figure for consistency.
Also applies to: 75-76
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pnpr/text/0000-auth-aware-resolution-cache.md` around lines 6 - 7, The
speedup figures in the RFC are inconsistent between the Summary and Motivation
table. Update the Summary text and the table in this document so the "~2.8×
slower" and "~2.3× faster" numbers use the same baseline, or explicitly label
the different baselines if they refer to different comparisons; check the
affected prose around the authentication cache motivation section and keep the
figures consistent throughout.
Handle private registries that serve package documents and tarballs on different domains (GitHub Packages, CodeArtifact, Artifactory/Azure with a separate asset CDN). The uplink endpoint already normalizes the client-facing tarball URL onto pnpr's own host, so lockfile entries stay integrity-only and parity holds. Add the security rule for pnpr's outbound fetch: a credential is attached only to fetches whose origin matches an explicitly configured uplink origin. A tarball host taken from a packument's dist.tarball is untrusted and never causes pnpr to forward the packument uplink's credential to it. A differing tarball origin must be configured explicitly (separate uplink or additional origin); an unconfigured private tarball origin is fetched without credentials and fails closed rather than leaking the token or acting as an open relay. Written by an agent (Claude Code, claude-opus-4-8).
A bare https://<pnpr>/<uplink> base shares pnpr's root path namespace with package names (/<name> is a packument), so an uplink named like a package collides. Prefix the endpoint path with ~ (https://<pnpr>/~<uplink>): ~ is a URL-unreserved character and an invalid leading character for an npm package name, so it never collides with /<name> or /@scope, and mid-string it does not trigger shell tilde expansion. Record the rejected alternatives in the open-questions list: a bare prefix (collides), + (mis-decoded as space in path segments), and $ (shell and .npmrc variable expansion), with the npm-reserved /-/uplink/<uplink>/ namespace as a clunkier fallback. Written by an agent (Claude Code, claude-opus-4-8).
Make explicit the requirement that the resolver recognize a request registry of https://<self>/~<uplink>/ as its own uplink endpoint and resolve through that uplink (credential, access, footprint) instead of treating it as a third-party upstream or recursively calling itself. This is what lets /resolve emit URLs already canonical for the client's configured endpoint. Tighten the routing failure mode: a fresh resolve of a private route whose scope points at the raw upstream (no usable credential-free URL) must fail closed with an actionable error naming the scope and the uplink endpoint, not emit an unfetchable upstream URL or forward client auth. Add the split-domain tarball-origin config question (separate uplink vs an additional allowlisted origin on the same uplink) to the open-questions list. Written by an agent (Claude Code, claude-opus-4-8).
First step of aligning the resolver with the auth-aware resolution cache RFC (pnpm/rfcs#11): an uplink can now carry an `access:` policy and a `generation`, folding the separate `upstreamAliases` concept into the existing `uplinks` config. An uplink that declares `access:` is eligible to back a proxied private route and be exposed at its `/~<name>/` registry endpoint; an uplink without `access:` stays registry-proxy only. No consumer reads the new fields yet; route classification and endpoint serving are wired up in follow-up commits.
Route classification now sources proxied-route credentials from `uplinks:` entries that declare an `access:` policy, in addition to the legacy `upstreamAliases` block. Uplink-sourced credentials are matched by registry origin (no package glob) and carry the uplink's generation; a plain proxy uplink without `access:` is never offered as a private-route credential. Part of aligning the resolver with the auth-aware resolution cache RFC (pnpm/rfcs#11).
A fetch to pnpr's own `/~<uplink>/` registry endpoint is now classified as a proxied route through that uplink, using the uplink's current generation (the URL carries none). Since a package name can never begin with `~`, such a URL is unambiguously an uplink endpoint, not a hosted package: an unauthorized caller or an unknown uplink fails closed rather than falling through to the hosted-package policy. This is what lets a `/resolve` request whose scope is configured at a `/~<uplink>/` endpoint classify and credential the route through the backing uplink instead of treating its own endpoint as a third-party upstream. Part of aligning the resolver with the auth-aware resolution cache RFC (pnpm/rfcs#11).
Expose each access-bearing uplink as a read-only registry endpoint at `/~<uplink>/`: packument (scoped and unscoped) and tarball reads route through the named uplink instead of the `packages.proxy` chain, gated by the uplink's own `access:` policy. Served packuments have their `dist.tarball` rewritten back onto the same endpoint, so a client that points a scope at `https://<pnpr>/~<uplink>/` gets canonical (integrity-only) lockfile entries. The endpoint is fetch-through: it reads the uplink's packument fresh for the version integrity and streams the tarball through a temp file verified against it, writing nothing to the shared proxy mirror. A private uplink's packuments and tarballs therefore never persist where the public path or another uplink could read them — closing the metadata/tarball leak that a shared, package-keyed mirror would open. Integration tests cover the endpoint rewrite, fail-closed access gating, and the no-persistence property. Part of aligning the resolver with the auth-aware resolution cache RFC (pnpm/rfcs#11).
Replace the opaque per-tarball gateway scheme with the uplink registry endpoints. The resolver now rewrites a proxied route's tarball to its `/~<uplink>/<package>/-/<file>` endpoint URL (canonical for a client whose scope is configured there, so the lockfile entry collapses to integrity-only), keeps a public/unknown route's upstream URL untouched, and reverses endpoint URLs back to upstream when verifying an input lockfile. Deleted: - the `/-/pnpr/v0/tarballs/alias|unknown/...` routes and their handlers; - `TarballGatewayRoutes` (the in-memory HMAC key -> URL map) and the unknown-route gateway machinery; - `GatewayAlias`/`gateway_alias` (replaced by `RouteContext::uplink_registry`); - the `upstreamAliases` config block and `UpstreamAlias` type — proxied-route credentials now come solely from access-bearing `uplinks:` entries, matched by registry origin. Tests across the route, resolver, config, and server suites are migrated from `upstreamAliases` to access-bearing uplinks and from gateway URLs to endpoint URLs; 518 pnpr tests pass. Part of aligning the resolver with the auth-aware resolution cache RFC (pnpm/rfcs#11).
Replace the "unknown route" model (resolve an off-policy registry anonymously as a non-shareable miss) with a default-deny allowlist: pnpr fetches only from operator-configured registries — the built-in npm route, public routes, uplinks (and their /~<uplink>/ endpoints), and pnpr itself — and rejects a client registry/namedRegistries matching none of them before any fetch. This closes the resolver's SSRF surface at the source (a caller can no longer point pnpr at cloud instance metadata or an internal service), strictly stronger than denylisting link-local/metadata hosts, and supersedes that approach (which stays useful only as defense-in-depth). It also removes the unknown-route resolve path, the non-shareable cache state, and the MetadataCacheScope Bypass tier: every resolved route is now public (globally shared) or carries a private access descriptor, so the cache has two states. The built-in npm route becomes host-level (scoped + unscoped) and treats an anonymous success as public; a private scoped npm package 404s anonymously and must be fronted by a uplink, so nothing private is silently shared. The auto-detecting-public-registries section is dropped (no unknown registry to probe), and a new Alternative records the denylist approach and why the allowlist is preferred. Written by an agent (Claude Code, claude-opus-4-8).
The allowlist is the SSRF boundary for every server-side fetch, not just the registry/namedRegistries fetch. Note that direct http(s)/git URL dependency specs, overrides leaves, and input-lockfile tarball URLs are checked the same way, so a caller cannot reach an off-allowlist host through the tarball or git resolver.
Drop the disable-the-built-in toggle: npmjs is always allowlisted and public with no configuration, and ahead of any uplink credential for the same origin.
…sidual The request-boundary allowlist covers the client's registries and direct-URL deps; a transitive dep on an off-allowlist URL fetched during the tree walk, and DNS rebinding, are both closed by the deferred connector-level IP guard.
… generation Replace the manual rotation-generation counter with an automatic digest of the uplink's credential: rotating the upstream credential re-keys the private cache namespace with no operator step to forget. The proxied descriptor key input becomes `uplink + credential-digest`.
Make the pnpr resolver cache authorization-aware and route private dependencies through per-uplink registry endpoints. Add route classification at the single fetch/auth-selection point: pnpr selects its own server-owned upstream credentials instead of forwarding client upstream auth, records a per-resolve footprint, and rejects inline URL credentials. A uplinks: entry that declares an access: policy becomes the private-route credential (folding in the former upstreamAliases block), matched by registry origin and exposed as a read-only registry endpoint at /~<uplink>/. access reuses bearer-token-backed pnpr identities, package access policy, and static groups; a SHA-256 digest of the uplink's credential participates in the footprint, so rotating the upstream credential automatically moves new resolves to a fresh namespace. The credential is attached only over a matching scheme (no token over plain http). Gate every server-side fetch behind a default-deny allowlist (built-in npm host, public routes, configured uplinks and their /~<uplink>/ endpoints, and pnpr itself). A registry/namedRegistries matching none is rejected at the request boundary, closing the resolver's SSRF surface at the source and superseding the link-local denylist. The boundary also covers direct-URL (http(s)/git, incl. scp-style) dependency specs, overrides, and lockfile tarballs; a `..` path segment is rejected; and the same allowlist re-validates every redirect hop. The official npm registry is a built-in host-level public route (scoped names included), so the npmjsPublic toggle is gone. With no off-allowlist route to resolve, every route is public or carries a private descriptor: RouteClass::Unknown, the non-shareable footprint tier, and the MetadataCacheScope::Bypass tier are removed. Transitive deps fetched during the tree walk are a connect-time-guard follow-up (#12705). Route resolver tarball URLs through those endpoints. A proxied route emits its /~<uplink>/ endpoint URL; public routes keep their upstream URL (fetched directly from the registry/CDN); pnpr-hosted packages use pnpr-hosted URLs. An endpoint URL is canonical for a client whose scope points there, so the lockfile entry stays integrity-only and the host comes from the client's registry config rather than the lockfile — a project resolves to the same lockfile through /resolve or a direct/proxied install. verification_lockfile reverses endpoint URLs to upstream for input-lockfile verification, and classification recognizes pnpr's own /~<uplink>/ URLs. The opaque per-tarball gateway scheme and its in-memory key->URL map are removed. Serve each access-bearing uplink as a /~<uplink>/ registry endpoint (packument + tarball, gated by the uplink access policy) with a private cache namespaced by an HMAC of (uplink, credential), so a private install caches like a public one, a rotation re-keys automatically, and a private uplink's content never lands in the shared mirror. Store bounded candidate lists under an auth-excluded base key instead of a single lockfile. Public candidates match every caller; private candidates carry the footprint and descriptor HMAC and are reused only when the caller still satisfies the stored uplink-or-hosted gates. Compute the base key for both no-lockfile and lockfile-seeded requests, using hash_lockfile() for input lockfiles, and evict expired/LRU private candidates before public ones. Record metadata fast-path routes into the footprint. The hook only fires at the auth-selection point, which the npm resolver's metadata fast paths (in-memory hit, offline disk read, version-spec exact match, publishedBy mtime shortcut) bypass. AuthHeaders::record_route drives the hook without a request, called up front in pick_package so every layer contributes to the footprint. Scope the npm metadata mirror by private access descriptor. A MetadataCacheScope (Public / Private) classified per (registry, package) fetch threads through pick_package, fetch_full_metadata_cached, the mirror path, in-memory/fetch-lock keys, and the verifier's local-mirror read. A private route stores its packument under v11/metadata-private/<descriptor-id>/. Fail closed on 401/403/private-404. The CLI installs no hook, so every fetch stays Public and the global mirror is unchanged. Related to #12699 and pnpm/rfcs#11.
Document the behaviors added during review that the RFC hadn't caught up to: - redirect hops are re-validated against the allowlist (an implemented guard, not part of the deferred connect-time residual — which is now just DNS-rebinding and transitive deps); - the boundary gates every scheme:// URL (not just http(s)/git) and scp-style git remotes, and rejects `.`/`..` path segments; - an uplink credential is attached only over a matching scheme (no token over plain http); - a registry-resolved package is classified by its registry route, not its dist.tarball host (so split-domain private tarballs route through /~<uplink>/), and an emitted public tarball URL is sanitized (userinfo + signed-URL query); - public-route rules fail closed on a typo.
Summary
Adds an RFC for pnpr's auth-aware resolver path: restoring resolution-cache sharing for public installs, safely sharing private resolver results for authorized callers, and ensuring resolved tarball URLs do not expose upstream credentials.
Today pnpr disables its resolution cache for any request that forwards upstream credentials, which makes authenticated installs ~2.8x slower than anonymous ones even when every dependency is public. See pnpm/pnpm#12604.
The RFC proposes:
401/403and private-route404responses must fail closed.Notes for reviewers
pnpr/text/0000-auth-aware-resolution-cache.md(number left as0000per the submission instructions).Summary by CodeRabbit
401/403.