RFC: Registry mounts for pnpr#13
Conversation
📝 WalkthroughWalkthroughThis PR adds a new RFC document (pnpr/text/0000-pnpr-registry-mounts.md) defining a "registry mount" abstraction for pnpr, covering hosted-org, upstream, and router mount kinds, routing/validation invariants, lockfile registry-identity requirements, cache/policy keying, an implementation plan, and open questions. No code or exported entities are changed. ChangesRegistry Mounts RFC
Sequence Diagram(s)Not applicable — this PR is a documentation-only RFC addition with no executable control flow to visualize. Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Summary by QodoRFC: Registry mounts for pnpr
AI Description
Diagram
High-Level Assessment
Files changed (1)
|
Make explicit single-origin `~<mount>` addressing the default and confine cross-origin composition to two named, opt-in types: byte-equivalent mirror groups and a Verdaccio-style blended default (hosted-over-public, first-match shadow, integrity-backstopped, public-only redirect). Drop the dist.tarball-rewriting and client-synthesized namedRegistries alias machinery: serving each mount at its canonical path lets pnpm's existing tarball-URL reconstruction keep the host out of the lockfile, so lockfiles are already portable. The one new requirement is recording registry identity in package identity so the same name@version from two mounts cannot collide. Strengthen the motivation with the concrete pnpm gap (registryName dropped from the package key today) and add vlt's DepID tuple as prior art for registry as a first-class component of package identity. Make the default mount configurable with no implicit hosted uplink, and gate publish-to-root to hosted defaults.
Replace the deployment-wide publicFallback that auto-derived a /~~<mount> endpoint for every private mount with a per-mount blendedFallback. A mount gains a blended (dependency-confusion-bearing) endpoint only when it names the public mount it overlays, so the surface is created exactly where an operator asks for it, never blanket-exposed by a single global setting. Naming the fallback per mount also lets different private mounts overlay different public fallbacks.
Collapse composition to a single rule: provenance is declared, never inferred, and no configuration can express a cross-origin fall-through (not on not-found, not on unavailable). Remove mirror groups, blended /~~ endpoints, blendedFallback, and existence-based overlay; make upstream exactly one URL with no secondary endpoints; introduce an authoritative router mount that maps name patterns to one concrete source. A matched source is final: not-found returns not-found, a down source returns an error (never a 404), and the router never consults another source. This closes the dependency-confusion class by construction rather than mitigating it. Outage resilience comes from pnpr's cache. Lockfiles record the concrete resolved source so a later route edit cannot relocate a locked package. Update prior art (Nexus group / Artifactory virtual merge, so their names are not reused; conda strict channel priority as select-one prior art) and the rejected alternatives (existence fallback, mirror/multi-endpoint failover) with the integrity-guards-bytes-not-origin reasoning.
Because routing is first-match-in-order, route order is load-bearing: a misordered router (e.g. a catch-all before a narrower private route) can silently send a private scope to a public origin — the one way a config mistake reintroduces the cross-origin hazard the model otherwise forbids. Require pnpr to validate routers at config load and refuse to start (and fail a reload) on an unreachable route, duplicate patterns, or an unknown/self source. Add the matching implementation step and test.
Correct the misleading framing that npm requires a registry at the domain root. npm treats any configured base URL as a registry, so every ~<mount>/ is already a complete registry root and mounts are not sub-registries under a privileged real root. The path-less base (bare host, no mount in the path) is therefore optional: a deployment may omit defaultTarget and expose only ~<mount>/ URLs. The default target exists only to give the path-less base a meaning when wanted, and an unqualified publish to it is allowed only when it resolves to a hosted org.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
pnpr/text/0000-pnpr-registry-mounts.md (3)
333-333: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueClarify "contradictory patterns".
The term "contradictory patterns" is undefined. Does this mean:
- Two routes with identical patterns?
- Overlapping patterns where neither fully covers the other?
- Patterns that syntactically cannot match any package name?
Specifying the intended cases would help implementers. If this is intentionally left open, consider adding "(e.g., identical patterns, or patterns that cannot both match any valid npm package name)".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pnpr/text/0000-pnpr-registry-mounts.md` at line 333, Clarify the ambiguous phrase in the routing guidance by updating the rule in the registry-mounts document to define what “contradictory patterns” means, using examples like identical patterns or patterns that cannot both match any valid npm package name. Keep the wording aligned with the surrounding router constraints so implementers can tell whether the intent is exact duplicates, overlapping matches, or syntactically impossible patterns.
192-196: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winClarify comment on Router source restriction.
The comment at line 194 reads "never another Router member by existence" — "by existence" appears to be residual phrasing from an earlier draft that discussed existence-based fallback. Rephrase for clarity:
- source: MountId, // a HostedOrg or Upstream mount; never another Router member by existence + source: MountId, // a HostedOrg or Upstream mount; never another Router🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pnpr/text/0000-pnpr-registry-mounts.md` around lines 192 - 196, The comment on Route::source is unclear because “by existence” is leftover draft wording; update the inline comment on the source field to clearly state the restriction without that phrase, keeping the intent that the source can only be a HostedOrg or Upstream mount and not another Router member.
658-662: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winClarify "pacquet" terminology.
The implementation plan mentions "pacquet" at lines 660-661 without introduction. If this refers to the Rust pnpm-compatible implementation, consider adding a clarifying parenthetical on first use, or verify that the audience will recognize this term. If it's a typo for "pnpm", correct it.
- 8. Add registry identity to pnpm/pacquet lockfile package identity so the same + 8. Add registry identity to pnpm (and its Rust implementation pacquet) lockfile package identity so the same🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pnpr/text/0000-pnpr-registry-mounts.md` around lines 658 - 662, The plan uses “pacquet” without context, so clarify the terminology in the affected section by either correcting it if it was meant to say “pnpm” or adding a brief parenthetical on first use to identify pacquet as the Rust pnpm-compatible implementation. Update the wording in the same paragraph that mentions the reader, writer, and installer so the term is unambiguous and consistent throughout the document.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pnpr/text/0000-pnpr-registry-mounts.md`:
- Line 333: Clarify the ambiguous phrase in the routing guidance by updating the
rule in the registry-mounts document to define what “contradictory patterns”
means, using examples like identical patterns or patterns that cannot both match
any valid npm package name. Keep the wording aligned with the surrounding router
constraints so implementers can tell whether the intent is exact duplicates,
overlapping matches, or syntactically impossible patterns.
- Around line 192-196: The comment on Route::source is unclear because “by
existence” is leftover draft wording; update the inline comment on the source
field to clearly state the restriction without that phrase, keeping the intent
that the source can only be a HostedOrg or Upstream mount and not another Router
member.
- Around line 658-662: The plan uses “pacquet” without context, so clarify the
terminology in the affected section by either correcting it if it was meant to
say “pnpm” or adding a brief parenthetical on first use to identify pacquet as
the Rust pnpm-compatible implementation. Update the wording in the same
paragraph that mentions the reader, writer, and installer so the term is
unambiguous and consistent throughout the document.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 013b9c75-1f58-4dfc-bac7-9219357bdd34
📒 Files selected for processing (1)
pnpr/text/0000-pnpr-registry-mounts.md
📜 Review details
🧰 Additional context used
🪛 LanguageTool
pnpr/text/0000-pnpr-registry-mounts.md
[grammar] ~22-~22: Ensure spelling is correct
Context: ...ting them. Outage resilience comes from pnpr's own cache, never from trying a differen...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~26-~26: Ensure spelling is correct
Context: ...L reconstruction rather than persisting pnpr URLs. The one genuinely new lockfile re...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~39-39: Ensure spelling is correct<uplink...
Context: ...ame. pnpm now has named registries and pnpr already has origin-qualified `/
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~575-~575: Consider an alternative for the overused word “exactly”.
Context: ...public origin of the same name. That is exactly the dependency-confusion attack. Declar...
(EXACTLY_PRECISELY)
🔇 Additional comments (9)
pnpr/text/0000-pnpr-registry-mounts.md (9)
1-30: LGTM!
61-87: Accurate and concrete motivation with clear evidence.The pnpm code reference at lines 71-74 precisely identifies the gap. The explanation that "pnpm does not error, it silently treats them as the same package" (lines 80-81) correctly characterizes the severity — this is a silent data corruption risk, not a loud failure.
211-235: LGTM!
236-257: LGTM!
276-306: Strong security model with clear operational semantics.The distinction between "not found" (authoritative, return 404) and "unavailable" (return error, never 404) at lines 288-293 is the correct behavior to prevent dependency confusion through downstream fall-through. This aligns with the upstream contract from
auth-aware-resolution-cache.mdthat rejects unknown routes at the request boundary.
343-401: LGTM!
402-495: Well-reasoned lockfile design with correct prior art integration.The pnpm code references at lines 439-444 and 446-451 accurately support the claim that lockfiles can remain deployment-portable. The vlt DepID model at lines 480-483 provides concrete prior art for registry-qualified package identity. The open question about name→URL mapping at install time is appropriately tracked in Unresolved Questions.
496-556: LGTM!
570-581: Correct and nuanced threat model analysis.The explicit recognition that "Integrity does not rescue this" (line 579) demonstrates correct understanding of the dependency confusion attack: integrity checks verify content against metadata but cannot validate that the metadata came from the intended origin. This is a common misconception that the RFC correctly avoids.
Each mount is independently addressable at its own ~<mount>/ URL, so a private package's access policy must live on the concrete source that holds it — a router or default target cannot protect it, because the source is reachable directly. Document that access composes as an AND (router policy if any, plus the resolved source policy, which always applies by every path), that a router policy gates only the router entry point, and that a fully private deployment must gate every mount. Add the npm private-plus-public worked example, the 404-not-403 rule for unauthorized private packages, and matching tests.
Define the hosted-org mount by a HostedBackend interface (produce packument and tarball for name@version; publish optional) instead of assuming pnpr owns the storage. pnpr ships a default content-addressed storage backend that accepts writes; the same mount kind can be backed by an external read-only projection that stores no npm packages of its own. bit.cloud is the motivating case — its registry projects a component object store (on-demand tarballs, snap->semver packuments with a componentId field, publish via its own export flow) — and everything around the backend (addressing, authorize-at-the-source, cache, integrity, lockfile registry identity) is identical regardless of backend. Update the enum, implementation step, cache wording, and tests.
Model every addressable registry origin in pnpr as a registry mount, exposed at /~<mount>/: a pnpr-hosted organization registry, a single-origin upstream registry, and a router mapping package-name patterns to one concrete source. Provenance is declared, never inferred — a package resolves to exactly one declared concrete origin, and no configuration can express a cross-origin fall-through (not on "not found", not on "unavailable"). This is a full replacement of the legacy Verdaccio-shaped model, not an additive layer. Implements the pnpr side of the RFC only; the lockfile registry-identity changes for the TypeScript CLI and pacquet are out of scope. New `mount` module (the routing core): a decidable PackagePattern language (`**`, `@*/*`, `@scope/*`, exact) with a covers() superset relation; first-match authoritative resolution; and Mounts::validate, which rejects at config load and reload shadowed/unreachable routes (including a non-last **), duplicate patterns, and sources that are unknown, self-referential, or another router. Config: `mounts:` (hostedOrg/upstream/router) and `defaultTarget:` are the only routing surface. The legacy `uplinks:`, `packages: proxy:` fallback chains, and `resolve_uplinks` are removed; `packages:` is now an access-control layer only. Serving and writes: every path-less request and every write (publish, dist-tag, unpublish) routes through the mount graph — the hosted-first-then-proxy path and the multi-uplink tarball fallback are gone. A router that matches no route is a definitive 404 (no fall-through); a matched-but-unavailable source returns an error, never a 404. The path-less base aliases the default-target mount; with no default target the bare host has no registry. Served dist.tarball URLs stay canonical for the base the client addressed (path-less vs /~<mount>/), so lockfiles drop them rather than baking in a mount name. The per-package access ACL is enforced centrally on every mount-served read, regardless of whether the package routes to a hosted org or an upstream. Hosted-org storage and publishing: each hosted-org mount has its own storage namespace (local dir and S3/R2; an empty `org` is the flat root) so two orgs hosting the same `name@version` cannot collide. Writes route into the resolved org's namespace; a write routed to an upstream is rejected. The org is threaded through staging, commit, and the publish journal (org-namespaced roll-forward) so crash recovery promotes into the right org. Unauthorized reads of a private hosted org return 404, not 403. Cache: a public upstream mount uses a stable, secret-free cache namespace (`~public/<digest>`), shared across process restarts; a private mount keeps the credential-and-secret-keyed namespace. The now-unused shared-mirror cache and conditional-GET validator machinery are deleted. The bundled config.yaml is the registry-mock shape in mount form (fixture scopes to a flat-namespace hosted org, ** to npmjs), so registry-mock keeps working with no task or fixture-seed changes. The integrated-benchmark cold-mock config is converted to the mount model.
Model every addressable registry origin in pnpr as a registry mount, exposed at /~<mount>/: a pnpr-hosted organization registry, a single-origin upstream registry, and a router mapping package-name patterns to one concrete source. Provenance is declared, never inferred — a package resolves to exactly one declared concrete origin, and no configuration can express a cross-origin fall-through (not on "not found", not on "unavailable"). This is a full replacement of the legacy Verdaccio-shaped model, not an additive layer. Implements the pnpr side of the RFC only; the lockfile registry-identity changes for the TypeScript CLI and pacquet are out of scope. New `mount` module (the routing core): a decidable PackagePattern language (`**`, `@*/*`, `@scope/*`, exact) with a covers() superset relation; first-match authoritative resolution; and Mounts::validate, which rejects at config load and reload shadowed/unreachable routes (including a non-last **), duplicate patterns, and sources that are unknown, self-referential, or another router. Config: `mounts:` (hostedOrg/upstream/router) and `defaultTarget:` are the only routing surface. The legacy `uplinks:`, `packages: proxy:` fallback chains, and `resolve_uplinks` are removed; `packages:` is now an access-control layer only. Serving and writes: every path-less request and every write (publish, dist-tag, unpublish) routes through the mount graph — the hosted-first-then-proxy path and the multi-uplink tarball fallback are gone. A router that matches no route is a definitive 404 (no fall-through); a matched-but-unavailable source returns an error, never a 404. The path-less base aliases the default-target mount; with no default target the bare host has no registry. Served dist.tarball URLs stay canonical for the base the client addressed (path-less vs /~<mount>/), so lockfiles drop them rather than baking in a mount name. The per-package access ACL is enforced centrally on every mount-served read, regardless of whether the package routes to a hosted org or an upstream. Hosted-org storage and publishing: each hosted-org mount has its own storage namespace (local dir and S3/R2; an empty `org` is the flat root) so two orgs hosting the same `name@version` cannot collide. Writes route into the resolved org's namespace; a write routed to an upstream is rejected. The org is threaded through staging, commit, and the publish journal (org-namespaced roll-forward) so crash recovery promotes into the right org. Unauthorized reads of a private hosted org return 404, not 403. Cache: a public upstream mount uses a stable, secret-free cache namespace (`~public/<digest>`), shared across process restarts; a private mount keeps the credential-and-secret-keyed namespace. The now-unused shared-mirror cache and conditional-GET validator machinery are deleted. The bundled config.yaml is the registry-mock shape in mount form (fixture scopes to a flat-namespace hosted org, ** to npmjs), so registry-mock keeps working with no task or fixture-seed changes. The integrated-benchmark cold-mock config is converted to the mount model.
…router patterns - server: a hosted mount's `access` list now gates the write routing (publish, batch publish, dist-tag add/remove, unpublish, packument update) exactly as it gates reads: `resolve_publish_target` resolves the org through `accessible_hosted_namespace` (the renamed `readable_hosted_namespace`) with the caller's identity, and a denied caller gets the same not-found mask as on a read. Previously the org was looked up without the identity, so any caller passing the default `$authenticated` per-package publish policy could publish into — or retag and unpublish from — a private hosted mount (RFC pnpm/rfcs#13, implementation point 9). - server: `/-/v1/search` scans the flat hosted store, so it now requires the caller to pass the access list of a hosted mount serving the flat root (`org: ""`) before scanning. A private flat-root mount's packages were 404-masked on packument reads yet enumerable by name, version, description, and maintainers through search, which filtered only on the per-package ACL (default `$all`). - mount: a wildcard-free router pattern must now be a well-formed package name. A typo like `@acme` (meaning `@acme/*`) parsed as a literal that can never match any request, silently letting the scope fall to a later catch-all route. New `ExactPatternNotAName` config error names the offending pattern and suggests `@scope/*`.
…items - server: dist-tag read/add/remove, unpublish (packument update, package and tarball delete), whoami, search, and the version manifest are now served under `/~<mount>/...`, routing through the same mount graph as the path-less alias (RFC pnpm/rfcs#13, implementation point 7). A hosted mount not reachable from the default target is now fully operable at its own endpoint. New 4-seg PUT, 6-seg PUT, and 7-seg DELETE routes carry the shapes that only existed path-less before. - server: search now routes through the mount graph (path-less via the default target): only the hosted sources the addressed registry serves are scanned, each gated by its mount access list, and under a router a name is kept only when the router actually routes it to the scanned source. The per-org namespaces are thereby searchable too, replacing the flat-root-only scan and its TODO. - server: path-less responses that resolve to a private mount now carry the same `Cache-Control: private, no-store` / `Vary: Authorization` headers the `/~<mount>/` surface applies; public resolutions stay cacheable (the hot install path). - server: a definitive upstream 404 purges the cached packument (and cached tarballs), so a package unpublished upstream can't be resurrected by the stale-if-error fallback during a later outage. - server: the uplink tarball path accepts a non-canonical basename preserved from the upstream's `dist.tarball` (still bound to the declaring version's integrity by the packument match, and still required to be a safe path segment), so such versions are fetchable through the URL this server itself advertised. - server: a version with no `dist.integrity` falls back to verifying against the legacy hex `dist.shasum` (sha1), keeping pre-2017 npm publishes proxyable; a version declaring neither stays unservable. - server: the per-tarball packument read now deserializes only the `versions[v].dist` projection instead of a full `serde_json::Value`, cutting the allocation cost on the tarball hot path. - config: `--disable-registry` still builds and validates the mount graph (only upstream credential resolution is skipped), a hosted `org` may not start with `.` (it would alias `.pnpr-cache` / `.pnpr-journal`), and a router with no routes is rejected. - config/search: drop stale comments referencing the removed `packages: proxy:` routing model. - tests: mount-addressed surface flow, path-less private-header parity, 404 purge, exotic-basename serve, shasum-only serve, org-namespaced journal roll-forward, empty-router and dot-org rejections, and mount validation under `--disable-registry`.
…) (#12747) Model every addressable registry origin in pnpr as a registry mount at /~<mount>/: a pnpr-hosted organization registry, a single-origin upstream, and a router mapping package-name patterns to one concrete source. Provenance is declared, never inferred — no configuration can express a cross-origin fall-through. Full replacement of the legacy Verdaccio-shaped model. New `mount` module: a decidable PackagePattern language with a covers() superset relation; first-match authoritative resolution; and Mounts::validate, which rejects shadowed/unreachable routes (including a non-last catch-all), duplicate patterns, and unknown/self/non-concrete sources at config load. mounts:/defaultTarget: is the only routing surface; uplinks:, packages: proxy: fallback chains, hosted-first serving, and multi-uplink tarball fallback are removed. Path-less and write requests route through the mount graph; a router no-route is a 404 with no fall-through and a down source errors rather than 404s. Served tarball URLs stay canonical for the client's base. Per-package ACLs apply on every mount-served read. Each hosted-org mount has its own storage namespace (local dir and S3/R2) so two orgs hosting the same name@version cannot collide; the org is threaded through staging, commit, and the publish journal so crash recovery lands in the right org. Public upstream mounts use a stable, secret-free cache namespace. The bundled config.yaml and the integrated-benchmark mock config are converted to the mount model; registry-mock keeps working with no task or seed changes. Implements the pnpr side of RFC pnpm/rfcs#13 only; lockfile registry-identity changes for the TypeScript CLI and pacquet are out of scope.
Summary
mounts:/defaultTarget:is the only routing surface, whilepackages:is an ACL layer only.**,@*/*,@scope/*, exact names) and static validation for shadowed routes, duplicate patterns, empty routes, and non-concrete sources.Notes
Routers provide a compatibility surface for single-registry clients without merging metadata or falling through between sources. PR 12747 implements the pnpr-server side; the client lockfile package-identity work remains separate.
Written by an agent (Codex, GPT-5).
Summary by CodeRabbit