Skip to content

pnpr: separate the proxied upstream cache from locally-published packages #12194

Description

@zkochan

Problem

pnpr currently stores proxied upstream cache and locally-published packages in the same on-disk store, with no way to tell them apart.

Both flows go through the single Cache abstraction (v11/pnpr/crates/pnpr/src/cache.rs), which writes a Verdaccio-shaped tree rooted at config.storage:

<storage>/
  <package>/
    package.json            # packument
    <name>-<version>.tgz    # tarballs, flat
  • Proxied tarballs/packuments are written here by serve_tarball / load_packument_bytes (server.rs).
  • Published tarballs/packuments are written here by publish_package (server.rs), via the same Cache::reserve_tarball_paths / write_packument.

There is no marker on disk distinguishing a proxied foo-1.0.0.tgz from a published one. At the packument level it's worse: merge_manifest (publish.rs) seeds the published package.json from the upstream packument and unions versions/dist-tags into it, so a single package.json interleaves published and proxied versions in one file. The doc comment on Config::storage even states the directory doubles as both cache and source of truth.

Consequences

  1. Cannot clear the proxy cache safely. There is no supported operation to drop just the mirrored upstream artifacts — deleting <storage>/<pkg>/ removes published packages too.
  2. Published packages share a lifecycle with disposable cache. A naive "clear the cache" wipes the source of truth. Published packages must never be able to disappear.
  3. Awkward server operations. Backups, durable/replicated volumes, and upgrades all have to treat the entire (potentially huge, fully reconstructible) proxy cache as if it were precious data.

Proposed solution

Physically separate the two stores so they have independent lifecycles.

Published store Proxy cache
Contents Locally published tarballs + packument fragments Mirrored upstream tarballs/packuments
Durability Durable, backed-up. Source of truth. Disposable — safe to wipe/GC anytime
Backing Persistent volume or object store (S3/GCS) Local SSD / ephemeral scratch
Eviction Never evicted TTL + size-cap GC is fine

Implementation sketch:

  • Give Cache two roots (published_root + cache_root). Route publish_package to published_root; route the proxy-fetch paths to cache_root. On read, check published-first, then cache.
  • Stop persisting a blended package.json. Store the locally-published packument fragment separately and compose the served packument at request time (published versions overlaid on the cached upstream packument).
  • Config: add a published-store path alongside storage (keep storage as the cache root for back-compat, or introduce explicit published_storage + cache_storage).

Server / deployment benefits

  • Published store → durable PVC or S3-compatible object storage; proxy cache → emptyDir / node-local scratch.
  • Upgrades retain published packages trivially: swap the binary/image, point it at the same published_root; the cache can start cold and self-heals on first request.
  • DR: back up only the published store; the cache never needs backing up.

This is pnpr-only (no pacquet-side port, no changeset needed).


Written by an agent (Claude Code, claude-opus-4-8).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions