pnpr: make remaining local state pluggable for stateless / serverless deployment

## Motivation

[#12198](https://github.com/pnpm/pnpm/pull/12198) moves pnpr's **hosted** store (published packages + static content) into an S3-compatible object store (S3 / Cloudflare R2 / MinIO / …). That removes the biggest blocker to running pnpr as multiple stateless replicas, but it isn't sufficient on its own: pnpr still keeps other state on local disk that a horizontally-scaled or serverless runtime won't preserve or share.

This issue tracks making the **remaining** state pluggable so pnpr can run as a stateless, horizontally-scalable service (the realistic near-term target is container-serverless with autoscaling — Cloud Run / Fly / Container Apps / Fargate scale-to-zero; true FaaS is a larger lift, see below).

Organizing principle: **every per-instance disk dependency becomes a config-selected backend, defaulting to today's local behavior**, following the pattern #12198 established (build the client once at config-load time into an `Arc<dyn …>` handle so request handlers stay infallible).

## Two backend "kinds"

Everything pnpr keeps locally is one of:

- **Blob** (content / objects) — already abstracted by `object_store`; reuse the `S3Store` from #12198.
- **Record** (relational / KV state) — narrow async traits, default impl = today's SQLite/file, plus a networked impl.

## Components

| Component | Today (code) | Kind | Externalize via |
|---|---|---|---|
| Hosted store | `storage.rs` `HostedStore {Fs\|S3}` | Blob | ✅ done (#12198) |
| **Proxy cache** | `storage.rs` `cached: Store` (fs-only) | Blob | Generalize to `{Fs\|S3}`, reuse `S3Store` |
| **Accelerator CAS** (`pnpr-store`) | `install_accelerator.rs` `store_dir` | Blob | Content-addressed → S3 maps directly |
| **Auth: users + tokens** | `auth.rs` `UserStore` / `TokenStore` | Record | ✅ done (#12206) — `UserBackend`/`TokenBackend` traits + libsql/Turso impl |
| **Grant table** | `install_accelerator/grant_table.rs` | Record | Trait + networked-SQLite impl |
| **Public-packages** | `install_accelerator/public_packages.rs` | Record | Trait + networked-SQLite impl |
| **Verdict cache** | `install_accelerator/verdict_cache.rs` | Record (pure cache) | Leave ephemeral, or externalize |

### Blob side (cache + accelerator CAS)
Reuses #12198. Two fs-isms to handle in the proxy cache:
- **TTL freshness** — `Store::read_fresh_packument` reads file mtime; the S3 variant uses `ObjectMeta.last_modified` (object_store exposes it).
- **`tee_to_cache`** (`streaming.rs`) writes incrementally to a local file then renames. S3 can't append, so stage to local tmp and upload on stream completion (multipart for large tarballs) — same stage-then-upload shape the publish path already uses.

No new external service — cheapest large win.

### Record side (auth + grants)
Narrow async traits, keeping current impls as defaults:

```rust
trait UserBackend  { async fn get(..); async fn upsert_or_login(..); async fn count(); }
trait TokenBackend { async fn issue(..); async fn lookup(..); async fn list(..); async fn revoke(..); }
trait GrantBackend { async fn is_granted(..); async fn record(..); async fn clear_package(..); }
```

Impls: `InMemory`, `LocalSqlite` (today, `rusqlite`), and a **networked SQLite** impl. Selected by config, built into `Arc<dyn …>` at load time. (Multiple impls behind `dyn` → `async-trait`.)

## DB choice: SQLite-compatible, not Postgres

pnpr is **already** built on SQLite (`rusqlite` backs tokens, grants, public-packages, verdict cache), so the schemas and queries already exist. Standardizing on SQLite-compatible services keeps the SQL and makes an all-Cloudflare stack coherent: **R2 (blobs) + D1 (records) + compute**.

Important nuance: **"SQLite" ≠ "rusqlite" once networked.** `rusqlite` opens a local *file*; a networked SQLite service needs a different driver. Keep the SQL, swap the driver:

| Option | Reach it via | Works with current stack? |
|---|---|---|
| **Cloudflare D1** | Workers binding, or REST API (SQL over HTTP) | REST API: yes (Cloud Run/Lambda). Workers binding: only inside a Worker. |
| **Turso / libSQL** | `libsql` crate (network + **embedded replicas**) | Yes — any tokio runtime |
| **LiteFS (Fly)** | FUSE-replicated file, `rusqlite` unchanged, single writer | Yes |
| **rqlite** | Raft + HTTP | Yes |

Caveats:
- **D1 is happiest inside a Worker**, and Workers can't run the current `tokio`/`axum`/`rusqlite` stack (WASM rewrite). From Cloud Run/Lambda you'd use D1's **REST API** (per-query HTTP).
- **Auth/token lookup is on the hot path** (≈ every request). A pure-HTTP-to-D1 backend taxes every call. Mitigate with a short-TTL in-process token cache or **Turso embedded replicas** (local-fast reads).
- **Eventual consistency** on replicas — fine for grants/public-packages (grant table is already best-effort, clear-on-discovery); the one to watch is **token-revocation lag**. A config knob (replica vs primary read for auth) covers it.

**Recommendation:** abstract behind the record-backend trait; default `LocalSqlite`. For the container-serverless target reachable today, **Turso/libSQL** is the pragmatic pick (works with the current stack; embedded replicas solve the hot-path auth read). **D1** is the north-star for the Workers/edge path, gated on the larger stack rewrite.

## Config shape (each block absent ⇒ local default)

```yaml
storage: ./storage
s3: { bucket: pkgs, ... }          # hosted blobs        (done)
cache:
  s3: { bucket: cache, ... }       # proxy-cache blobs   (phase 2)
accelerator:
  store:
    s3: { bucket: accel, ... }     # accelerator CAS     (phase 2)
backend:
  libsql:
    url: ${PNPR_LIBSQL_URL}
    auth_token: ${PNPR_LIBSQL_TOKEN}
  # or: d1: { account_id, database_id, api_token }
```

## Phased rollout

1. **Hosted → S3** — ✅ done (#12198).
2. **Proxy cache + accelerator CAS → S3** — reuses `object_store`, **no new external service**. After this the only local state left is the SQLite record stores. Biggest bang, lowest risk — proposed next step.
3. **Auth (users + tokens) → networked SQLite** — ✅ done (#12206). `UserBackend` / `TokenBackend` async traits; local (htpasswd + SQLite) / in-memory / **libsql/Turso** impls selected by a `backend.libsql:` block. Embedded-replica read acceleration (`replicaPath` / `syncIntervalSecs`) for the hot-path token lookup; strict atomic registration cap.
4. **Grants + public-packages → networked SQLite**; leave verdict-cache ephemeral (pure cache, degrades to lower hit-rate).
5. **Deployment glue** — Cloud Run needs ~a Dockerfile + readiness probe; Lambda needs `lambda_http` + response-streaming handling; Workers is a separate, larger effort.

## Cross-cutting caveat (independent of where bytes live)

With N replicas, the **read-modify-write flows** are the real distributed-systems risk, not storage:
- Publish merges the incoming manifest into the existing packument (`publish.rs::merge_manifest`); partial-unpublish rewrites it. Two replicas publishing the same package concurrently = last-write-wins on the packument object → a lost version.
- Single-instance half ✅ done (#12206): a striped per-package lock serializes the read-modify-write packument flows (publish, dist-tag, partial-unpublish) on one instance. **Cross-replica** still pending: S3 conditional writes (`If-Match` / ETag) or a DB-level conditional update. pnpm clients usually serialize per package, but a shared registry shouldn't rely on that.

Also: keep cold-start cost down — the install accelerator is already lazily built via `OnceLock`; keep that laziness and use a small connection pool.

---
Written by an agent (Claude Code, claude-opus-4-8).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

pnpr: make remaining local state pluggable for stateless / serverless deployment #12199

Motivation

Two backend "kinds"

Components

Blob side (cache + accelerator CAS)

Record side (auth + grants)

DB choice: SQLite-compatible, not Postgres

Config shape (each block absent ⇒ local default)

Phased rollout

Cross-cutting caveat (independent of where bytes live)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Component	Today (code)	Kind	Externalize via
Hosted store	`storage.rs` `HostedStore {Fs\|S3}`	Blob	✅ done (#12198)
Proxy cache	`storage.rs` `cached: Store` (fs-only)	Blob	Generalize to `{Fs\|S3}`, reuse `S3Store`
Accelerator CAS (`pnpr-store`)	`install_accelerator.rs` `store_dir`	Blob	Content-addressed → S3 maps directly
Auth: users + tokens	`auth.rs` `UserStore` / `TokenStore`	Record	✅ done (#12206) — `UserBackend`/`TokenBackend` traits + libsql/Turso impl
Grant table	`install_accelerator/grant_table.rs`	Record	Trait + networked-SQLite impl
Public-packages	`install_accelerator/public_packages.rs`	Record	Trait + networked-SQLite impl
Verdict cache	`install_accelerator/verdict_cache.rs`	Record (pure cache)	Leave ephemeral, or externalize

Option	Reach it via	Works with current stack?
Cloudflare D1	Workers binding, or REST API (SQL over HTTP)	REST API: yes (Cloud Run/Lambda). Workers binding: only inside a Worker.
Turso / libSQL	`libsql` crate (network + embedded replicas)	Yes — any tokio runtime
LiteFS (Fly)	FUSE-replicated file, `rusqlite` unchanged, single writer	Yes
rqlite	Raft + HTTP	Yes

Uh oh!

Uh oh!

pnpr: make remaining local state pluggable for stateless / serverless deployment #12199

Description

Motivation

Two backend "kinds"

Components

Blob side (cache + accelerator CAS)

Record side (auth + grants)

DB choice: SQLite-compatible, not Postgres

Config shape (each block absent ⇒ local default)

Phased rollout

Cross-cutting caveat (independent of where bytes live)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions