Skip to content

fix(pnpr): preserve non-canonical upstream tarball filenames#12593

Merged
zkochan merged 8 commits into
mainfrom
fix-main-branch2
Jun 22, 2026
Merged

fix(pnpr): preserve non-canonical upstream tarball filenames#12593
zkochan merged 8 commits into
mainfrom
fix-main-branch2

Conversation

@zkochan

@zkochan zkochan commented Jun 22, 2026

Copy link
Copy Markdown
Member

Summary

The proxied-tarball integrity fix (#12570, GHSA-5f9g-98vq-2jxw) started reconstructing each rewritten dist.tarball filename from its version key (<name>-<version>.tgz). That assumes upstream tarball filenames are always canonical, which breaks packages whose real tarball name differs — e.g. esprima-fb@3001.1.0-dev-harmony-fb, whose npm tarball is esprima-fb-3001.0001.0000-dev-harmony-fb.tgz. The client recorded the wrong lockfile URL, and the proxied fetch 404'd against the upstream.

This regressed two tests on main (@pnpm/installing.deps-installertest/lockfile.ts):

  • save tarball URL when it is non-standard
  • keep non-standard tarball URL when lockfileIncludeTarballUrl is false

This PR restores faithful proxying of non-canonical tarball names while keeping the GHSA integrity protection intact.

Squash Commit Body

fix(pnpr): preserve non-canonical upstream tarball filenames

The proxied-tarball integrity fix (pnpm/pnpm#12570, GHSA-5f9g-98vq-2jxw)
started reconstructing each rewritten dist.tarball filename from its
version key (`<name>-<version>.tgz`). That assumes upstream tarball names
are always canonical, which breaks packages like esprima-fb whose real
npm tarball is `esprima-fb-3001.0001.0000-dev-harmony-fb.tgz` for version
`3001.1.0-dev-harmony-fb`: the rewritten URL no longer matched what npm
hosts, so the client recorded the wrong lockfile URL and the proxied
fetch 404'd.

Preserve the upstream basename verbatim in the rewrite again, and resolve
a tarball request's version and dist.integrity by matching the requested
filename against each version's dist.tarball basename instead of parsing
the version out of the filename. OSV screening re-runs against the
resolved version when it differs from the filename-derived one.

The GHSA protection is unchanged: served bytes are still verified against
the SRI declared by the version that owns that tarball name, so a
preserved name cannot smuggle in bytes of another provenance. Tests that
encoded the canonical-reconstruction behavior are updated to assert
basename preservation, with new coverage for non-canonical names.

Also fix a pre-existing compile break in the pnpr server test target:
UserStore::in_memory gained a MaxUsers argument (pnpm/pnpm#12581) but one
test call site was not updated.

Checklist

  • The change is implemented in both the TypeScript CLI and the Rust pacquet/ port, or the description notes what still needs porting. — N/A: the change is confined to the pnpr registry server; no pnpm/pacquet CLI surface is affected.
  • Added a changeset (pnpm changeset) if this PR changes any published package. — N/A: pnpr is a Rust binary, not a published npm package.
  • Added or updated tests.
  • Updated the documentation if needed. — N/A.

Written by an agent (Claude Code, claude-opus-4-8).

Summary by CodeRabbit

  • Bug Fixes

    • Improved tarball routing and authorization by preserving the dist.tarball basename declared in the packument and ensuring integrity/provenance checks align with the declaring version, even when packuments swap filenames.
  • New Features

    • Added more robust tarball URL rewriting that normalizes basenames consistently, including handling query/signed URLs and non-canonical paths.
  • Tests

    • Expanded unit and server tests for tarball URL edge cases, basename extraction, and failure behavior when multiple versions share an ambiguous tarball basename.

zkochan added 2 commits June 22, 2026 22:00
The proxied-tarball integrity fix (#12570, GHSA-5f9g-98vq-2jxw)
started reconstructing each rewritten dist.tarball filename from its
version key (`<name>-<version>.tgz`). That assumes upstream tarball names
are always canonical, which breaks packages like esprima-fb whose real
npm tarball is `esprima-fb-3001.0001.0000-dev-harmony-fb.tgz` for version
`3001.1.0-dev-harmony-fb`: the rewritten URL no longer matched what npm
hosts, so the client recorded the wrong lockfile URL and the proxied
fetch 404'd.

Preserve the upstream basename verbatim in the rewrite again, and resolve
a tarball request's version and dist.integrity by matching the requested
filename against each version's dist.tarball basename instead of parsing
the version out of the filename. OSV screening re-runs against the
resolved version when it differs from the filename-derived one.

The GHSA protection is unchanged: served bytes are still verified against
the SRI declared by the version that owns that tarball name, so a
preserved name cannot smuggle in bytes of another provenance. Tests that
encoded the canonical-reconstruction behavior are updated to assert
basename preservation, with new coverage for non-canonical names.

Also fix a pre-existing compile break in the pnpr server test target:
`UserStore::in_memory` gained a `MaxUsers` argument (#12581) but
one test call site was not updated.
@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b729f07c-9181-4869-9250-18269e054d45

📥 Commits

Reviewing files that changed from the base of the PR and between 324d952 and 4559f27.

📒 Files selected for processing (2)
  • pnpr/crates/pnpr/src/upstream.rs
  • pnpr/crates/pnpr/src/upstream/tests.rs
🚧 Files skipped from review as they are similar to previous changes (2)
  • pnpr/crates/pnpr/src/upstream.rs
  • pnpr/crates/pnpr/src/upstream/tests.rs

📝 Walkthrough

Walkthrough

Tarball URL rewriting is refactored to preserve the upstream dist.tarball basename instead of reconstructing filenames from version strings. A new tarball_basename helper extracts path segments. In serve_tarball, version resolution now matches the requested filename against each packument version's dist.tarball basename via expected_tarball_dist, with a second OSV authorization check when the resolved version differs from the filename-derived one.

Changes

Tarball Basename Preservation and Version Resolution

Layer / File(s) Summary
tarball_basename helper and URL rewriting refactor
pnpr/crates/pnpr/src/upstream.rs
Adds exported tarball_basename that strips query/fragment and returns the final path segment. rewrite_dist_tarball now extracts that basename from the existing dist.tarball string instead of constructing a filename from a version string, falling back to a canonical name when basename extraction fails. rewrite_tarball_urls trims public_url once, iterates only version map values, and handles both versioned and single-version packument shapes. extract_version_manifest calls rewrite_tarball_urls rather than rewrite_dist_tarball with a resolved version.
TarballDist struct, expected_tarball_dist helper, and serve_tarball OSV re-check
pnpr/crates/pnpr/src/server.rs
Imports tarball_basename. Introduces TarballDist { version, integrity }. Replaces expected_tarball_integrity with expected_tarball_dist(packument, name, filename), which iterates packument versions matching the requested filename against each dist.tarball basename, fails closed with tarball_integrity_error on duplicate basenames, and maps integrity parse failures to "malformed dist.integrity". serve_tarball derives name_version for an initial OSV check, then re-runs ensure_osv_allowed when the resolved version differs.
Upstream test coverage for basename behavior
pnpr/crates/pnpr/src/upstream/tests.rs
Imports tarball_basename. Corrects Verdaccio scoped tarball fixture with comment about scope-in-path. Adds tests for non-canonical basename preservation, query-bearing URL rewriting, and basenameless URL fallback. Validates tarball_basename directly. Fixes extracts_version_by_dist_tag fixture where scoped package's dist.tarball version was misaligned.
Integration test updates for basename routing and version binding
pnpr/crates/pnpr/tests/server.rs
Integration test renamed to tarball_route_preserves_basename_and_binds_to_declaring_version with mock packument swapping basenames between versions. Upstream mock expects the basename from the latest version's declared dist.tarball. Assertions updated for swapped-basename routing. Cache entry name updated. New ambiguous_tarball_basename_is_rejected_before_fetch test asserts BAD_GATEWAY with zero upstream fetches when two versions share the same dist.tarball basename.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant serve_tarball
  participant ensure_osv_allowed
  participant expected_tarball_dist
  participant packument

  Client->>serve_tarball: GET /pkg/-/foo-1.0.0.tgz
  serve_tarball->>serve_tarball: parse_tarball_name → name_version (e.g. "1.0.0")
  serve_tarball->>ensure_osv_allowed: check name_version
  serve_tarball->>packument: load packument
  serve_tarball->>expected_tarball_dist: match "foo-1.0.0.tgz" vs dist.tarball basenames
  expected_tarball_dist->>packument: iterate versions, extract tarball_basename
  expected_tarball_dist-->>serve_tarball: TarballDist { version: "2.0.0", integrity } or error
  alt resolved version ≠ name_version
    serve_tarball->>ensure_osv_allowed: re-check resolved version
  end
  serve_tarball-->>Client: serve tarball bytes or not_found / bad_gateway
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • pnpm/pnpm#12566: Both PRs modify serve_tarball to parse the tarball filename and perform OSV-related authorization, with this PR further refining when OSV is re-checked based on the resolved packument tarball version after dist.tarball basename matching.
  • pnpm/pnpm#12570: Directly related — both PRs modify serve_tarball's dist.integrity resolution from packument data and the tarball_integrity_error handling path in the same server module.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix-main-branch2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@qodo-free-for-open-source-projects

Copy link
Copy Markdown

PR Summary by Qodo

Fix pnpr tarball proxying to preserve upstream non-canonical filenames
🐞 Bug fix 🧪 Tests 🕐 20-40 Minutes

Grey Divider

Description

• Preserve upstream tarball basenames when rewriting dist.tarball URLs.
• Resolve tarball requests by matching basename to packument versions and integrity.
• Update/expand tests and fix pnpr server test setup compile break.
Diagram

graph TD
  client(["Client (pnpm)"]) --> pack["pnpr: packument/manifest\n(rewrite dist.tarball basename)"] --> client
  client --> tar["pnpr: serve_tarball"] --> osv{"OSV policy screen"} --> resolve["Resolve by dist.tarball\nbasename + dist.integrity"] --> up(("Upstream registry")) --> cache[("Tarball cache")] --> client

  subgraph Legend
    direction LR
    _a(["Actor"]) ~~~ _s["Service/handler"] ~~~ _d{"Policy/decision"} ~~~ _e(("External")) ~~~ _db[("Storage")]
  end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Canonicalize filenames but add redirect/alias mapping
  • ➕ Keeps a single canonical URL shape for clients/lockfiles
  • ➕ Can preserve previous security model without packument basename reliance
  • ➖ Requires storing and maintaining a mapping of canonical↔upstream basenames
  • ➖ Adds extra request/redirect hop or more complex routing rules
  • ➖ Still needs careful integrity binding to avoid cross-version confusion
2. Resolve tarball version by integrity instead of basename
  • ➕ Avoids reliance on tarball filename conventions entirely
  • ➕ Could handle edge cases where different versions share basenames
  • ➖ Requires reading packument first and possibly more CPU/work per request
  • ➖ Integrity lookup can be ambiguous if duplicates exist; still needs tie-break rules
  • ➖ More invasive change than basename match and less aligned with upstream URL reality

Recommendation: The chosen approach (preserve upstream basenames in rewritten URLs and resolve the declaring version by matching that basename in the packument) is the most direct fix with minimal surface-area impact. It restores correctness for non-canonical upstream tarball names while keeping the GHSA protection intact by continuing to verify served bytes against the declaring version’s dist.integrity, and it keeps caching keyed to the actual upstream-served basename.

Files changed (5) +121 / -40

Bug fix (2) +75 / -30
server.rsResolve tarball integrity by dist.tarball basename, with OSV re-screening +49/-17

Resolve tarball integrity by dist.tarball basename, with OSV re-screening

• Changes tarball serving to treat the filename-derived version as a best-effort hint, then resolve the authoritative declaring version by matching the requested filename against each version’s 'dist.tarball' basename. Returns both resolved version and 'dist.integrity' and re-runs OSV screening when the resolved version differs.

pnpr/crates/pnpr/src/server.rs

upstream.rsPreserve upstream tarball basename when rewriting dist.tarball URLs +26/-13

Preserve upstream tarball basename when rewriting dist.tarball URLs

• Updates 'rewrite_tarball_urls' to rewrite URLs as '{public_url}/{pkg}/-/{basename}' using the original tarball URL’s basename rather than reconstructing '<name>-<version>.tgz'. Applies the rewrite helper to both packument and single-manifest shapes.

pnpr/crates/pnpr/src/upstream.rs

Tests (3) +46 / -10
tests.rsFix auth test setup after UserStore::in_memory signature change +2/-2

Fix auth test setup after UserStore::in_memory signature change

• Updates tests to pass 'MaxUsers::Unlimited' into 'UserStore::in_memory', fixing a compile break in the pnpr server test target.

pnpr/crates/pnpr/src/server/tests.rs

tests.rsAdjust tarball rewrite expectations and add non-canonical basename coverage +28/-2

Adjust tarball rewrite expectations and add non-canonical basename coverage

• Fixes scoped verdaccio tarball expectations to reflect basename-only rewriting and adds a new test ensuring non-canonical upstream basenames (e.g., zero-padded versions) are preserved in rewritten URLs.

pnpr/crates/pnpr/src/upstream/tests.rs

server.rsUpdate end-to-end tarball routing test to assert basename preservation and binding +16/-6

Update end-to-end tarball routing test to assert basename preservation and binding

• Renames and updates the tarball route/cache key test to assert that pnpr fetches/caches under the declaring version’s 'dist.tarball' basename and still verifies bytes against that version’s integrity, even when basenames are swapped between versions in the packument.

pnpr/crates/pnpr/tests/server.rs

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 22, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (4) 📘 Rule violations (0) 📎 Requirement gaps (0) 🎨 UX issues (0) 🔗 Cross-repo conflicts (0) 📜 Skill insights (0)

Grey Divider


Action required

1. Tarball rewrite bypass ✓ Resolved 🐞 Bug ⛨ Security
Description
rewrite_dist_tarball() now returns without rewriting when tarball_basename() is None (e.g.,
upstream dist.tarball ends with /), so pnpr can serve a packument that still contains the
upstream tarball URL. That allows an upstream-controlled packument to direct clients to fetch
tarballs from arbitrary URLs instead of pnpr’s /pkg/-/file.tgz endpoint where integrity and OSV
enforcement occur.
Code

pnpr/crates/pnpr/src/upstream.rs[R394-399]

let Some(tarball_value) = dist.get_mut("tarball") else { return };
-    if !tarball_value.is_string() {
+    let Some(basename) = tarball_value.as_str().and_then(tarball_basename) else {
    return;
-    }
-    let filename = pkg.tarball_name_for_version(version);
-    *tarball_value = Value::String(format!("{public_url}/{}/-/{filename}", pkg.as_str()));
+    };
+    *tarball_value = Value::String(format!("{public_url}/{}/-/{basename}", pkg.as_str()));
+}
Evidence
The rewrite helper early-returns when it can’t compute a basename, which means dist.tarball can
remain an upstream URL in the JSON pnpr serves. packument_response() always calls
rewrite_tarball_urls() before serializing the packument, so any skipped rewrite is observable by
clients and can route them away from pnpr’s tarball endpoint (where pnpr enforces integrity/OSV).

pnpr/crates/pnpr/src/upstream.rs[380-411]
pnpr/crates/pnpr/src/server.rs[2542-2552]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`rewrite_tarball_urls()` is intended to ensure all `dist.tarball` URLs served to clients route through pnpr. After this PR, `rewrite_dist_tarball()` skips rewriting when `tarball_basename()` returns `None` (notably when the URL’s path ends in `/`), which can leave the upstream URL in the served packument. This bypasses pnpr’s tarball-serving path where integrity verification and OSV policy are enforced.
## Issue Context
This is a security control-plane regression: a malicious or malformed upstream packument can set `dist.tarball` to an arbitrary URL that triggers `tarball_basename(url) == None` and thus gets passed through to clients.
## Fix Focus Areas
- pnpr/crates/pnpr/src/upstream.rs[380-411]
- pnpr/crates/pnpr/src/server.rs[2542-2552]
### Suggested implementation direction
- Ensure `dist.tarball` is *always* rewritten to the pnpr public URL when `dist.tarball` is a string.
- If a basename cannot be derived (e.g., trailing `/`), do not leave the upstream URL intact. Options (pick one):
- **Fail closed**: make `rewrite_tarball_urls` return `Result<()>` and have `packument_response()` fail (502) when a tarball URL cannot be rewritten.
- **Safe fallback**: when iterating `versions`, keep access to the version key and fall back to `pkg.tarball_name_for_version(version_key)` for rewrite if `tarball_basename()` returns `None` (preserve-basename only for well-formed URLs).
- At minimum, replace unrewritable tarballs with a pnpr URL (or remove the affected version) rather than passing through the upstream URL.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Basename-driven cache fragmentation 🐞 Bug ⛨ Security
Description
Because tarball URLs now preserve the upstream dist.tarball basename, the tarball cache is keyed
by an upstream-controlled filename rather than a stable version-derived name. If an upstream changes
the basename for the same version over time, pnpr will miss existing cache entries and store
duplicate tarballs under different filenames, increasing bandwidth/disk usage and creating a
cache-bloat amplification vector.
Code

pnpr/crates/pnpr/src/upstream.rs[R390-415]

+fn rewrite_dist_tarball(value: &mut Value, pkg: &PackageName, public_url: &str) {
+    // Every string `dist.tarball` must be rewritten to a route on *this*
+    // server, where integrity and OSV are enforced — never passed through.
+    // When the upstream URL has no usable basename (e.g. it ends in `/`), fall
+    // back to the version-derived canonical name (the manifest carries its own
+    // `version`) so a malformed URL still points at pnpr (and 404s there)
+    // rather than directing the client at an arbitrary upstream host.
+    let fallback = value
+        .get("version")
+        .and_then(Value::as_str)
+        .map(|version| pkg.tarball_name_for_version(version));
   let Some(dist) = value.get_mut("dist").and_then(Value::as_object_mut) else {
       return;
   };
   let Some(tarball_value) = dist.get_mut("tarball") else { return };
   if !tarball_value.is_string() {
       return;
   }
-    let filename = pkg.tarball_name_for_version(version);
+    let filename = tarball_value
+        .as_str()
+        .and_then(tarball_basename)
+        .map(str::to_owned)
+        .or(fallback)
+        .unwrap_or_default();
   *tarball_value = Value::String(format!("{public_url}/{}/-/{filename}", pkg.as_str()));
}
Evidence
The PR changes tarball rewrite to preserve the upstream basename, and the tarball-serving path uses
the request filename as the on-disk cache key via Storage::tarball_path(name, filename), so
upstream basename churn directly translates into new cache entries and cache misses.

pnpr/crates/pnpr/src/upstream.rs[360-415]
pnpr/crates/pnpr/src/server.rs[828-955]
pnpr/crates/pnpr/src/storage.rs[723-729]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`rewrite_dist_tarball()` now rewrites `dist.tarball` to `{public_url}/{pkg}/-/{basename}` where `basename` is preserved from the upstream URL. Downstream, `serve_tarball()` uses the requested `filename` as the storage/cache key (`open_cached_tarball*` and integrity marker I/O). This makes the cache identity effectively upstream-basename-driven, so an upstream that changes tarball basenames for the same version can cause repeated cache misses and duplicated cache entries.
### Issue Context
This is a resource-amplification risk: even if bytes/integrity are unchanged, varying basenames force extra upstream fetches and extra disk entries because the storage path is `package_dir.join(filename)`.
### Fix Focus Areas
- pnpr/crates/pnpr/src/server.rs[872-966]
- pnpr/crates/pnpr/src/storage.rs[723-729]
- pnpr/crates/pnpr/src/upstream.rs[380-415]
### Suggested fix
- After `expected_tarball_dist()` resolves the declaring `version`, derive a **stable internal cache filename** (e.g. `let cache_filename = name.tarball_name_for_version(&version);`).
- Use `cache_filename` (not the request `filename`) for:
- `open_cached_tarball`, `open_cached_tarball_tmp`
- cached integrity marker read/write/remove
- Keep using the request `filename` for routing and for the upstream fetch path (`fetch_tarball_response(&name, &filename)`), so non-canonical upstream basenames are still fetched correctly.
- (Optional) If you need offline replay for the preserved-basename URL, store a small alias mapping `{requested filename} -> {cache_filename}` (or compute it from cached packument), so requests can still find the stable cache entry without duplicating bytes.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. OSV precheck false-denies 🐞 Bug ☼ Reliability
Description
In serve_tarball, OSV is enforced against the filename-derived name_version before resolving the
authoritative declaring version from the packument; when these differ (the PR explicitly allows this
for non-canonical upstream basenames), a vulnerable name_version will reject the request even if
the resolved version is allowed. This creates incorrect FORBIDDEN responses (availability/compat
regression) for some proxied tarballs with non-canonical naming.
Code

pnpr/crates/pnpr/src/server.rs[R838-852]

+    // `name_version` is the version segment carried by the filename. It is
+    // canonical for hosted tarballs (the publish handler enforces it) and a
+    // best-effort screen here; the authoritative version a proxied tarball
+    // resolves to is the `version` matched below, which may differ for a
+    // non-canonical upstream tarball name.
+    let (filename, name_version) = match name.parse_tarball_name(filename) {
      Ok(parsed) => parsed,
      Err(err) => return error_response(&err),
  };
  if let Err(err) = authorize(state, identity, name.as_str(), Action::Access) {
      return error_response(&err);
  }
-    if let Err(err) = ensure_osv_allowed(state, &name, &version) {
+    if let Err(err) = ensure_osv_allowed(state, &name, &name_version) {
      return error_response(&err);
  }
Evidence
The code comment states name_version is only a best-effort screen and the authoritative version
may differ, but the OSV check is still enforced (and can return early) before that authoritative
version is determined.

pnpr/crates/pnpr/src/server.rs[828-885]
pnpr/crates/pnpr/src/server.rs[2651-2668]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`serve_tarball()` performs `ensure_osv_allowed(..., name_version)` before it resolves the authoritative `(version, integrity)` owner of the requested tarball from the packument. The PR explicitly documents that `name_version` may differ from the resolved `version` for non-canonical upstream tarball names, so denying early can incorrectly block an otherwise-allowed tarball.
### Issue Context
This is a behavioral regression introduced by allowing non-canonical basenames while still using the filename-derived version as an enforcement gate.
### Fix Focus Areas
- Adjust the OSV gating so a mismatch can’t cause a false-deny.
- Prefer enforcing OSV on the resolved declaring version; keep any pre-check strictly as an optimization that cannot reject if resolution would allow.
### Fix Focus Areas (code pointers)
- pnpr/crates/pnpr/src/server.rs[838-885]
### Suggested implementation sketch
- Move the initial `ensure_osv_allowed(state, &name, &name_version)` to after `expected_tarball_dist(...)` so the decision is made using the resolved `version`.
- OR: keep the early check, but if it returns an error, continue to resolve via `expected_tarball_dist` and only return FORBIDDEN if the resolved `version` is also disallowed (and ensure you still block when resolved version is vulnerable).

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


4. Encoded basename mismatch 🐞 Bug ☼ Reliability
Description
rewrite_dist_tarball preserves the upstream basename verbatim, but axum percent-decodes the
requested filename path param before serve_tarball/expected_tarball_dist compare it to the
packument’s raw dist.tarball basename. If the upstream basename contains percent-encoded bytes
(e.g. %2B), the decoded request (+) won’t match the stored basename (%2B), causing proxied
tarball requests to 404 (and the upstream fetch URL to be constructed from the decoded form).
Code

pnpr/crates/pnpr/src/server.rs[R988-995]

+    let Some((version, manifest)) = versions.iter().find(|(_, manifest)| {
+        manifest
+            .get("dist")
+            .and_then(|dist| dist.get("tarball"))
+            .and_then(Value::as_str)
+            .and_then(|url| url.rsplit('/').next())
+            .is_some_and(|basename| basename == filename)
+    }) else {
Evidence
The rewrite preserves the raw trailing substring after / from the upstream tarball URL, while the
server’s routing stack percent-decodes path parameters before handler code runs; the tarball
resolver then compares the decoded request filename against the still-encoded basename extracted
from the raw packument URL, so percent-encoded basenames can never match.

pnpr/crates/pnpr/src/upstream.rs[380-399]
pnpr/crates/pnpr/src/server.rs[979-1013]
pnpr/crates/pnpr/src/server.rs[1074-1076]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Tarball basenames that contain percent-encoded bytes in the upstream packument (e.g. `foo-1.0.0%2Bbuild.tgz`) can break after this PR:
- `rewrite_tarball_urls` emits proxy URLs containing the raw `%2B` basename.
- axum’s `Path` extraction percent-decodes path params before `serve_tarball` sees them.
- `expected_tarball_dist` matches the request’s (decoded) `filename` against the packument’s (still encoded) basename via plain string equality.
This can lead to `404 Not Found` for valid upstream tarballs and/or incorrect upstream request URL construction.
### Issue Context
This is introduced by switching tarball URL rewriting from canonical reconstruction to “preserve basename verbatim”. The server still treats route params as decoded strings.
### Fix Focus Areas
- pnpr/crates/pnpr/src/upstream.rs[390-399]
- pnpr/crates/pnpr/src/server.rs[979-1013]
- pnpr/crates/pnpr/src/server.rs[1074-1076]
### Suggested fix approach
- Decode the upstream `dist.tarball` basename to a normalized filename string before storing/using it for comparisons, or decode during comparison:
- In `expected_tarball_dist`, percent-decode the extracted `basename` (from packument) before comparing to `filename`.
- When constructing rewritten `dist.tarball` URLs, ensure the basename is re-encoded as a URL path segment (so the URL is valid and round-trips safely).
- When constructing the upstream fetch URL, build it with a URL builder (or path-segment encoding) so decoded characters are correctly percent-encoded for the outbound request.
The key invariant: **the same canonical representation** of the basename must be used across (1) rewritten metadata, (2) route matching, (3) packument lookup, and (4) upstream fetch URL construction.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (3)
5. Basename query breaks tarballs ✓ Resolved 🐞 Bug ☼ Reliability
Description
rewrite_dist_tarball copies the upstream tarball URL’s last '/' segment verbatim into the proxy URL
path, so if that segment contains URL delimiters like '?' or '#', the rewritten dist.tarball becomes
ambiguous/malformed and clients will request a different path than pnpr later matches in
expected_tarball_dist, resulting in 404s. This is inconsistent with pnpr’s upstream tarball fetch,
which always constructs '{base}/{pkg}/-/{filename}' and cannot honor upstream query/fragment anyway.
Code

pnpr/crates/pnpr/src/upstream.rs[R395-399]

+    let Some(basename) = tarball_value.as_str().and_then(|url| url.rsplit('/').next()) else {
return;
-    }
-    let filename = pkg.tarball_name_for_version(version);
-    *tarball_value = Value::String(format!("{public_url}/{}/-/{filename}", pkg.as_str()));
+    };
+    *tarball_value = Value::String(format!("{public_url}/{}/-/{basename}", pkg.as_str()));
}
Evidence
The rewrite path uses a raw string tail segment as a path component, while tarball routing/matching
is based on decoded path segments (excluding query), and matching also uses raw rsplit basenames
from packument metadata—so reserved URL delimiters in the upstream tail segment can make the
rewritten URL’s path differ from the matched filename.

pnpr/crates/pnpr/src/upstream.rs[360-399]
pnpr/crates/pnpr/src/upstream.rs[301-318]
pnpr/crates/pnpr/src/server.rs[538-555]
pnpr/crates/pnpr/src/server.rs[969-1013]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`rewrite_dist_tarball` currently extracts the upstream tarball `basename` via `rsplit('/').next()` and interpolates it directly into a proxy URL path. If the upstream tarball URL contains a query/fragment (or other reserved delimiters) in the tail, the generated `dist.tarball` becomes ambiguous (everything after `?` is a query, not part of the path), and tarball fetches won’t match what `expected_tarball_dist` looks up.
## Issue Context
- The tarball request handler receives the filename from the URL *path segment* (not including the query/fragment), and `expected_tarball_dist` compares that filename to a `basename` extracted from `dist.tarball`.
- pnpr’s upstream tarball fetch does **not** use the original upstream `dist.tarball` URL at all; it always builds `{upstream_base}/{pkg}/-/{filename}` so any upstream query/fragment is inherently unsupported and should not be preserved into the proxy URL.
## Fix Focus Areas
- pnpr/crates/pnpr/src/upstream.rs[380-399]
- pnpr/crates/pnpr/src/server.rs[979-995]
## Suggested fix approach
1. Replace `rsplit('/').next()` basename extraction with URL-aware parsing:
- Parse `dist.tarball` using the `url` crate (`Url::parse`).
- Extract the last *path segment* only (ignoring query/fragment), e.g. `url.path_segments().and_then(|s| s.last())`.
2. When constructing `{public_url}/{pkg}/-/{basename}` ensure `basename` is a valid single path segment:
- Either percent-encode reserved characters for a path segment, or reject/skip rewriting if the segment contains characters that would change URL structure.
3. Update `expected_tarball_dist` to use the same URL-aware extraction when matching basenames, so it stays consistent with the rewriting logic.
4. Add a targeted unit test in `pnpr/crates/pnpr/src/upstream/tests.rs` (and/or server tests) covering an upstream `dist.tarball` with a query string to ensure the rewritten URL and the server-side matching behavior remain consistent.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Linear tarball dist scan 🐞 Bug ⛨ Security
Description
serve_tarball now resolves (version, integrity) by parsing the full packument and linearly
scanning versions for a dist.tarball basename match, which adds O(number_of_versions) JSON work
to every tarball request (including cache hits and repeated misses). This creates a CPU
amplification/DoS vector on public registries by repeatedly requesting tarballs for packages with
large version histories.
Code

pnpr/crates/pnpr/src/server.rs[R987-997]

+    let Some(versions) = packument.get("versions").and_then(Value::as_object) else {
+        return Ok(None);
+    };
+    let Some((version, manifest)) = versions.iter().find(|(_, manifest)| {
+        manifest
+            .get("dist")
+            .and_then(|dist| dist.get("tarball"))
+            .and_then(Value::as_str)
+            .and_then(|url| url.rsplit('/').next())
+            .is_some_and(|basename| basename == filename)
+    }) else {
Evidence
The tarball handler always loads the packument and resolves expected integrity before attempting to
serve from the tarball cache, and resolution now uses a linear versions.iter().find(...) scan over
every version’s dist.tarball basename.

pnpr/crates/pnpr/src/server.rs[859-911]
pnpr/crates/pnpr/src/server.rs[981-1015]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`expected_tarball_dist()` currently parses the packument and scans all `versions` to find a tarball by basename. This is on the tarball hot path and can be abused for CPU DoS (or just cause latency/CPU regression) for packages with many versions.
### Issue Context
For canonical hosted/npm tarballs, the filename already encodes a version segment (`name.parse_tarball_name()`), and the packument is keyed by version. We can keep the new non-canonical support while restoring a fast path for the common case.
### Fix Focus Areas
- pnpr/crates/pnpr/src/server.rs[820-911]
- pnpr/crates/pnpr/src/server.rs[981-1015]
### Suggested fix
1. Change `expected_tarball_dist` to accept `name_version` (the parsed version segment) in addition to `filename`.
2. Fast-path: `versions.get(name_version)` and verify its `dist.tarball` basename matches `filename`; if it matches, return that `(version, integrity)`.
3. Only fall back to the current full scan when the fast-path fails (to support non-canonical upstream basenames / swapped names).
4. Optionally short-circuit obvious misses (e.g., if `versions` has no entry for `name_version` and `filename` is canonical) to avoid scanning on attacker-driven 404 storms.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. Ambiguous basename ownership ✓ Resolved 🐞 Bug ☼ Reliability
Description
expected_tarball_dist returns the first version whose dist.tarball basename matches the
requested filename; if an upstream provides duplicate basenames across versions, the “owning”
version/integrity becomes dependent on packument ordering and may not match the version that
advertised the URL. This can cause inconsistent integrity/OSV decisions and cache-key confusion for
that filename under malformed/adversarial metadata.
Code

pnpr/crates/pnpr/src/server.rs[R990-999]

+    let Some((version, manifest)) = versions.iter().find(|(_, manifest)| {
+        manifest
+            .get("dist")
+            .and_then(|dist| dist.get("tarball"))
+            .and_then(Value::as_str)
+            .and_then(|url| url.rsplit('/').next())
+            .is_some_and(|basename| basename == filename)
+    }) else {
return Ok(None);
};
Evidence
The resolver uses versions.iter().find(...) which inherently selects a single first match and
performs no duplicate detection, and the updated test suite demonstrates that version keys and
tarball basenames can intentionally diverge (so basename-based resolution is authoritative).

pnpr/crates/pnpr/src/server.rs[981-1015]
pnpr/crates/pnpr/tests/server.rs[500-586]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When multiple `versions[*].dist.tarball` entries share the same basename, `expected_tarball_dist()` silently picks the first match. That makes the resolved `(version, integrity)` ambiguous and can break invariants (“a tarball filename maps to exactly one version/integrity”).
### Issue Context
This PR explicitly allows versions to declare tarball basenames that don’t match their own version key (see updated integration test), which increases the importance of making basename→version binding deterministic and unambiguous.
### Fix Focus Areas
- pnpr/crates/pnpr/src/server.rs[981-1015]
- pnpr/crates/pnpr/tests/server.rs[500-594]
### Suggested fix
1. While scanning `versions`, track whether a match was already found.
2. If a second match is found for the same `filename`, return an explicit error (e.g. 502 with a `TarballIntegrity`/metadata error) rather than choosing arbitrarily.
3. If you implement the fast-path from the other finding, you can also disambiguate by preferring the match whose version key equals `name_version`, and only error when multiple matches remain after applying that preference.
4. Add a regression test that constructs a packument with two versions sharing the same `dist.tarball` basename and asserts the request fails closed with a clear error.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects

qodo-free-for-open-source-projects Bot commented Jun 22, 2026

Copy link
Copy Markdown

Code Review by Qodo

Grey Divider

New Review Started

This review has been superseded by a new analysis

Grey Divider

Qodo Logo

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pnpr/crates/pnpr/src/server.rs`:
- Around line 990-997: The current implementation uses find() to locate the
first version with a matching tarball basename in the versions iterator, but
this silently accepts the first match without validating uniqueness. This is a
security issue because if multiple versions declare the same tarball basename,
the SRI/integrity check could be performed against the wrong version. Replace
the find() logic with a check that either rejects the request if multiple
versions claim the same tarball basename, or verifies that all matching versions
agree on the same tarball URL before proceeding. This ensures the served bytes
are validated against the correct declaring version as required by the
REVIEW_GUIDE.

In `@pnpr/crates/pnpr/src/upstream.rs`:
- Around line 395-398: The current implementation extracts the basename from
tarball_value using rsplit('/').next() which includes query strings and
fragments, causing mismatches when the tarball is later served in serve_tarball.
Create a centralized helper function that extracts the URL path basename by
first stripping everything after the first occurrence of '?' or '#' characters,
then extracting the final path segment after the last '/'. Use this helper
function to properly extract basename in the tarball_value assignment block and
also apply it in the expected_tarball_dist function to ensure consistent and
correct basename extraction across all locations where dist.tarball is
processed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 84675763-4624-4cca-b212-18549a1ef7c1

📥 Commits

Reviewing files that changed from the base of the PR and between 88add09 and a7eb549.

📒 Files selected for processing (5)
  • pnpr/crates/pnpr/src/server.rs
  • pnpr/crates/pnpr/src/server/tests.rs
  • pnpr/crates/pnpr/src/upstream.rs
  • pnpr/crates/pnpr/src/upstream/tests.rs
  • pnpr/crates/pnpr/tests/server.rs

Comment thread pnpr/crates/pnpr/src/server.rs Outdated
Comment thread pnpr/crates/pnpr/src/upstream.rs Outdated
@qodo-free-for-open-source-projects

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 187639b

@qodo-free-for-open-source-projects

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 187639b

@github-actions github-actions Bot added the reviewed: coderabbit CodeRabbit submitted an approving review label Jun 22, 2026
@qodo-free-for-open-source-projects

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 324d952

@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.22642% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.16%. Comparing base (88add09) to head (4559f27).

Files with missing lines Patch % Lines
pnpr/crates/pnpr/src/server.rs 94.11% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #12593      +/-   ##
==========================================
+ Coverage   88.14%   88.16%   +0.01%     
==========================================
  Files         326      326              
  Lines       46523    46556      +33     
==========================================
+ Hits        41006    41044      +38     
+ Misses       5517     5512       -5     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread pnpr/crates/pnpr/src/upstream.rs
@qodo-free-for-open-source-projects

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 324d952

@qodo-free-for-open-source-projects

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 4559f27

1 similar comment
@qodo-free-for-open-source-projects

Copy link
Copy Markdown

Code review by qodo was updated up to the latest commit 4559f27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

product: pnpr reviewed: coderabbit CodeRabbit submitted an approving review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants