fix(registry): verify tarball SHA on publish#11976
Conversation
Hash incoming publish tarball bytes and reject the request if the computed SHA-512 doesn't match the `dist.integrity` declared in the packument, or if `dist.shasum` is present and its SHA-1 doesn't match. A body that arrives without `dist.integrity` is rejected outright — npm always emits it, and accepting a publish whose hash the registry never verified is the silent-corruption vector that issue #11975 calls out. Both failure modes surface as a 400 with an `EINTEGRITY:` prefix so pnpm / npm clients recognize the error. The check runs *before* any cache write, so a bad upload never lands on disk and can't replicate to downstream consumers via a subsequent fetch.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ChangesTarball integrity verification on publish
Sequence DiagramsequenceDiagram
participant Client
participant publish_package
participant Cache
participant stream_verify_write
participant ssri
participant FS
Client->>publish_package: POST publish body with `_attachments` (base64)
publish_package->>publish_package: parse filenames -> (canonical, version)
publish_package->>publish_package: lookup `incoming["versions"][version]["dist"]`
publish_package->>Cache: reserve_tarball_paths(name, canonical)
publish_package->>stream_verify_write: (base64_data, declared_length, dist, tmp_path)
stream_verify_write->>ssri: parse dist.integrity
stream_verify_write->>stream_verify_write: stream base64 decode, compute ssri digest & optional sha1
stream_verify_write->>FS: write decoded bytes to tmp_path
stream_verify_write->>ssri: compare computed digest vs declared
alt integrity matches
stream_verify_write-->>publish_package: Ok(bytes_written)
publish_package->>Cache: finalize_tarball_slot(tmp_path -> final_path)
else integrity mismatch or missing
stream_verify_write-->>publish_package: Err(EINTEGRITY)
publish_package->>FS: cleanup_tmp_slots(tmp_path)
end
publish_package-->>Client: 200 OK or 400 EINTEGRITY
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Review Summary by QodoVerify tarball integrity on publish before cache write
WalkthroughsDescription• Verify tarball SHA-512 and SHA-1 integrity before writing to cache • Reject publishes with missing or mismatched dist.integrity field • Return 400 with EINTEGRITY: prefix for client recognition • Prevent silent-corruption bugs from bad uploads replicating downstream Diagramflowchart LR
A["Incoming publish request"] --> B["Extract attachments"]
B --> C["Parse tarball filename"]
C --> D["Look up versions[v].dist"]
D --> E["Verify SHA-512 integrity"]
E --> F{Integrity match?}
F -->|No| G["Return 400 EINTEGRITY"]
F -->|Yes| H["Verify SHA-1 shasum"]
H --> I{Shasum match?}
I -->|No| G
I -->|Yes| J["Write to cache"]
File Changes1. registry/crates/pnpm-registry/src/package_name.rs
|
Micro-Benchmark ResultsLinux |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #11976 +/- ##
==========================================
+ Coverage 88.03% 88.19% +0.16%
==========================================
Files 228 228
Lines 28068 28737 +669
==========================================
+ Hits 24711 25346 +635
- Misses 3357 3391 +34 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Integrated-Benchmark Report (Linux)Scenario: Isolated linker: fresh restore, cold cache + cold store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 1.94615672312,
"stddev": 0.08311690951970489,
"median": 1.92196890572,
"user": 2.7607041599999995,
"system": 3.23114208,
"min": 1.84602721922,
"max": 2.08198523722,
"times": [
1.9986837772200001,
1.9997971052199999,
1.91461319822,
1.92932461322,
1.89864583922,
2.05556557622,
1.88404067222,
2.08198523722,
1.84602721922,
1.85288399322
]
},
{
"command": "pacquet@main",
"mean": 1.92817821292,
"stddev": 0.055697303177662034,
"median": 1.9216367082199999,
"user": 2.77955126,
"system": 3.23657398,
"min": 1.8537913592200002,
"max": 2.04783630022,
"times": [
1.96901833622,
1.93852464622,
1.8537913592200002,
1.89989764822,
1.95708187222,
1.90522050522,
2.04783630022,
1.90208022822,
1.93805291122,
1.87027832222
]
}
]
}Scenario: Isolated linker: fresh restore, hot cache + hot store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 0.6468001888400001,
"stddev": 0.03921493845104728,
"median": 0.63389094984,
"user": 0.37157146,
"system": 1.3408782799999999,
"min": 0.62490847784,
"max": 0.7553942318400001,
"times": [
0.7553942318400001,
0.6435088578400001,
0.6524181268400001,
0.64294234784,
0.6345392588400001,
0.62490847784,
0.6263767878400001,
0.63324264084,
0.62630662284,
0.6283645358400001
]
},
{
"command": "pacquet@main",
"mean": 0.6350578599400001,
"stddev": 0.010946060549885854,
"median": 0.6330758163400001,
"user": 0.36637266,
"system": 1.34045388,
"min": 0.6213649278400001,
"max": 0.6610379798400001,
"times": [
0.6610379798400001,
0.6268870098400001,
0.6314662858400001,
0.63800273484,
0.6346853468400001,
0.63950649684,
0.6399745428400001,
0.63052803284,
0.6213649278400001,
0.6271252418400001
]
}
]
}Scenario: Isolated linker: fresh install, cold cache + cold store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 2.20944355774,
"stddev": 0.030077072250081213,
"median": 2.2135835191399997,
"user": 3.827690080000001,
"system": 3.0086175600000002,
"min": 2.1585499321399997,
"max": 2.2648153561399997,
"times": [
2.2223783021399997,
2.20413852714,
2.22415732614,
2.1585499321399997,
2.1856776811399996,
2.17791006614,
2.2195886031399996,
2.2648153561399997,
2.22964134814,
2.20757843514
]
},
{
"command": "pacquet@main",
"mean": 2.2301428483399994,
"stddev": 0.023759106128624478,
"median": 2.2211385126399996,
"user": 3.86044228,
"system": 3.00369626,
"min": 2.2059987951399997,
"max": 2.2814093131399997,
"times": [
2.2101345101399996,
2.2059987951399997,
2.2190862921399996,
2.2190956361399996,
2.23437836814,
2.25569121614,
2.2108178481399996,
2.24163511514,
2.2231813891399996,
2.2814093131399997
]
}
]
}Scenario: Isolated linker: fresh install, hot cache + hot store
BENCHMARK_REPORT.json{
"results": [
{
"command": "pacquet@HEAD",
"mean": 1.4657587857799999,
"stddev": 0.02529892893238504,
"median": 1.46502933348,
"user": 1.7283687199999995,
"system": 1.8782971199999998,
"min": 1.4219677959800001,
"max": 1.50322056498,
"times": [
1.50322056498,
1.47315731398,
1.4635981679799999,
1.46392911398,
1.4219677959800001,
1.47435575798,
1.45424812698,
1.43566494398,
1.50131651898,
1.46612955298
]
},
{
"command": "pacquet@main",
"mean": 1.4599491793800001,
"stddev": 0.022082840305512687,
"median": 1.4537552754799998,
"user": 1.7154707199999997,
"system": 1.86129812,
"min": 1.4315969929799999,
"max": 1.49377766898,
"times": [
1.49377766898,
1.4315969929799999,
1.48266971598,
1.4629552399799999,
1.4387424549799999,
1.49043263198,
1.45592590498,
1.45158464598,
1.44062592298,
1.45118061498
]
}
]
} |
|
| Branch | pr/11976 |
| Testbed | pacquet |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result milliseconds (ms) (Result Δ%) | Upper Boundary milliseconds (ms) (Limit %) |
|---|---|---|---|
| isolated-linker.fresh-install.cold-cache.cold-store | 📈 view plot 🚷 view threshold | 2,209.44 ms(-7.10%)Baseline: 2,378.43 ms | 2,854.12 ms (77.41%) |
| isolated-linker.fresh-install.hot-cache.hot-store | 📈 view plot 🚷 view threshold | 1,465.76 ms(-3.55%)Baseline: 1,519.75 ms | 1,823.70 ms (80.37%) |
| isolated-linker.fresh-restore.cold-cache.cold-store | 📈 view plot 🚷 view threshold | 1,946.16 ms(-4.20%)Baseline: 2,031.57 ms | 2,437.88 ms (79.83%) |
| isolated-linker.fresh-restore.hot-cache.hot-store | 📈 view plot 🚷 view threshold | 646.80 ms(-2.34%)Baseline: 662.32 ms | 794.78 ms (81.38%) |
Replace the buffer-then-decode-then-hash flow with a streaming pass through ssri's IntegrityChecker. `extract_attachments` now returns the base64 payload verbatim (no eager decode), and a new `stream_decode_verify_and_write` consumes that payload via `base64::read::DecoderReader`, feeding 64 KiB chunks to the on-disk file and to the integrity / shasum hashers in lockstep. The publish handler runs the streaming routine inside `spawn_blocking` and finalizes (or cleans up) the tmp slots asynchronously. The decoded tarball never has to live in memory all at once — only the original base64 string and a fixed-size working buffer do. On a 100 MiB publish that saves roughly 100 MiB of peak heap. Failure modes (mismatched integrity, mismatched shasum, mismatched length, base64 decode error) clean up the in-progress tmp file before returning so a rejected publish leaves no on-disk artifact.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
registry/crates/pnpm-registry/src/server.rs (1)
661-668:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftRollback promoted tarballs if the publish still fails.
After Line 662 renames a tarball into its final cache path, any later
finalize_tarball_slotfailure or thewrite_packumentfailure at Line 667 returns immediately without deleting the tarballs that were already promoted. That leaves a failed publish with persisted artifacts and a partially-updated package directory.Suggested direction
- for slot in written_slots { - if let Err(err) = state.inner.cache.finalize_tarball_slot(slot).await { - return error_response(&err); - } - } + let mut finalized_paths = Vec::with_capacity(written_slots.len()); + for slot in written_slots { + let final_path = slot.final_path.clone(); + if let Err(err) = state.inner.cache.finalize_tarball_slot(slot).await { + cleanup_finalized_paths(finalized_paths).await; + return error_response(&err); + } + finalized_paths.push(final_path); + } if let Err(err) = state.inner.cache.write_packument(&name, &merged_bytes).await { + cleanup_finalized_paths(finalized_paths).await; return error_response(&err); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@registry/crates/pnpm-registry/src/server.rs` around lines 661 - 668, The loop promoting tarball slots (written_slots -> state.inner.cache.finalize_tarball_slot) can leave promoted files when a later finalize_tarball_slot or state.inner.cache.write_packument(&name, &merged_bytes) fails; update the publish error path so that on any failure after promotions you iterate over the already-promoted slots and call the cache rollback/removal (e.g., a new state.inner.cache.rollback_tarball_slot or remove_promoted_slot operation) for each slot to revert them, and ensure rollback errors are logged/handled but do not mask the original error returned by error_response; modify the code around finalize_tarball_slot, written_slots and write_packument to perform this cleanup before returning the error_response.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@registry/crates/pnpm-registry/src/publish.rs`:
- Around line 138-159: The two integrity-failure branches in publish.rs should
return the client-visible EINTEGRITY variant rather than a plain
InvalidAttachment message: update the Err(invalid(format!("base64 decode failed:
{err}"))) in the base64-decode failure branch and the
Err(invalid(format!("length mismatch: header says {expected}, decoded
{total}"))) in the declared_length mismatch branch to emit the same
EINTEGRITY-style error used elsewhere (either by prefixing the message with
"EINTEGRITY: " or by calling the existing helper that constructs EINTEGRITY
errors if one exists), so failures in the base64 decode and declared-length
checks match the other integrity errors; keep the same diagnostic text after the
EINTEGRITY marker and leave the RegistryError::Io(err) handling unchanged.
---
Outside diff comments:
In `@registry/crates/pnpm-registry/src/server.rs`:
- Around line 661-668: The loop promoting tarball slots (written_slots ->
state.inner.cache.finalize_tarball_slot) can leave promoted files when a later
finalize_tarball_slot or state.inner.cache.write_packument(&name, &merged_bytes)
fails; update the publish error path so that on any failure after promotions you
iterate over the already-promoted slots and call the cache rollback/removal
(e.g., a new state.inner.cache.rollback_tarball_slot or remove_promoted_slot
operation) for each slot to revert them, and ensure rollback errors are
logged/handled but do not mask the original error returned by error_response;
modify the code around finalize_tarball_slot, written_slots and write_packument
to perform this cleanup before returning the error_response.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 97746665-8f5f-44c8-a0da-7d87f550174a
📒 Files selected for processing (3)
registry/crates/pnpm-registry/src/cache.rsregistry/crates/pnpm-registry/src/publish.rsregistry/crates/pnpm-registry/src/server.rs
📜 Review details
🧰 Additional context used
📓 Path-based instructions (1)
registry/crates/**/*.rs
📄 CodeRabbit inference engine (registry/AGENTS.md)
Follow the pacquet code-style guide (../pacquet/CODE_STYLE_GUIDE.md) for Rust-level conventions including imports, naming, ownership, error handling, and test layout
Files:
registry/crates/pnpm-registry/src/cache.rsregistry/crates/pnpm-registry/src/server.rsregistry/crates/pnpm-registry/src/publish.rs
… EINTEGRITY The remaining two failure modes in stream_decode_verify_and_write — base64 decode error and declared-length mismatch — return plain InvalidAttachment messages rather than the EINTEGRITY-prefixed form the integrity and shasum branches use. Clients that key off EINTEGRITY to decide whether to retry the publish would miss these cases. Both are integrity-class failures (the bytes the client sent can't be reconstructed, or don't add up to what the client claimed), so they get the same prefix as the hash mismatches.
…, and multi-attachment cleanup paths Four branches in the publish flow had no test coverage: * Malformed `dist.integrity` — the SRI parser fails before any bytes get streamed; previously only mismatch-after-streaming was tested. * Invalid base64 payload — the EINTEGRITY-prefixed branch added in response to the CodeRabbit review (commit 41f8582) was inferred-correct but unproven. * Payloads larger than the 64 KiB chunk buffer — every prior happy path used a tiny payload that drained in one read. A regression where the streaming loop returned after the first chunk (or where the hasher only saw the first chunk) would still pass them all. The new test uses 200 KiB. * Multi-attachment cleanup — `cleanup_tmp_slots` removes already- written tmp files when a later attachment fails integrity. Real npm clients only ever publish one attachment, so this code path had no integration coverage. Confirmed by temporarily no-op'ing the cleanup call: the new test fails with the expected leftover `*.tmp.*` file.
`s` in the data-field match arm and `n` for the read byte count were the two single-letter identifiers introduced by this PR. The code style guide prefers descriptive names for production code even where perfectionist's `single_letter_let_binding` allowlist would let `n` slide. Switch to `bytes_read`, and collapse the `Some(Value::String(s)) => s` arm into a `let Some(Value::String(data)) = ... else` so the binding goes away entirely.
Summary
Closes #11975.
pnpm-registry'sPUT /:pkgendpoint accepted tarballs without verifying that their bytes matched the integrity declared in the packument. A buggy or partial-upload client could land a tarball whose hash didn't matchdist.integrity/dist.shasum, and a malicious client could deliberately decouple the advertised hash from the bytes — both classes of silent-corruption bug that only surface when downstreampnpm installlater fails withEINTEGRITY.This change hashes every attachment before any I/O happens and rejects mismatches up-front so the bad bytes never reach the cache. The decode + hash + write runs in one streaming pass so the full decoded payload never lives in memory at once.
What lands
stream_decode_verify_and_writeinpublish.rs— pulls base64 chunks out ofbase64::read::DecoderReader, feeds each 64 KiB chunk to an ssriIntegrityChecker(SHA-512), an optionalIntegrityOpts(Sha1)for legacy shasum, and the on-disk tmp file in lockstep. Verifies declaredlength, integrity, and shasum at the end; any failure removes the tmp file before returning.400 BAD_REQUESTwith anEINTEGRITY:prefix in the body so pnpm / npm clients can recognize the error code.extract_attachmentsno longer base64-decodes eagerly; it returnsPendingAttachment { filename, data, declared_length }and lets the streaming path consumedatadirectly.PackageName::parse_tarball_namereturns(canonical, version)so the publish handler can pull the matchingversions[v].distblock out of the body.canonicalize_tarball_namebecomes a thin wrapper.Cache::reserve_tarball_paths+finalize_tarball_slotexpose a tmp/final path pair so the publish handler can do the actual write insidetokio::task::spawn_blocking(syncstd::fs) and finalize via asynctokio::fs::renameafterward.publish_packagecallsstream_decode_verify_and_writebefore any tarball lands at its final path. On any failure it removes every in-progress tmp file so a rejected publish leaves no on-disk artifact.Memory win
On a 100 MiB tarball publish, the old flow held the full payload in three places simultaneously (HTTP body Bytes + serde_json owned base64 String + decoded
Vec<u8>). The streaming flow drops the decodedVec<u8>entirely — only the base64 string and a 64 KiB working buffer remain. Roughly 100 MiB of peak heap saved per concurrent publish.Test plan
cargo test -p pnpm-registry— 115 tests pass (60 unit + 29 publish/auth + 9 + 17 server).publish.rsforstream_decode_verify_and_write: matching integrity passes; mismatched integrity / missing integrity / missing dist entry / mismatched shasum / mismatched length all fail with the right error, and the tmp file is removed.tests/auth_publish.rs:publish_rejects_integrity_mismatch_and_leaves_no_artifacts— corrupt integrity → 400 EINTEGRITY, no packument or tarball on disk.publish_rejects_shasum_mismatch— valid integrity but bad shasum → 400 EINTEGRITY, nothing on disk.publish_rejects_missing_integrity_field—dist.integrityremoved → 400 with "EINTEGRITY: dist.integrity is required", nothing on disk.cargo clippy -p pnpm-registry --all-targets -- --deny warnings— clean.cargo fmt -p pnpm-registry --check— clean.Follow-up left for separate PRs
serde_json::Value— out of scope here; the current change already removes the largest in-memory copy.Written by an agent (Claude Code, claude-opus-4-7).
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores