Conversation
Per the HTTP spec, Content-Length refers to the encoded payload when Content-Encoding is set, but undici fetch yields decoded bytes — so the strict size check rejected legitimate downloads with ERR_PNPM_BAD_TARBALL_SIZE. Closes #11506.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThe tarball fetcher now sends ChangesTarball Content-Encoding Handling
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR fixes ERR_PNPM_BAD_TARBALL_SIZE for registries that serve tarballs with an end-to-end Content-Encoding (e.g. gzip), by skipping the strict Content-Length vs downloaded-bytes check when the response is content-encoded (since undici fetch yields decoded bytes).
Changes:
- Skip
Content-Length-based size preallocation/validation whenContent-Encodingis present and non-identity. - Add a regression unit test covering a
Content-Encodingresponse with aContent-Lengthmismatch. - Add a changeset for patch releases of
@pnpm/fetching.tarball-fetcherandpnpm.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| fetching/tarball-fetcher/src/remoteTarballFetcher.ts | Detect non-identity Content-Encoding and disable Content-Length size validation/preallocation for decoded responses. |
| fetching/tarball-fetcher/test/fetch.ts | Adds coverage ensuring content-encoded responses don’t fail on Content-Length mismatch. |
| .changeset/tarball-content-encoding.md | Declares patch-level changeset documenting the fix and linking the issue. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Verified the user's report against v10 source: v10 worked because it called node-fetch with `compress: false` (network/fetch/src/fetchFromRegistry.ts on the v10 branch), which suppressed Accept-Encoding and prevented auto-decompression. v11's switch to undici fetch lost that opt-out — undici sends Accept-Encoding: gzip, deflate, br by default and transparently decodes the body, while keeping Content-Length pointed at the encoded payload (confirmed empirically). The strict size check then rejects the download. Restore v10's effective behavior by sending Accept-Encoding: identity for tarball requests, and as defense in depth against misbehaving servers that stamp Content-Encoding regardless, skip the strict size check when the response declares a non-identity Content-Encoding.
Per RFC 9110 §8.4 the header is a comma-separated, case-insensitive list that may include whitespace and mixed codings (e.g. `gzip, identity`). The previous string-equality check misclassified those — the response is now treated as encoded iff any coding is non-`identity`.
* fix: skip Content-Length check when response is content-encoded Per the HTTP spec, Content-Length refers to the encoded payload when Content-Encoding is set, but undici fetch yields decoded bytes — so the strict size check rejected legitimate downloads with ERR_PNPM_BAD_TARBALL_SIZE. Closes #11506. * fix(tarball-fetcher): match v10's no-compression behavior Verified the user's report against v10 source: v10 worked because it called node-fetch with `compress: false` (network/fetch/src/fetchFromRegistry.ts on the v10 branch), which suppressed Accept-Encoding and prevented auto-decompression. v11's switch to undici fetch lost that opt-out — undici sends Accept-Encoding: gzip, deflate, br by default and transparently decodes the body, while keeping Content-Length pointed at the encoded payload (confirmed empirically). The strict size check then rejects the download. Restore v10's effective behavior by sending Accept-Encoding: identity for tarball requests, and as defense in depth against misbehaving servers that stamp Content-Encoding regardless, skip the strict size check when the response declares a non-identity Content-Encoding. * fix(tarball-fetcher): parse Content-Encoding as a coding list Per RFC 9110 §8.4 the header is a comma-separated, case-insensitive list that may include whitespace and mixed codings (e.g. `gzip, identity`). The previous string-equality check misclassified those — the response is now treated as encoded iff any coding is non-`identity`.
Summary
Fixes #11506 —
ERR_PNPM_BAD_TARBALL_SIZEwhen a registry serves tarballs withContent-Encoding: gzip.Why v10 worked but v11 doesn't
I verified this against the actual v10 source. v10 worked because of one line in
network/fetch/src/fetchFromRegistry.tson thev10branch:node-fetchhonored that flag by suppressingAccept-Encodingand not auto-decompressing the body. v11's switch to undici (#10537) silently dropped this — undici's WHATWGfetchalways sendsAccept-Encoding: gzip, deflate, brand transparently decodes the response body. Importantly, it leavesContent-Lengthpointed at the encoded payload (confirmed empirically — and matches the spec). When the strict size check landed in #11151, this began rejecting any tarball served with end-to-end content encoding.Per MDN: "When the Content-Encoding header is present, other metadata (e.g., Content-Length) refer to the encoded form of the data, not the original resource."
Fix
Two layers:
Accept-Encoding: identityfor tarball requests. Tarballs are already gzipped — asking for an additional encoding wastes CPU and triggers the bug. This restores v10's effective behavior.Content-Lengthcheck when the response declares a non-identityContent-Encoding. Defense in depth against misbehaving servers that stampContent-Encodingregardless ofAccept-Encoding(the OP's case looked like Artifactory). Integrity verification via SHA still catches genuinely corrupt payloads.Test plan
Accept-Encoding: identityis sent on the tarball requestContent-LengthandContent-Encoding: gzipis acceptedfetch.tssuite still passes (24/24)Written by an agent (Claude Code, claude-opus-4-7).
Summary by CodeRabbit