Skip to content

fix: skip Content-Length check when response is content-encoded#11508

Merged
zkochan merged 3 commits into
mainfrom
fix/11506
May 7, 2026
Merged

fix: skip Content-Length check when response is content-encoded#11508
zkochan merged 3 commits into
mainfrom
fix/11506

Conversation

@zkochan

@zkochan zkochan commented May 6, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #11506ERR_PNPM_BAD_TARBALL_SIZE when a registry serves tarballs with Content-Encoding: gzip.

Why v10 worked but v11 doesn't

I verified this against the actual v10 source. v10 worked because of one line in network/fetch/src/fetchFromRegistry.ts on the v10 branch:

compress: opts?.compress ?? false,

node-fetch honored that flag by suppressing Accept-Encoding and not auto-decompressing the body. v11's switch to undici (#10537) silently dropped this — undici's WHATWG fetch always sends Accept-Encoding: gzip, deflate, br and transparently decodes the response body. Importantly, it leaves Content-Length pointed at the encoded payload (confirmed empirically — and matches the spec). When the strict size check landed in #11151, this began rejecting any tarball served with end-to-end content encoding.

Per MDN: "When the Content-Encoding header is present, other metadata (e.g., Content-Length) refer to the encoded form of the data, not the original resource."

Fix

Two layers:

  1. Send Accept-Encoding: identity for tarball requests. Tarballs are already gzipped — asking for an additional encoding wastes CPU and triggers the bug. This restores v10's effective behavior.
  2. Skip the strict Content-Length check when the response declares a non-identity Content-Encoding. Defense in depth against misbehaving servers that stamp Content-Encoding regardless of Accept-Encoding (the OP's case looked like Artifactory). Integrity verification via SHA still catches genuinely corrupt payloads.

Test plan

  • New unit test asserts Accept-Encoding: identity is sent on the tarball request
  • New unit test asserts a server response with mismatched Content-Length and Content-Encoding: gzip is accepted
  • Existing fetch.ts suite still passes (24/24)
  • Lint clean

Written by an agent (Claude Code, claude-opus-4-7).

Summary by CodeRabbit

  • Bug Fixes
    • Fixed ERR_PNPM_BAD_TARBALL_SIZE by requesting tarballs with Accept-Encoding: identity and by relaxing strict Content-Length validation for responses that are content-encoded.
  • Tests
    • Added regression tests for the tarball request header and for handling responses with various Content-Encoding forms and size behaviors.
  • Documentation
    • Bumped patch versions and added a changelog entry documenting the fix.

Per the HTTP spec, Content-Length refers to the encoded payload when
Content-Encoding is set, but undici fetch yields decoded bytes — so the
strict size check rejected legitimate downloads with
ERR_PNPM_BAD_TARBALL_SIZE. Closes #11506.
Copilot AI review requested due to automatic review settings May 6, 2026 23:06
@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4c4b5468-a774-4271-ba20-e801d54651f9

📥 Commits

Reviewing files that changed from the base of the PR and between e1c09ee and ad65f41.

📒 Files selected for processing (2)
  • fetching/tarball-fetcher/src/remoteTarballFetcher.ts
  • fetching/tarball-fetcher/test/fetch.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • fetching/tarball-fetcher/src/remoteTarballFetcher.ts

📝 Walkthrough

Walkthrough

The tarball fetcher now sends Accept-Encoding: identity and treats Content-Length as trusted only when Content-Encoding is absent or identity, avoiding size-validation failures for content-encoded responses.

Changes

Tarball Content-Encoding Handling

Layer / File(s) Summary
Request Header
fetching/tarball-fetcher/src/remoteTarballFetcher.ts
Adds Accept-Encoding: identity to tarball GET requests to prevent server-side content-encoding.
Response Detection
fetching/tarball-fetcher/src/remoteTarballFetcher.ts
Adds isContentEncoded helper and logic to detect Content-Encoding; only parse Content-Length when encoding is absent or identity.
Size Handling / Buffering
fetching/tarball-fetcher/src/remoteTarballFetcher.ts
Treats size as unknown (null) when response is content-encoded so preallocation and strict size checks are skipped.
Tests / Regression Coverage
fetching/tarball-fetcher/test/fetch.ts
Adds tests asserting Accept-Encoding: identity is sent; parameterized tests ensure downloads don't fail when Content-Encoding is present despite mismatched Content-Length; test verifies strict check still applies when encoding is effectively identity.
Versioning / Changelog
.changeset/tarball-content-encoding.md
Patch bumps for @pnpm/fetching.tarball-fetcher and pnpm and changelog entry describing the ERR_PNPM_BAD_TARBALL_SIZE fix.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I tap my paw on bytes and streams,
Asked for identity, avoided schemes.
When headers lie about their size,
I hop around those misled ties.
Tarballs land safe — beneath moonbeams. ✨📦

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: skipping Content-Length validation when responses are content-encoded, which directly addresses the root cause of issue #11506.
Linked Issues check ✅ Passed All coding requirements from issue #11506 are met: Accept-Encoding: identity is sent to prevent encoding, Content-Length check is skipped for content-encoded responses, and integrity verification (SHA) is preserved.
Out of Scope Changes check ✅ Passed All changes are directly related to the linked issue #11506: tarball fetcher logic, test coverage for Content-Encoding handling, and changelog entry documenting the fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/11506

Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes ERR_PNPM_BAD_TARBALL_SIZE for registries that serve tarballs with an end-to-end Content-Encoding (e.g. gzip), by skipping the strict Content-Length vs downloaded-bytes check when the response is content-encoded (since undici fetch yields decoded bytes).

Changes:

  • Skip Content-Length-based size preallocation/validation when Content-Encoding is present and non-identity.
  • Add a regression unit test covering a Content-Encoding response with a Content-Length mismatch.
  • Add a changeset for patch releases of @pnpm/fetching.tarball-fetcher and pnpm.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
fetching/tarball-fetcher/src/remoteTarballFetcher.ts Detect non-identity Content-Encoding and disable Content-Length size validation/preallocation for decoded responses.
fetching/tarball-fetcher/test/fetch.ts Adds coverage ensuring content-encoded responses don’t fail on Content-Length mismatch.
.changeset/tarball-content-encoding.md Declares patch-level changeset documenting the fix and linking the issue.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread fetching/tarball-fetcher/src/remoteTarballFetcher.ts Outdated
zkochan added 2 commits May 7, 2026 01:16
Verified the user's report against v10 source: v10 worked because it
called node-fetch with `compress: false` (network/fetch/src/fetchFromRegistry.ts
on the v10 branch), which suppressed Accept-Encoding and prevented
auto-decompression. v11's switch to undici fetch lost that opt-out — undici
sends Accept-Encoding: gzip, deflate, br by default and transparently
decodes the body, while keeping Content-Length pointed at the encoded
payload (confirmed empirically). The strict size check then rejects the
download.

Restore v10's effective behavior by sending Accept-Encoding: identity for
tarball requests, and as defense in depth against misbehaving servers
that stamp Content-Encoding regardless, skip the strict size check when
the response declares a non-identity Content-Encoding.
Per RFC 9110 §8.4 the header is a comma-separated, case-insensitive list
that may include whitespace and mixed codings (e.g. `gzip, identity`).
The previous string-equality check misclassified those — the response is
now treated as encoded iff any coding is non-`identity`.
Copilot AI review requested due to automatic review settings May 6, 2026 23:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@zkochan zkochan merged commit 0cf64b7 into main May 7, 2026
19 of 20 checks passed
@zkochan zkochan deleted the fix/11506 branch May 7, 2026 06:14
zkochan added a commit that referenced this pull request May 7, 2026
* fix: skip Content-Length check when response is content-encoded

Per the HTTP spec, Content-Length refers to the encoded payload when
Content-Encoding is set, but undici fetch yields decoded bytes — so the
strict size check rejected legitimate downloads with
ERR_PNPM_BAD_TARBALL_SIZE. Closes #11506.

* fix(tarball-fetcher): match v10's no-compression behavior

Verified the user's report against v10 source: v10 worked because it
called node-fetch with `compress: false` (network/fetch/src/fetchFromRegistry.ts
on the v10 branch), which suppressed Accept-Encoding and prevented
auto-decompression. v11's switch to undici fetch lost that opt-out — undici
sends Accept-Encoding: gzip, deflate, br by default and transparently
decodes the body, while keeping Content-Length pointed at the encoded
payload (confirmed empirically). The strict size check then rejects the
download.

Restore v10's effective behavior by sending Accept-Encoding: identity for
tarball requests, and as defense in depth against misbehaving servers
that stamp Content-Encoding regardless, skip the strict size check when
the response declares a non-identity Content-Encoding.

* fix(tarball-fetcher): parse Content-Encoding as a coding list

Per RFC 9110 §8.4 the header is a comma-separated, case-insensitive list
that may include whitespace and mixed codings (e.g. `gzip, identity`).
The previous string-equality check misclassified those — the response is
now treated as encoded iff any coding is non-`identity`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

pnpm 11: Content-Encoded tarballs are rejected with ERR_PNPM_BAD_TARBALL_SIZE

2 participants