Skip to content

Refactor unpacking logic for archive files#13870

Merged
notatallshaw merged 8 commits into
pypa:mainfrom
f3flight:patch-1
Apr 19, 2026
Merged

Refactor unpacking logic for archive files#13870
notatallshaw merged 8 commits into
pypa:mainfrom
f3flight:patch-1

Conversation

@f3flight

@f3flight f3flight commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

Make unpacking logic resilient against ambiguous signature check, and order logic by check confidence instead of by file format.

Former unpacking logic was like that:

  1. run all possible zip checks - content type, filename, signature. If match - unzip.
  2. same for tar
  3. fail if we reached here

It was discovered that zipfile.is_zipfile may come out false positive,
so this logic can fail on a valid tar.gz falsely identified as zip.
Additionally, it may be unnecessary to do the magic signature check, if
content type is explicitly provided, or format is obvious from filename.

New logic works the following way:

  1. run content_type check. If match found, either unzip or untar.
  2. run filename check. Same as above.
  3. run magic signature check (is_zipfile, is_tarfile). Only unzip or untar if only one of these matched.
  4. fail if we reached here. Additionally, log error if the file is ambiguous (is_zipfile AND is_tarfile are both True).

This fixes the bug and makes logic more sensible in terms of check order.

This change aims to address #13867

Make unpacking logic resilient against ambiguous signature check, and order logic by check confidence instead of by file format.
@f3flight f3flight marked this pull request as draft March 26, 2026 18:31
@f3flight f3flight marked this pull request as ready for review March 26, 2026 21:47
@notatallshaw

Copy link
Copy Markdown
Member

Thanks for the PR, please be aware review capacity is quite low at the moment. That said, I have two major concerns with this PR:

  1. It needs to be more clear what it's doing, i.e. what is expected to fail now but work with this PR, what is expected to work now but fail with this PR. Whether that's PR description, comments, or NEWS entry.
  2. It lacks any tests.

Sorry, I know this is a much bigger ask of a PR, but we have to be very cautious as pip maintainers. I will do my best to make some review time before pip 26.1 and try and get it over the line for that.

@notatallshaw notatallshaw added this to the 26.1 milestone Mar 27, 2026
@f3flight

f3flight commented Mar 27, 2026

Copy link
Copy Markdown
Contributor Author

@notatallshaw no problem - added unit tests and a detailed description of change in the news file. This code was already decently covered by existing tests, as I learned after I initially created the PR and bunch of tests failed :) creating a commit in GitHub web UI wasn't the best idea haha. Let's see if CI is green now, and let me know if the PR looks good now.

@notatallshaw

Copy link
Copy Markdown
Member

Hope you don't mind, but I've done a little clean up and added a lot more tests to get this ready for 26.1. This looks good to me but I want to give a heads up to other pip maintainers that this more formally sets an expectation on how pip detects zip vs. tar files:

  1. Content Type
  2. File Extension
  3. Python standard library magic detection functions IF they unambiguously agree

This should only break someone's workflow if they were relying on magic detection being ambiguous about the file type.

@sbidoul sbidoul left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun bug indeed :)

@notatallshaw notatallshaw merged commit c4cdbf9 into pypa:main Apr 19, 2026
33 checks passed
@westurner

Copy link
Copy Markdown

CVE for this; per the helpful ANN to the Python security-announce mailing list today: https://www.cve.org/CVERecord?id=CVE-2026-3219

Also found this:

"Preventing ZIP parser confusion attacks on Python package installers" (2025-08) The Python Package Index Blog https://blog.pypi.org/posts/2025-08-07-wheel-archive-confusion-attacks/ :

What is PyPI doing to prevent ZIP confusion attacks?

Is this sufficient; what else should PyPI to prevent malformed uploads that worked before these changes?

@notatallshaw

Copy link
Copy Markdown
Member

@f3flight fyi this was indepently reported by Google as a security issue and the advisory has been published: https://www.cve.org/CVERecord?id=CVE-2026-3219

Thanks again for reporting and fixing, I don't think I would have had capacity to fix this myself before 26.1

@notatallshaw

Copy link
Copy Markdown
Member

Is this sufficient; what else should PyPI to prevent malformed uploads that worked before these changes?

Warehouse (PyPIs backend) already rejects polyglot files: pypi/warehouse#19638

stydxm pushed a commit to AOSC-Dev/aosc-os-abbs that referenced this pull request Apr 21, 2026
JoaoPucci added a commit to JoaoPucci/ephemera that referenced this pull request Apr 24, 2026
Prior commit assumed pip 26.1 shipped (based on the CVE advisory
page), but PyPI's latest is still 26.0.1; the CI runner failed to
resolve pip>=26.1. The fix has merged in pypa/pip#13870 but no
release has been cut yet.

Replace with a surgical --ignore-vuln CVE-2026-3219 on the audit
step, with a comment anchoring when to remove it (once pip 26.1+
is on PyPI). The runtime exploitation path against our deploys is
independently blocked by --require-hashes at install time, so this
is a legitimate temporary exception, not a security regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thebscolaro added a commit to thebscolaro/ownlock that referenced this pull request Apr 24, 2026
The advisory is a pip-in-pip polyglot archive installation issue (local, user
interaction required, no confidentiality impact). Upstream fix is merged in
pypa/pip#13870 but a patched pip has not shipped to PyPI yet. Removing the
ignore is tracked by that PR landing in a release.

Made-with: Cursor
nlebovits added a commit to geoparquet/geoparquet-io that referenced this pull request Apr 24, 2026
Low severity (CVSS 4.6) tar/ZIP handling issue in pip.
Fix available in pip 26.1 which is not yet released.
See: pypa/pip#13870

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
nlebovits added a commit to geoparquet/geoparquet-io that referenced this pull request Apr 24, 2026
Low severity (CVSS 4.6) tar/ZIP handling issue in pip.
Ignore auto-expires when pip 26.1+ detected.
See: pypa/pip#13870

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
aallan added a commit to aallan/vera that referenced this pull request Apr 24, 2026
Two unrelated CI failures on redesign-index, fixed together:

1. tests/test_html.py::test_vera_block_count expected 5 Vera blocks
   in docs/index.html but the redesign consolidated safe_classify
   into classify_sentiment, leaving 4. Test expectation updated
   5 -> 4, comment rewritten, and the CHANGELOG 'five-sample' claim
   corrected to 'four-sample' to match the actual layout.

2. pip-audit flags CVE-2026-3219 in pip 26.0.1 (GHSA-58qw-9mgm-455v)
   an archive-handling hardening bug. Upstream fix merged in
   pypa/pip#13870, milestone pip 26.1 -- not yet released, so
   there is nothing to upgrade to. Add --ignore-vuln CVE-2026-3219
   to the dependency-audit step with a dated comment linking the
   CVE, GHSA, upstream PR, and internal tracking issue #527.
   Threat model (untrusted ambiguous archives) does not apply to
   our CI. The ignore is a bridge; #527 carries the 'remove when
   pip 26.1 ships' action item.

Also:
- Add a 'CI ignores' table to KNOWN_ISSUES.md covering both
  CVE-2026-4539 (pygments, existing) and CVE-2026-3219 (pip, new)
  with explicit removal triggers.
- Add a ### CI bullet to CHANGELOG [Unreleased] recording the
  ignore and linking #527.

Co-Authored-By: Claude <noreply@anthropic.invalid>
biostochastics added a commit to biostochastics/APMODE that referenced this pull request Apr 25, 2026
Pip <= 26.0.1 misinterprets concatenated tar/ZIP archives. The
upstream fix (pypa/pip#13870) is merged for milestone 26.1 but not
yet on PyPI, so a constraint-dependency would block uv resolution
today. Exposure on this project is dev-only — pip is pulled in
transitively via ``pip-audit -> pip-api`` and ``pip-audit`` lives
in ``[project.optional-dependencies].dev``; APMODE never invokes
pip at runtime. The note in ``[tool.uv]`` records the intent to
pin ``pip>=26.1`` to ``override-dependencies`` once 26.1 ships.

The Dependabot alert (#3) is dismissed as ``tolerable_risk`` with
the same rationale; no open advisories remain.
joenarvaez added a commit to engineering-with-ai/python-ml-pipeline that referenced this pull request Apr 25, 2026
GHSA-58qw-9mgm-455v: pip 0–26.0.1 mishandles archives that are both
valid tar AND ZIP, resolving as ZIP regardless of filename. Severity
MODERATE (CWE-434). pypa/pip#13870 has the fix but is unreleased.

26.0.1 is the latest version; no clean version exists yet. Adding to
the ignore list per the existing pattern. Remove when pip ships a fix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
joenarvaez added a commit to engineering-with-ai/python-base that referenced this pull request Apr 25, 2026
Drop mypy in favor of Astral's ty (~10-100x faster). Recover annotation
enforcement via Ruff's flake8-annotations rules. Add types-pyyaml as an
explicit dev dep (mypy --install-types previously auto-installed it).

Also ignore CVE-2026-3219 (pip ZIP/tar handling) — no fix released yet,
pypa/pip#13870 is the patch but unreleased.

Accepted regressions: disallow_any_generics, disallow_subclassing_any —
codebase already mandates Pydantic models, so bare generics are rare.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
josh-iterabloom pushed a commit to iterabloom/hypergumbo that referenced this pull request Apr 25, 2026
Failing gate on PR #3347: pip-audit flagged pip 26.0.1 for CVE-2026-3219
(concatenated-ZIP+tar archive confusion). CVSS 4.6 MEDIUM, AV:L with
active user interaction required, integrity-only low impact. Fix lands
in pip 26.1 (pypa/pip#13870) but that release was not yet published
when the CVE was disclosed 2026-04-20; our CI's existing
pip install --upgrade pip step will pick it up automatically once it
ships.

Zero attack surface on our single-tenant self-hosted runners — no
user-supplied archives ever reach pip — so mirrors the posture of the
existing CVE-2025-71176 ignore.

Applied identically to ci.yml and release.yml to keep the two gates
aligned.

Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
rjmurillo added a commit to rjmurillo/ai-agents that referenced this pull request Apr 26, 2026
…1779)

pip 26.0/26.0.1 mishandle concatenated tar/ZIP archives (CWE-434, CVSS
4.6 MEDIUM, local + user interaction). Fix is merged on pip main
(pypa/pip#13870), pending pip 26.1 release. No released pip version is
patched, so we cannot remediate by upgrade.

CI installs trusted deps from PyPI via uv lockfile, so the local-only
attack surface does not apply here. Extend the existing CVE-2026-4539
ignore pattern to cover CVE-2026-3219 with explanatory comments and a
removal trigger.

Tracking removal in #1778.

Co-authored-by: rjmurillo[bot] <250269933+rjmurillo-bot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rjmurillo pushed a commit to rjmurillo/ai-agents that referenced this pull request Apr 26, 2026
Resolves conflict in .github/workflows/pytest.yml. Both branches added
--ignore-vuln CVE-2026-3219 to pip-audit (this branch in aee5278, main
in #1779). Took main's version since #1779 has the canonical, more
detailed rationale comment block (links to pypa/pip#13870 fix PR,
explains the local-only attack surface and why it does not apply to CI).

Also addresses the latest Copilot review comments on this PR:

- tests/eval/test_eval_prompt_change.py: scope the sys.path mutation
  to module load only and restore it in a finally block. eval-prompt-
  change.py imports sibling modules (_anthropic_api, _eval_common) by
  bare name, so EVAL_DIR must be on sys.path during exec_module();
  removing the entry afterwards prevents the addition from leaking
  into other tests' import resolution. Tests still pass: 85 in this
  file, 107 with workspace-limits tests included.

- .agents/sessions/2026-04-25-session-1714.json: rewrite workLog steps
  1 and 7 evidence so they no longer reference .agents/audit/eval-1741/
  or .agents/audit/eval-1755/ paths that do not exist in the repo (the
  artifacts were transient and dropped in 31ed283). Step 1 now points
  to the reproducer command; step 7 points to the per-agent table in
  the PR Honesty notes section.

Pulls in main's PR #1779 (pip CVE ignore w/ rationale) and PR #1762
(enterprise-patterns rule) as part of the merge.

Refs #1755

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

https://claude.ai/code/session_01XALNMgBrrH7Ya8dM1CjYzo
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants