Refactor unpacking logic for archive files#13870
Conversation
Make unpacking logic resilient against ambiguous signature check, and order logic by check confidence instead of by file format.
|
Thanks for the PR, please be aware review capacity is quite low at the moment. That said, I have two major concerns with this PR:
Sorry, I know this is a much bigger ask of a PR, but we have to be very cautious as pip maintainers. I will do my best to make some review time before pip 26.1 and try and get it over the line for that. |
|
@notatallshaw no problem - added unit tests and a detailed description of change in the news file. This code was already decently covered by existing tests, as I learned after I initially created the PR and bunch of tests failed :) creating a commit in GitHub web UI wasn't the best idea haha. Let's see if CI is green now, and let me know if the PR looks good now. |
|
Hope you don't mind, but I've done a little clean up and added a lot more tests to get this ready for 26.1. This looks good to me but I want to give a heads up to other pip maintainers that this more formally sets an expectation on how pip detects zip vs. tar files:
This should only break someone's workflow if they were relying on magic detection being ambiguous about the file type. |
|
CVE for this; per the helpful ANN to the Python security-announce mailing list today: https://www.cve.org/CVERecord?id=CVE-2026-3219 Also found this: "Preventing ZIP parser confusion attacks on Python package installers" (2025-08) The Python Package Index Blog https://blog.pypi.org/posts/2025-08-07-wheel-archive-confusion-attacks/ :
Is this sufficient; what else should PyPI to prevent malformed uploads that worked before these changes? |
|
@f3flight fyi this was indepently reported by Google as a security issue and the advisory has been published: https://www.cve.org/CVERecord?id=CVE-2026-3219 Thanks again for reporting and fixing, I don't think I would have had capacity to fix this myself before 26.1 |
Warehouse (PyPIs backend) already rejects polyglot files: pypi/warehouse#19638 |
Backport a patch for https://www.cve.org/CVERecord?id=CVE-2026-3219. Link: https://www.cve.org/CVERecord?id=CVE-2026-3219 Link: pypa/pip#13870
Prior commit assumed pip 26.1 shipped (based on the CVE advisory page), but PyPI's latest is still 26.0.1; the CI runner failed to resolve pip>=26.1. The fix has merged in pypa/pip#13870 but no release has been cut yet. Replace with a surgical --ignore-vuln CVE-2026-3219 on the audit step, with a comment anchoring when to remove it (once pip 26.1+ is on PyPI). The runtime exploitation path against our deploys is independently blocked by --require-hashes at install time, so this is a legitimate temporary exception, not a security regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The advisory is a pip-in-pip polyglot archive installation issue (local, user interaction required, no confidentiality impact). Upstream fix is merged in pypa/pip#13870 but a patched pip has not shipped to PyPI yet. Removing the ignore is tracked by that PR landing in a release. Made-with: Cursor
Low severity (CVSS 4.6) tar/ZIP handling issue in pip. Fix available in pip 26.1 which is not yet released. See: pypa/pip#13870 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Low severity (CVSS 4.6) tar/ZIP handling issue in pip. Ignore auto-expires when pip 26.1+ detected. See: pypa/pip#13870 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Two unrelated CI failures on redesign-index, fixed together: 1. tests/test_html.py::test_vera_block_count expected 5 Vera blocks in docs/index.html but the redesign consolidated safe_classify into classify_sentiment, leaving 4. Test expectation updated 5 -> 4, comment rewritten, and the CHANGELOG 'five-sample' claim corrected to 'four-sample' to match the actual layout. 2. pip-audit flags CVE-2026-3219 in pip 26.0.1 (GHSA-58qw-9mgm-455v) an archive-handling hardening bug. Upstream fix merged in pypa/pip#13870, milestone pip 26.1 -- not yet released, so there is nothing to upgrade to. Add --ignore-vuln CVE-2026-3219 to the dependency-audit step with a dated comment linking the CVE, GHSA, upstream PR, and internal tracking issue #527. Threat model (untrusted ambiguous archives) does not apply to our CI. The ignore is a bridge; #527 carries the 'remove when pip 26.1 ships' action item. Also: - Add a 'CI ignores' table to KNOWN_ISSUES.md covering both CVE-2026-4539 (pygments, existing) and CVE-2026-3219 (pip, new) with explicit removal triggers. - Add a ### CI bullet to CHANGELOG [Unreleased] recording the ignore and linking #527. Co-Authored-By: Claude <noreply@anthropic.invalid>
Pip <= 26.0.1 misinterprets concatenated tar/ZIP archives. The upstream fix (pypa/pip#13870) is merged for milestone 26.1 but not yet on PyPI, so a constraint-dependency would block uv resolution today. Exposure on this project is dev-only — pip is pulled in transitively via ``pip-audit -> pip-api`` and ``pip-audit`` lives in ``[project.optional-dependencies].dev``; APMODE never invokes pip at runtime. The note in ``[tool.uv]`` records the intent to pin ``pip>=26.1`` to ``override-dependencies`` once 26.1 ships. The Dependabot alert (#3) is dismissed as ``tolerable_risk`` with the same rationale; no open advisories remain.
GHSA-58qw-9mgm-455v: pip 0–26.0.1 mishandles archives that are both valid tar AND ZIP, resolving as ZIP regardless of filename. Severity MODERATE (CWE-434). pypa/pip#13870 has the fix but is unreleased. 26.0.1 is the latest version; no clean version exists yet. Adding to the ignore list per the existing pattern. Remove when pip ships a fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop mypy in favor of Astral's ty (~10-100x faster). Recover annotation enforcement via Ruff's flake8-annotations rules. Add types-pyyaml as an explicit dev dep (mypy --install-types previously auto-installed it). Also ignore CVE-2026-3219 (pip ZIP/tar handling) — no fix released yet, pypa/pip#13870 is the patch but unreleased. Accepted regressions: disallow_any_generics, disallow_subclassing_any — codebase already mandates Pydantic models, so bare generics are rare. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Failing gate on PR #3347: pip-audit flagged pip 26.0.1 for CVE-2026-3219 (concatenated-ZIP+tar archive confusion). CVSS 4.6 MEDIUM, AV:L with active user interaction required, integrity-only low impact. Fix lands in pip 26.1 (pypa/pip#13870) but that release was not yet published when the CVE was disclosed 2026-04-20; our CI's existing pip install --upgrade pip step will pick it up automatically once it ships. Zero attack surface on our single-tenant self-hosted runners — no user-supplied archives ever reach pip — so mirrors the posture of the existing CVE-2025-71176 ignore. Applied identically to ci.yml and release.yml to keep the two gates aligned. Signed-off-by: jgstern-agent <josh-agent@iterabloom.com>
…1779) pip 26.0/26.0.1 mishandle concatenated tar/ZIP archives (CWE-434, CVSS 4.6 MEDIUM, local + user interaction). Fix is merged on pip main (pypa/pip#13870), pending pip 26.1 release. No released pip version is patched, so we cannot remediate by upgrade. CI installs trusted deps from PyPI via uv lockfile, so the local-only attack surface does not apply here. Extend the existing CVE-2026-4539 ignore pattern to cover CVE-2026-3219 with explanatory comments and a removal trigger. Tracking removal in #1778. Co-authored-by: rjmurillo[bot] <250269933+rjmurillo-bot@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflict in .github/workflows/pytest.yml. Both branches added --ignore-vuln CVE-2026-3219 to pip-audit (this branch in aee5278, main in #1779). Took main's version since #1779 has the canonical, more detailed rationale comment block (links to pypa/pip#13870 fix PR, explains the local-only attack surface and why it does not apply to CI). Also addresses the latest Copilot review comments on this PR: - tests/eval/test_eval_prompt_change.py: scope the sys.path mutation to module load only and restore it in a finally block. eval-prompt- change.py imports sibling modules (_anthropic_api, _eval_common) by bare name, so EVAL_DIR must be on sys.path during exec_module(); removing the entry afterwards prevents the addition from leaking into other tests' import resolution. Tests still pass: 85 in this file, 107 with workspace-limits tests included. - .agents/sessions/2026-04-25-session-1714.json: rewrite workLog steps 1 and 7 evidence so they no longer reference .agents/audit/eval-1741/ or .agents/audit/eval-1755/ paths that do not exist in the repo (the artifacts were transient and dropped in 31ed283). Step 1 now points to the reproducer command; step 7 points to the per-agent table in the PR Honesty notes section. Pulls in main's PR #1779 (pip CVE ignore w/ rationale) and PR #1762 (enterprise-patterns rule) as part of the merge. Refs #1755 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> https://claude.ai/code/session_01XALNMgBrrH7Ya8dM1CjYzo
Make unpacking logic resilient against ambiguous signature check, and order logic by check confidence instead of by file format.
Former unpacking logic was like that:
It was discovered that zipfile.is_zipfile may come out false positive,
so this logic can fail on a valid tar.gz falsely identified as zip.
Additionally, it may be unnecessary to do the magic signature check, if
content type is explicitly provided, or format is obvious from filename.
New logic works the following way:
This fixes the bug and makes logic more sensible in terms of check order.
This change aims to address #13867