Skip to content

Bizzare bug with wheel build from tar.gz - tar.gz identified as zip file #13867

@f3flight

Description

@f3flight

Description

I stumbled upon a pretty unique bug, caused by this code:

if (
content_type == "application/zip"
or filename.lower().endswith(ZIP_EXTENSIONS)
or zipfile.is_zipfile(filename)
):
unzip_file(filename, location, flatten=not filename.endswith(".whl"))

In my org I have ran into a tar.gz file which, when passed to zipfile.is_zipfile(...) returns True.
This causes false positive check and subsequent error BadZipFile when zipfile library tries to unpack it (because it's a tar.gz!)

The file contains proprietary code so I cannot share it, but I can share the output:

$ tar -tvf my-tar-file.tar.gz | head -n3
drwxr-xr-x runner/runner     0 2026-03-25 04:05 some-package-1.2.3/
-rw-r--r-- runner/runner   145 2026-03-25 04:05 some-package-1.2.3/MANIFEST.in
-rw-r--r-- runner/runner   257 2026-03-25 04:05 some-package-1.2.3/PKG-INFO
$ python -c 'import zipfile; print(zipfile.is_zipfile("my-tar-file.tar.gz"))'
True
$

Some AI prompting revealed this is due to some magic byte string "PK\x05\x06" (ZIP End-of-Central-Directory
signature) which can randomly appear in any file, and is enough for zipfile.is_zipfile to return True

Since there's no way to bypass this zipfile.is_zipfile check when using "pip wheel <tar.gz>" command, I will have to modify our tooling to either unpack first or use "uv" instead.

I think the fix would be to do a filename check first, and not try to call zipfile.is_zipfile on a file with "tar.gz" extension.

Fun bug)

Expected behavior

No response

pip version

latest code has the same issue I believe, but we're using 21 or 24

Python version

3.12

OS

linux

How to Reproduce

I cannot give you the file so it is tricky, you have to be really lucky to have a tar.gz which contains a 4-byte sequence 50 4b 05 06.

Output

...
       File "/venv/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 596, in _prepare_linked_requirement
         local_file = unpack_url(
                      ^^^^^^^^^^^
       File "/venv/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 178, in unpack_url
         unpack_file(file.path, location, file.content_type)
       File "/venv/lib/python3.12/site-packages/pip/_internal/utils/unpacking.py", line 240, in unpack_file
         unzip_file(filename, location, flatten=not filename.endswith(".whl"))
       File "/venv/lib/python3.12/site-packages/pip/_internal/utils/unpacking.py", line 115, in unzip_file
         zip = zipfile.ZipFile(zipfp, allowZip64=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       File "/python/3.12.6/lib/python3.12/zipfile/__init__.py", line 1349, in __init__
         self._RealGetContents()
       File "/python/3.12.6/lib/python3.12/zipfile/__init__.py", line 1435, in _RealGetContents
         raise BadZipFile("Bad offset for central directory")
     zipfile.BadZipFile: Bad offset for central directory

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugA confirmed bug or unintended behavior
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions