Description
I stumbled upon a pretty unique bug, caused by this code:
|
if ( |
|
content_type == "application/zip" |
|
or filename.lower().endswith(ZIP_EXTENSIONS) |
|
or zipfile.is_zipfile(filename) |
|
): |
|
unzip_file(filename, location, flatten=not filename.endswith(".whl")) |
In my org I have ran into a tar.gz file which, when passed to zipfile.is_zipfile(...) returns True.
This causes false positive check and subsequent error BadZipFile when zipfile library tries to unpack it (because it's a tar.gz!)
The file contains proprietary code so I cannot share it, but I can share the output:
$ tar -tvf my-tar-file.tar.gz | head -n3
drwxr-xr-x runner/runner 0 2026-03-25 04:05 some-package-1.2.3/
-rw-r--r-- runner/runner 145 2026-03-25 04:05 some-package-1.2.3/MANIFEST.in
-rw-r--r-- runner/runner 257 2026-03-25 04:05 some-package-1.2.3/PKG-INFO
$ python -c 'import zipfile; print(zipfile.is_zipfile("my-tar-file.tar.gz"))'
True
$
Some AI prompting revealed this is due to some magic byte string "PK\x05\x06" (ZIP End-of-Central-Directory
signature) which can randomly appear in any file, and is enough for zipfile.is_zipfile to return True
Since there's no way to bypass this zipfile.is_zipfile check when using "pip wheel <tar.gz>" command, I will have to modify our tooling to either unpack first or use "uv" instead.
I think the fix would be to do a filename check first, and not try to call zipfile.is_zipfile on a file with "tar.gz" extension.
Fun bug)
Expected behavior
No response
pip version
latest code has the same issue I believe, but we're using 21 or 24
Python version
3.12
OS
linux
How to Reproduce
I cannot give you the file so it is tricky, you have to be really lucky to have a tar.gz which contains a 4-byte sequence 50 4b 05 06.
Output
...
File "/venv/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 596, in _prepare_linked_requirement
local_file = unpack_url(
^^^^^^^^^^^
File "/venv/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 178, in unpack_url
unpack_file(file.path, location, file.content_type)
File "/venv/lib/python3.12/site-packages/pip/_internal/utils/unpacking.py", line 240, in unpack_file
unzip_file(filename, location, flatten=not filename.endswith(".whl"))
File "/venv/lib/python3.12/site-packages/pip/_internal/utils/unpacking.py", line 115, in unzip_file
zip = zipfile.ZipFile(zipfp, allowZip64=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/python/3.12.6/lib/python3.12/zipfile/__init__.py", line 1349, in __init__
self._RealGetContents()
File "/python/3.12.6/lib/python3.12/zipfile/__init__.py", line 1435, in _RealGetContents
raise BadZipFile("Bad offset for central directory")
zipfile.BadZipFile: Bad offset for central directory
Code of Conduct
Description
I stumbled upon a pretty unique bug, caused by this code:
pip/src/pip/_internal/utils/unpacking.py
Lines 340 to 345 in fc9550b
In my org I have ran into a tar.gz file which, when passed to zipfile.is_zipfile(...) returns True.
This causes false positive check and subsequent error BadZipFile when zipfile library tries to unpack it (because it's a tar.gz!)
The file contains proprietary code so I cannot share it, but I can share the output:
Some AI prompting revealed this is due to some magic byte string "PK\x05\x06" (ZIP End-of-Central-Directory
signature) which can randomly appear in any file, and is enough for zipfile.is_zipfile to return True
Since there's no way to bypass this zipfile.is_zipfile check when using "pip wheel <tar.gz>" command, I will have to modify our tooling to either unpack first or use "uv" instead.
I think the fix would be to do a filename check first, and not try to call zipfile.is_zipfile on a file with "tar.gz" extension.
Fun bug)
Expected behavior
No response
pip version
latest code has the same issue I believe, but we're using 21 or 24
Python version
3.12
OS
linux
How to Reproduce
I cannot give you the file so it is tricky, you have to be really lucky to have a tar.gz which contains a 4-byte sequence 50 4b 05 06.
Output
Code of Conduct