Hi there! First of all, thanks for continuing to maintain this package -- it's extremely useful 🙂
I'm one of the maintainers of pip-audit, and we had a user report some strange dependency resolution behavior: pypa/pip-audit#248
We were able to root-cause the bug down to a release of cffi (1.0.2-2) that uses the implicit post releases syntax for specifying the post-release number, rather than the canonicalized postN format. This release of cffi is published on PyPI here, without canonicalization, so it's likely that it was uploaded before PyPI began normalizing versions.
Because 1.0.2-2 contains a dash, the following body of packaging.utils.parse_sdist_filename contains an incorrect assumption and parses the source distribution name incorrectly:
def parse_sdist_filename(filename: str) -> Tuple[NormalizedName, Version]:
if filename.endswith(".tar.gz"):
file_stem = filename[: -len(".tar.gz")]
elif filename.endswith(".zip"):
file_stem = filename[: -len(".zip")]
else:
raise InvalidSdistFilename(
f"Invalid sdist filename (extension must be '.tar.gz' or '.zip'):"
f" {filename}"
)
# We are requiring a PEP 440 version, which cannot contain dashes,
# so we split on the last dash.
name_part, sep, version_part = file_stem.rpartition("-")
if not sep:
raise InvalidSdistFilename(f"Invalid sdist filename: {filename}")
name = canonicalize_name(name_part)
version = Version(version_part)
return (name, version)
yielding:
>>> from packaging.utils import parse_sdist_filename
>>> parse_sdist_filename("cffi-1.0.2-2.tar.gz")
('cffi-1-0-2', <Version('2')>)
whereas we expected:
>>> from packaging.utils import parse_sdist_filename
>>> parse_sdist_filename("cffi-1.0.2-2.tar.gz")
('cffi', <Version('1.0.2.post2')>)
TL;DR: parse_sdist_filename shouldn't rely on the last dash as a separator between the distribution name and the version, since PEP 440 allows dashes in non-normalized versions. Parsing this correctly poses a bit of a challenge, since distribution names can also contain dashes and numbers and might even contain them in pathological ways, such as:
# package foo3, version 1.0.0.post1
foo3-1.0.0-1.tar.gz
# package foo-3, version 1.0.0.post1
foo-3-1.0.0-1.tar.gz
# package 3_3, version 1.0.0.post1
# i'm not sure this one is valid, but i can't find a countervailing spec in any of the packaging PEPs
3-3-1.0.0-1.tar.gz
Hi there! First of all, thanks for continuing to maintain this package -- it's extremely useful 🙂
I'm one of the maintainers of
pip-audit, and we had a user report some strange dependency resolution behavior: pypa/pip-audit#248We were able to root-cause the bug down to a release of
cffi(1.0.2-2) that uses the implicit post releases syntax for specifying the post-release number, rather than the canonicalizedpostNformat. This release ofcffiis published on PyPI here, without canonicalization, so it's likely that it was uploaded before PyPI began normalizing versions.Because
1.0.2-2contains a dash, the following body ofpackaging.utils.parse_sdist_filenamecontains an incorrect assumption and parses the source distribution name incorrectly:yielding:
whereas we expected:
TL;DR:
parse_sdist_filenameshouldn't rely on the last dash as a separator between the distribution name and the version, since PEP 440 allows dashes in non-normalized versions. Parsing this correctly poses a bit of a challenge, since distribution names can also contain dashes and numbers and might even contain them in pathological ways, such as: