Conversation
parse_wheel_filename
|
I would want to understand the real world impact of this change, in particular is this going to reject a lot of wheels uploaded to PyPI? |
|
Good question. The validation only triggers on compressed tag sets — filenames where a single component contains multiple dot-separated values (e.g. All major build tools I'm aware of ( If this is a concern, I could scope the check to only |
|
Sorry, I posted that before I realized similar discussion was taking place in #909 (comment), let's move discussion on what we want to do about compliance/backwards comparability there. |
|
CI failures highlight an important edge case: lexicographic string sorting doesn't work for version-embedded platform tags. Failing examples:
This confirms @notatallshaw's concern about real-world impact. The sorted-order check needs either:
Happy to implement whichever direction you prefer — will wait for the discussion on #909 to settle before pushing a fix. |
|
The only reason to sort these is to have a completely deterministic filename. So there isn't an "intended" order, it's not there to make it easier for humans. So only simple sorting, which is easy to implement in other languages too (like Rust) makes sense, anything other than lexicographic sorting would be much harder to implement, slower, and would serve no purpose. |
henryiii
left a comment
There was a problem hiding this comment.
I think this would be fine as an opt-in parameter. I don't think we can add it to a legacy function without the opt-in.
|
I think we must make this opt-in for a legacy API. There are at least 370,069 wheels uploaded to PyPI (out of the 1,023,6875 wheels I checked with a compressed tag set) that have unsorted compressed tags. We need to be able to parse those old wheel filenames. |
|
Thanks for the data, @henryiii — 370K out of 10M+ is significant. I'll update this to add a |
|
Great, thanks! Auditwheel produced unsorted tag sets for many years, it got fixed only a few years ago. It didn't really matter originally, as it didn't produce multiple tags, but when it started making manylinux tags in both old and new styles, it did so unsorted, and it was at least a few years before sorting was added. I wouldn't be surprised if multi-tagged universal macOS wheels were sometimes wrong, too. |
|
Updated per review feedback:
|
|
Mind running |
|
Done — pushed a fix for the trailing newline and line length. Thanks for the heads-up on |
parse_wheel_filenameparse_wheel_filename
|
Wow, that's a strange test failure! The failure happens if you run the full test suite, but not if you run just |
8820a00 to
4e0731c
Compare
4e0731c to
484cf52
Compare
|
I pulled the test fix out into #1152, this should work when that goes in. |
484cf52 to
57ff8d9
Compare
Per review suggestion from pradyunsg: the PEP 425 sorted-order invariant is now enforced at the parse_tag level, raising ValueError when a compressed tag set component is not sorted. parse_wheel_filename catches this and re-raises as InvalidWheelFilename with the full filename for better error context.
Per review suggestion from pradyunsg: the PEP 425 sorted-order invariant is now enforced at the parse_tag level, raising ValueError when a compressed tag set component is not sorted. parse_wheel_filename catches this and re-raises as InvalidWheelFilename with the full filename for better error context.
Per review suggestion from pradyunsg: the PEP 425 sorted-order invariant is now enforced at the parse_tag level, raising ValueError when a compressed tag set component is not sorted. parse_wheel_filename catches this and re-raises as InvalidWheelFilename with the full filename for better error context.
Per review feedback: 370K+ wheels on PyPI have unsorted compressed tag sets, so the check must be opt-in for backwards compatibility. - Add validate_order=False to parse_tag() and parse_wheel_filename() - Only check sorted order when validate_order=True - Fix exception chaining (raise ... from None) - Update tests: unsorted tags parse by default, rejected with validate
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
Signed-off-by: Henry Schreiner <henryfs@princeton.edu>
46d06a8 to
5884358
Compare
|
@woodruffw, @brettcannon, and/or @dstufft, since you engaged on #909, it would be nice to get a opinion here from at least one of you. I think opt-in is the best way to go for this legacy API, this looks good to me. |
I agree with that. |
|
Yeah, opt-in seems reasonable given that it's a legacy API 🙂 |
|
Thanks for these! |
Summary
Fixes #909.
parse_wheel_filenameaccepted wheel filenames where compressed tag sets had components in unsorted order, even though PEP 425 explicitly requires them to be sorted:For example,
pyvirtualcam-0.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whlwas accepted silently, but the correct filename ispyvirtualcam-0.13.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl(since'2' < '_'in ASCII,manylinux2014_x86_64sorts beforemanylinux_2_17_x86_64).This was surfaced by a real-world interaction:
auditwheel repairproduces incorrectly-ordered tag strings (tracked at pypa/auditwheel#583), andpackagingaccepted them without error, masking the upstream bug.Changes
src/packaging/utils.py: Inparse_wheel_filename, before callingparse_tag, validate that each--separated component's.-separated parts are in lexicographic sorted order. RaisesInvalidWheelFilenameif not.tests/test_utils.py:test_parse_wheel_invalid_filename:pyvirtualcam-0.13.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl(platform tags unsorted)foo-1.0-py3.py2-none-any.whl(interpreter tags unsorted)test_parse_wheel_filename:pyvirtualcam-0.13.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl(correctly sorted)Design note
The check is placed in
parse_wheel_filename(notparse_tag) so thatparse_tagremains a general-purpose utility without new validation semantics. Per brettcannon's comment on #909, the fix touches the tag-parsing path inutils.py.