fix(py_wheel): produce deterministic wheel files#1453
fix(py_wheel): produce deterministic wheel files#1453aignas merged 3 commits intobazel-contrib:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
fcea928 to
7524345
Compare
|
I'm not a wheel expert, but dropping the permissions might be problematic? My basic thinking is if a +x bit gets dropped, that might be a problem. Can't wheel contains arbitrary data, and someone might intend for a data file to have the executable bit? Maybe mask such that we keep the executable bit if set, but normalize to read only? |
|
If it helps, I've got code that canonicalizes wheels here. Another thing to watch out for is |
|
I talked to a Bazel dev about how permissions are handled
Pretty much any of those are a good reason for us to not preserve permissions. I find the second and third one particularly compelling. So I think its really just a question of: do we set rx or rwx? My vote is for rwx; it seems what is commonly done, and it "preserves" the writability of any files. |
|
(Ignas posted this, but it only shows up in my email not on the pr for some reason)
Is the order coming from depsets? Or is it coming from e.g. a Python dict/set re-ordering things? If it's coming from a depset, then, insofar as Bazel is concerned, this is fine. Depsets (and input lists) have ordering and it's reflected in how bazel computes what is affected/changed (e.g. the argv of the action's command line). I don't think that can be avoided. The zip itself is still deterministic, just not "predictable". If it's coming from Python doing a set/dict call somewhere, then we should fix that -- that's a source of non-determinism Bazel isn't aware of. Having that in a separate PR is fine; this one has been out for awhile. But yes, I agree that sorting the zip entries is a good idea regardless. |
We're sorting file lists in
|
|
Yeah, my previous comment (that I have deleted) was all wrong, I think all is good with the sorting of the contents within the wheelmaker, so the only things that I would love to see in this PR would be: |
|
I have now realized what had confused me in the beginning - when writing the |
Current implementation does not produce deterministic output because: - `ZipFile.writestr()` leaks current date and time - `ZipFile.write()` leaks the source file's mtime and mode bits (permissions) into the resulting zip archive. By manually creating our own `ZipInfo` objects we can explicitly set date and time fields to `Jan 1, 1980, 00:00` (minimum value allowed by the zip file standard), and ensure that other file attributes are uniform across all entries in a zip file.
cbf36ff to
66be710
Compare
|
Rebased on top of main. |
#1453 added logic to manually set zipinfo for each file in the wheel archive. When wheel archives generated by `_Whlfile` are installed by the `installer` module (or `pip`), the file mode is inspected as such: * `installer`: https://github.com/pypa/installer/blob/fcc0d6f14f99974316c2e490cead07a9a0b7a6ac/src/installer/sources.py#L318-L321 * `pip`: https://github.com/pypa/pip/blob/0f21fb92/src/pip/_internal/utils/unpacking.py#L96-L100 as you can tell, if the regular file bit is not set, the installer thinks that the file being installed is not an executable and therefore the executable bit will not be preserved when the file from wheel is extracted onto the host filesystem. Since all files being archived into Whlfile are regular files anyway, we set `S_IFREG` on the file mode for all files in the zip archive. Fixes #1711 --------- Signed-off-by: Thomas Lam <thomaslam@canva.com> Co-authored-by: Ignas Anikevicius <240938+aignas@users.noreply.github.com>
Current implementation does not produce deterministic output because:
ZipFile.writestr()leaks current date and timeZipFile.write()leaks the source file's mtime and mode bits (permissions) into the resulting zip archive.By manually creating our own
ZipInfoobjects we can explicitly set date and time fields toJan 1, 1980, 00:00(minimum value allowed by the zip file standard), and ensure that other file attributes are uniform across all entries in a zip file.