Remove usages of zip crate#19365
Conversation
bfe5a07 to
55df007
Compare
| async-trait = { version = "0.1.82" } | ||
| async_http_range_reader = { version = "0.11.0", package = "astral_async_http_range_reader" } | ||
| async_zip = { version = "0.0.17", package = "astral_async_zip", features = [ | ||
| async_zip = { git = "https://github.com/astral-sh/rs-async-zip.git", branch = "charlie/restore-zip-writing", package = "astral_async_zip", features = [ |
|
Wow, that performance gain is pretty awesome! |
| assert_snapshot!( | ||
| format!("{:x}", sha2::Sha256::digest(fs_err::read(&wheel_path).unwrap())), | ||
| @"dbe56fd8bd52184095b2e0ea3e83c95d1bc8b4aa53cf469cec5af62251b24abb" | ||
| @"0affdf7d3fda7e0a27007f6bf61524cd58bf3fce8e456bafb58b4a22faf0d6d0" |
There was a problem hiding this comment.
Noting: we should probably figure out why this reproduces differently, even if it's consistent across platforms with the new ZIP encoder.
There was a problem hiding this comment.
pkgdiff might help, not sure: https://lvc.github.io/pkgdiff/
There was a problem hiding this comment.
I think it's just metadata, but I will confirm.
There was a problem hiding this comment.
The hash changed because the new astral_async_zip writer emits different ZIP container metadata:
- Sets UTF-8 flag bit 11 on all entries, and creator version 6.3 instead of 2.0.
- Writes zero DOS timestamp fields, while the old zip output used 1980-01-01 00:00:00.
- For the 11 streamed source-file entries, writes ZIP64-style local headers with placeholder sizes/CRC, a 20-byte local ZIP64 extra field, a 24-byte data descriptor, and a 28-byte central-directory ZIP64 extra field.
- Adds a ZIP64 end-of-central-directory record plus locator, another 76 bytes.
- Compressed bytes are identical for 22 of 23 entries. Only built_by_uv-0.1.0.dist-info/RECORD has different compressed bytes, but its decompressed content is identical; it is 762 compressed bytes on main and 751 on the branch.
There was a problem hiding this comment.
We may want to change this? astral-sh/rs-async-zip#36
There was a problem hiding this comment.
Huh, the ZIP64 EOCDR is interesting -- I guess we're now defaulting to writing ZIP64 rather than only doing it when the input exceeds the ZIP limits.
(I think that's fine.)
| indoc = { workspace = true } | ||
| insta = { workspace = true } | ||
| regex = { workspace = true } | ||
| zip = { workspace = true } |
There was a problem hiding this comment.
Should we remove zip as a dev-dep too?
There was a problem hiding this comment.
I considered it... I'll do it in a separate change.
|
Did you also measure if performance changes when building a large wheel with the uv build backend? |
| sync::{Arc, Mutex}, | ||
| }; | ||
|
|
||
| const BUFFER_SIZE: usize = 64 * 1024; |
There was a problem hiding this comment.
It's from benchmarking.
There was a problem hiding this comment.
Can you add a comment on that symbol?
| fn fill_buf(&mut self) -> std::io::Result<&[u8]> { | ||
| if self.buffered_len() == 0 { | ||
| let mut underlying_file = self.file.lock().map_err(|_| { | ||
| std::io::Error::other("cloneable seekable reader mutex was poisoned") |
There was a problem hiding this comment.
This doesn't match our usual handling for poisoned locks (Same with the occurrences below)
b10888e to
74f2a52
Compare
Box the enlarged command futures that Clippy now rejects after the zip transition. Co-authored-by: Codex <noreply@openai.com>
Pin astral_async_zip to the published v0.0.18-rc2 tag and add the version requirement needed for cargo publish dry-run. Co-authored-by: Codex <noreply@openai.com>
The published async zip rc2 writer produces the same wheel contents with a different deterministic archive hash, so update the inline insta snapshot used by the build-backend reproducibility test. Co-authored-by: Codex <noreply@openai.com>
74f2a52 to
857bdc7
Compare
|
This branch appears a bit slower than
But I'm going to follow-up with a change that improves it past main. |
## Summary A staged change that I forgot to push prior to #19365.
## Summary The motivation and approach here is identical to #19365, but for tar. This doesn't have a meaningful effect on build size (since we _only_ use `tar` in `uv-build` and `astral-tokio-tar` in `uv`), but it does mean we aren't exposed to CVEs in `tar`.
Summary
Historically, we've relied on both the synchronous
zipcrate and our ownrs-async-zipfork. The async variant is used for streaming (i.e., unzipping a wheel as it's downloaded), while the sync variant is used to parallel-unzip local files, which turns out to be very fast.This PR removes all usages of the
zipcrate (apart from in development dependencies), in favor of usingrs-async-zipeverywhere. This appears to both reduce crate size and improve performance while also cutting off a source of CVEs. (Thers-async-zipcrate is, of course, also vulnerable to potential CVEs, but having two crates is strictly greater exposure for us.)Per Codex, the parallel unzip is now a bit faster:
Per Codex, the
uvanduv-buildbinaries are both smaller:Closes #19363.