Skip to content

Remove usages of zip crate#19365

Merged
charliermarsh merged 7 commits into
mainfrom
charlie/zip
May 12, 2026
Merged

Remove usages of zip crate#19365
charliermarsh merged 7 commits into
mainfrom
charlie/zip

Conversation

@charliermarsh

@charliermarsh charliermarsh commented May 11, 2026

Copy link
Copy Markdown
Member

Summary

Historically, we've relied on both the synchronous zip crate and our own rs-async-zip fork. The async variant is used for streaming (i.e., unzipping a wheel as it's downloaded), while the sync variant is used to parallel-unzip local files, which turns out to be very fast.

This PR removes all usages of the zip crate (apart from in development dependencies), in favor of using rs-async-zip everywhere. This appears to both reduce crate size and improve performance while also cutting off a source of CVEs. (The rs-async-zip crate is, of course, also vulnerable to potential CVEs, but having two crates is strictly greater exposure for us.)

Per Codex, the parallel unzip is now a bit faster:

- 55 KB wheel: async -1.0%
- 393 KB wheel: async -7.8%
- 11.4 MB shellcheck wheel: async -3.8%
- 12.5 MB Ruff wheel: async -9.1%
- 21.3 MB NumPy wheel: async -11.0%
- 66.5 MB pandas wheel: async -4.3%

Per Codex, the uv and uv-build binaries are both smaller:

  ┌──────────┬──────────────┬──────────────┬─────────────────────────────────┐
  │ Binary   │         main │       branch │                           delta │
  ├──────────┼──────────────┼──────────────┼─────────────────────────────────┤
  │ uv       │ 16,422,784 B │ 16,274,608 B │ -148,176 B (-144.7 KiB, -0.90%) │
  │ uv-build │  2,683,584 B │  2,652,512 B │   -31,072 B (-30.3 KiB, -1.16%) │
  └──────────┴──────────────┴──────────────┴─────────────────────────────────┘

Closes #19363.

@charliermarsh charliermarsh force-pushed the charlie/zip branch 2 times, most recently from bfe5a07 to 55df007 Compare May 11, 2026 17:28
@charliermarsh charliermarsh marked this pull request as ready for review May 11, 2026 19:19
Comment thread Cargo.toml Outdated
async-trait = { version = "0.1.82" }
async_http_range_reader = { version = "0.11.0", package = "astral_async_http_range_reader" }
async_zip = { version = "0.0.17", package = "astral_async_zip", features = [
async_zip = { git = "https://github.com/astral-sh/rs-async-zip.git", branch = "charlie/restore-zip-writing", package = "astral_async_zip", features = [

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@woodruffw

Copy link
Copy Markdown
Member

Wow, that performance gain is pretty awesome!

assert_snapshot!(
format!("{:x}", sha2::Sha256::digest(fs_err::read(&wheel_path).unwrap())),
@"dbe56fd8bd52184095b2e0ea3e83c95d1bc8b4aa53cf469cec5af62251b24abb"
@"0affdf7d3fda7e0a27007f6bf61524cd58bf3fce8e456bafb58b4a22faf0d6d0"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting: we should probably figure out why this reproduces differently, even if it's consistent across platforms with the new ZIP encoder.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pkgdiff might help, not sure: https://lvc.github.io/pkgdiff/

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just metadata, but I will confirm.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash changed because the new astral_async_zip writer emits different ZIP container metadata:
- Sets UTF-8 flag bit 11 on all entries, and creator version 6.3 instead of 2.0.
- Writes zero DOS timestamp fields, while the old zip output used 1980-01-01 00:00:00.
- For the 11 streamed source-file entries, writes ZIP64-style local headers with placeholder sizes/CRC, a 20-byte local ZIP64 extra field, a 24-byte data descriptor, and a 28-byte central-directory ZIP64 extra field.
- Adds a ZIP64 end-of-central-directory record plus locator, another 76 bytes.
- Compressed bytes are identical for 22 of 23 entries. Only built_by_uv-0.1.0.dist-info/RECORD has different compressed bytes, but its decompressed content is identical; it is 762 compressed bytes on main and 751 on the branch.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to change this? astral-sh/rs-async-zip#36

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, the ZIP64 EOCDR is interesting -- I guess we're now defaulting to writing ZIP64 rather than only doing it when the input exceeds the ZIP limits.

(I think that's fine.)

indoc = { workspace = true }
insta = { workspace = true }
regex = { workspace = true }
zip = { workspace = true }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove zip as a dev-dep too?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it... I'll do it in a separate change.

@woodruffw woodruffw added internal A refactor or improvement that is not user-facing dependencies labels May 12, 2026
@konstin

konstin commented May 12, 2026

Copy link
Copy Markdown
Member

Did you also measure if performance changes when building a large wheel with the uv build backend?

sync::{Arc, Mutex},
};

const BUFFER_SIZE: usize = 64 * 1024;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is that number from?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's from benchmarking.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment on that symbol?

Comment thread crates/uv-build-backend/src/wheel.rs
fn fill_buf(&mut self) -> std::io::Result<&[u8]> {
if self.buffered_len() == 0 {
let mut underlying_file = self.file.lock().map_err(|_| {
std::io::Error::other("cloneable seekable reader mutex was poisoned")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't match our usual handling for poisoned locks (Same with the occurrences below)

@charliermarsh charliermarsh force-pushed the charlie/zip branch 3 times, most recently from b10888e to 74f2a52 Compare May 12, 2026 19:48
charliermarsh and others added 7 commits May 12, 2026 15:48
Box the enlarged command futures that Clippy now rejects after the zip transition.

Co-authored-by: Codex <noreply@openai.com>
Pin astral_async_zip to the published v0.0.18-rc2 tag and add the version requirement needed for cargo publish dry-run.

Co-authored-by: Codex <noreply@openai.com>
The published async zip rc2 writer produces the same wheel contents with a different deterministic archive hash, so update the inline insta snapshot used by the build-backend reproducibility test.

Co-authored-by: Codex <noreply@openai.com>
@charliermarsh

Copy link
Copy Markdown
Member Author

This branch appears a bit slower than main for writing:

candidate run 1 mean reversed mean
origin/main 390ae4620 569.1 ms 571.3 ms
charlie/zip 857bdc712 591.8 ms 601.0 ms

But I'm going to follow-up with a change that improves it past main.

@charliermarsh charliermarsh merged commit f82a2cf into main May 12, 2026
112 checks passed
@charliermarsh charliermarsh deleted the charlie/zip branch May 12, 2026 20:48
zanieb pushed a commit that referenced this pull request May 12, 2026
## Summary

A staged change that I forgot to push prior to
#19365.
woodruffw pushed a commit that referenced this pull request May 15, 2026
## Summary

The motivation and approach here is identical to #19365, but for tar.
This doesn't have a meaningful effect on build size (since we _only_ use
`tar` in `uv-build` and `astral-tokio-tar` in `uv`), but it does mean we
aren't exposed to CVEs in `tar`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies internal A refactor or improvement that is not user-facing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove use of synchronous zip crate

3 participants