Skip to content

perf: skip BufReader for Stored files in make_reader#739

Merged
Pr0methean merged 3 commits into
zip-rs:masterfrom
lovasoa:skip-bufreader-stored
Mar 22, 2026
Merged

perf: skip BufReader for Stored files in make_reader#739
Pr0methean merged 3 commits into
zip-rs:masterfrom
lovasoa:skip-bufreader-stored

Conversation

@lovasoa

@lovasoa lovasoa commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

Currently, by_index() allocates an 8 KiB BufReader and a Decompressor for every entry, even when CompressionMethod is Stored (when decompression is a no-op pass-through). For a 23K-entry JAR (mostly Stored files), this means 2 heap allocations per call totaling 8,632 bytes each — 47,262 allocations just to iterate and drop.

Fix: add a ZipFileReader::Stored variant that wraps CryptoReader directly in Crc32Reader, bypassing both BufReader and Decompressor.

This required adding a Debug impl for CryptoReader since the new Stored variant needs it for the derive(Debug) on ZipFileReader.

Measured on bazel.jar (23,631 Stored entries):

  by_index() allocations:
    before: 2.0 allocs/entry, 8,632 bytes alloc'd each
    after:  1.0 allocs/entry,   256 bytes alloc'd each
  cargo bench read_entry (1 MiB Stored file):
    before: 106,657 ns/iter (9,831 MB/s)
    after:  101,276 ns/iter (10,353 MB/s, +5.3%)

Problem: by_index() allocates an 8 KiB BufReader and a Decompressor
for every entry, even CompressionMethod::Stored where decompression
is a no-op pass-through. For a 23K-entry JAR (mostly Stored files),
this means 2 heap allocations per call totaling 8,632 bytes each —
47,262 allocations just to iterate and drop.

Fix: add a ZipFileReader::Stored variant that wraps CryptoReader
directly in Crc32Reader, bypassing both BufReader and Decompressor.

This required adding a Debug impl for CryptoReader since the new
Stored variant needs it for the derive(Debug) on ZipFileReader.

Measured on bazel.jar (23,631 Stored entries):
  by_index() allocations:
    before: 2.0 allocs/entry, 8,632 bytes alloc'd each
    after:  1.0 allocs/entry,   256 bytes alloc'd each
  cargo bench read_entry (1 MiB Stored file):
    before: 106,657 ns/iter (9,831 MB/s)
    after:  101,276 ns/iter (10,353 MB/s, +5.3%)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the handling of uncompressed (Stored) files within the ZIP archive reading logic. By introducing a dedicated reader path for these files, it eliminates redundant buffering and decompression steps, leading to more efficient memory usage and faster data access. The changes are focused on optimizing resource consumption when dealing with files that do not require actual decompression.

Highlights

  • Performance Optimization for Stored Files: Introduced a specialized ZipFileReader::Stored variant to bypass unnecessary BufReader and Decompressor allocations for files with CompressionMethod::Stored. This significantly reduces memory allocations and slightly improves read performance for uncompressed entries.
  • Reduced Allocations: The change reduces allocations from 2.0 allocs/entry (8,632 bytes each) to 1.0 allocs/entry (256 bytes each) for by_index() operations on Stored files.
  • Improved Read Speed: Benchmarking shows a 5.3% improvement in read speed for 1 MiB Stored files, from 106,657 ns/iter to 101,276 ns/iter.
  • Debug Implementation: Added a Debug implementation for CryptoReader to support the derive(Debug) on ZipFileReader after the introduction of the new Stored variant.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

@lovasoa lovasoa marked this pull request as ready for review March 20, 2026 15:44
@Its-Just-Nans

Copy link
Copy Markdown
Member

Hi

Thanks for the MR

I'm curious, which tools do you use to get this result?

  by_index() allocations:
    before: 2.0 allocs/entry, 8,632 bytes alloc'd each
    after:  1.0 allocs/entry,   256 bytes alloc'd each

@Its-Just-Nans Its-Just-Nans self-requested a review March 20, 2026 17:31
@Its-Just-Nans Its-Just-Nans added this to the 8.4.0 milestone Mar 20, 2026
@lovasoa

lovasoa commented Mar 20, 2026

Copy link
Copy Markdown
Contributor Author

My harness is here: https://github.com/lovasoa/zip2/blob/reduce-allocations/examples%2Fmemory_bench_alloc.rs

@Pr0methean

Copy link
Copy Markdown
Member

/Gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the ZipFileReader in src/read.rs by introducing explicit handling for Stored (uncompressed) data. A new Stored variant is added to the ZipFileReader enum, along with corresponding implementations for the Read trait methods (read, read_exact, read_to_end, read_to_string) and the into_inner method. The make_reader function is updated to correctly construct a ZipFileReader::Stored when the compression method is Stored. Additionally, a Debug implementation was added for the CryptoReader enum to facilitate debugging.

@Pr0methean

Copy link
Copy Markdown
Member

@gemini-code-assist Will this cause a performance regression when reading from a file?

@Pr0methean Pr0methean added this pull request to the merge queue Mar 22, 2026
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Hi @Pr0methean,

That's a great question! The benchmark results provided in the pull request description show a 5.3% improvement in read speed for 1 MiB Stored files. This suggests that the changes should not cause a performance regression when reading from a file, and may even improve it in some cases.

However, it's important to consider that these benchmarks are just one data point, and performance may vary depending on the specific file being read, the hardware being used, and other factors. It would be helpful to run additional benchmarks on a wider variety of files and systems to get a more complete picture of the performance impact of these changes.

Additionally, it's worth noting that the primary goal of this pull request is to reduce memory allocations, which can have a positive impact on overall performance even if read speed is not significantly affected. The reduced allocations can lead to less garbage collection overhead and improved memory locality, which can improve performance in memory-constrained environments.

@Pr0methean

Copy link
Copy Markdown
Member

@gemini-code-assist Can this change increase the number of I/O syscalls?

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Hi @Pr0methean,

That's another excellent question! Whether this change increases the number of I/O syscalls depends on a few factors:

  • Buffering: The original code uses BufReader, which performs its own internal buffering. By removing BufReader for stored files, we might reduce buffering in some cases, potentially leading to more frequent, smaller reads from the underlying CryptoReader.
  • Read Sizes: If the read sizes requested by the user are smaller than the internal buffer size of BufReader, then removing it could lead to more syscalls. Conversely, if the read sizes are larger, the impact might be negligible or even beneficial.
  • Underlying Reader: The CryptoReader itself might perform buffering, depending on its implementation and the underlying reader it wraps. If CryptoReader already buffers efficiently, the impact of removing BufReader might be minimal.

To get a definitive answer, it would be necessary to:

  1. Profile the code: Use profiling tools to measure the number of syscalls before and after the change, under various scenarios (different file sizes, read sizes, etc.).
  2. Analyze CryptoReader: Examine the implementation of CryptoReader to understand its buffering behavior.

In summary, it's possible that this change could increase the number of I/O syscalls in certain situations, but it's not guaranteed. Profiling and further analysis are needed to determine the actual impact.

Merged via the queue into zip-rs:master with commit c2999bc Mar 22, 2026
130 checks passed
@Pr0methean Pr0methean mentioned this pull request Mar 22, 2026
@lovasoa lovasoa deleted the skip-bufreader-stored branch March 22, 2026 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants