fix: allow for `[u8]` as filename by Its-Just-Nans · Pull Request #775 · zip-rs/zip2

Its-Just-Nans · 2026-04-11T21:34:22Z

The PR title must conform to Conventional Commits and start with one of the types specified by the Angular convention.

amazon-q-developer

The PR changes the internal representation of file names from Box<str> to Box<[u8]> to support non-UTF-8 filenames. However, there's a critical defect in finish_into_readable() where expect("invalid UTF-8") will cause runtime panics for ZIP archives containing non-UTF-8 filenames. This breaks existing functionality since ZIP files can legally contain such filenames using encodings like CP437.

You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.

gemini-code-assist

Code Review

This pull request updates the internal file mapping to use Box<[u8]> instead of Box for filenames, facilitating support for non-UTF-8 paths. However, the implementation introduces a potential panic in the finish method when converting keys back to UTF-8, which contradicts the goal of supporting arbitrary bytes. Additionally, there are opportunities to optimize memory allocations during file insertion by using existing raw filename fields instead of re-allocating byte slices.

gemini-code-assist · 2026-04-11T21:41:20Z

+        if self.files.contains_key(file.file_name.as_bytes()) {
            return Err(invalid!("Duplicate filename: {}", file.file_name));
        }
-        let (index, _) = self.files.insert_full(file.file_name.clone(), file);
+        let (index, _) = self
+            .files
+            .insert_full(file.file_name.as_bytes().to_vec().into_boxed_slice(), file);


The implementation of insert_file_data uses file.file_name.as_bytes() to generate the map key, which is inconsistent with the goal of supporting raw byte filenames. Additionally, the sequence .to_vec().into_boxed_slice() performs redundant allocations. It is more efficient and consistent to use file.file_name_raw.clone().

if self.files.contains_key(file.file_name_raw.as_ref()) { return Err(invalid!("Duplicate filename: {}", file.file_name)); } let (index, _) = self.files.insert_full(file.file_name_raw.clone(), file);

References

To skip bytes from a reader, use std::io::copy with std::io::sink() or a stack-allocated buffer instead of allocating a Vec on the heap to avoid unnecessary allocations.

Avoid exposing internal data structures in public APIs. Instead, accept primitive types (e.g., &[u8]) and perform validation and construction internally.

Pr0methean

Mostly looks good, but some of the details are unclear to me.

Pr0methean · 2026-04-16T01:35:23Z

    /// Returns an iterator over all the file and directory names in this archive.
    pub fn file_names(&self) -> impl Iterator<Item = &str> {
-        self.shared.files.keys().map(std::convert::AsRef::as_ref)
+        self.shared.files.values().map(|f| f.file_name.as_ref())


What purpose does this change in implementation serve?

If the key is [u8] item cannot be str

Previously we used the key which was str
Now the key is [u8] so we cannot
And we use the file name inside the data

The thing is, to remove the triple allocation of file name, this method will be a breaking change (in another MR)

Why can't we just convert the key back to str? Granted, it doesn't handle the case where the name is an unknown encoding; but this PR doesn't handle that case either.

If this change lets us piggyback on some existing handling of non-UTF8, then that may be a valid reason but should be documented in a comment.

Yes this is the goal of the MR.

I'm not sure how a good idea it is to save the file name as a string. String are UTF-8, the zip specs doesn't enforce that.

I also think that a futur MR would be needed to change the way we are handling the file_name in ZipFileData

Two points:

the file name raw is duplicated between the key and the ZipFileData, I think this is removable

the file name (non-raw) is also saved in zipfiledata. I don't think that's a good way to do it. But this type of change would mean a breaking change (method using file name will return a results, and the UTF-8 string will be lazy computed on call, not saved in the struct)

I'll probably have to take another look at this situation on the weekend.

Removing the raw filename would probably require the values in the files-by-name map to borrow the key, assuming that's even possible (although if we went back to an IndexMap we could store the index instead), so that a reverse lookup would be possible. Is that what you're suggesting?

Removing the raw filename would probably require the values in the files-by-name map to borrow the key, assuming that's even possible (although if we went back to an IndexMap we could store the index instead), so that a reverse lookup would be possible. Is that what you're suggesting?

Yes kinda that will be this https://github.com/zip-rs/zip2/compare/only-file-name?expand=1 which is a MR on top of this current branch

ZipFileData will become (file_name_raw, ZipFileDataWithout_file_name_raw)

Its-Just-Nans added 2 commits April 11, 2026 13:19

use bytes direclty

79f0d36

cargo fmt

f1eb6c4

Its-Just-Nans marked this pull request as draft April 11, 2026 21:34

amazon-q-developer Bot reviewed Apr 11, 2026

View reviewed changes

Comment thread src/write.rs Outdated

github-advanced-security AI found potential problems Apr 11, 2026

View reviewed changes

Comment thread src/write.rs Fixed

Comment thread src/write.rs Fixed

gemini-code-assist Bot reviewed Apr 11, 2026

View reviewed changes

Its-Just-Nans added 2 commits April 11, 2026 16:35

fix

7f4e340

also on read

f6ae0a3

Its-Just-Nans mentioned this pull request Apr 12, 2026

v9.0.0: New version needs #691

Open

2 tasks

Its-Just-Nans added 2 commits April 11, 2026 19:26

add as bytes

3edd51f

fmt

f717f24

Its-Just-Nans marked this pull request as ready for review April 12, 2026 06:02

Pr0methean reviewed Apr 16, 2026

View reviewed changes

Pr0methean approved these changes Apr 23, 2026

View reviewed changes

Pr0methean added this pull request to the merge queue Apr 23, 2026

Its-Just-Nans mentioned this pull request Apr 23, 2026

refactor!: store the file_name_raw only as key #789

Merged

Merged via the queue into master with commit 0329d9d Apr 23, 2026
133 checks passed

Pr0methean deleted the test-internal-index-bytes branch April 23, 2026 23:46

Pr0methean mentioned this pull request Apr 23, 2026

chore: release v8.6.0 #781

Merged

Its-Just-Nans self-assigned this May 5, 2026

Conversation

Its-Just-Nans commented Apr 11, 2026

Uh oh!

amazon-q-developer Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

gemini-code-assist Bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pr0methean left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Its-Just-Nans Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Its-Just-Nans Apr 21, 2026 •

edited

Loading