-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
package manager: handle archives without leading root folder #19111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c389f67 to
716adc0
Compare
andrewrk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does the behavior for any kind of resource, however, this logic is intended to only be executed for archives.
src/Package/Fetch.zig
Outdated
| const rand_int = std.crypto.random.int(u64); | ||
| const tmp_dir_sub_path = "tmp" ++ s ++ Manifest.hex64(rand_int); | ||
| const tmp_dir_sub_path = "tmp" ++ s ++ Manifest.hex64(rand_int); // root of the temporary directory | ||
| var tmp_package_root_sub_path: ?[]const u8 = null; // package root inside temporary directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With some code restructuring, this does not need to be a variable. Notice how the surrounding code avoids using variables, with the exception of loops and resource management.
src/Package/Fetch.zig
Outdated
| const cache_root = f.job_queue.global_cache; | ||
| const rand_int = std.crypto.random.int(u64); | ||
| const tmp_dir_sub_path = "tmp" ++ s ++ Manifest.hex64(rand_int); | ||
| const tmp_dir_sub_path = "tmp" ++ s ++ Manifest.hex64(rand_int); // root of the temporary directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment disagrees with the local's name. If you pick a good name instead then there is no need to comment about the meaning of the name.
andrewrk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I think this is the right approach.
I identified 3 syscall optimizations to be implemented before merging. I believe these changes will result in equivalent - if not more - maintainability for this code, while being more optimal.
src/Package/Fetch.zig
Outdated
| fn archivePackageDir(arena: Allocator, out_dir: fs.Dir) !?[]const u8 { | ||
| var iter = out_dir.iterate(); | ||
| if (try iter.next()) |entry| { | ||
| if (try iter.next() == null and entry.kind == .directory) { | ||
| return try arena.dupe(u8, entry.name); | ||
| } | ||
| } | ||
| return null; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This information should be instead tracked by pipeToFileSystem to avoid these two syscalls (open and readdir).
| uri_path: []const u8, | ||
| tmp_directory: Cache.Directory, | ||
| ) RunError!void { | ||
| ) RunError!?[]const u8 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Enhancing this function to support returning a sub-path inside the temporary directory makes sense because otherwise the tar unpacking code would need to sometimes move all the top-level entries.
It took me a while to understand the meaning of this return value. Let's make it easier for the next person by adding a doc comment to the function to explain what return value means.
Something like this:
/// A `null` return value indicates the `tmp_directory` is populated directly with the package contents.
/// A non-null return value means that the package contents are inside a sub-directory indicated by the named path.
src/Package/Fetch.zig
Outdated
| if (package_dir) |dir_name| { | ||
| // Position tmp_directory to dir_name inside tmp_dir_sub_path. | ||
| const path = try cache_root.join(arena, &.{ tmp_dir_sub_path, dir_name }); | ||
| const handle = tmp_directory.handle.openDir(dir_name, .{ .iterate = true }) catch |err| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to open the sub directory since the parent is already open. The Path abstraction already has a sub_path field. It looks like you may need to update computeHash to accept Path rather than Directory.
src/Package/Fetch.zig
Outdated
| // Remove temporary directory root. | ||
| cache_root.handle.deleteTree(tmp_dir_sub_path) catch {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Syscall should be avoided when it is known to not exist already.
- When it is needed, it is known to be an empty directory, so it is better to use deleteDir.
Prior to [this](ziglang@ef9966c#diff-08c935ef8c633bb630641d44230597f1cff5afb0e736d451e2ba5569fa53d915R805) commit tar was not a valid extension. After that this one is valid case.
Unpack tar without removing leading root folder. Then find package root in unpacked tmp folder.
From review comments: ziglang#19111
While iterating over all files in tarball set root_dir in diagnostic if there is single root in tarball. Will be used in package manager with strip_components = 0 to find the root of the fetched package.
Based on comment: ziglang#19111 (comment) computeHash finds all files in temporary directory. There is no difference on what path are they. When calculating hash normalized_path must be set relative to package root. That's the place where we strip root if needed.
Based on: ziglang#19111 (comment) In pipeToFileSystem we are already iterating over all files in tarball. Inspecting them there for existence of single root folder saves two syscalls later.
ffb0460 to
8545cb0
Compare
andrewrk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, thank you.
Filter should be applied on path where package root folder (if there is any) is stripped. Manifest is inside package root and has paths relative to package root not temporary directory root.
|
I made a bug in handling paths while filtering files not included in manifest! Those paths are relative to the package root not to the temporary directory root. After removing reopening temporary directory to the package root those two are different in computeHash. There is entry file system path and entry package path. First has to be used for file system operations (deleting, reading), package path has to be used for filtering and hashing. Fixed now. |
Prepare test cases, store them in Fetch/testdata. They cover changes in this PR as well from previous one ziglang#19111.
From review comments: ziglang#19111
Based on comment: ziglang#19111 (comment) computeHash finds all files in temporary directory. There is no difference on what path are they. When calculating hash normalized_path must be set relative to package root. That's the place where we strip root if needed.
Based on: ziglang#19111 (comment) In pipeToFileSystem we are already iterating over all files in tarball. Inspecting them there for existence of single root folder saves two syscalls later.
Prepare test cases, store them in Fetch/testdata. They cover changes in this PR as well from previous one ziglang#19111.
Prepare test cases, store them in Fetch/testdata. They cover changes in this PR as well from previous one ziglang#19111.
Instead of striping leading root folder while unpacking tar, tar archive is unpacked as is into temporary directory. Then we check result and decide whether to skip root folder. That allows us to handle archives which don't have that leading root folder.
This if follow up of PR #19090.
Fixes #17779