Skip to content

Conversation

@ianic
Copy link
Contributor

@ianic ianic commented Feb 27, 2024

Instead of striping leading root folder while unpacking tar, tar archive is unpacked as is into temporary directory. Then we check result and decide whether to skip root folder. That allows us to handle archives which don't have that leading root folder.

This if follow up of PR #19090.

Fixes #17779

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does the behavior for any kind of resource, however, this logic is intended to only be executed for archives.

const rand_int = std.crypto.random.int(u64);
const tmp_dir_sub_path = "tmp" ++ s ++ Manifest.hex64(rand_int);
const tmp_dir_sub_path = "tmp" ++ s ++ Manifest.hex64(rand_int); // root of the temporary directory
var tmp_package_root_sub_path: ?[]const u8 = null; // package root inside temporary directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With some code restructuring, this does not need to be a variable. Notice how the surrounding code avoids using variables, with the exception of loops and resource management.

const cache_root = f.job_queue.global_cache;
const rand_int = std.crypto.random.int(u64);
const tmp_dir_sub_path = "tmp" ++ s ++ Manifest.hex64(rand_int);
const tmp_dir_sub_path = "tmp" ++ s ++ Manifest.hex64(rand_int); // root of the temporary directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment disagrees with the local's name. If you pick a good name instead then there is no need to comment about the meaning of the name.

@ianic ianic requested a review from andrewrk April 2, 2024 17:09
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I think this is the right approach.

I identified 3 syscall optimizations to be implemented before merging. I believe these changes will result in equivalent - if not more - maintainability for this code, while being more optimal.

Comment on lines 633 to 641
fn archivePackageDir(arena: Allocator, out_dir: fs.Dir) !?[]const u8 {
var iter = out_dir.iterate();
if (try iter.next()) |entry| {
if (try iter.next() == null and entry.kind == .directory) {
return try arena.dupe(u8, entry.name);
}
}
return null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information should be instead tracked by pipeToFileSystem to avoid these two syscalls (open and readdir).

uri_path: []const u8,
tmp_directory: Cache.Directory,
) RunError!void {
) RunError!?[]const u8 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Enhancing this function to support returning a sub-path inside the temporary directory makes sense because otherwise the tar unpacking code would need to sometimes move all the top-level entries.

It took me a while to understand the meaning of this return value. Let's make it easier for the next person by adding a doc comment to the function to explain what return value means.

Something like this:

/// A `null` return value indicates the `tmp_directory` is populated directly with the package contents.
/// A non-null return value means that the package contents are inside a sub-directory indicated by the named path.

if (package_dir) |dir_name| {
// Position tmp_directory to dir_name inside tmp_dir_sub_path.
const path = try cache_root.join(arena, &.{ tmp_dir_sub_path, dir_name });
const handle = tmp_directory.handle.openDir(dir_name, .{ .iterate = true }) catch |err| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to open the sub directory since the parent is already open. The Path abstraction already has a sub_path field. It looks like you may need to update computeHash to accept Path rather than Directory.

Comment on lines 535 to 536
// Remove temporary directory root.
cache_root.handle.deleteTree(tmp_dir_sub_path) catch {};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Syscall should be avoided when it is known to not exist already.
  2. When it is needed, it is known to be an empty directory, so it is better to use deleteDir.

ianic added 9 commits April 3, 2024 17:06
Prior to
[this](ziglang@ef9966c#diff-08c935ef8c633bb630641d44230597f1cff5afb0e736d451e2ba5569fa53d915R805)
commit tar was not a valid extension. After that this one is valid case.
Unpack tar without removing leading root folder. Then find package root
in unpacked tmp folder.
While iterating over all files in tarball set root_dir in diagnostic if
there is single root in tarball. Will be used in package manager with
strip_components = 0 to find the root of the fetched package.
Based on comment:
ziglang#19111 (comment)

computeHash finds all files in temporary directory. There is no
difference on what path are they. When calculating hash normalized_path
must be set relative to package root. That's the place where we strip
root if needed.
Based on:
ziglang#19111 (comment)

In pipeToFileSystem we are already iterating over all files in tarball.
Inspecting them there for existence of single root folder saves two
syscalls later.
@ianic ianic force-pushed the no_strip_components branch from ffb0460 to 8545cb0 Compare April 3, 2024 19:25
@ianic ianic requested a review from andrewrk April 3, 2024 19:26
Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, thank you.

@andrewrk andrewrk enabled auto-merge April 3, 2024 20:22
@ianic ianic disabled auto-merge April 3, 2024 23:18
@ianic ianic marked this pull request as draft April 3, 2024 23:18
Filter should be applied on path where package root folder (if
there is any) is stripped. Manifest is inside package root and has paths
relative to package root not temporary directory root.
@ianic ianic marked this pull request as ready for review April 4, 2024 00:08
@ianic
Copy link
Contributor Author

ianic commented Apr 4, 2024

I made a bug in handling paths while filtering files not included in manifest!

Those paths are relative to the package root not to the temporary directory root. After removing reopening temporary directory to the package root those two are different in computeHash. There is entry file system path and entry package path. First has to be used for file system operations (deleting, reading), package path has to be used for filtering and hashing.

Fixed now.

@andrewrk andrewrk enabled auto-merge April 4, 2024 00:18
@andrewrk andrewrk merged commit e5d9002 into ziglang:master Apr 4, 2024
@ianic ianic deleted the no_strip_components branch April 4, 2024 14:27
ianic added a commit to ianic/zig that referenced this pull request Apr 4, 2024
Prepare test cases, store them in Fetch/testdata.
They cover changes in this PR as well from previous one ziglang#19111.
RossComputerGuy pushed a commit to MidstallSoftware/zig that referenced this pull request Apr 5, 2024
RossComputerGuy pushed a commit to MidstallSoftware/zig that referenced this pull request Apr 5, 2024
Based on comment:
ziglang#19111 (comment)

computeHash finds all files in temporary directory. There is no
difference on what path are they. When calculating hash normalized_path
must be set relative to package root. That's the place where we strip
root if needed.
RossComputerGuy pushed a commit to MidstallSoftware/zig that referenced this pull request Apr 5, 2024
Based on:
ziglang#19111 (comment)

In pipeToFileSystem we are already iterating over all files in tarball.
Inspecting them there for existence of single root folder saves two
syscalls later.
ianic added a commit to ianic/zig that referenced this pull request Apr 6, 2024
Prepare test cases, store them in Fetch/testdata.
They cover changes in this PR as well from previous one ziglang#19111.
ianic added a commit to ianic/zig that referenced this pull request Apr 9, 2024
Prepare test cases, store them in Fetch/testdata.
They cover changes in this PR as well from previous one ziglang#19111.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

error: unable to unpack tarball to temporary directory: TarComponentsOutsideStrippedPrefix

3 participants