-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Add tar index mode to erofs snapshotter #11919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tar index mode to erofs snapshotter #11919
Conversation
|
Hi @aadhar-agarwal. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
b7ecd1d to
b6cb58f
Compare
|
Hi @aadhar-agarwal, I think you could mark it as non-draft, since I think it's finished and we've already reviewed for several rounds before: aadhar-agarwal#1 Mark it as non-draft so that other folks think it's mature to be reviewed. |
59e13d3 to
b936a8d
Compare
f5dfbf5 to
bac7024
Compare
|
@aadhar-agarwal can you merge the commits? |
|
/cc @dmcgowan @fuweid @AkihiroSuda @djdongjin |
bac7024 to
78e5e38
Compare
|
Yes, confidential containers require cryptographic verification of all image content before execution to maintain the trusted computing base. Dm-verity can provide this verification by creating merkle trees over block devices, but this creates a performance challenge (if we are doing full tar extraction before verification). The tar index mode can addresse this by enabling direct mounting of tar files as dm-verity block devices with on-demand file access. This eliminates the need for full extraction before verification. This matters for confidential workloads because extraction time directly impacts container startup performance. Without tar index mode, the overhead of dm-verity computation during full extraction could significantly degrade performance. |
I guess that is due to we have tar diffID for each layer according to the OCI image spec, so that we don't need to reinvent a new way to verify the image layer content for confidential containers but just calculate the sha256 of the original tar data (because erofs could just reuse the tar data with 512-byte fs block size and build a minimal index for direct mounting of tar) out of the tar index mode in the guest and compare it with each diffID. If it's possible, could you also document this use case (in the doc and PR message) too? |
78e5e38 to
8956352
Compare
Signed-off-by: Aadhar Agarwal <aadagarwal@microsoft.com> Minor style updates to erofs.md and differ_linux.go Signed-off-by: Aadhar Agarwal <aadagarwal@microsoft.com> Add use case for tar index in erofs.md Signed-off-by: Aadhar Agarwal <aadagarwal@microsoft.com>
8956352 to
b641933
Compare
AkihiroSuda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a test?
|
/ok-to-test |
I think we could have a test (see if @aadhar-agarwal can help on this), yet the another CI PR still failed by device mapper snapshotter flaky.. |
Yeah, I can create another PR to add a test for the tar index mode |
Summary
This PR introduces support for a new "tar index" mode in the EROFS snapshotter and differ. The tar index mode enables more efficient handling of OCI image layers by generating a tar index and appending the original tar content
Key Changes
GenerateTarIndexAndAppendTarto create a combined EROFS layer with a tar index and tar content.SupportGenerateFromTarto detect mkfs.erofs tar mode support.Motivation
The tar index approach provides computational advantages, particularly when integrated with dm-verity. When testing with an Ubuntu 20.04 image layer, it takes about 6s to generate the merkle tree. We would like to offload this process to happen off the container host ahead of time and can be stored in the registry. We will also use the registry to store the root hash dm-verity signature, so we would need to fetch that anyway.
Since we will be fetching the dm-verity merkle tree and the root hash signature from the registry, we can also fetch the tar index generated by erofs utils. While generating the tar index is much less computationally intensive, it would still result in unnecessary computation on per node basis.
Finally, we would like to have a fallback mechanism that is consistent with the artifacts published to the registry (the merkle tree and the tar index). For that, we would like to not only have the logic in the differ to support appending tar to the tar index fetched from the registry, but also the ability to generate the tar index. This way, if the index is not available in the registry, it can be generated on the fly on the node.
As to why we prefer the erofs tar index over the erofs blob, is that since we have already pulled the layer tar, we don't want to repull the full erofs blob, which would be effectively similar in size to the tar layer. The tar index is much smaller.
In addition, we have a tar diffID for each layer according to the OCI image spec, so we don't need to reinvent a new way to verify the image layer content for confidential containers but just calculate the sha256 of the original tar data (because erofs could just reuse the tar data with 512-byte fs block size and build a minimal index for direct mounting of tar) out of the tar index mode in the guest and compare it with each diffID.
Configuration
To enable tar index mode, set
enable_tar_index = truein the differ plugin configuration.