Skip to content

Conversation

@aadhar-agarwal
Copy link
Contributor

@aadhar-agarwal aadhar-agarwal commented May 30, 2025

Summary

This PR introduces support for a new "tar index" mode in the EROFS snapshotter and differ. The tar index mode enables more efficient handling of OCI image layers by generating a tar index and appending the original tar content

Key Changes

  • docs/snapshotters/erofs.md: Added documentation for the new tar index mode, including configuration and usage details.
  • internal/erofsutils/mount_linux.go:
    • Added GenerateTarIndexAndAppendTar to create a combined EROFS layer with a tar index and tar content.
    • Added SupportGenerateFromTar to detect mkfs.erofs tar mode support.
  • plugins/diff/erofs/differ_linux.go:
    • Refactored to support tar index mode via options.
    • Differentiated between standard and tar index conversion logic.
  • plugins/diff/erofs/plugin/plugin_linux.go:
    • Updated plugin config to support enabling tar index mode via TOML.
    • Checked for mkfs.erofs tar mode support during plugin initialization.

Motivation

The tar index approach provides computational advantages, particularly when integrated with dm-verity. When testing with an Ubuntu 20.04 image layer, it takes about 6s to generate the merkle tree. We would like to offload this process to happen off the container host ahead of time and can be stored in the registry. We will also use the registry to store the root hash dm-verity signature, so we would need to fetch that anyway.

Since we will be fetching the dm-verity merkle tree and the root hash signature from the registry, we can also fetch the tar index generated by erofs utils. While generating the tar index is much less computationally intensive, it would still result in unnecessary computation on per node basis.

Finally, we would like to have a fallback mechanism that is consistent with the artifacts published to the registry (the merkle tree and the tar index). For that, we would like to not only have the logic in the differ to support appending tar to the tar index fetched from the registry, but also the ability to generate the tar index. This way, if the index is not available in the registry, it can be generated on the fly on the node.

As to why we prefer the erofs tar index over the erofs blob, is that since we have already pulled the layer tar, we don't want to repull the full erofs blob, which would be effectively similar in size to the tar layer. The tar index is much smaller.

In addition, we have a tar diffID for each layer according to the OCI image spec, so we don't need to reinvent a new way to verify the image layer content for confidential containers but just calculate the sha256 of the original tar data (because erofs could just reuse the tar data with 512-byte fs block size and build a minimal index for direct mounting of tar) out of the tar index mode in the guest and compare it with each diffID.

Configuration

To enable tar index mode, set enable_tar_index = true in the differ plugin configuration.

* **Add tar index mode to EROFS snapshotter**

@k8s-ci-robot
Copy link

Hi @aadhar-agarwal. Thanks for your PR.

I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hsiangkao
Copy link
Member

Hi @aadhar-agarwal, I think you could mark it as non-draft, since I think it's finished and we've already reviewed for several rounds before: aadhar-agarwal#1

Mark it as non-draft so that other folks think it's mature to be reviewed.

@aadhar-agarwal aadhar-agarwal marked this pull request as ready for review May 30, 2025 23:18
@dosubot dosubot bot added the area/distribution Image Distribution label May 30, 2025
@aadhar-agarwal aadhar-agarwal force-pushed the aadagarwal/add_tar_index_mode branch 2 times, most recently from 59e13d3 to b936a8d Compare June 12, 2025 20:53
@aadhar-agarwal aadhar-agarwal force-pushed the aadagarwal/add_tar_index_mode branch from f5dfbf5 to bac7024 Compare June 17, 2025 22:22
@hsiangkao
Copy link
Member

hsiangkao commented Jun 18, 2025

@aadhar-agarwal can you merge the commits?
It seems it's no need to seperate mediate commits also it's non-bisectionable.

@hsiangkao
Copy link
Member

hsiangkao commented Jun 18, 2025

/cc @dmcgowan @fuweid @AkihiroSuda @djdongjin
could you help take a look of this commit? although I think the use cases are still not quite fully documented.
Especially I heard it's also quite useful for confidential container use cases. I'm not sure if @miz060 would like to reveal it now, because it also makes sense for the tar index mode.

@miz060
Copy link

miz060 commented Jun 24, 2025

Yes, confidential containers require cryptographic verification of all image content before execution to maintain the trusted computing base. Dm-verity can provide this verification by creating merkle trees over block devices, but this creates a performance challenge (if we are doing full tar extraction before verification).

The tar index mode can addresse this by enabling direct mounting of tar files as dm-verity block devices with on-demand file access. This eliminates the need for full extraction before verification.

This matters for confidential workloads because extraction time directly impacts container startup performance. Without tar index mode, the overhead of dm-verity computation during full extraction could significantly degrade performance.

@hsiangkao
Copy link
Member

hsiangkao commented Jun 25, 2025

Yes, confidential containers require cryptographic verification of all image content before execution to maintain the trusted computing base. Dm-verity can provide this verification by creating merkle trees over block devices, but this creates a performance challenge (if we are doing full tar extraction before verification).

The tar index mode can addresse this by enabling direct mounting of tar files as dm-verity block devices with on-demand file access. This eliminates the need for full extraction before verification.

This matters for confidential workloads because extraction time directly impacts container startup performance. Without tar index mode, the overhead of dm-verity computation during full extraction could significantly degrade performance.

I guess that is due to we have tar diffID for each layer according to the OCI image spec, so that we don't need to reinvent a new way to verify the image layer content for confidential containers but just calculate the sha256 of the original tar data (because erofs could just reuse the tar data with 512-byte fs block size and build a minimal index for direct mounting of tar) out of the tar index mode in the guest and compare it with each diffID.

If it's possible, could you also document this use case (in the doc and PR message) too?

@aadhar-agarwal aadhar-agarwal force-pushed the aadagarwal/add_tar_index_mode branch from 78e5e38 to 8956352 Compare July 8, 2025 21:38
Signed-off-by: Aadhar Agarwal <aadagarwal@microsoft.com>

Minor style updates to erofs.md and differ_linux.go

Signed-off-by: Aadhar Agarwal <aadagarwal@microsoft.com>

Add use case for tar index in erofs.md

Signed-off-by: Aadhar Agarwal <aadagarwal@microsoft.com>
@aadhar-agarwal aadhar-agarwal force-pushed the aadagarwal/add_tar_index_mode branch from 8956352 to b641933 Compare July 8, 2025 21:44
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a test?

@github-project-automation github-project-automation bot moved this from Needs Triage to Review In Progress in Pull Request Review Jul 9, 2025
@AkihiroSuda AkihiroSuda added this pull request to the merge queue Jul 9, 2025
@AkihiroSuda
Copy link
Member

/ok-to-test

@hsiangkao
Copy link
Member

Can we have a test?

I think we could have a test (see if @aadhar-agarwal can help on this), yet the another CI PR still failed by device mapper snapshotter flaky..

@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 9, 2025
@AkihiroSuda AkihiroSuda added this pull request to the merge queue Jul 9, 2025
Merged via the queue into containerd:main with commit 8576e59 Jul 9, 2025
52 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Jul 9, 2025
@aadhar-agarwal
Copy link
Contributor Author

Can we have a test?

Yeah, I can create another PR to add a test for the tar index mode

@dmcgowan dmcgowan changed the title erofs snapshotter: Add tar index mode Add tar index mode to erofs snapshotter Sep 11, 2025
@dmcgowan dmcgowan added area/storage Image Storage and removed area/distribution Image Distribution labels Sep 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

7 participants