Skip to content

Conversation

@dmcgowan
Copy link
Member

@dmcgowan dmcgowan commented Sep 26, 2025

Updates the erofs snapshotter to use the mount manager. With this change, the erofs snapshotter no longer needs to mount erofs block files, but can pass it along to the runtime to mount either when a task is created or inside the shim. This add a couple new features along side this, including..

  • Blockfile support for the upper directory, allowing quota support and some runtimes to completely avoid storing container files on the host filesystem.
  • Darwin support, this now compiles and works on Darwin. We currently don't have a darwin testsuite, but this could work with a future testsuite which does not rely on mounts and pull integration testing. Pull and unpack will work on Darwin when using the erofs snapshotter.

This change is based on #12063

* **EROFS enhancements using mount manager** ([#12333](https://github.com/containerd/containerd/pull/12333))

  Improvements to EROFS snapshotter using the new mount manager service
  * **Quota Support**: Support for sized block devices as the upper layer for overlayfs
  * **Mount Lifecycle**: Loopback setup, block device creation, and overlayfs argument formatting is moved to the
     mount  manager to be performed on-demand or within the runtime.
  * **Mount handler**: To allow optimization of EROFS mount types based on the current system
  * **macOS Support**: EROFS snapshotter can now be used on Darwin to natively allow image pulls
  * **Tar index mode**: Efficiently generate EROFS metadata backed by original tar content ([#11919](https://github.com/containerd/containerd/pull/11919))

@k8s-ci-robot
Copy link

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@dmcgowan dmcgowan added this to the 2.2 milestone Sep 30, 2025
Signed-off-by: Derek McGowan <derek@mcg.dev>
Signed-off-by: Derek McGowan <derek@mcg.dev>
@dmcgowan dmcgowan force-pushed the erofs-block-files branch 2 times, most recently from 9f53900 to fbedc23 Compare October 14, 2025 06:13
Extend the mount manager to support more transformers than format. The
transformers allow altering the mount before it is passed to the mount
handlers. These could be one-time actions which are needed to perform
the mount.

Adds mkdir and mkfs actions which can be used to prepare the arguments
for a mount. The actions can be limited to actions within the target
mount directory or plugin directories.

Signed-off-by: Derek McGowan <derek@mcg.dev>
@dmcgowan
Copy link
Member Author

The process cannot access the file because it is being used by another process.

I believe these are failing on windows because this is using os.OpenRoot, which will hold a file descriptor to the root directories. I will add a cleanup for this to make sure it can teardown.

Adding a close method allows the mount manager to close any open file
descriptors. The method will also be called automatically by containerd
on shutdown.

Ensure the tests call Close to avoid leaking file descriptors or errors
on Windows cleaning up directories that are in use.

Signed-off-by: Derek McGowan <derek@mcg.dev>
Reduce the size of the test files. Even using sparse files, the reported
large size may cause issues in some test environments.

Signed-off-by: Derek McGowan <derek@mcg.dev>
@dmcgowan dmcgowan moved this from Needs Update to Review In Progress in Pull Request Review Oct 17, 2025
return fmt.Errorf("failed to remove snapshot %s: %w", key, err)
}

removals, err = s.getCleanupDirectories(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can introduce asyncRemove flag for this.

@dmcgowan dmcgowan added this pull request to the merge queue Oct 19, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 19, 2025
@dmcgowan
Copy link
Member Author

One flake in the loopback test I'll look into

Signed-off-by: Derek McGowan <derek@mcg.dev>
@dmcgowan
Copy link
Member Author

Added one more commit to address the flakiness in the loopback tests. I saw that flake outside of this PR as well, but it flaked in merge queue so should just include it here.

@fuweid fuweid added this pull request to the merge queue Oct 20, 2025
Merged via the queue into containerd:main with commit e95415b Oct 20, 2025
93 of 96 checks passed
@github-project-automation github-project-automation bot moved this from Review In Progress to Done in Pull Request Review Oct 20, 2025
aadhar-agarwal added a commit to aadhar-agarwal/containerd that referenced this pull request Oct 20, 2025
This commit integrates dm-verity integrity verification into the EROFS
snapshotter, adapted for the new mount manager architecture introduced
in PR containerd#12333.

Key changes:
- Add enableDmverity configuration option to SnapshotterConfig
- Add WithDmverity() option function for enabling dm-verity
- Implement formatDmverityLayer() to format EROFS blobs with dm-verity
  hash trees during commit phase
- Store dm-verity root hash in .roothash file for each layer
- Modify mount specifications to use 'dmverity/erofs' mount type when
  dm-verity is enabled, delegating device setup to mount manager
- Add plugin configuration option (enable_dmverity)
- Import internal/dmverity package for dm-verity operations

Architecture:
Instead of the snapshotter directly managing dm-verity devices, this
implementation uses the mount manager pattern where:
1. During Commit: format the EROFS blob with hash tree, save root hash
2. During mount: return mount spec with type 'dmverity/erofs' and
   X-containerd.dmverity.roothash-file option
3. Mount manager (via mount transformer) handles device setup/teardown

Benefits:
- Cleaner separation of concerns
- Runtime handles mounting (not snapshotter)
- No device lifecycle management in snapshotter
- Follows new architecture patterns
- Compatible with graceful shutdown

Note: This requires a corresponding mount transformer to handle the
'dmverity/erofs' mount type (to be implemented separately).
@halaney
Copy link
Contributor

halaney commented Oct 21, 2025

@dmcgowan I've been playing around with erofs the last few days, and this PR is causing issues if I've got:

  1. A user namespace enabled pod
  2. erofs snapshotter / differ enabled
  3. This series applied

Without it the pod starts fine and user namespaces are indeed used.
I dumped some debug info on the mount failure:

=== Mount syscall failed ===
mount("overlay", "/mnt/containerd/tmpmounts/containerd-mount986718758", "format/mkdir/overlay", 0x0, "workdir=/mnt/containerd/io.containerd.snapshotter.v1.erofs/snapshots/3524/work,upperdir=/mnt/containerd/io.containerd.snapshotter.v1.erofs/snapshots/3524/fs,lowerdir={{ mount 0 }},volatile")
Error: mount source: "overlay", target: "/mnt/containerd/tmpmounts/containerd-mount986718758", fstype: format/mkdir/overlay, flags: 0, data: "workdir=/mnt/containerd/io.containerd.snapshotter.v1.erofs/snapshots/3524/work,upperdir=/mnt/containerd/io.containerd.snapshotter.v1.erofs/snapshots/3524/fs,lowerdir={{ mount 0 }},volatile", err: no such device
Stack trace:
goroutine 1043 [running]:
runtime/debug.Stack()
        /usr/lib/golang/src/runtime/debug/stack.go:26 +0x6b
github.com/containerd/containerd/v2/core/mount.(*Mount).mount(0xc000a41b18, {0xc001c1b380, 0x33})
        /containerd/core/mount/mount_linux.go:176 +0x155b
github.com/containerd/containerd/v2/core/mount.(*Mount).Mount(0xc000a41b18, {0xc001c1b380, 0x33})
        /containerd/core/mount/mount.go:106 +0x30c
github.com/containerd/containerd/v2/core/mount.All({0xc001698e10, 0x2, 0x2}, {0xc001c1b380, 0x33})
        /containerd/core/mount/mount.go:52 +0x157
github.com/containerd/containerd/v2/core/mount.WithTempMount({0x2b10768, 0xc000f04330}, {0xc001698e10, 0x2, 0x2}, 0xc000a41ed8)
        /containerd/core/mount/temp.go:67 +0x514
github.com/containerd/containerd/v2/client.remapRootFS({0x2b10768, 0xc000f04330}, {0xc001698e10, 0x2, 0x2}, {{0xc000925bf0, 0x1, 0x1}, {0xc000925c70, 0x1, ...}})
        /containerd/client/container_opts_unix.go:122 +0x105
github.com/containerd/containerd/v2/client.resolveSnapshotOptions({0x2b10768, 0xc000f04330}, 0xc0004b4a00, {0xc0009242cb, 0x5}, {0x2b19ce0, 0xc0005fa180}, {0xc0004a8910, 0x47}, {0xc001685020, ...})
        /containerd/client/snapshotter_opts_unix.go:129 +0x12b3
github.com/containerd/containerd/v2/client.withNewSnapshot.func1({0x2b10768, 0xc000f04330}, 0xc0004b4a00, 0xc001032f00)
        /containerd/client/container_opts.go:253 +0x4d6
github.com/containerd/containerd/v2/internal/cri/opts.WithNewSnapshot.func1({0x2b10768, 0xc000f04330}, 0xc0004b4a00, 0xc001032f00)
        /containerd/internal/cri/opts/container.go:42 +0x85
github.com/containerd/containerd/v2/client.(*Client).NewContainer(0xc0004b4a00, {0x2b10768, 0xc000f04330}, {0xc000b842c0, 0x40}, {0xc000a44168, 0x6, 0x6})
        /containerd/client/client.go:361 +0x793
github.com/containerd/containerd/v2/internal/cri/server/podsandbox.(*Controller).Start(0xc00041eb60, {0x2b10768, 0xc001c1f6e0}, {0xc000b842c0, 0x40})
        /containerd/internal/cri/server/podsandbox/sandbox_run.go:200 +0x24b2
github.com/containerd/containerd/v2/internal/cri/server.(*criSandboxService).StartSandbox(0xc000e0b120, {0x2b10768, 0xc001c1f6e0}, {0x28db0a6, 0xa}, {0xc000b842c0, 0x40})
        /containerd/internal/cri/server/sandbox_service.go:64 +0x1f9
github.com/containerd/containerd/v2/internal/cri/server.(*criService).RunPodSandbox(0xc0003acb48, {0x2b10768, 0xc001c1f6e0}, 0xc0010d5400)
        /containerd/internal/cri/server/sandbox_run.go:280 +0x3bf3
github.com/containerd/containerd/v2/internal/cri/instrument.(*instrumentedService).RunPodSandbox(0xc0007a49d0, {0x2b10768, 0xc001c1f3b0}, 0xc0010d5400)
        /containerd/internal/cri/instrument/instrumented_service.go:83 +0x3af
k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_RunPodSandbox_Handler.func1({0x2b10768, 0xc001c1f3b0}, {0x276b420, 0xc0010d5400})
        /containerd/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api_grpc.pb.go:774 +0xf4
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2b10768, 0xc001c1f3b0}, {0x276b420, 0xc0010d5400}, 0xc000f0a960, 0xc0010d74e8)
        /containerd/vendor/github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/server.go:22 +0x368
google.golang.org/grpc.getChainUnaryHandler.func1({0x2b10768, 0xc001c1f3b0}, {0x276b420, 0xc0010d5400})
        /containerd/vendor/google.golang.org/grpc/server.go:1243 +0x11a
github.com/containerd/containerd/v2/cmd/containerd/server.unaryNamespaceInterceptor({0x2b10768, 0xc001c1f3b0}, {0x276b420, 0xc0010d5400}, 0xc000f0a960, 0xc0010d5540)
        /containerd/cmd/containerd/server/namespace.go:31 +0x13b
google.golang.org/grpc.chainUnaryInterceptors.func1({0x2b10768, 0xc001c1f3b0}, {0x276b420, 0xc0010d5400}, 0xc000f0a960, 0xc0010d74e8)
        /containerd/vendor/google.golang.org/grpc/server.go:1234 +0xf5
k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_RunPodSandbox_Handler({0x28a56e0, 0xc0007a49d0}, {0x2b10768, 0xc001c1f3b0}, 0xc001510580, 0xc000226560)
        /containerd/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api_grpc.pb.go:776 +0x22c
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0003cc200, {0x2b10768, 0xc001c1f290}, 0xc000e1c780, 0xc000280330, 0x37bd638, 0x0)
        /containerd/vendor/google.golang.org/grpc/server.go:1431 +0x192d
google.golang.org/grpc.(*Server).handleStream(0xc0003cc200, {0x2b11010, 0xc000622d00}, 0xc000e1c780)
        /containerd/vendor/google.golang.org/grpc/server.go:1842 +0xcc5
google.golang.org/grpc.(*Server).serveStreams.func2.1()
        /containerd/vendor/google.golang.org/grpc/server.go:1061 +0x178
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 468
        /containerd/vendor/google.golang.org/grpc/server.go:1072 +0x1ae
============================

I think that code path needs to learn about the mount manager?

@hsiangkao
Copy link
Member

@dmcgowan I've been playing around with erofs the last few days, and this PR is causing issues if I've got:

  1. A user namespace enabled pod
  2. erofs snapshotter / differ enabled
  3. This series applied

Without it the pod starts fine and user namespaces are indeed used. I dumped some debug info on the mount failure:

=== Mount syscall failed ===
mount("overlay", "/mnt/containerd/tmpmounts/containerd-mount986718758", "format/mkdir/overlay", 0x0, "workdir=/mnt/containerd/io.containerd.snapshotter.v1.erofs/snapshots/3524/work,upperdir=/mnt/containerd/io.containerd.snapshotter.v1.erofs/snapshots/3524/fs,lowerdir={{ mount 0 }},volatile")
Error: mount source: "overlay", target: "/mnt/containerd/tmpmounts/containerd-mount986718758", fstype: format/mkdir/overlay, flags: 0, data: "workdir=/mnt/containerd/io.containerd.snapshotter.v1.erofs/snapshots/3524/work,upperdir=/mnt/containerd/io.containerd.snapshotter.v1.erofs/snapshots/3524/fs,lowerdir={{ mount 0 }},volatile", err: no such device
Stack trace:
goroutine 1043 [running]:
runtime/debug.Stack()
        /usr/lib/golang/src/runtime/debug/stack.go:26 +0x6b
github.com/containerd/containerd/v2/core/mount.(*Mount).mount(0xc000a41b18, {0xc001c1b380, 0x33})
        /containerd/core/mount/mount_linux.go:176 +0x155b
github.com/containerd/containerd/v2/core/mount.(*Mount).Mount(0xc000a41b18, {0xc001c1b380, 0x33})
        /containerd/core/mount/mount.go:106 +0x30c
github.com/containerd/containerd/v2/core/mount.All({0xc001698e10, 0x2, 0x2}, {0xc001c1b380, 0x33})
        /containerd/core/mount/mount.go:52 +0x157
github.com/containerd/containerd/v2/core/mount.WithTempMount({0x2b10768, 0xc000f04330}, {0xc001698e10, 0x2, 0x2}, 0xc000a41ed8)
        /containerd/core/mount/temp.go:67 +0x514
github.com/containerd/containerd/v2/client.remapRootFS({0x2b10768, 0xc000f04330}, {0xc001698e10, 0x2, 0x2}, {{0xc000925bf0, 0x1, 0x1}, {0xc000925c70, 0x1, ...}})
        /containerd/client/container_opts_unix.go:122 +0x105
github.com/containerd/containerd/v2/client.resolveSnapshotOptions({0x2b10768, 0xc000f04330}, 0xc0004b4a00, {0xc0009242cb, 0x5}, {0x2b19ce0, 0xc0005fa180}, {0xc0004a8910, 0x47}, {0xc001685020, ...})
        /containerd/client/snapshotter_opts_unix.go:129 +0x12b3
github.com/containerd/containerd/v2/client.withNewSnapshot.func1({0x2b10768, 0xc000f04330}, 0xc0004b4a00, 0xc001032f00)
        /containerd/client/container_opts.go:253 +0x4d6
github.com/containerd/containerd/v2/internal/cri/opts.WithNewSnapshot.func1({0x2b10768, 0xc000f04330}, 0xc0004b4a00, 0xc001032f00)
        /containerd/internal/cri/opts/container.go:42 +0x85
github.com/containerd/containerd/v2/client.(*Client).NewContainer(0xc0004b4a00, {0x2b10768, 0xc000f04330}, {0xc000b842c0, 0x40}, {0xc000a44168, 0x6, 0x6})
        /containerd/client/client.go:361 +0x793
github.com/containerd/containerd/v2/internal/cri/server/podsandbox.(*Controller).Start(0xc00041eb60, {0x2b10768, 0xc001c1f6e0}, {0xc000b842c0, 0x40})
        /containerd/internal/cri/server/podsandbox/sandbox_run.go:200 +0x24b2
github.com/containerd/containerd/v2/internal/cri/server.(*criSandboxService).StartSandbox(0xc000e0b120, {0x2b10768, 0xc001c1f6e0}, {0x28db0a6, 0xa}, {0xc000b842c0, 0x40})
        /containerd/internal/cri/server/sandbox_service.go:64 +0x1f9
github.com/containerd/containerd/v2/internal/cri/server.(*criService).RunPodSandbox(0xc0003acb48, {0x2b10768, 0xc001c1f6e0}, 0xc0010d5400)
        /containerd/internal/cri/server/sandbox_run.go:280 +0x3bf3
github.com/containerd/containerd/v2/internal/cri/instrument.(*instrumentedService).RunPodSandbox(0xc0007a49d0, {0x2b10768, 0xc001c1f3b0}, 0xc0010d5400)
        /containerd/internal/cri/instrument/instrumented_service.go:83 +0x3af
k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_RunPodSandbox_Handler.func1({0x2b10768, 0xc001c1f3b0}, {0x276b420, 0xc0010d5400})
        /containerd/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api_grpc.pb.go:774 +0xf4
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors.UnaryServerInterceptor.func1({0x2b10768, 0xc001c1f3b0}, {0x276b420, 0xc0010d5400}, 0xc000f0a960, 0xc0010d74e8)
        /containerd/vendor/github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/server.go:22 +0x368
google.golang.org/grpc.getChainUnaryHandler.func1({0x2b10768, 0xc001c1f3b0}, {0x276b420, 0xc0010d5400})
        /containerd/vendor/google.golang.org/grpc/server.go:1243 +0x11a
github.com/containerd/containerd/v2/cmd/containerd/server.unaryNamespaceInterceptor({0x2b10768, 0xc001c1f3b0}, {0x276b420, 0xc0010d5400}, 0xc000f0a960, 0xc0010d5540)
        /containerd/cmd/containerd/server/namespace.go:31 +0x13b
google.golang.org/grpc.chainUnaryInterceptors.func1({0x2b10768, 0xc001c1f3b0}, {0x276b420, 0xc0010d5400}, 0xc000f0a960, 0xc0010d74e8)
        /containerd/vendor/google.golang.org/grpc/server.go:1234 +0xf5
k8s.io/cri-api/pkg/apis/runtime/v1._RuntimeService_RunPodSandbox_Handler({0x28a56e0, 0xc0007a49d0}, {0x2b10768, 0xc001c1f3b0}, 0xc001510580, 0xc000226560)
        /containerd/vendor/k8s.io/cri-api/pkg/apis/runtime/v1/api_grpc.pb.go:776 +0x22c
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0003cc200, {0x2b10768, 0xc001c1f290}, 0xc000e1c780, 0xc000280330, 0x37bd638, 0x0)
        /containerd/vendor/google.golang.org/grpc/server.go:1431 +0x192d
google.golang.org/grpc.(*Server).handleStream(0xc0003cc200, {0x2b11010, 0xc000622d00}, 0xc000e1c780)
        /containerd/vendor/google.golang.org/grpc/server.go:1842 +0xcc5
google.golang.org/grpc.(*Server).serveStreams.func2.1()
        /containerd/vendor/google.golang.org/grpc/server.go:1061 +0x178
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 468
        /containerd/vendor/google.golang.org/grpc/server.go:1072 +0x1ae
============================

I think that code path needs to learn about the mount manager?

should we apply the similar logic in pkg/oci/spec_opts.go:withReadonlyFS() to
client/snapshotter_opts_unix.go:resolveSnapshotOptions() and
client/container_opts_unix.go: withRemappedSnapshotBase() ?

@dmcgowan dmcgowan deleted the erofs-block-files branch October 21, 2025 02:14
@dmcgowan
Copy link
Member Author

I probably need to better understand the id-mapped mount code path we have. My understanding is that we should be able to avoid the expensive snapshotter remapping logic. We could add the similar logic to those functions to make it complete, however, it is still slow and not the best way to achieve this. Especially with erofs, it is way better to just remap the metadata and avoid the costly mount->chown->commit. That inefficient approach was only intended as a stop gap until id-mapped mounts support or native snapshotter support.

@hsiangkao
Copy link
Member

hsiangkao commented Oct 21, 2025

I probably need to better understand the id-mapped mount code path we have. My understanding is that we should be able to avoid the expensive snapshotter remapping logic. We could add the similar logic to those functions to make it complete, however, it is still slow and not the best way to achieve this. Especially with erofs, it is way better to just remap the metadata and avoid the costly mount->chown->commit. That inefficient approach was only intended as a stop gap until id-mapped mounts support or native snapshotter support.

Yeah, agreed. I think idmap support is fine but this feature can only be used for newer kernels.

Also for older kernels, mkfs.erofs has --{u,g}id-offset= options and with fsmerge feature, mount->chown->commit cycle can be avoided (assuming that we don't want to regenerate each layer with shifted-ids).

Yet if fsmerge feature is off and kernels don't have idmap support, I'm not sure if we need such compat logic.

@halaney
Copy link
Contributor

halaney commented Oct 21, 2025

I probably need to better understand the id-mapped mount code path we have. My understanding is that we should be able to avoid the expensive snapshotter remapping logic. We could add the similar logic to those functions to make it complete, however, it is still slow and not the best way to achieve this. Especially with erofs, it is way better to just remap the metadata and avoid the costly mount->chown->commit. That inefficient approach was only intended as a stop gap until id-mapped mounts support or native snapshotter support.

Yeah, agreed. I think idmap support is fine but this feature can only be used for newer kernels.

Also for older kernels, mkfs.erofs has --{u,g}id-offset= options and with fsmerge feature, mount->chown->commit cycle can be avoided (assuming that we don't want to regenerate each layer with shifted-ids).

Yet if fsmerge feature is off and kernels don't have idmap support, I'm not sure if we need such compat logic.

Oh wow, I thought we were using idmap support for the container in the end and it was just a particular call path that was bugging out and using the remap chown path. I realize now looking at my test setup that we're not doing the idmap at all. I need to revisit that all in the morning, I thought since (at least prior to this series) the Mount::Type was overlay for the erofs snapshotter that we'd hit this path during mount and end up doing the idmap bind mount for the lower dirs still, but that doesn't seem to be the case.

We should definitely switch things over to using idmap support for erofs too, but I do wonder how we should handle cases like this where it seems the snapshotter expects mount manager to handle things but the call path isn't setup for that. To be honest I also need to look more at the mount manager integration too to give any useful suggestions.

@hsiangkao
Copy link
Member

I probably need to better understand the id-mapped mount code path we have. My understanding is that we should be able to avoid the expensive snapshotter remapping logic. We could add the similar logic to those functions to make it complete, however, it is still slow and not the best way to achieve this. Especially with erofs, it is way better to just remap the metadata and avoid the costly mount->chown->commit. That inefficient approach was only intended as a stop gap until id-mapped mounts support or native snapshotter support.

Yeah, agreed. I think idmap support is fine but this feature can only be used for newer kernels.
Also for older kernels, mkfs.erofs has --{u,g}id-offset= options and with fsmerge feature, mount->chown->commit cycle can be avoided (assuming that we don't want to regenerate each layer with shifted-ids).
Yet if fsmerge feature is off and kernels don't have idmap support, I'm not sure if we need such compat logic.

Oh wow, I thought we were using idmap support for the container in the end and it was just a particular call path that was bugging out and using the remap chown path. I realize now looking at my test setup that we're not doing the idmap at all. I need to revisit that all in the morning, I thought since (at least prior to this series) the Mount::Type was overlay for the erofs snapshotter that we'd hit this path during mount and end up doing the idmap bind mount for the lower dirs still, but that doesn't seem to be the case.

We should definitely switch things over to using idmap support for erofs too, but I do wonder how we should handle cases like this where it seems the snapshotter expects mount manager to handle things but the call path isn't setup for that. To be honest I also need to look more at the mount manager integration too to give any useful suggestions.

The erofs snapshotter doesn't adapt containerd idmapped mount support initially, but the kernel already supports this feature. Anyway, if we really need idmapped mount (in addition to old chown), it needs some extra development anyway.

@halaney
Copy link
Contributor

halaney commented Oct 21, 2025

I probably need to better understand the id-mapped mount code path we have. My understanding is that we should be able to avoid the expensive snapshotter remapping logic. We could add the similar logic to those functions to make it complete, however, it is still slow and not the best way to achieve this. Especially with erofs, it is way better to just remap the metadata and avoid the costly mount->chown->commit. That inefficient approach was only intended as a stop gap until id-mapped mounts support or native snapshotter support.

Yeah, agreed. I think idmap support is fine but this feature can only be used for newer kernels.
Also for older kernels, mkfs.erofs has --{u,g}id-offset= options and with fsmerge feature, mount->chown->commit cycle can be avoided (assuming that we don't want to regenerate each layer with shifted-ids).
Yet if fsmerge feature is off and kernels don't have idmap support, I'm not sure if we need such compat logic.

Oh wow, I thought we were using idmap support for the container in the end and it was just a particular call path that was bugging out and using the remap chown path. I realize now looking at my test setup that we're not doing the idmap at all. I need to revisit that all in the morning, I thought since (at least prior to this series) the Mount::Type was overlay for the erofs snapshotter that we'd hit this path during mount and end up doing the idmap bind mount for the lower dirs still, but that doesn't seem to be the case.
We should definitely switch things over to using idmap support for erofs too, but I do wonder how we should handle cases like this where it seems the snapshotter expects mount manager to handle things but the call path isn't setup for that. To be honest I also need to look more at the mount manager integration too to give any useful suggestions.

The erofs snapshotter doesn't adapt containerd idmapped mount support initially, but the kernel already supports this feature. Anyway, if we really need idmapped mount (in addition to old chown), it needs some extra development anyway.

I've worked on this some today and think I got something going, I'll try and post a PR soon. Need to rebase with the mount manager support and see if that breaks anything, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants