Skip to content

Import cstor-rs, use for reflinked imports from containers-storage#218

Draft
cgwalters wants to merge 8 commits intocomposefs:mainfrom
cgwalters:import-cstor-rs
Draft

Import cstor-rs, use for reflinked imports from containers-storage#218
cgwalters wants to merge 8 commits intocomposefs:mainfrom
cgwalters:import-cstor-rs

Conversation

@cgwalters
Copy link
Collaborator

Obviously massively Assisted-by: OpenCode (Opus 4.5) here; I did a lot of design work of course and but this is looking pretty nice to me, and it seems to work well!

Copy link
Collaborator Author

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a review here so it's more visible

@giuseppe
Copy link
Collaborator

I've not looked into details yet, but are we comfortable with the risks of these intrusive handling of the storage library? The storage library doesn't have any stability guarantees and can change at any time

@cgwalters
Copy link
Collaborator Author

I've not looked into details yet, but are we comfortable with the risks of these intrusive handling of the storage library? The storage library doesn't have any stability guarantees and can change at any time

Is there anything fundamentally different from the storage access in bootc-dev/bootc#1962 though?

As far as the "can change at any time" - in reality we have pretty strong constraints there on any truly backwards-incompatible changes right? Because even though there's one Go library for it, there's multiple tools using that library which may not always be upgraded at the same time and I definitely remember an incident with cri-o + podman on different schedules.

I think if we did any future breaking changes, we'd just need to remember to also update this library right? We can enforce that with CI.

@giuseppe
Copy link
Collaborator

Is there anything fundamentally different from the storage access in bootc-dev/bootc#1962 though?

No, but I've added that only for the PoC, I think that logic belongs to container-libs.

As far as the "can change at any time" - in reality we have pretty strong constraints there on any truly backwards-incompatible changes right? Because even though there's one Go library for it, there's multiple tools using that library which may not always be upgraded at the same time and I definitely remember an incident with cri-o + podman on different schedules.

Sure, but testing with the previous version of the container tools will be caught earlier (and hopefully block a release), while it could be too late for a separate implementation as the one proposed here.

As long as we are OK with the potential risks, I am OK with this. It is better to get unblocked and make some progress instead of waiting for the "perfect" solution.

@cgwalters
Copy link
Collaborator Author

No, but I've added that only for the PoC, I think that logic belongs to container-libs.

Yes, agreed. I think we can pursue a proper API for this (and I like the jsonrpc-fdpass + splitfdstream, but obviously up for debate)

That said, there's an entirely different path to pursue here, which is a Go implementation of what's in composefs-rs (this repo) (or redoing the containers-storage-composefs backend on IPC to cfsctl which we productize and ship, or a hybrid). I suspect that such a path would be relatively straightforward, and would force us to standardize even more the file formats etc.

@allisonkarlitskaya
Copy link
Collaborator

So from what I can tell, the main thrust here is to get the reflinking thing between c-storage and composefs-rs working, right?

I'd have a really strong preference here to add a new API to the existing skopeo proxy and containers-image-store-rs. There are already fd-passing APIs there: we'd just need a lot more fds...

@giuseppe's argument about having a parallel implementation has traction with me. I'm not sure that I'd prefer a vibecoded containers-storage reimplementation, even if it is in Rust, compared to the real thing, specifically because we're gonna RPC to it anyway...

See also #137 for existing code that understands enough of tarsplit (because it's part of zstd:chunked) to get its job done...

I know you're trying to reduce CI pain here by decreasing the size of the stack of modules we have, but I'm not sure how I feel about mass-importing a vibecoding session here... the fact that most of that is in a separate crate helps a bit, of course.

Also: are you still 100% sure "pull into c-storage and then copy to composefs" is the way we definitely want to do this? I know the arguments about exotic authentication mechanisms, but skopeo would cover us there. I also know the arguments about zstd:chunked, but we have our own support in #137. I also know the arguments about combining zstd:chunked with the exotic authentication mechanisms thing and that's where I have a bit of sympathy for this approach, but ... it seems like maybe we could possibly figure something out still? Also: if this is the plan, then we'll need skopeo (at least) around anyway to do the pulling for us, so having a separate implementation of c-storage won't really gain us anything in terms of reducing on-disk size, etc...

@travier
Copy link
Collaborator

travier commented Jan 30, 2026

Also: are you still 100% sure "pull into c-storage and then copy to composefs" is the way we definitely want to do this?

Another argument for this logic is local layering. Pull the updated image in c-storage, do a local layered build with podman, import into composefs "for free".

@allisonkarlitskaya
Copy link
Collaborator

Also: are you still 100% sure "pull into c-storage and then copy to composefs" is the way we definitely want to do this?

Another argument for this logic is local layering. Pull the updated image in c-storage, do a local layered build with podman, import into composefs "for free".

Indeed. That's exactly what we do for our examples, and it'd be nice to see those be more efficient...

@cgwalters
Copy link
Collaborator Author

but I'm not sure how I feel about mass-importing a vibecoding session here...

I understand the point of what you're saying, but the definition of vibecoding is not looking at the code, and I can assure you that I have and will continue to read and sanity check all "core" code to which I attach my name. This code will be responsible for updating my operating system and those of people and customers, and so for sure it will not be "vibecoded" by the generally accepted definition of that term. "Includes substantial AI generation" - absolutely. But this took quite a bit of iteration.
This is of course an industry-wide debate, with many reasonable positions to take.

To me another key input into this is "could the person have written it themselves"? And I hope you'd agree for this the answer is yes, because I have a long track record from before LLMs even came on the scene. It's going to be a lot harder for people joining the industry now, but that's a whole other thing...


Also: are you still 100% sure "pull into c-storage and then copy to composefs" is the way we definitely want to do this?

I don't think that has to be the only way, no. I'm totally fine with an opt-in mode here that uses e.g. oci-distribution - it would be a very obvious thing to support. I just don't think we can use it in bootc at least by default.

Also: if this is the plan, then we'll need skopeo (at least) around anyway to do the pulling for us, so having a separate implementation of c-storage won't really gain us anything in terms of reducing on-disk size, etc...

I agree it makes sense to put this in skopeo (and/or podman). There's no question there and I already have some PoC patches there, but we need to come to consensus on the design (jsonrpc-fdpass for IPC + splitfdstream for tar layer interchange is again my proposal).

But that said: this whole problem has been so painful for so long and I really want to make fowrard progress - patching skopeo also implies waiting on a whole review-merge-release-ship cycle, probably fastest path a month. Not the end of the world if we agree and commit. There's a flip side though - we can make this an opt-in feature and gain more experience with it. The places that would use this in bootc at least are already both experimental (unified storage and composefs). Although we'd need to be careful to turn off this code path for bootc install and bootc switch --transport containers-storage by default if we did that.

@allisonkarlitskaya
Copy link
Collaborator

But that said: this whole problem has been so painful for so long and I really want to make fowrard progress - patching skopeo also implies waiting on a whole review-merge-release-ship cycle, probably fastest path a month.

How about incredibly evil hacks? We know the ID of the container and the ID of the layers and the names of the files in the tar archives... and you know what the layout of the repository looks like... could we just go looking for the files? skopeo would waste its time reassembling the tar archive and streaming it to us (just to have us ignore it), but it would have to do that anyway in order to hash it...

The the thing on the other side of the RPC looks more like "open this [named] file for me"...

@cgwalters
Copy link
Collaborator Author

How about incredibly evil hacks? We know the ID of the container and the ID of the layers and the names of the files in the tar archives... and you know what the layout of the repository looks like... could we just go looking for the files?

How is that different than what this code is doing? What specific portion of the code here feels riskier than re-implementing it a different way?

but it would have to do that anyway in order to hash it...

I don't think we need to validate the tar digest ("diffid") when pulling from containers-storage: by default. Should we support it? Absolutely. But there's a notable difference in efficiency. IMO where we really want to go is "composefs digest per layer as replacement for diffid" and I have some work on that.

@cgwalters
Copy link
Collaborator Author

See also #137 for existing code that understands enough of tarsplit (because it's part of zstd:chunked) to get its job done...

What's funny of course is that you and I are almost arguing opposite ends here in the two PRs of "should we reimplement here parts of container-libs" 😄

(But to clarify for me though, unified storage is WAY more important than zstd:chunked)

@allisonkarlitskaya
Copy link
Collaborator

What's funny of course is that you and I are almost arguing opposite ends here in the two PRs of "should we reimplement here parts of container-libs" 😄

I suppose I always dreamed of containers-storage backending directly on to composefs and no longer having its own repository. That would be a better future, I think we could agree...

And in that context, stuff like "ability to pull directly from the network without skopeo" starts to make a lot more sense...

@allisonkarlitskaya
Copy link
Collaborator

I suppose I always dreamed of containers-storage backending directly on to composefs.

Actually, there's a thought: we could teach skopeo a bit about how to write to composefs... the object storage part is obviously the most important bit, but it's also the easiest bit: instead of it handing us a tonne of fds that we do reflinks to we could just hand it our object directory (as an fd, or by name) and have it do that part...

@cgwalters
Copy link
Collaborator Author

I suppose I always dreamed of containers-storage backending directly on to composefs and no longer having its own repository. That would be a better future, I think we could agree...

Yes I think as part of this we must deliver sealed OCI and a revamp of the current containers-storage: composefs backend is in order IMO. However: we need a solid impl of https://github.com/containers/composefs-rs/blob/main/doc/plans/oci-sealing-spec.md
(I'll post some WIP on that soon)

Actually, there's a thought: we could teach skopeo a bit about how to write to composefs...

Yes but there's a shorter term sticking point: For bootc because we have logically bound images at the current time we must expose our images in a way that current podman (i.e. container-libs) understands.

For this reason among others I would like to stick with the plan of pulling into containers-storage first, then reflinking to composefs.

@giuseppe
Copy link
Collaborator

giuseppe commented Feb 2, 2026

Actually, there's a thought: we could teach skopeo a bit about how to write to composefs...

Yes but there's a shorter term sticking point: For bootc because we have logically bound images at the current time we must expose our images in a way that current podman (i.e. container-libs) understands.

For this reason among others I would like to stick with the plan of pulling into containers-storage first, then reflinking to composefs.

why wouldn't this work if we teach skopeo to copy from the containers-storage to composefs (of course we need to fix the part where we go through a tarstream)?

@cgwalters
Copy link
Collaborator Author

why wouldn't this work if we teach skopeo to copy from the containers-storage to composefs (of course we need to fix the part where we go through a tarstream)?

I'm obviously fine with that, however I think it's more generally useful to ensure we can reflink containers-storage:containers-storage: and ostree too - and I think the combination of splitfdstream (and jsonrpc-fdpass for cross-lang/process IPC) is a good solution, but that's what I'm looking for feedback on.

@cgwalters
Copy link
Collaborator Author

If changes on the container-libs side are going to stall for a long time, I have a preference to clean this one up, rebase etc. and get it in. We can always cut over to using container-libs/skopeo for this once support lands there!

@cgwalters cgwalters force-pushed the import-cstor-rs branch 3 times, most recently from 6c16d20 to 77beaa2 Compare February 27, 2026 18:59
Move imports only used within #[cfg(feature = "oci")] blocks behind
the feature flag to eliminate unused import warnings when building
without the oci feature.

Assisted-by: OpenCode (Claude claude-opus-4-5@20251101)
Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive
headers using the zerocopy crate. It supports:

- POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats
- Base-256 encoding for large values (GNU extension)
- EntryType enum for all standard tar entry types
- Checksum verification

The goal is to share code between composefs-oci and cstorage (which both
do tar header parsing), and eventually enable upstream contribution to
the tar-rs crate (ref: alexcrichton/tar-rs#392).

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
Migrate composefs-oci to use the shared tar-header crate instead of
the tar crate for header parsing. The tar crate is still used for
PaxExtensions (PAX header parsing) and Builder (test utilities).

This eliminates duplicate tar header parsing logic and uses
zerocopy for safe, zero-copy access to header fields.

Assisted-by: OpenCode (Opus 4.5)
Signed-off-by: Colin Walters <walters@verbum.org>
Provide read-only access to containers-storage (the storage backend
used by podman, buildah, and other container tools) so that composefs
can import layers directly without re-downloading them.

Key components:
- Storage: discovers and opens storage locations, image/layer lookup
- Layer: overlay layer with content access via diff_dir
- Image: OCI image with manifest/config parsing
- TarSplitFdStream: tar-split metadata streaming for zero-copy import
- LockFile: wire-compatible locking with Go containers/storage
- userns helper: JSON-RPC process spawned via `podman unshare` for
  rootless access to files with restrictive permissions

Uses the tar-header crate for header parsing and cap-std for
capability-based file operations.

Adapted from cgwalters/cstor-rs.

Assisted-by: OpenCode (Claude Opus)
Signed-off-by: Colin Walters <walters@verbum.org>
Enable importing container images directly from podman/buildah's local
storage into composefs repositories. This avoids re-downloading layers
that are already present on disk, and uses FICLONE reflinks when the
filesystem supports them (btrfs, XFS) for zero-copy object storage.

The `ensure_object_from_file` method on Repository tries reflink first,
falling back to regular copy on EOPNOTSUPP/EXDEV. A `Reflinked` variant
is added to `ObjectStoreMethod` so stats can distinguish zero-copy from
copied storage.

The cstor module reads tar-split metadata from containers-storage and
streams it into splitstreams. When running rootless, a helper process
is spawned via `podman unshare` to read files with restrictive permissions.

The `pull()` function in composefs-oci now automatically routes
`containers-storage:` references to the native import path.

Usage: cfsctl oci pull containers-storage:alpine:latest

Assisted-by: OpenCode (Claude Opus)
Signed-off-by: Colin Walters <walters@verbum.org>
Add a human-readable Display impl for ImportStats that formats
object counts and byte sizes (e.g. "42 new + 100 already present
objects; 1.5 MB stored, 800 B inlined"), plus a total_objects()
convenience method. This makes it easy for consumers like bootc
to log pull statistics.

Assisted-by: OpenCode (claude-opus-4-6)
Now that bootc vendors the latest composefs-rs, we can
re-sync the CI here to re-test using the bootc tests.

Update the patch URL, add cfsctl to the patched crates, and pass the
required arguments through to bootc's test-composefs target (defaulting
to grub + ext4).

Assisted-by: OpenCode (Claude claude-opus-4-6)
Signed-off-by: Colin Walters <walters@verbum.org>
I think we were just above 1h before, let's give ourselves some
headroom.

(obviously it'd be nicer to make this faster)

Signed-off-by: Colin Walters <walters@verbum.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants