Import cstor-rs, use for reflinked imports from containers-storage#218
Import cstor-rs, use for reflinked imports from containers-storage#218cgwalters wants to merge 8 commits intocomposefs:mainfrom
Conversation
ed16d50 to
6df2401
Compare
cgwalters
left a comment
There was a problem hiding this comment.
Doing a review here so it's more visible
|
I've not looked into details yet, but are we comfortable with the risks of these intrusive handling of the storage library? The storage library doesn't have any stability guarantees and can change at any time |
Is there anything fundamentally different from the storage access in bootc-dev/bootc#1962 though? As far as the "can change at any time" - in reality we have pretty strong constraints there on any truly backwards-incompatible changes right? Because even though there's one Go library for it, there's multiple tools using that library which may not always be upgraded at the same time and I definitely remember an incident with cri-o + podman on different schedules. I think if we did any future breaking changes, we'd just need to remember to also update this library right? We can enforce that with CI. |
6df2401 to
455891a
Compare
No, but I've added that only for the PoC, I think that logic belongs to container-libs.
Sure, but testing with the previous version of the container tools will be caught earlier (and hopefully block a release), while it could be too late for a separate implementation as the one proposed here. As long as we are OK with the potential risks, I am OK with this. It is better to get unblocked and make some progress instead of waiting for the "perfect" solution. |
Yes, agreed. I think we can pursue a proper API for this (and I like the jsonrpc-fdpass + splitfdstream, but obviously up for debate) That said, there's an entirely different path to pursue here, which is a Go implementation of what's in composefs-rs (this repo) (or redoing the containers-storage-composefs backend on IPC to |
|
So from what I can tell, the main thrust here is to get the reflinking thing between c-storage and composefs-rs working, right? I'd have a really strong preference here to add a new API to the existing skopeo proxy and containers-image-store-rs. There are already fd-passing APIs there: we'd just need a lot more fds... @giuseppe's argument about having a parallel implementation has traction with me. I'm not sure that I'd prefer a vibecoded containers-storage reimplementation, even if it is in Rust, compared to the real thing, specifically because we're gonna RPC to it anyway... See also #137 for existing code that understands enough of tarsplit (because it's part of zstd:chunked) to get its job done... I know you're trying to reduce CI pain here by decreasing the size of the stack of modules we have, but I'm not sure how I feel about mass-importing a vibecoding session here... the fact that most of that is in a separate crate helps a bit, of course. Also: are you still 100% sure "pull into c-storage and then copy to composefs" is the way we definitely want to do this? I know the arguments about exotic authentication mechanisms, but skopeo would cover us there. I also know the arguments about zstd:chunked, but we have our own support in #137. I also know the arguments about combining zstd:chunked with the exotic authentication mechanisms thing and that's where I have a bit of sympathy for this approach, but ... it seems like maybe we could possibly figure something out still? Also: if this is the plan, then we'll need skopeo (at least) around anyway to do the pulling for us, so having a separate implementation of c-storage won't really gain us anything in terms of reducing on-disk size, etc... |
Another argument for this logic is local layering. Pull the updated image in c-storage, do a local layered build with podman, import into composefs "for free". |
Indeed. That's exactly what we do for our examples, and it'd be nice to see those be more efficient... |
I understand the point of what you're saying, but the definition of vibecoding is not looking at the code, and I can assure you that I have and will continue to read and sanity check all "core" code to which I attach my name. This code will be responsible for updating my operating system and those of people and customers, and so for sure it will not be "vibecoded" by the generally accepted definition of that term. "Includes substantial AI generation" - absolutely. But this took quite a bit of iteration. To me another key input into this is "could the person have written it themselves"? And I hope you'd agree for this the answer is yes, because I have a long track record from before LLMs even came on the scene. It's going to be a lot harder for people joining the industry now, but that's a whole other thing...
I don't think that has to be the only way, no. I'm totally fine with an opt-in mode here that uses e.g. oci-distribution - it would be a very obvious thing to support. I just don't think we can use it in bootc at least by default.
I agree it makes sense to put this in skopeo (and/or podman). There's no question there and I already have some PoC patches there, but we need to come to consensus on the design (jsonrpc-fdpass for IPC + splitfdstream for tar layer interchange is again my proposal). But that said: this whole problem has been so painful for so long and I really want to make fowrard progress - patching skopeo also implies waiting on a whole review-merge-release-ship cycle, probably fastest path a month. Not the end of the world if we agree and commit. There's a flip side though - we can make this an opt-in feature and gain more experience with it. The places that would use this in bootc at least are already both experimental (unified storage and composefs). Although we'd need to be careful to turn off this code path for |
b9fa9db to
3c7573b
Compare
How about incredibly evil hacks? We know the ID of the container and the ID of the layers and the names of the files in the tar archives... and you know what the layout of the repository looks like... could we just go looking for the files? skopeo would waste its time reassembling the tar archive and streaming it to us (just to have us ignore it), but it would have to do that anyway in order to hash it... The the thing on the other side of the RPC looks more like "open this [named] file for me"... |
How is that different than what this code is doing? What specific portion of the code here feels riskier than re-implementing it a different way?
I don't think we need to validate the tar digest ("diffid") when pulling from containers-storage: by default. Should we support it? Absolutely. But there's a notable difference in efficiency. IMO where we really want to go is "composefs digest per layer as replacement for diffid" and I have some work on that. |
What's funny of course is that you and I are almost arguing opposite ends here in the two PRs of "should we reimplement here parts of container-libs" 😄 (But to clarify for me though, unified storage is WAY more important than zstd:chunked) |
I suppose I always dreamed of containers-storage backending directly on to composefs and no longer having its own repository. That would be a better future, I think we could agree... And in that context, stuff like "ability to pull directly from the network without skopeo" starts to make a lot more sense... |
Actually, there's a thought: we could teach skopeo a bit about how to write to composefs... the object storage part is obviously the most important bit, but it's also the easiest bit: instead of it handing us a tonne of fds that we do reflinks to we could just hand it our object directory (as an fd, or by name) and have it do that part... |
Yes I think as part of this we must deliver sealed OCI and a revamp of the current containers-storage: composefs backend is in order IMO. However: we need a solid impl of https://github.com/containers/composefs-rs/blob/main/doc/plans/oci-sealing-spec.md
Yes but there's a shorter term sticking point: For bootc because we have logically bound images at the current time we must expose our images in a way that current podman (i.e. container-libs) understands. For this reason among others I would like to stick with the plan of pulling into containers-storage first, then reflinking to composefs. |
3c7573b to
6c16d20
Compare
why wouldn't this work if we teach skopeo to copy from the containers-storage to composefs (of course we need to fix the part where we go through a tarstream)? |
I'm obviously fine with that, however I think it's more generally useful to ensure we can reflink |
|
If changes on the container-libs side are going to stall for a long time, I have a preference to clean this one up, rebase etc. and get it in. We can always cut over to using container-libs/skopeo for this once support lands there! |
6c16d20 to
77beaa2
Compare
14b5355 to
10f9af9
Compare
7b2ae45 to
3dbcbb4
Compare
Move imports only used within #[cfg(feature = "oci")] blocks behind the feature flag to eliminate unused import warnings when building without the oci feature. Assisted-by: OpenCode (Claude claude-opus-4-5@20251101) Signed-off-by: Colin Walters <walters@verbum.org>
This new internal crate provides safe, zero-copy parsing of tar archive headers using the zerocopy crate. It supports: - POSIX.1-1988, UStar (POSIX.1-2001), and GNU tar header formats - Base-256 encoding for large values (GNU extension) - EntryType enum for all standard tar entry types - Checksum verification The goal is to share code between composefs-oci and cstorage (which both do tar header parsing), and eventually enable upstream contribution to the tar-rs crate (ref: alexcrichton/tar-rs#392). Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
Migrate composefs-oci to use the shared tar-header crate instead of the tar crate for header parsing. The tar crate is still used for PaxExtensions (PAX header parsing) and Builder (test utilities). This eliminates duplicate tar header parsing logic and uses zerocopy for safe, zero-copy access to header fields. Assisted-by: OpenCode (Opus 4.5) Signed-off-by: Colin Walters <walters@verbum.org>
Provide read-only access to containers-storage (the storage backend used by podman, buildah, and other container tools) so that composefs can import layers directly without re-downloading them. Key components: - Storage: discovers and opens storage locations, image/layer lookup - Layer: overlay layer with content access via diff_dir - Image: OCI image with manifest/config parsing - TarSplitFdStream: tar-split metadata streaming for zero-copy import - LockFile: wire-compatible locking with Go containers/storage - userns helper: JSON-RPC process spawned via `podman unshare` for rootless access to files with restrictive permissions Uses the tar-header crate for header parsing and cap-std for capability-based file operations. Adapted from cgwalters/cstor-rs. Assisted-by: OpenCode (Claude Opus) Signed-off-by: Colin Walters <walters@verbum.org>
Enable importing container images directly from podman/buildah's local storage into composefs repositories. This avoids re-downloading layers that are already present on disk, and uses FICLONE reflinks when the filesystem supports them (btrfs, XFS) for zero-copy object storage. The `ensure_object_from_file` method on Repository tries reflink first, falling back to regular copy on EOPNOTSUPP/EXDEV. A `Reflinked` variant is added to `ObjectStoreMethod` so stats can distinguish zero-copy from copied storage. The cstor module reads tar-split metadata from containers-storage and streams it into splitstreams. When running rootless, a helper process is spawned via `podman unshare` to read files with restrictive permissions. The `pull()` function in composefs-oci now automatically routes `containers-storage:` references to the native import path. Usage: cfsctl oci pull containers-storage:alpine:latest Assisted-by: OpenCode (Claude Opus) Signed-off-by: Colin Walters <walters@verbum.org>
Add a human-readable Display impl for ImportStats that formats object counts and byte sizes (e.g. "42 new + 100 already present objects; 1.5 MB stored, 800 B inlined"), plus a total_objects() convenience method. This makes it easy for consumers like bootc to log pull statistics. Assisted-by: OpenCode (claude-opus-4-6)
Now that bootc vendors the latest composefs-rs, we can re-sync the CI here to re-test using the bootc tests. Update the patch URL, add cfsctl to the patched crates, and pass the required arguments through to bootc's test-composefs target (defaulting to grub + ext4). Assisted-by: OpenCode (Claude claude-opus-4-6) Signed-off-by: Colin Walters <walters@verbum.org>
I think we were just above 1h before, let's give ourselves some headroom. (obviously it'd be nicer to make this faster) Signed-off-by: Colin Walters <walters@verbum.org>
3dbcbb4 to
5e35d0f
Compare
Obviously massively
Assisted-by: OpenCode (Opus 4.5)here; I did a lot of design work of course and but this is looking pretty nice to me, and it seems to work well!