Skip to content

[Additional Layer Store] Use TOCDigest as ID of each layer (patch for c/image)#2416

Merged
mtrmac merged 2 commits intocontainers:mainfrom
ktock:store-tocdigest-id
May 20, 2024
Merged

[Additional Layer Store] Use TOCDigest as ID of each layer (patch for c/image)#2416
mtrmac merged 2 commits intocontainers:mainfrom
ktock:store-tocdigest-id

Conversation

@ktock
Copy link
Copy Markdown
Contributor

@ktock ktock commented May 14, 2024

Needs: containers/storage#1924

This commit changes the store implementation to use TOCDigset as the ID of each layer. This ensures the runtime to pass TOCDigest to the FUSE backend everytime requiring a layer.

This commit modifies c/storage's LookupAdditionalLayer to receive a TOCDigest as one of the arguments. Then that TOCDigest is passed to Additional Layer Store as one of the path elements of the FUSE fs. For example, the extracted view of a layer diff contents is provided in the following path:

 <mountpoint>/base64(imageref)/<TOCDigest>/diff

Additional Layer Store implementation attempts lazy pulling of the layer that has the specified TOCDigest in the image.
If that image doesn't contain the layer matching to the TOCDigest, accessing to the layer directly results in an error.

Example draft implementation of Additional Layer Store is at containerd/stargz-snapshotter#1673

Copy link
Copy Markdown
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Just a very quick first skim.

@ktock ktock force-pushed the store-tocdigest-id branch 2 times, most recently from 76644c2 to 773cb67 Compare May 15, 2024 01:06
@TomSweeneyRedHat
Copy link
Copy Markdown
Member

@ktock You're fighting some lint errors on this one.

@ktock ktock force-pushed the store-tocdigest-id branch from 773cb67 to a1e2fff Compare May 16, 2024 01:09
@ktock
Copy link
Copy Markdown
Contributor Author

ktock commented May 16, 2024

Thanks for the comment. Fixed the linter error.

} else if err == nil {
s.lockProtected.blobDiffIDs[blobDigest] = aLayer.UncompressedDigest()
s.lockProtected.blobAdditionalLayer[blobDigest] = aLayer
d := aLayer.TOCDigest()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have passed down the expected TOC digest in LookupAdditionalLayer. Would this ever return a different value? If so, what does that mean / what is the guaranteed relationship between the two TOCs?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the implementation in containerd/stargz-snapshotter#1673 , this seems to always return the digest that is used in the FUSE request, i.e. it is exactly the value we passed in … unless the FUSE server is old and does not return the value at all; and in that case we don’t want to use any layers in the ALS — the FUSE server would be looking up a layer with compressed (non-TOC) digest options.TOCDigest.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2428 modifies this to only trust options.TOCDigest (but to require the one from the ALS to match).

s.lockProtected.blobDiffIDs[blobDigest] = aLayer.UncompressedDigest()
s.lockProtected.blobAdditionalLayer[blobDigest] = aLayer
d := aLayer.TOCDigest()
if d == "" {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can this happen? Wouldn’t it be appropriate to pull the layer in the ordinary way, instead of failing?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the other comment, this can happen if the FUSE server is old and not expecting TOCs at all.

return -1, fmt.Errorf("size for layer %q is unknown, failing getSize()", layerID)
}
if layer.UncompressedSize < 0 {
sum = 0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that if there 3 layers, and the middle one has an unknown size, the total returned size of the image is… some value larger than the true size, but the particular value can AFAICS not be reasonably used for anything.

Why is that better than returning -1 to indicate “unknown”, or than returning an error? (One possible answer might be “Podman callers assume this always succeeds and returns a non--1 value, I didn’t check whether that’s the case. But I think, if that were true, the callers would need to be updated to deal with images where the size is unknown, instead.)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2428 modifies this to treat the unknown-size layer as zero-size, while returning the size of the other components.

if (layer.TOCDigest == "" && layer.UncompressedDigest == "") || (layer.TOCDigest == "" && layer.UncompressedSize < 0) {
return -1, fmt.Errorf("size for layer %q is unknown, failing getSize()", layerID)
}
if layer.UncompressedSize < 0 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a TOC-identified layer can have UncompressedSize == -1, that breaks more assumptions. At least the LayersByTOCDigest path in tryReusingBlobAsPending is using the UncompressedSize value, and the value can’t be “unknown” on that path.

… but, also, note to self: is a new behavior of the AdditionalLayerStore code? We did previously assume that the blobDigest-identified ALS layers did have a valid UncompressedSize. Was that always broken?

Copy link
Copy Markdown
Collaborator

@mtrmac mtrmac May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit says

Layers from additional layer store possibly doesn't know the uncompressed size until it fully downloads the entire contents to the node.

but we collect all metadata at the time of layer creation (layerStore.PutAdditionalLayer reads AdditionalLayer.Info for that), and don’t update it afterwards. Does that mean that the UncompressedSize is typically not available?


Actually, in https://github.com/containerd/stargz-snapshotter/pull/1673/files , it is now reported always as -1.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the LayersByTOCDigest path in tryReusingBlobAsPending is using the UncompressedSize value, and the value can’t be “unknown” on that path.

That’s actually fine because that is already conditional on UncompressedDigest != "", and that’s false for recent ALS layers.

Copy link
Copy Markdown
Collaborator

@mtrmac mtrmac May 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and I don’t know whether UncompressedSize was always returned with the older version of the FUSE server, but that’s not a concern going forward, anyway.

ktock added 2 commits May 20, 2024 17:28
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Layers from additional layer store possibly doesn't know the uncompressed size
until it fully downloads the entire contents to the node.

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
@mtrmac mtrmac force-pushed the store-tocdigest-id branch from a1e2fff to 52101a0 Compare May 20, 2024 15:28
Copy link
Copy Markdown
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging now because otherwise c/image does not compile after the c/storage API change; and because, as of today, containerd/stargz-snapshotter#1673 is not merged, so we actually never create ALS layers after merging this, and most of the concerns become hypothetical (in the short term).

I’ll immediately follow up with a PR that mitigates some of the concerns, or at least adds explicit FIXMEs. That’s not to preclude any other work on improving those aspects, just to get us back to a production-ready state.

@mtrmac mtrmac merged commit cf26b3c into containers:main May 20, 2024
@ktock ktock deleted the store-tocdigest-id branch May 20, 2024 23:46
@influentcoder
Copy link
Copy Markdown

@ktock I've looked at several pull requests related to this change - what's the reasoning behind using the TOCDigest instead of the blob digest? Previously, with the blob digest, I could use images with the non-seekable, standard tar+gzip format, and use an additional layer stored backed by a NFS. But now that's not the case, I'm forced to store my images in either stargz or sztd:chunked format

@mtrmac
Copy link
Copy Markdown
Collaborator

mtrmac commented Oct 22, 2025

Layer IDs are used for deduplication. So, it must not be possible for a layer with the same “ID” to be consumed in two different ways.

For zstd:chunked, the blob (identified with the full digest) can, in principle, be consumed either as a non-chunked tarball, or as a TOC-based chunked object, and (in the case of ALS) there is ~nothing we can do to enforce the two are exactly the same filesystem (other than reading all files and validating against a DiffID: we do that on various code paths, but it would entirely defeat the on-demand nature of ALS).

So, for ALS, the ID must match the data one actually used to fetch the contents. For zstd:chunked on-demand access, that’s the TOC; the ALS backend is then trusted to not have a code path consuming the full tarball and ignoring the TOC.

I’m afraid that ALS change an incompatible, and barely-announced change. Due to the fairly weird way ALS use a FUSE filesystem basically as a RPC API, the features of that “API mechanism” are rather limited, and doing things properly over that API is… complex.

@mtrmac
Copy link
Copy Markdown
Collaborator

mtrmac commented Oct 22, 2025

… and, to finish the thought, that makes it very unattractive to me to consider just reverting the behavior, in an invisible-incompatible way, again. If we were to change this, or to add any substantial features to ALS, I’d personally want to see a more extensible/maintainable/standard RPC mechanism — but, also, it’s not something I can spend significant time on.

@chvish
Copy link
Copy Markdown

chvish commented Oct 23, 2025

Would it not be possible to pass the ALS arg/config in a way that you can tell if you want the ALS to be used as either a non-chunked layer or not. Then you can use the blob digest or TOC checksum based on what the arg/config is, instead of the behaviour being inferred based on the image compression.

The current implementation means that the feature is ties to specific implementation of the ALS, one which assumes a layer store can only be used with seekable images, which should not be the case, atleast not intuitively.

@influentcoder
Copy link
Copy Markdown

From a user perspective (or at least my perspective), it seems natural that if we specify a location as an additional layer store, that podman uses this location to source the layers, regardless of the compression format in the remote registry.

I imagine that the base case is to have simply the layer expanded in a filesystem. For example to store commonly used layers in a remote filesystem. The fact that we can use a fuse mount to perform lazy loading directly from a remote registry is just an extension of that primary use case in my view.

@mtrmac Would you be open if I submit a PR that brings backwards compatibility to the previous behavior? E.g. if a layer isn't compressed with a TOC, then fallback to lookup to the store based on the layer digest.

Or alternatively, as @chvish is proposing, a way to provide provide optionality on how we intend to use the layer store

@mtrmac
Copy link
Copy Markdown
Collaborator

mtrmac commented Oct 23, 2025

I am trying to hold to a policy that I will not be reviewing PRs that add extra options and code paths to drivers/overlay because I just can’t deal with the complexity.

I realize that’s not great after this ABI break, and anyway there are other maintainers of the project who could review PRs.

E.g. if a layer isn't compressed with a TOC, then fallback to lookup to the store based on the layer digest.

Given that the layer ID must not cause collisions, I’m skeptical this can be done securely, at least given the existing API between c/storage and c/image. But I didn’t look into this in detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants