[Additional Layer Store] Use TOCDigest as ID of each layer (patch for c/image) by ktock · Pull Request #2416 · containers/image

ktock · 2024-05-14T13:37:02Z

Needs: containers/storage#1924

This commit changes the store implementation to use TOCDigset as the ID of each layer. This ensures the runtime to pass TOCDigest to the FUSE backend everytime requiring a layer.

This commit modifies c/storage's LookupAdditionalLayer to receive a TOCDigest as one of the arguments. Then that TOCDigest is passed to Additional Layer Store as one of the path elements of the FUSE fs. For example, the extracted view of a layer diff contents is provided in the following path:

 <mountpoint>/base64(imageref)/<TOCDigest>/diff

Additional Layer Store implementation attempts lazy pulling of the layer that has the specified TOCDigest in the image.
If that image doesn't contain the layer matching to the TOCDigest, accessing to the layer directly results in an error.

Example draft implementation of Additional Layer Store is at containerd/stargz-snapshotter#1673

mtrmac

Thanks!

Just a very quick first skim.

storage/storage_dest.go

storage/storage_src.go

storage/storage_dest.go

TomSweeneyRedHat · 2024-05-15T22:09:20Z

@ktock You're fighting some lint errors on this one.

ktock · 2024-05-16T01:32:48Z

Thanks for the comment. Fixed the linter error.

mtrmac · 2024-05-20T14:52:52Z

storage/storage_dest.go

 		} else if err == nil {
-			s.lockProtected.blobDiffIDs[blobDigest] = aLayer.UncompressedDigest()
-			s.lockProtected.blobAdditionalLayer[blobDigest] = aLayer
+			d := aLayer.TOCDigest()


We have passed down the expected TOC digest in LookupAdditionalLayer. Would this ever return a different value? If so, what does that mean / what is the guaranteed relationship between the two TOCs?

Looking at the implementation in containerd/stargz-snapshotter#1673 , this seems to always return the digest that is used in the FUSE request, i.e. it is exactly the value we passed in … unless the FUSE server is old and does not return the value at all; and in that case we don’t want to use any layers in the ALS — the FUSE server would be looking up a layer with compressed (non-TOC) digest options.TOCDigest.

#2428 modifies this to only trust options.TOCDigest (but to require the one from the ALS to match).

mtrmac · 2024-05-20T14:56:23Z

storage/storage_dest.go

-			s.lockProtected.blobDiffIDs[blobDigest] = aLayer.UncompressedDigest()
-			s.lockProtected.blobAdditionalLayer[blobDigest] = aLayer
+			d := aLayer.TOCDigest()
+			if d == "" {


When can this happen? Wouldn’t it be appropriate to pull the layer in the ordinary way, instead of failing?

Per the other comment, this can happen if the FUSE server is old and not expecting TOCs at all.

storage/storage_src.go

mtrmac · 2024-05-20T15:21:08Z

storage/storage_src.go

 			return -1, fmt.Errorf("size for layer %q is unknown, failing getSize()", layerID)
 		}
+		if layer.UncompressedSize < 0 {
+			sum = 0


This means that if there 3 layers, and the middle one has an unknown size, the total returned size of the image is… some value larger than the true size, but the particular value can AFAICS not be reasonably used for anything.

Why is that better than returning -1 to indicate “unknown”, or than returning an error? (One possible answer might be “Podman callers assume this always succeeds and returns a non--1 value, I didn’t check whether that’s the case. But I think, if that were true, the callers would need to be updated to deal with images where the size is unknown, instead.)

#2428 modifies this to treat the unknown-size layer as zero-size, while returning the size of the other components.

mtrmac · 2024-05-20T15:26:27Z

storage/storage_src.go

+		if (layer.TOCDigest == "" && layer.UncompressedDigest == "") || (layer.TOCDigest == "" && layer.UncompressedSize < 0) {
 			return -1, fmt.Errorf("size for layer %q is unknown, failing getSize()", layerID)
 		}
+		if layer.UncompressedSize < 0 {


If a TOC-identified layer can have UncompressedSize == -1, that breaks more assumptions. At least the LayersByTOCDigest path in tryReusingBlobAsPending is using the UncompressedSize value, and the value can’t be “unknown” on that path.

… but, also, note to self: is a new behavior of the AdditionalLayerStore code? We did previously assume that the blobDigest-identified ALS layers did have a valid UncompressedSize. Was that always broken?

The commit says

Layers from additional layer store possibly doesn't know the uncompressed size until it fully downloads the entire contents to the node.

but we collect all metadata at the time of layer creation (layerStore.PutAdditionalLayer reads AdditionalLayer.Info for that), and don’t update it afterwards. Does that mean that the UncompressedSize is typically not available?

Actually, in https://github.com/containerd/stargz-snapshotter/pull/1673/files , it is now reported always as -1.

At least the LayersByTOCDigest path in tryReusingBlobAsPending is using the UncompressedSize value, and the value can’t be “unknown” on that path.

That’s actually fine because that is already conditional on UncompressedDigest != "", and that’s false for recent ALS layers.

… and I don’t know whether UncompressedSize was always returned with the older version of the FUSE server, but that’s not a concern going forward, anyway.

storage/storage_src.go

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>

Layers from additional layer store possibly doesn't know the uncompressed size until it fully downloads the entire contents to the node. Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>

mtrmac

Merging now because otherwise c/image does not compile after the c/storage API change; and because, as of today, containerd/stargz-snapshotter#1673 is not merged, so we actually never create ALS layers after merging this, and most of the concerns become hypothetical (in the short term).

I’ll immediately follow up with a PR that mitigates some of the concerns, or at least adds explicit FIXMEs. That’s not to preclude any other work on improving those aspects, just to get us back to a production-ready state.

influentcoder · 2025-10-21T22:00:17Z

@ktock I've looked at several pull requests related to this change - what's the reasoning behind using the TOCDigest instead of the blob digest? Previously, with the blob digest, I could use images with the non-seekable, standard tar+gzip format, and use an additional layer stored backed by a NFS. But now that's not the case, I'm forced to store my images in either stargz or sztd:chunked format

mtrmac · 2025-10-22T16:24:20Z

Layer IDs are used for deduplication. So, it must not be possible for a layer with the same “ID” to be consumed in two different ways.

For zstd:chunked, the blob (identified with the full digest) can, in principle, be consumed either as a non-chunked tarball, or as a TOC-based chunked object, and (in the case of ALS) there is ~nothing we can do to enforce the two are exactly the same filesystem (other than reading all files and validating against a DiffID: we do that on various code paths, but it would entirely defeat the on-demand nature of ALS).

So, for ALS, the ID must match the data one actually used to fetch the contents. For zstd:chunked on-demand access, that’s the TOC; the ALS backend is then trusted to not have a code path consuming the full tarball and ignoring the TOC.

I’m afraid that ALS change an incompatible, and barely-announced change. Due to the fairly weird way ALS use a FUSE filesystem basically as a RPC API, the features of that “API mechanism” are rather limited, and doing things properly over that API is… complex.

mtrmac · 2025-10-22T16:30:48Z

… and, to finish the thought, that makes it very unattractive to me to consider just reverting the behavior, in an invisible-incompatible way, again. If we were to change this, or to add any substantial features to ALS, I’d personally want to see a more extensible/maintainable/standard RPC mechanism — but, also, it’s not something I can spend significant time on.

chvish · 2025-10-23T11:42:50Z

Would it not be possible to pass the ALS arg/config in a way that you can tell if you want the ALS to be used as either a non-chunked layer or not. Then you can use the blob digest or TOC checksum based on what the arg/config is, instead of the behaviour being inferred based on the image compression.

The current implementation means that the feature is ties to specific implementation of the ALS, one which assumes a layer store can only be used with seekable images, which should not be the case, atleast not intuitively.

influentcoder · 2025-10-23T13:30:02Z

From a user perspective (or at least my perspective), it seems natural that if we specify a location as an additional layer store, that podman uses this location to source the layers, regardless of the compression format in the remote registry.

I imagine that the base case is to have simply the layer expanded in a filesystem. For example to store commonly used layers in a remote filesystem. The fact that we can use a fuse mount to perform lazy loading directly from a remote registry is just an extension of that primary use case in my view.

@mtrmac Would you be open if I submit a PR that brings backwards compatibility to the previous behavior? E.g. if a layer isn't compressed with a TOC, then fallback to lookup to the store based on the layer digest.

Or alternatively, as @chvish is proposing, a way to provide provide optionality on how we intend to use the layer store

mtrmac · 2025-10-23T15:31:27Z

I am trying to hold to a policy that I will not be reviewing PRs that add extra options and code paths to drivers/overlay because I just can’t deal with the complexity.

I realize that’s not great after this ABI break, and anyway there are other maintainers of the project who could review PRs.

E.g. if a layer isn't compressed with a TOC, then fallback to lookup to the store based on the layer digest.

Given that the layer ID must not cause collisions, I’m skeptical this can be done securely, at least given the existing API between c/storage and c/image. But I didn’t look into this in detail.

This was referenced May 14, 2024

[Additional Layer Store] Use TOCDigest as ID of each layer (patch for c/storage) containers/storage#1924

Merged

[Additional Layer Store] Use TOCDigest as ID of each layer containerd/stargz-snapshotter#1673

Merged

mtrmac requested changes May 14, 2024

View reviewed changes

storage/storage_dest.go Outdated Show resolved Hide resolved

storage/storage_src.go Show resolved Hide resolved

storage/storage_dest.go Outdated Show resolved Hide resolved

ktock force-pushed the store-tocdigest-id branch 2 times, most recently from 76644c2 to 773cb67 Compare May 15, 2024 01:06

ktock force-pushed the store-tocdigest-id branch from 773cb67 to a1e2fff Compare May 16, 2024 01:09

TomSweeneyRedHat mentioned this pull request May 20, 2024

Bump to c/storage v1.54.0 then c/image to v5.31.0 and then to v5.31.1-dev #2425

Merged

mtrmac reviewed May 20, 2024

View reviewed changes

ktock added 2 commits May 20, 2024 17:28

Enable to pass TOCDigest to Additional Layer Store

ebbd025

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>

getSize: allow unknown uncompressed size

52101a0

Layers from additional layer store possibly doesn't know the uncompressed size until it fully downloads the entire contents to the node. Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>

mtrmac force-pushed the store-tocdigest-id branch from a1e2fff to 52101a0 Compare May 20, 2024 15:28

mtrmac reviewed May 20, 2024

View reviewed changes

mtrmac merged commit cf26b3c into containers:main May 20, 2024

mtrmac mentioned this pull request May 20, 2024

Short-term kludges for recent AdditionalLayerStore changes #2428

Merged

ktock deleted the store-tocdigest-id branch May 20, 2024 23:46

ktock mentioned this pull request Jun 15, 2024

OCPNODE-2205: Lazy image pull support openshift/enhancements#1600

Closed

ktock mentioned this pull request Jan 30, 2026

OCPNODE-4052: Add enhancement for additional storage configuration in CRI-O openshift/enhancements#1934

Merged

Conversation

ktock commented May 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mtrmac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomSweeneyRedHat commented May 15, 2024

Uh oh!

ktock commented May 16, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mtrmac May 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mtrmac May 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mtrmac left a comment

Choose a reason for hiding this comment

Uh oh!

influentcoder commented Oct 21, 2025

Uh oh!

mtrmac commented Oct 22, 2025

Uh oh!

mtrmac commented Oct 22, 2025

Uh oh!

chvish commented Oct 23, 2025

Uh oh!

influentcoder commented Oct 23, 2025

Uh oh!

mtrmac commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ktock commented May 14, 2024 •

edited

Loading

mtrmac May 20, 2024 •

edited

Loading

mtrmac May 20, 2024 •

edited

Loading