[Additional Layer Store] Use TOCDigest as ID of each layer (patch for c/image)#2416
[Additional Layer Store] Use TOCDigest as ID of each layer (patch for c/image)#2416mtrmac merged 2 commits intocontainers:mainfrom
Conversation
mtrmac
left a comment
There was a problem hiding this comment.
Thanks!
Just a very quick first skim.
76644c2 to
773cb67
Compare
|
@ktock You're fighting some lint errors on this one. |
|
Thanks for the comment. Fixed the linter error. |
| } else if err == nil { | ||
| s.lockProtected.blobDiffIDs[blobDigest] = aLayer.UncompressedDigest() | ||
| s.lockProtected.blobAdditionalLayer[blobDigest] = aLayer | ||
| d := aLayer.TOCDigest() |
There was a problem hiding this comment.
We have passed down the expected TOC digest in LookupAdditionalLayer. Would this ever return a different value? If so, what does that mean / what is the guaranteed relationship between the two TOCs?
There was a problem hiding this comment.
Looking at the implementation in containerd/stargz-snapshotter#1673 , this seems to always return the digest that is used in the FUSE request, i.e. it is exactly the value we passed in … unless the FUSE server is old and does not return the value at all; and in that case we don’t want to use any layers in the ALS — the FUSE server would be looking up a layer with compressed (non-TOC) digest options.TOCDigest.
There was a problem hiding this comment.
#2428 modifies this to only trust options.TOCDigest (but to require the one from the ALS to match).
| s.lockProtected.blobDiffIDs[blobDigest] = aLayer.UncompressedDigest() | ||
| s.lockProtected.blobAdditionalLayer[blobDigest] = aLayer | ||
| d := aLayer.TOCDigest() | ||
| if d == "" { |
There was a problem hiding this comment.
When can this happen? Wouldn’t it be appropriate to pull the layer in the ordinary way, instead of failing?
There was a problem hiding this comment.
Per the other comment, this can happen if the FUSE server is old and not expecting TOCs at all.
| return -1, fmt.Errorf("size for layer %q is unknown, failing getSize()", layerID) | ||
| } | ||
| if layer.UncompressedSize < 0 { | ||
| sum = 0 |
There was a problem hiding this comment.
This means that if there 3 layers, and the middle one has an unknown size, the total returned size of the image is… some value larger than the true size, but the particular value can AFAICS not be reasonably used for anything.
Why is that better than returning -1 to indicate “unknown”, or than returning an error? (One possible answer might be “Podman callers assume this always succeeds and returns a non--1 value, I didn’t check whether that’s the case. But I think, if that were true, the callers would need to be updated to deal with images where the size is unknown, instead.)
There was a problem hiding this comment.
#2428 modifies this to treat the unknown-size layer as zero-size, while returning the size of the other components.
| if (layer.TOCDigest == "" && layer.UncompressedDigest == "") || (layer.TOCDigest == "" && layer.UncompressedSize < 0) { | ||
| return -1, fmt.Errorf("size for layer %q is unknown, failing getSize()", layerID) | ||
| } | ||
| if layer.UncompressedSize < 0 { |
There was a problem hiding this comment.
If a TOC-identified layer can have UncompressedSize == -1, that breaks more assumptions. At least the LayersByTOCDigest path in tryReusingBlobAsPending is using the UncompressedSize value, and the value can’t be “unknown” on that path.
… but, also, note to self: is a new behavior of the AdditionalLayerStore code? We did previously assume that the blobDigest-identified ALS layers did have a valid UncompressedSize. Was that always broken?
There was a problem hiding this comment.
The commit says
Layers from additional layer store possibly doesn't know the uncompressed size until it fully downloads the entire contents to the node.
but we collect all metadata at the time of layer creation (layerStore.PutAdditionalLayer reads AdditionalLayer.Info for that), and don’t update it afterwards. Does that mean that the UncompressedSize is typically not available?
Actually, in https://github.com/containerd/stargz-snapshotter/pull/1673/files , it is now reported always as -1.
There was a problem hiding this comment.
At least the
LayersByTOCDigestpath intryReusingBlobAsPendingis using theUncompressedSizevalue, and the value can’t be “unknown” on that path.
That’s actually fine because that is already conditional on UncompressedDigest != "", and that’s false for recent ALS layers.
There was a problem hiding this comment.
… and I don’t know whether UncompressedSize was always returned with the older version of the FUSE server, but that’s not a concern going forward, anyway.
Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
Layers from additional layer store possibly doesn't know the uncompressed size until it fully downloads the entire contents to the node. Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
a1e2fff to
52101a0
Compare
mtrmac
left a comment
There was a problem hiding this comment.
Merging now because otherwise c/image does not compile after the c/storage API change; and because, as of today, containerd/stargz-snapshotter#1673 is not merged, so we actually never create ALS layers after merging this, and most of the concerns become hypothetical (in the short term).
I’ll immediately follow up with a PR that mitigates some of the concerns, or at least adds explicit FIXMEs. That’s not to preclude any other work on improving those aspects, just to get us back to a production-ready state.
|
@ktock I've looked at several pull requests related to this change - what's the reasoning behind using the TOCDigest instead of the blob digest? Previously, with the blob digest, I could use images with the non-seekable, standard tar+gzip format, and use an additional layer stored backed by a NFS. But now that's not the case, I'm forced to store my images in either stargz or sztd:chunked format |
|
Layer IDs are used for deduplication. So, it must not be possible for a layer with the same “ID” to be consumed in two different ways. For zstd:chunked, the blob (identified with the full digest) can, in principle, be consumed either as a non-chunked tarball, or as a TOC-based chunked object, and (in the case of ALS) there is ~nothing we can do to enforce the two are exactly the same filesystem (other than reading all files and validating against a DiffID: we do that on various code paths, but it would entirely defeat the on-demand nature of ALS). So, for ALS, the ID must match the data one actually used to fetch the contents. For zstd:chunked on-demand access, that’s the TOC; the ALS backend is then trusted to not have a code path consuming the full tarball and ignoring the TOC. I’m afraid that ALS change an incompatible, and barely-announced change. Due to the fairly weird way ALS use a FUSE filesystem basically as a RPC API, the features of that “API mechanism” are rather limited, and doing things properly over that API is… complex. |
|
… and, to finish the thought, that makes it very unattractive to me to consider just reverting the behavior, in an invisible-incompatible way, again. If we were to change this, or to add any substantial features to ALS, I’d personally want to see a more extensible/maintainable/standard RPC mechanism — but, also, it’s not something I can spend significant time on. |
|
Would it not be possible to pass the ALS arg/config in a way that you can tell if you want the ALS to be used as either a non-chunked layer or not. Then you can use the blob digest or TOC checksum based on what the arg/config is, instead of the behaviour being inferred based on the image compression. The current implementation means that the feature is ties to specific implementation of the ALS, one which assumes a layer store can only be used with seekable images, which should not be the case, atleast not intuitively. |
|
From a user perspective (or at least my perspective), it seems natural that if we specify a location as an additional layer store, that podman uses this location to source the layers, regardless of the compression format in the remote registry. I imagine that the base case is to have simply the layer expanded in a filesystem. For example to store commonly used layers in a remote filesystem. The fact that we can use a fuse mount to perform lazy loading directly from a remote registry is just an extension of that primary use case in my view. @mtrmac Would you be open if I submit a PR that brings backwards compatibility to the previous behavior? E.g. if a layer isn't compressed with a TOC, then fallback to lookup to the store based on the layer digest. Or alternatively, as @chvish is proposing, a way to provide provide optionality on how we intend to use the layer store |
|
I am trying to hold to a policy that I will not be reviewing PRs that add extra options and code paths to drivers/overlay because I just can’t deal with the complexity. I realize that’s not great after this ABI break, and anyway there are other maintainers of the project who could review PRs.
Given that the layer ID must not cause collisions, I’m skeptical this can be done securely, at least given the existing API between c/storage and c/image. But I didn’t look into this in detail. |
Needs: containers/storage#1924
This commit changes the store implementation to use TOCDigset as the ID of each layer. This ensures the runtime to pass TOCDigest to the FUSE backend everytime requiring a layer.
This commit modifies c/storage's
LookupAdditionalLayerto receive a TOCDigest as one of the arguments. Then that TOCDigest is passed to Additional Layer Store as one of the path elements of the FUSE fs. For example, the extracted view of a layer diff contents is provided in the following path:Additional Layer Store implementation attempts lazy pulling of the layer that has the specified TOCDigest in the image.
If that image doesn't contain the layer matching to the TOCDigest, accessing to the layer directly results in an error.
Example draft implementation of Additional Layer Store is at containerd/stargz-snapshotter#1673