-
Notifications
You must be signed in to change notification settings - Fork 267
Description
@giuseppe Zstd:chunked has uses a TarSplitChecksumKey annotation, separate from the TOC digest.
That’s fine as far as per-image individual trust goes, because the annotation is authenticated by the manifest digest.
But we deduplicate individual layers by the TOC digest alone; so it is possible for two images to use a layer with the same TOC digest, but different tar-split locations and contents (e.g. one with an added set-UID bit somewhere).
Locally, it seems by far easiest to move the tar-split digest into the TOC. That way the by-now-widespread assumption that the TOC digest is a sufficient identifier can be maintained.
For other purposes, like the BlobInfoCache and metadata-only reuse, the value returned from toc.GetTOCDigest is used both for reuse/match lookups, so whatever this function returns must be a complete identifier; it should be unambiguous, and we can never add more annotations that change the semantics of a layer with a matching GetTOCDigest value. Even if we replaced GetTOCDigest now with something that accounts for tar-split, we need this guarantee to be forward-compatible (consider pulling an image just before a new annotation type is added, upgrading, and then pulling another just after). I think that also argues for a long-term design of keeping everything anchored to a single digest value.
A bit more generally, I’ve been wondering whether the TOC digest is a good identifier long-term.
The way things are now, we basically can’t significantly change the format of the TOC, otherwise we risk the possibility of the same TOC blob being valid for both the old and new formats, allowing an attacker to trigger deduplication between layers with the same TOC digest but different TOC formats and differently-understood contents.
For example, we just can’t support a new partial-pull format with some other JSON-based TOC format if it could be constructed ambiguously with the zstd:chunked TOC format. I’m not 100% sure this is worth worrying about — but if it were ever to happen, we would need a fairly large restructuring, or at least renaming/re-documenting. So I’m raising it right now, in case such things were expected. (E.g. will this be required for composeFS?)