chunked: generate tar-split as part of zstd:chunked#1627
chunked: generate tar-split as part of zstd:chunked#1627rhatdan merged 13 commits intocontainers:mainfrom
Conversation
da1b8ad to
561ef29
Compare
5408a8f to
c4b7edb
Compare
1704458 to
62f24ea
Compare
02d9ae3 to
c65dd10
Compare
|
LGTM |
c65dd10 to
91a1fe9
Compare
|
these fixes are independent of the work in c/image, except for chunked: add function to retrieve TOC digest |
|
LGTM |
0fd208b to
3919680
Compare
|
rebased |
|
I'd like to use these fixes for the new composefs integration work. Could we move forward and if there is anything to change/fix we could address it later? |
|
all comments adressed, and the CI is green |
| zstdWriter = nil | ||
|
|
||
| return internal.WriteZstdChunkedManifest(dest, outMetadata, uint64(dest.Count), metadata, level) | ||
| if err := tarSplitData.zstd.Flush(); err != nil { |
There was a problem hiding this comment.
This branch doesn't close tarSplitData.zstd, and a number of branches above it don't close zstdWriter.
There was a problem hiding this comment.
there is a func for zstdWriter to clean it up if it is not closed (and thus set to nil afterwards):
defer func() {
if zstdWriter != nil {
zstdWriter.Close()
}
}()
is that sufficient?
I've added the same pattern for tarSplitData
|
Mostly LGTM, just nits. |
|
LGTM other than @nalind's comments |
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
change the file format to store the tar-split as part of the zstd:chunked image. This will allow clients to rebuild the entire tarball without having to download it fully. also store the uncompressed digest for the tarball, so that it can be stored into the storage database. Needs: containers/image#1976 Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
thanks, I've added a separate patch to drop it since it was already this way (so in case of regressions, it is easier to bisect it) |
1c356e2 to
032aae3
Compare
|
LGTM |
mtrmac
left a comment
There was a problem hiding this comment.
Catching up for containers/image#1980 … quite possibly the questions have been resolved since. Follow containers/image#1980 for progress.
| } | ||
| } | ||
| for k, v := range diffOutput.BigData { | ||
| if err := r.SetBigData(id, k, bytes.NewReader(v)); err != nil { |
There was a problem hiding this comment.
SetBigData already calls saveFor(layer), so moving the saveFor in this function past this loop can not reliably achieve anything unusual, and to me only seems confusing.
| manifestType = binary.LittleEndian.Uint64(footer[24:32]) | ||
| if !isZstdChunkedFrameMagic(footer[32:40]) { | ||
| return nil, 0, errors.New("invalid magic number") | ||
| if !isZstdChunkedFrameMagic(footer[48:56]) { |
There was a problem hiding this comment.
AFAICS WriteZstdChunkedManifest silte writes the magic at 32:40, and the other semantics (offset/length/…) match the offsets in that method.
So how does this work?!
There was a problem hiding this comment.
thanks for spotting this.
I've opened a PR to fix it and add some tests: #1697
I am not 100% sure what to do with the footer data, this code path is not used by us at the moment, since we store this information in the OCI annotations for the layer (and if the OCI annotations are missing we don't even attempt a partial pull).
On the other hand it seems nice to have this information somewhere in the layer file itself, considering it is not a lot of data anyway.
There was a problem hiding this comment.
I think plausible alternatives are:
- Keep it as a frozen artifact, supporting only the old features. If the old features are sufficient to pull/push useful images.
- Design some kind of forward-compatibility mechanism. Maybe a new magic value, and now including a version identifier. And the design would need to allow adding more fields, ie.e. the magic/version ID would need to be at a predictable place regardless of version. And it wouldn’t be a replacement for the annotations because we need the layer’s metadata to be authenticated without having to read all of the layer. If we are not consuming the data at all, all of this seems to me to be very likely to keep breaking, and just not worth the effort.
- Redesign so that this data does not need extending: only include data about the manifest in here, and add all new fields in the manifest instead. That seems to me to be the cleanest way to have the layer blob be ~self-describing… OTOH in principle it would be possible for an attacker to point at one manifest in this footer and another in the annotation, so I’m not sure there’s that much of a benefit to being self-describing, either.
- Completely drop support, and rely on annotations only? I didn’t look into the history, whether we can do that.
| } | ||
|
|
||
| d, err := digest.Parse(manifestChecksumAnnotation) | ||
| func decodeAndValidateBlob(blob []byte, lengthUncompressed uint64, expectedUncompressedChecksum string) ([]byte, error) { |
There was a problem hiding this comment.
The digest is actually a compressed digest.
| if tocDigest, ok := annotations[estargz.TOCJSONDigestAnnotation]; ok { | ||
| d, err := digest.Parse(tocDigest) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| return &d, nil | ||
| } | ||
| if tocDigest, ok := annotations[internal.ManifestChecksumKey]; ok { |
There was a problem hiding this comment.
GetDiffer processes the digests in the opposite order. Doesn’t that end up using an arbitrary layer ID which differs from the manifest we actually process?
There was a problem hiding this comment.
This was fixed here; #1844 also makes GetTOCDigest unambiguous.
change the file format to store the tar-split as part of the zstd:chunked image. This will allow clients to rebuild the entire tarball without having to download it fully.
also store the uncompressed digest for the tarball, so that it can be stored into the storage database.
Needs: containers/image#1976
Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com