Ensure chunked TOC and tar-split metadata are consistent#2035
Ensure chunked TOC and tar-split metadata are consistent#2035openshift-merge-bot[bot] merged 4 commits intocontainers:mainfrom
Conversation
| // This is horrible, but we don’t know how much padding to skip. (It can be computed from the previous hdr.Size for non-sparse | ||
| // files, but for sparse files that is set to the logical size.) |
There was a problem hiding this comment.
This is a pretty horrible hack.
It should be possible for tar-split/archive/tar to expose the expected padding size in Header.
(Alternatively, the OCI spec suggests that sparse files should not be used, but I don’t expect that can be relied upon.)
The test failure is on unrealistic data in a different test in this package, I’ll update that tomorrow. |
cgwalters
left a comment
There was a problem hiding this comment.
Only a relatively superficial skim but generally LGTM.
| pendingFiles := map[string]*internal.FileMetadata{} // Name -> an entry in toc.Entries | ||
| for i := range toc.Entries { | ||
| e := &toc.Entries[i] | ||
| if e.Type != internal.TypeChunk { |
There was a problem hiding this comment.
Very minor but I paused on this for a bit, uncertain; perhaps:
// Chunks are just part of files, they won't appear explicitly
// in the tar stream, so we don't validate them.
if e.Type == internal.TypeChunk {
continue
}
?
There was a problem hiding this comment.
I think this would be best documented somewhere around pkg/chunked/internal/compression.go, this is “just” straightforwardly consuming the structure as designed.
#1939 did add the basic documentation of TypeChunk, although the full semantics of Offset/ChunkOffset etc. is not written down anywhere.
There was a problem hiding this comment.
Just to be a good citizen, I have added a commit documenting the format (to the extent I can reverse-engineer it from the code) to the documentation of internal.FileMetadata.
|
|
||
| // iterateTarSplit calls handler for each tar header in tarSplit | ||
| func iterateTarSplit(tarSplit []byte, handler func(hdr *tar.Header) error) error { | ||
| // This, strictly speaking, hard-codes undocumented assumptions about how github.com/vbatts/tar-split/tar/asm.NewInputTarStream |
There was a problem hiding this comment.
In theory this API could land as a PR to that repo?
There was a problem hiding this comment.
That would certainly be better. The timing will probably force us to carry this in c/storage at for a few days/weeks, but I should definitely prepare a tar-split PR.
There was a problem hiding this comment.
I have recorded the task to file a tar-split PR as an item on https://github.com/containers/image/issues/2189 .
pkg/chunked/tar_split_linux_test.go
Outdated
| "github.com/vbatts/tar-split/tar/storage" | ||
| ) | ||
|
|
||
| func testTarheader(index int, typeFlag byte, size int64) tar.Header { |
There was a problem hiding this comment.
The name made me thing this is itself a test, how about createTestTarheader?
983b14b to
886c158
Compare
28222ae to
85d7be5
Compare
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
|
Podman tests passed in containers/podman#23307 . |
|
/lgtm |
In addition to the existing use when creating a TOC from tar data, we will also need it when parsing TOC and tar-split data. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
We are going to be checking its consistency with the TOC. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, giuseppe, mtrmac The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This should resolve containers/image#2014. Basically untested, filing now for early design review.
We simply enforce that the TOC and tar-split have exactly the same contents. To the extent the zstd:chunked format is only being produced by this package, we can expect an equal match (right now? ), but that may become harder as the format evolves (e.g. the recent
timeIfNotZerochange).@giuseppe PTAL. Cc: @cgwalters .