Ensure chunked TOC and tar-split metadata are consistent by mtrmac · Pull Request #2035 · containers/storage

mtrmac · 2024-07-16T20:48:22Z

This should resolve containers/image#2014. Basically untested, filing now for early design review.

We simply enforce that the TOC and tar-split have exactly the same contents. To the extent the zstd:chunked format is only being produced by this package, we can expect an equal match (right now? ), but that may become harder as the format evolves (e.g. the recent timeIfNotZero change).

@giuseppe PTAL. Cc: @cgwalters .

mtrmac · 2024-07-16T20:51:01Z

pkg/chunked/tar_split_linux.go

+			// This is horrible, but we don’t know how much padding to skip. (It can be computed from the previous hdr.Size for non-sparse
+			// files, but for sparse files that is set to the logical size.)


This is a pretty horrible hack.

It should be possible for tar-split/archive/tar to expose the expected padding size in Header.

(Alternatively, the OCI spec suggests that sparse files should not be used, but I don’t expect that can be relied upon.)

mtrmac · 2024-07-16T22:16:58Z

tar-split and TOC data is inconsistent: reading tar-split entries: invalid character 'T' looking for beginning of value

The test failure is on unrealistic data in a different test in this package, I’ll update that tomorrow.

cgwalters

Only a relatively superficial skim but generally LGTM.

cgwalters · 2024-07-16T22:22:39Z

pkg/chunked/compression_linux.go

+	pendingFiles := map[string]*internal.FileMetadata{} // Name -> an entry in toc.Entries
+	for i := range toc.Entries {
+		e := &toc.Entries[i]
+		if e.Type != internal.TypeChunk {


Very minor but I paused on this for a bit, uncertain; perhaps:

// Chunks are just part of files, they won't appear explicitly // in the tar stream, so we don't validate them. if e.Type == internal.TypeChunk { continue }

?

I think this would be best documented somewhere around pkg/chunked/internal/compression.go, this is “just” straightforwardly consuming the structure as designed.

#1939 did add the basic documentation of TypeChunk, although the full semantics of Offset/ChunkOffset etc. is not written down anywhere.

Just to be a good citizen, I have added a commit documenting the format (to the extent I can reverse-engineer it from the code) to the documentation of internal.FileMetadata.

cgwalters · 2024-07-16T22:27:21Z

pkg/chunked/tar_split_linux.go

+
+// iterateTarSplit calls handler for each tar header in tarSplit
+func iterateTarSplit(tarSplit []byte, handler func(hdr *tar.Header) error) error {
+	// This, strictly speaking, hard-codes undocumented assumptions about how github.com/vbatts/tar-split/tar/asm.NewInputTarStream


In theory this API could land as a PR to that repo?

That would certainly be better. The timing will probably force us to carry this in c/storage at for a few days/weeks, but I should definitely prepare a tar-split PR.

I have recorded the task to file a tar-split PR as an item on https://github.com/containers/image/issues/2189 .

Filed as vbatts/tar-split#71 .

cgwalters · 2024-07-16T22:32:16Z

pkg/chunked/tar_split_linux_test.go

+	"github.com/vbatts/tar-split/tar/storage"
+)
+
+func testTarheader(index int, typeFlag byte, size int64) tar.Header {


The name made me thing this is itself a test, how about createTestTarheader?

Thanks, fixed.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

giuseppe

LGTM

mtrmac · 2024-07-18T21:25:41Z

Podman tests passed in containers/podman#23307 .

cgwalters · 2024-07-18T21:36:51Z

/lgtm

In addition to the existing use when creating a TOC from tar data, we will also need it when parsing TOC and tar-split data. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

We are going to be checking its consistency with the TOC. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

giuseppe

/lgtm

openshift-ci · 2024-07-19T18:16:34Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, giuseppe, mtrmac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [giuseppe,mtrmac]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci bot added do-not-merge/work-in-progress approved labels Jul 16, 2024

mtrmac commented Jul 16, 2024

View reviewed changes

mtrmac force-pushed the wip branch from fa8153d to 9e51fdb Compare July 16, 2024 20:51

cgwalters approved these changes Jul 16, 2024

View reviewed changes

mtrmac force-pushed the wip branch 2 times, most recently from 983b14b to 886c158 Compare July 17, 2024 16:43

mtrmac mentioned this pull request Aug 27, 2025

Zstd(:chunked) work tracking checklist containers/container-libs#205

Open

37 tasks

mtrmac force-pushed the wip branch 2 times, most recently from 28222ae to 85d7be5 Compare July 17, 2024 19:01

mtrmac added a commit to mtrmac/libpod that referenced this pull request Jul 17, 2024

DO NOT MERGE: Vendor UNMERGED containers/storage#2035

eedca2b

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

giuseppe approved these changes Jul 18, 2024

View reviewed changes

openshift-ci bot assigned cgwalters Jul 18, 2024

openshift-ci bot added the lgtm label Jul 18, 2024

mtrmac added 4 commits July 18, 2024 23:36

Split NewFileMetadata from pkg/chunked/compressor

2ba2dd1

In addition to the existing use when creating a TOC from tar data, we will also need it when parsing TOC and tar-split data. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Use realistic tar-split data in TestGenerateAndParseManifest

9af9f57

We are going to be checking its consistency with the TOC. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Ensure that the metadata in the TOC matches the tar-split

a1acfed

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Document the TypeReg/TypeChunk storage format

2c4c5b8

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac force-pushed the wip branch from 85d7be5 to 2c4c5b8 Compare July 18, 2024 21:37

openshift-ci bot removed the lgtm label Jul 18, 2024

mtrmac changed the title ~~WIP: Ensure chunked TOC and tar-split metadata are consistent~~ Ensure chunked TOC and tar-split metadata are consistent Jul 18, 2024

mtrmac marked this pull request as ready for review July 18, 2024 23:25

openshift-ci bot removed the do-not-merge/work-in-progress label Jul 18, 2024

giuseppe approved these changes Jul 19, 2024

View reviewed changes

openshift-ci bot assigned giuseppe Jul 19, 2024

openshift-ci bot added the lgtm label Jul 19, 2024

openshift-merge-bot bot merged commit 8d26ede into containers:main Jul 19, 2024

mtrmac deleted the wip branch July 19, 2024 18:24

mtrmac mentioned this pull request Aug 27, 2025

Zstd(:chunked) work tracking checklist containers/container-libs#210

Closed

37 tasks

		// This is horrible, but we don’t know how much padding to skip. (It can be computed from the previous hdr.Size for non-sparse
		// files, but for sparse files that is set to the logical size.)

Conversation

mtrmac commented Jul 16, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mtrmac commented Jul 16, 2024

Uh oh!

cgwalters left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mtrmac Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

giuseppe left a comment

Choose a reason for hiding this comment

Uh oh!

mtrmac commented Jul 18, 2024

Uh oh!

cgwalters commented Jul 18, 2024

Uh oh!

giuseppe left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jul 19, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mtrmac Jul 18, 2024 •

edited

Loading