Skip to content

MergeOp docs#2212

Closed
sipsma wants to merge 7 commits intomoby:masterfrom
sipsma:mergeop_doc
Closed

MergeOp docs#2212
sipsma wants to merge 7 commits intomoby:masterfrom
sipsma:mergeop_doc

Conversation

@sipsma
Copy link
Collaborator

@sipsma sipsma commented Jun 30, 2021

These docs outline the proposed approach to implementing MergeOp and the interrelated DiffOp. They are split up into 4 docs to keep things more manageable and will probably make the most sense to read in the order specified by the file name's prefix. Feel free to just review one at a time as it's obviously a lot to read all at once.

@sipsma sipsma requested a review from tonistiigi June 30, 2021 00:21
@sipsma
Copy link
Collaborator Author

sipsma commented Jun 30, 2021

cc @AkihiroSuda @coryb @ktock please take a look if you have time and let me know if you have any concerns with the new features proposed here.

Copy link
Collaborator

@coryb coryb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I am not too familiar with the caching internals or with import/export but what you have written makes sense to me.

* Additionally, to support (de)serialization of this data structure, we will simply use the tar format, but instead of including actual file data the tar archive will consist of zero-sized files and thus consist only of tar headers, which have all the metadata we need for the file.
* When storing serialized data in the cache metadata, we will also include a format specifier, enabling us to support other formats in the future if ever needed
* It’s worth noting that there are degenerate cases where storing this metadata may noticeably increase disk+memory usage, namely where there are very large numbers of files created in a layer.
* For example, if there is 128 bytes of file/dir metadata, 100k files+dirs would be about 12 MB of metadata. As a reference, the base ubuntu image has just under 3k files, so it’s unlikely this should become meaningful for all but the most extreme cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect with compression we could reduce this size by >90%, the metadata should be highly compressable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me, once we get to this stage of implementation I'll experiment and see what ratios can be achieved

@sipsma sipsma marked this pull request as draft July 6, 2021 17:50
@sipsma
Copy link
Collaborator Author

sipsma commented Jul 6, 2021

Marking this as a draft as @tonistiigi doesn't want us to merge the docs for now; the docs are ready to review though and we will still use this PR as a way of collecting comments

@AkihiroSuda
Copy link
Member

Maybe we should put them in ./docs/proposal directory until the implementation gets merged?

Or just have a header to clarity that the implementation of the proposal is not merged yet.

Copy link
Collaborator

@ktock ktock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the great write-up!

Comment on lines +3 to +4
## **Requirements**

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if we have a description about the high-level use-cases/benefits by MergeOp. (e.g. which Dockerfile instructions are expected to be optimized by this?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I had only discussed it with people familiar with the use cases before this and forgot to include them here :-)

Added a Use Cases section.


stateC := Scratch().
File(Mkfile("/foo", 0777, []byte("C"))).
File(Mkfile("/c", 0777, []byte("C")))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If here we do File(Rm("/a")), will Merge(stateB, stateC) contain /a ?
IIUC stateC doesn't contain /a so the whiteout won't be recorded by overlayfs and Merge(stateB, stateC) will do contain /a. Is this an expected result?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good question but that's the expected behavior. In general I think it will be very rare for a user to want to create a layer with a deletion at a non-existent file, but in those obscure cases they can create a base layer with the file (i.e. Scratch().File(Mkfile("/a"))) and then do the deletion on top of that.

I don't love that but it's a fairly simple workaround that works and can be made seamless by higher-level tools consuming LLB if needed.

The only significant interface change will be extending cache.Accessor with this method:
* `Merge(ctx context.Context, parents []ImmutableRef, opts ...RefOption) (ImmutableRef, error)`

Merge, as you’d expect, creates an ImmutableRef that represents the merge of the given parents and returns it to the caller. That ImmutableRef can then be used just like any other ImmutableRef can be today.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to use the "merged" ImmutableRef just like any other ImmutableRef (e.g. using this as a parent of other snapshots), the merged result needs to be recorded to the snapshotter as a snapshot, right? IIUC containerd's overlayfs snaphsotter doesn't support importing a directory as a snapshot, I guess we need to fork our own snapshotter implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly, we shouldn't need to fork snapshotters; I believe we can implement the whole design on top of any overlay-based snapshotter, including containerd's overlay and even stargz.

For the case you brought up, using a merged ref as the parent of a new snapshot, all of the special handling will take place in Buildkit's cache package. So, if you want to merge together A and B and then create a new snapshot C whose parent is that merged ref, the cache package will create a brand new empty snapshot for C and then merge its mounts on top of the merged mounts of A and B. Essentially, while the end user of LLB thinks of C as a snapshot on top of Merge(A, B), in the underlying snapshotter there are actually 3 separate snapshot chains for A, B and C.

I detailed exactly how this will work in the Efficient Snapshot Merge of the 1_mergeop.md doc (plus the Efficient Snapshot Merge Details section in the appendix).

There are some significant corner cases surrounding all this described in the DiffOp and Import/Export docs, but they all have fairly straightforward workarounds thankfully. Let me know if you can think of any I missed though!

* Cons
* The format is not perfectly optimized for our use case of conflict detection (same as Tar)
* Not clear if the continuity package is maintained much these days besides the util functions under fs used by containerd’s differ (which the proto is separate from). We may become partially or fully responsible for maintenance of the proto and thus have to deal with any other consumers of it that exist out in the world.
* Custom Proto or JSON
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOC by stargz/eStargz (doc) might be another good candidate of JSON-formatted filesystem metadata :)
(zstd:chunked by Podman also uses the similar one.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting, thanks for pointing me to that! I never thought to look but makes sense that stargz had to solve a similar problem too. I think for now I still want to go with the proposal in the doc of using tar because it simplifies some of details around import/export of this metadata and doesn't tie us into any changes you may need to make to the stargz format in the future.

However, I added a section mentioning it in case we ever want to reconsider this decision in the future and are looking back for other options.


## Conflict Detection Alternatives

There are a few existing userspace emulations of overlayfs that were considered for implementing our layer conflict detection features:
Copy link
Collaborator

@ktock ktock Jul 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need userspace filesystem for conflict calculation? Isn't it enough to just compare filesystem metadata between layers? (maybe I'm missing something)

EDIT: sorry, nevermind. This section says they are alternatives so the primary method of conflict detection is to compare filesystem metadata, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, yeah these are the alternatives, I just wanted to mention them to give context on why we're proposing doing our own custom code instead of using a library.

Added an extra sentence clarifying this.

sipsma added 6 commits July 11, 2021 16:33
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
Signed-off-by: Erik Sipsma <erik@sipsma.dev>
@sipsma
Copy link
Collaborator Author

sipsma commented Jul 11, 2021

Maybe we should put them in ./docs/proposal directory until the implementation gets merged?

Good idea, moved it to proposals.

Signed-off-by: Erik Sipsma <erik@sipsma.dev>
@sipsma sipsma mentioned this pull request Sep 3, 2021
@sipsma
Copy link
Collaborator Author

sipsma commented Jan 6, 2022

Closing this in favor of a better style of docs, as mentioned+tracked here.

@sipsma sipsma closed this Jan 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants