Skip to content

dockerfile: add COPY --link for copy via merging layers#2596

Merged
tonistiigi merged 2 commits intomoby:masterfrom
tonistiigi:copy-link
Feb 14, 2022
Merged

dockerfile: add COPY --link for copy via merging layers#2596
tonistiigi merged 2 commits intomoby:masterfrom
tonistiigi:copy-link

Conversation

@tonistiigi
Copy link
Member

@tonistiigi tonistiigi commented Feb 4, 2022

replaces #2423
fixes #2414

Enables smarter file copy logic using the mergeop implementation. See docs in the PR contents.

Not 100% sure about the flag name.

Currently on top of #2513 for the progress groups support.

TODO/follow-up:

  • Tests for cache semantics
  • Support for --chown (currently same as with --link=false)

@tonistiigi
Copy link
Member Author

Pushed frontend for testing.

#syntax=tonistiigi/dockerfile:mergeop

To setup buildx with mergeop support:

docker buildx create --name=withmergeop --driver-opt image=moby/buildkit:master
export BUILDX_BUILDER=withmergeop

The progress will show multiple rows for COPY --link unless buildctl from #2513 is used.

Docs: https://github.com/moby/buildkit/blob/67bbbd1f90fefd583449bc8c0bca58a32968b996/frontend/dockerfile/docs/syntax.md#linked-copies-copy---link-add---link

Copy link
Collaborator

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of the flag name, I think --link is fine. I like --rebase too, which is sort of suggestive of how it works in terms of structuring layers, but that might only appeal to me because I know the internal details of how it works.

d.state = d.state.File(a, fileOpt...)
if cfg.opt.llbCaps.Supports(pb.CapMergeOp) == nil && cfg.link && cfg.chmod == "" {
mergeOpt := append(fileOpt, llb.WithCustomName(prefixCommand(d, "LINK "+name, d.prefixPlatform, &platform, env)))
d.state = llb.Merge([]llb.State{d.state, llb.Scratch().File(a, fileOpt...)}, mergeOpt...)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If it's easy to implement, it seems like if the user is copying from another stage (i.e. not a local ref) and isn't using any special flags, then the case of COPY --link --from=foo / / could skip the file op and just merge directly.

This wouldn't be needed if we manage to get the more optimized implementation where all copies are directly hardlinked into merges before the next release, but not sure how plausible that is as it seemed fairly complicated when we last discussed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, though I just thought of something that's worth considering. I remembered that this behavior implies that any deletions present in the layers of the source of the link will be applied on top of the new base, which is different than the behavior you get with copies where all the source layers are squashed and deletions are discarded.

I can't think of a single realistic Dockerfile use case where this would matter, but did want to bring it up in case you think this somewhat divergent behavior might matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the whiteouts but thought about another case. This currently would mess up the history array in the image as history array will only expect that one layer was generated by the command. Possibly something can be done with adding some markers to overcome this (not quite sure) but I think I will revert this part for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay cool sgtm, the use case for this optimization was pretty niche anyways so doesn't hurt much to save it for the future.

@tonistiigi tonistiigi force-pushed the copy-link branch 2 times, most recently from 709b5ca to b784ea0 Compare February 9, 2022 00:48
@thaJeztah
Copy link
Member

Question; would this option only apply to COPY operations that take a --from ? (so --link means "apply the source from --from as-is)?

If that's the case, should it be an option on --from? (similar to --mount type=xxx,src=xxx)

@thaJeztah
Copy link
Member

Ignore me; I was commenting before reading (let me get back to that)


To use this flag set Dockerfile version to at least `1.4`.

```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add dockerfile code-hints to the blocks that show Dockerfile examples? (so that highlighting will work if we publish it)

COPY /foo /bar
```

and merging all the layers of both images together.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merging all the layers

Does that mean that this produces a single layer (alpine + copy) ? Won't that result in the alpine layer no longer being shared? (I recall we tried to avoid that when implementing docker build --squash, where we kept the base-layer out of the squashing to allow it to still be shared).

FROM alpine
COPY --link /foo /bar

This somewhat implies that;

FROM alpine
COPY --link /foo /bar
COPY --link /bar /baz

Would result in a single layer (alpine + copy + copy) (is that correct?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is no flatten. Merging layers means that if you merge 2 layers with 3 then result is 5 layers. Layers are immutable blobs in the registry.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, I'm working on some user docs for the lower-level MergeOp. We can consider updating these Dockerfile docs to mention those ones once I'm done, which might help clarify questions like this that have to do with finer-grain details.

Enables smarter file copy logic using the mergeop
implementation.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
Copy link
Member

@crazy-max crazy-max left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, also looking forward for typical use cases in our docs https://github.com/moby/buildkit/pull/2596/files#r803941857 to avoid confusion with layers aggregation.

@sipsma
Copy link
Collaborator

sipsma commented Feb 14, 2022

SGTM, also looking forward for typical use cases in our docs https://github.com/moby/buildkit/pull/2596/files#r803941857 to avoid confusion with layers aggregation.

Still WIP, but pushed the draft here if you want to take a look now: #2633


#### Incompatibilities with `--link=false`

When using `--link` the `COPY/ADD` commands are not allowed to read any files from the previous state. This means that if in previous state the destination directory was a path that contained a symlink, `COPY/ADD` can not follow it. In the final image the destination path created with `--link` will always be a path containing only directories.
Copy link
Collaborator

@tiborvass tiborvass Feb 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: ignore comment, i'm completely wrong
Need also mention that you can no longer merge directories from a previous layer:

$ mkdir dir_to_be_merged ; echo foo > dir_to_be_merged/foo
$ cat > Dockerfile <<EOF
FROM alpine
RUN mkdir /dir && echo bar > /dir/bar
COPY dir_to_be_merged /dir
RUN test "$(cat /dir/bar)" = bar
EOF
$ docker build . # should work without COPY --link=false and fail with COPY --link

@tiborvass
Copy link
Collaborator

tiborvass commented Feb 14, 2022

My main concern is that anything named link next to a copy rings "symlink/hardlink", even if I can see the link to "link" (pun intended). Unfortunately, I only have other bad ideas but maybe it will spur better ideas:

  • copy --mount, pros: --mount exists already, cons: --mount exists already on RUN, + different syntax
  • copy --overlay
  • copy --insert
  • copy --smart
  • copy -x
  • new keywords: insert, overlay, mount, copy2, copyx

@tonistiigi tonistiigi merged commit 6102755 into moby:master Feb 14, 2022
@rcjsuen
Copy link

rcjsuen commented Apr 27, 2022

@tonistiigi Hi, just to confirm...

  1. Is --link the same as --link=true?
  2. Is --link=false automatically implied in all regular ADD and COPY instructions with no flags? If not, when would a user want to use --link=false?

Thank you for the information!

@thaJeztah
Copy link
Member

thaJeztah commented Apr 27, 2022

Usually, =false (or =true) wouldn't be used, but for boolean options, it's common to have the =true/false "implicit". The same applies to most command-line tools, for example;

This;

docker --version
Docker version 20.10.14, build a224086

Is "implicitly" the same as an explicit:

docker --version=true
Docker version 20.10.14, build a224086

Same for "don't pass the --version flag";

docker
# (no --version set prints usage)

docker --version=false
# (same - prints usage)

That said, it's possible that (perhaps!) these options can at some point be filled in through a build-arg, in which case being able to explicitly specify the true / false value could become useful;

FROM foobar
ARG USE_LINK=false
COPY --link=$USE_LINK ....

@rcjsuen
Copy link

rcjsuen commented Apr 27, 2022

@thaJeztah Thank you very much for clarifying. I work on a Dockerfile linter so I just wanted to confirm the different syntax variants that are valid. Have a nice day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

dockerfile: eliminate dependency on dest directory for COPY

6 participants