Skip to content

dockerfile: eliminate dependency on dest directory for COPY #2414

@tonistiigi

Description

@tonistiigi

In MergeOp #2335 we are adding capability that COPY layers can be rebased and reused via --cache-from even if cache for previous layers gets invalidated. All this works remotely with blobs in the registry. You can rebase an image on top of another image without the layers ever being downloaded or uploaded.

In dockerfile frontend every copy(src, dest) will change to merge(dest, copy(src, scratch()).

In order for a copy to work on remote objects only, it can not access any individual paths from the destination directory.

The problem with this is the behavior in the case when the destination directory does not exist. In that case, new dir is created currently with new properties but if it exists then nothing is changed about the directory.

Eg. when we have a Dockerfile

FROM alpine
COPY foo a/b/c/foo

and after change a new Dockerfile

FROM alpine
RUN mkdir -p a/b/c && chmod -Rf a/b/c 0600
COPY foo a/b/c/foo

If we rebase the copy layer blob directly it would be wrong as the layer already contains directory a/b/c with perm 0755 that would overwrite the previous layer. While if the second file runs directly then a/b/c would remain 0600.

Cases where we can solve this problem

When USER is root and no --chown/chmod is set we can fix this by never putting records for the implied parent dirs in the tarball that COPY created. The tarball will only contain one record a/b/c/foo. When the image is pulled, a container runtime like docker will fill in the missing directories for a/b/c with default configuration when they do not exist.

In order to make this work, we need to log the actual changes COPY made so we can exclude the implied parent directories when making a tarball. Started with that in tonistiigi/fsutil#113

Cases that can't be solved

When COPY contains --chown=username there is no way this copy can be rebased with remote objects only. The username to uid mapping is in the parent image and the only way to check if it has changed is to extract the image and read /etc/passwd. This is unfortunate as this mapping pretty much never changes but don't see any solutions.

Cases that could be solved with some additional syntax

COPY --chown=uid and COPY --chmod=non-default-perms would not work by default. We can't just exclude the implied parents as docker would only create these parents with default perms/user. While in Dockerfile, unfortunately, the rule is that implied parents also get these chown/chmod values (what doesn't really make any sense but we can't just break it and I don't want to create v2 just for this).

We could allow rebases with these COPY instructions if there would be some additional (opt-in) syntax(eg. new flag) where the user either confirms that COPY should not create implied parent directories or that it should always create them(up to a point). We need to eliminate the need to stat the destination directory in order to determine what the resulting state should be. From user's standpoint they almost always already know if the directory already exists or should be created. Ideally, it would be something that we could at least write a linter rule and suggest all users to always use this syntax.

Suggestions?

@sipsma @thaJeztah @crazy-max @AkihiroSuda @aaronlehmann

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions