-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
- relates to Remove squash from build #34565
Multi stage builds are a better alternative to the experimental --squash option for most use cases. However, there is one use-case that's not addressed by this feature (see #34565 (comment)). Some silly examples below are just to illustrate the issue.
Copying all files from a build-stage to the final image (from scratch) produces a single-layer image:
FROM alpine AS build-stage
RUN apk add --no-cache man
RUN apk add --no-cache nginx
RUN apk del --no-cache man
FROM scratch
COPY --from=build-stage / /While works correctly if the final image is built from scratch (and may be the desired result), the resulting image is fully squashed, including the base-image, and therefore does not take advantage of sharing the base image layer with other images.
When attempting to use the same base-image for both stages;
FROM alpine AS build-stage
RUN apk add --no-cache man
RUN apk add --no-cache nginx
RUN apk del --no-cache man
FROM alpine
COPY --from=build-stage / /The entire base image is included twice in the final image (once as the "base" layer, and once as part of the squashed / copied layer):
REPOSITORY TAG IMAGE ID CREATED SIZE
squashed2 latest ad22751e06e6 About an hour ago 9.28MB
squashed latest 94080bdf30e4 About an hour ago 5.31MB
Ideally the builder would automatically skip files that are already present in the base image (based on their checksum/metadata). Doing so would resolve cases where the same base image is used for both stages.
In many cases this won't be enough though; the build-stage could use a different base image (containing build-tools), and only artifacts produced by the build-stage itself should be copied; for example in the following dockerfile:
FROM buildpack-deps:stretch AS build-stage
COPY . /usr/src/
WORKDIR /usr/src/
RUN gcc -g .......
FROM debian:stretch
COPY --from build-stage / /While it is best practice to be specific what to copy from a stage (i.e., only the artifacts you need), however, keeping track of exact paths and files to copy can be cumbersome, and error-prone (for example, determining which files are installed by an apt-get install foo may not be known by the user). Artifacts can also be distributed over many (sub)directories, some of those containing files that were inherited from the base image.
Proposal: add option to skip files from the base image
The COPY command should have an option to ignore any file that was already present in the base image to address the issues mentioned above. I don't have a proper name for this option yet, so just using --skip-base as a placeholder. Example uses would look something like:
Copy every file that was added/modified in build-stage to the final image:
FROM alpine AS build-stage
RUN apk add --no-cache nginx
FROM alpine
COPY --skip-base --from=build-stage / /Copy every file _inside /usr that was created in build-stage to the final image
FROM buildpack-deps:stretch AS build-stage
COPY . /usr/src/
WORKDIR /usr/src/
RUN gcc -g .......
FROM debian:stretch
COPY --skip-base --from=build-stage /usr /usr/cc @ijc @dnephin @tonistiigi PTAL