Update dockerfiles to do staged builds#19952
Merged
yxieca merged 30 commits intosonic-net:masterfrom May 14, 2025
Merged
Conversation
5 tasks
5287e97 to
0115b4d
Compare
k-v1
reviewed
Oct 17, 2024
Contributor
There was a problem hiding this comment.
@saiarcot895
You can use debian slim images to reduce final image size as it was suggested here: #19008.
Contributor
Author
There was a problem hiding this comment.
Agreed. I want to keep the focus of this PR on unblocking Docker upgrades, but I had to bring in some space optimization stuff (see the COPY at the end of this file) to get things to work.
On newer versions of Docker, only the buildkit builder is supported, and cannot be disabled by setting DOCKER_BUILDKIT to 0. The side effect of this is that the behavior of `--squash` is different (see moby/moby#38903). This will result in the container sizes being significantly higher. To work around this, make all of our builds two-stage builds, with the `--squash` flag entirely removed. The way this works is in the first stage, whatever new files/packages need to be added are added (along with files/packages that need to be removed). Then, in the second stage, all of the files from the final state of the first stage are copied to the second stage. As part of this, also consolidate the container cleanup code into `post_run_cleanup`, and remove it from the individual containers (for consistency). Also experiment a bit with not needing to explcitly install library dependencies, and let apt install it as necessary. This will help during upgrades in the case of ABI changes for packages. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This shouldn't be committed Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
The docker root cleanup is removing the contents of the docker root directory we create from within a container. However, this container isn't using the container registry variable, which menas it may fail depending on the network environment. Fix this by prefixing the container registry variable The docker root directory creation is missing the `shell` at the beginning, which means the directory doesn't actually get created. While the docker command later will still create the directory automatically, fix this and make sure it gets created here. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
It seems that on the Bullseye slave container (not sure about Buster), the nofile ulimit is set to 1048576:1048576 (as in, 1048576 for both the soft and hard limit). However, the Docker startup script in version 25 and newer sets the hard limit to 524288 (because of moby/moby#c8930105b), which fails because then the soft limit will be higher than the hard limit, which doesn't make sense. However, on a Bookworm slave container, the nofile ulimit is set to 1024:1048576, and the startup script's ulimit command goes through. A simple workaround would be to explicitly set the nofile ulimit to be 1024:1048576 for all slave containers. However, sonic-swss's tests needs more than 1024 file descriptors open, because the test code doesn't clean up file descriptors at the end of each test case/test suite. This results in FD leaks. Therefore, set the ulimit to 524288:1048576, so that Docker's startup script can lower it to 524288 and swss can open file descriptors. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
With the new approach of building the images (where the entire final rootfs is copied into the second stage), if the system building the containers is using the overlay2 storage driver (which is the default) and is able to use native diffs (which could be true if CONFIG_OVERLAY_FS_REDIRECT_DIR isn't enabled in the kernel), then the final result of the image will be different than if naive diffs (where Docker compares the metadata of each file and, if needed, the contents to find out if something has changed) were used. Specifically, with native diffs, each container would be much larger, since technically speaking, the whole rootfs is being written to, even if the content ends up the same. This appears to be a known issue (in some form), and workarounds are being thought of in moby/moby#35280. As a workaround, install rsync into the base container, copy the entirely of that into an empty base image, and use rsync to copy only the changed files into the layer in one shot. This does mean that rsync will remain installed in the final built containers, but hopefully this is fine. Signe-off-by: Saikrishna Arcot <sarcot@microsoft.com>
7968803 to
0b85785
Compare
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
|
/azpw run Azure.sonic-buildimage |
Collaborator
|
/AzurePipelines run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
|
/azpw ms_conflict |
1 similar comment
Contributor
|
/azpw ms_conflict |
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Collaborator
|
/azp run Azure.sonic-buildimage |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Contributor
Author
|
/azpw ms_conflict |
saiarcot895
added a commit
to saiarcot895/sonic-buildimage
that referenced
this pull request
May 19, 2025
This reverts commit fd1a0ea.
qiluo-msft
pushed a commit
that referenced
this pull request
May 21, 2025
DavidZagury
pushed a commit
to DavidZagury/sonic-buildimage
that referenced
this pull request
May 27, 2025
This reverts commit fd1a0ea.
mssonicbld
added a commit
to mssonicbld/sonic-buildimage
that referenced
this pull request
May 27, 2025
Reverts sonic-net#19952 due to regressions in KVM debug image size, issues with certain containers (such as docker-sysmgr) not getting updated, and docker-sonic-telemetry having the wrong base container.
mssonicbld
added a commit
that referenced
this pull request
May 28, 2025
Reverts #19952 due to regressions in KVM debug image size, issues with certain containers (such as docker-sysmgr) not getting updated, and docker-sonic-telemetry having the wrong base container.
saiarcot895
added a commit
to saiarcot895/sonic-buildimage
that referenced
this pull request
Jul 1, 2025
…onic-net#22647) This reverts commit a83309f.
9 tasks
This was referenced Feb 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why I did it
On newer versions of Docker, only the buildkit builder is supported, and
cannot be disabled by setting DOCKER_BUILDKIT to 0. The side effect of
this is that the behavior of
--squashis different (seemoby/moby#38903). This will result in the container sizes being
significantly higher.
Work item tracking
How I did it
To work around this, make all of our builds two-stage builds, with the
--squashflag entirely removed. The way this works is in the firststage, whatever new files/packages need to be added are added (along
with files/packages that need to be removed). Then, in the second stage,
use rsync to copy over the changed files as a single command/layer. In the
case of the base layer for each Debian version, the final result of the first
stage will be copied into an empty base layer.
As part of this, also consolidate the container cleanup code into
post_run_cleanup, and remove it from the individual containers (forconsistency). Also experiment a bit with not needing to explicitly
install library dependencies, and let apt install it as necessary. This
will help during upgrades in the case of ABI changes for packages.
Also, remove the
SONIC_USE_DOCKER_BUILDKIToption, and don'tset
DOCKER_BUILDKIToption. This option will eventually have no impact.This also means that builds will now use buildkit, as that is the default now.
How to verify it
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)