Skip to content

Update dockerfiles to do staged builds#19952

Merged
yxieca merged 30 commits intosonic-net:masterfrom
saiarcot895:update-dockerfiles
May 14, 2025
Merged

Update dockerfiles to do staged builds#19952
yxieca merged 30 commits intosonic-net:masterfrom
saiarcot895:update-dockerfiles

Conversation

@saiarcot895
Copy link
Copy Markdown
Contributor

@saiarcot895 saiarcot895 commented Aug 19, 2024

Why I did it

On newer versions of Docker, only the buildkit builder is supported, and
cannot be disabled by setting DOCKER_BUILDKIT to 0. The side effect of
this is that the behavior of --squash is different (see
moby/moby#38903). This will result in the container sizes being
significantly higher.

Work item tracking
  • Microsoft ADO (number only): 29277538

How I did it

To work around this, make all of our builds two-stage builds, with the
--squash flag entirely removed. The way this works is in the first
stage, whatever new files/packages need to be added are added (along
with files/packages that need to be removed). Then, in the second stage,
use rsync to copy over the changed files as a single command/layer. In the
case of the base layer for each Debian version, the final result of the first
stage will be copied into an empty base layer.

As part of this, also consolidate the container cleanup code into
post_run_cleanup, and remove it from the individual containers (for
consistency). Also experiment a bit with not needing to explicitly
install library dependencies, and let apt install it as necessary. This
will help during upgrades in the case of ABI changes for packages.

Also, remove the SONIC_USE_DOCKER_BUILDKIT option, and don't
set DOCKER_BUILDKIT option. This option will eventually have no impact.
This also means that builds will now use buildkit, as that is the default now.

How to verify it

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saiarcot895
You can use debian slim images to reduce final image size as it was suggested here: #19008.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I want to keep the focus of this PR on unblocking Docker upgrades, but I had to bring in some space optimization stuff (see the COPY at the end of this file) to get things to work.

saiarcot895 and others added 14 commits January 9, 2025 08:55
On newer versions of Docker, only the buildkit builder is supported, and
cannot be disabled by setting DOCKER_BUILDKIT to 0. The side effect of
this is that the behavior of `--squash` is different (see
moby/moby#38903). This will result in the container sizes being
significantly higher.

To work around this, make all of our builds two-stage builds, with the
`--squash` flag entirely removed. The way this works is in the first
stage, whatever new files/packages need to be added are added (along
with files/packages that need to be removed). Then, in the second stage,
all of the files from the final state of the first stage are copied to
the second stage.

As part of this, also consolidate the container cleanup code into
`post_run_cleanup`, and remove it from the individual containers (for
consistency). Also experiment a bit with not needing to explcitly
install library dependencies, and let apt install it as necessary. This
will help during upgrades in the case of ABI changes for packages.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This shouldn't be committed

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
The docker root cleanup is removing the contents of the docker root
directory we create from within a container. However, this container
isn't using the container registry variable, which menas it may fail
depending on the network environment. Fix this by prefixing the
container registry variable

The docker root directory creation is missing the `shell` at the
beginning, which means the directory doesn't actually get created. While
the docker command later will still create the directory automatically,
fix this and make sure it gets created here.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
It seems that on the Bullseye slave container (not sure about Buster),
the nofile ulimit is set to 1048576:1048576 (as in, 1048576 for both the
soft and hard limit). However, the Docker startup script in version 25
and newer sets the hard limit to 524288 (because of
moby/moby#c8930105b), which fails because then the soft limit will be
higher than the hard limit, which doesn't make sense.  However, on a
Bookworm slave container, the nofile ulimit is set to
1024:1048576, and the startup script's ulimit command goes through.

A simple workaround would be to explicitly set the nofile ulimit to be
1024:1048576 for all slave containers. However, sonic-swss's tests needs
more than 1024 file descriptors open, because the test code doesn't
clean up file descriptors at the end of each test case/test suite. This
results in FD leaks.

Therefore, set the ulimit to 524288:1048576, so that Docker's startup
script can lower it to 524288 and swss can open file descriptors.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
With the new approach of building the images (where the entire final
rootfs is copied into the second stage), if the system building the
containers is using the overlay2 storage driver (which is the default)
and is able to use native diffs (which could be true if
CONFIG_OVERLAY_FS_REDIRECT_DIR isn't enabled in the kernel), then the
final result of the image will be different than if naive diffs (where
Docker compares the metadata of each file and, if needed, the contents
to find out if something has changed) were used. Specifically, with
native diffs, each container would be much larger, since technically
speaking, the whole rootfs is being written to, even if the content ends
up the same. This appears to be a known issue (in some form), and
workarounds are being thought of in moby/moby#35280.

As a workaround, install rsync into the base container, copy the
entirely of that into an empty base image, and use rsync to copy only
the changed files into the layer in one shot. This does mean that rsync
will remain installed in the final built containers, but hopefully this
is fine.

Signe-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@saiarcot895
Copy link
Copy Markdown
Contributor Author

/azpw run Azure.sonic-buildimage

@mssonicbld
Copy link
Copy Markdown
Collaborator

/AzurePipelines run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@saiarcot895
Copy link
Copy Markdown
Contributor Author

/azpw ms_conflict

1 similar comment
@liushilongbuaa
Copy link
Copy Markdown
Contributor

/azpw ms_conflict

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
@mssonicbld
Copy link
Copy Markdown
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link
Copy Markdown

Azure Pipelines successfully started running 1 pipeline(s).

@saiarcot895
Copy link
Copy Markdown
Contributor Author

/azpw ms_conflict

@yxieca yxieca merged commit fd1a0ea into sonic-net:master May 14, 2025
19 checks passed
@saiarcot895 saiarcot895 deleted the update-dockerfiles branch May 16, 2025 20:50
saiarcot895 added a commit to saiarcot895/sonic-buildimage that referenced this pull request May 19, 2025
qiluo-msft pushed a commit that referenced this pull request May 21, 2025
Reverts #19952 due to regressions in KVM debug image size, issues with certain containers (such as docker-sysmgr) not getting updated, and docker-sonic-telemetry having the wrong base container.
DavidZagury pushed a commit to DavidZagury/sonic-buildimage that referenced this pull request May 27, 2025
mssonicbld added a commit to mssonicbld/sonic-buildimage that referenced this pull request May 27, 2025
Reverts sonic-net#19952 due to regressions in KVM debug image size, issues with certain containers (such as docker-sysmgr) not getting updated, and docker-sonic-telemetry having the wrong base container.
mssonicbld added a commit that referenced this pull request May 28, 2025
Reverts #19952 due to regressions in KVM debug image size, issues with certain containers (such as docker-sysmgr) not getting updated, and docker-sonic-telemetry having the wrong base container.
saiarcot895 added a commit to saiarcot895/sonic-buildimage that referenced this pull request Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants