Skip to content

push: improve performance by 3 to 5 times, use gzip level 1 by default#48106

Open
morotti wants to merge 1 commit intomoby:masterfrom
man-group:compression-fix
Open

push: improve performance by 3 to 5 times, use gzip level 1 by default#48106
morotti wants to merge 1 commit intomoby:masterfrom
man-group:compression-fix

Conversation

@morotti
Copy link

@morotti morotti commented Jul 1, 2024

Hello,

Could you correct the compression level used by docker push?

There has been open issues about this for 11 years and it's not been fixed. The fix is trivial and takes 1 line of code. #1266

If you want to do it very cleanly, you could add a setting to allow specifying a compression level 1-9 or 0-9.
(Level 0 is no compression, it's debatable whether no compression should be allowed because it affects storage on the container registry, docker hub or internal registry).

docker push is compressing the image on the fly during the push using gzip.
gzip compression is slow, the default level 6 is very slow and a poor tradeoff performance/compression.
I see the gzip compression is limited to 12 MB/s on my servers.
I see docker push is taking whole minutes at the ends of builds, with all the time taken by the gzip compression during the push. It's single threaded compression by the way.
For people working with GPU and ML, images can approach 10 GB of content these days. It takes 10-20 minutes to push an image. All that time is taken by the inefficient gzip compression.

Rather than the default level of 6, you should use the level 1, that should make compression 3 to 5 times faster at the cost of 1-2% larger image size.
You may want to run a build of docker with this fix and try a few images to see how much differences you get.

I've had a look into the go compressor in the standard library (gzip/compress).
From what I have gathered, at the time docker was written and the old tickets were discussed, circa 2015, the go compiler was bugged, setting the compression levels did not work. Levels barely made any difference when levels 9-1 are supposed to be 10-100 MB/s on compression (or that order of magnitude). The go standard library was fixed a few years later.

(There are other improvements to use lz4/zstd or to compress on multiple threads, but that is outside of the scope of this issue, and a much bigger piece of work.)

fixes: 1266

- Description for the changelog

push: improve performance, use gzip compression level 1 instead of 6

- A picture of a cute animal (not mandatory but encouraged)

image

@morotti morotti force-pushed the compression-fix branch 2 times, most recently from 107e383 to 555abaf Compare July 1, 2024 16:36
Signed-off-by: rmorotti <romain.morotti@man.com>
@morotti morotti force-pushed the compression-fix branch from 555abaf to 70463c3 Compare July 1, 2024 16:38
@morotti
Copy link
Author

morotti commented Jul 1, 2024

(passed the git signature)

Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on the change, however, I think we should still provide an option to customize this (or at least get the original behavior back).

@vvoland vvoland added area/distribution Image Distribution kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. area/performance labels Jul 2, 2024
@thaJeztah
Copy link
Member

Yes, I think this needs a closer look;

  • This flow would only work when not using the containerd image store; with the containerd image store, layers are kept in their compressed format, so either the layers as pulled from the registry, or layers as created by docker build.
    • ^^ In that situation, no compression happens during push
    • ^^ Given that our goal is to ultimately move towards the containerd image store as default, we should be careful implementing features for the non-containerd situation if we don't have a design yet for the containerd image store, otherwise we will regress on features in the new situation.
  • We must have a close look at the behavior with this change. This part is slightly more "complicated";
    • When pushing an image to the same registry as where the image was pulled from, the engine has an optimisation to check if the content is already there, in which no compressed layers are pushed (and thus, not created).
    • ☝️ we need to double check the behavior there
    • 👉 because if changing the compression here would unconditionally compress layers with the new compression, it would mean that base-layers would always be created with a new digest
    • 👉 so instead of "push only the layers I created in my FROM <some base imaeg> image", we could now end up with "push new versions of ALL layers, including <base image>", and such layers no longer being shared when pulling.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs further discussion (see above)

@fersarr
Copy link

fersarr commented Jul 11, 2024

Hi @thaJeztah , thanks for having a look! This change is quite important for us too as docker push takes really long currently. I saw you added a label "1-design-review", what does that mean? Does that imply it will be reviewed further by the Docker team or do you expect @morotti to provide anything else? Would it help if this change was off by default and configurable?

roolebo added a commit to roolebo/moby that referenced this pull request Aug 14, 2024
gzip compression is notoriously slow. The compression on skylake happens
at 7MiB/s. The push rate is not sustainable for large multi-gigabyte
images.

zstd/fastest runs at 81 MiB/s for the same image. That gives tenfold
push speedup with default flags.

Resolves: moby#1266
See also: moby#48106

Signed-off-by: Roman Bolshakov <rbolshakov@ddn.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/distribution Image Distribution area/performance kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/1-design-review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants