Skip to content

docker squash: Consolidate image layers#13929

Closed
jlhawn wants to merge 6 commits intomoby:masterfrom
jlhawn:layer_squash
Closed

docker squash: Consolidate image layers#13929
jlhawn wants to merge 6 commits intomoby:masterfrom
jlhawn:layer_squash

Conversation

@jlhawn
Copy link
Contributor

@jlhawn jlhawn commented Jun 13, 2015

This patch adds a new CLI command:

$ docker squash --help

Usage: docker squash [OPTIONS] IMAGE [ANCESTOR]

Merge filesystem layers of an image into a new image

  --help=false        Print usage
  --no-trunc=false    Don't truncate output

Example use cases:

  • Consolidating all layer of an image to a new single-layered image.

    Let's inspect the history of the busybox image.

    $ docker history busybox
    IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
    8c2e06607696        8 weeks ago         /bin/sh -c #(nop) CMD ["/bin/sh"]               0 B                 
    6ce2e90b0bc7        8 weeks ago         /bin/sh -c #(nop) ADD file:8cf517d90fe79547c4   2.43 MB             
    cf2616975b4a        8 weeks ago         /bin/sh -c #(nop) MAINTAINER Jérôme Petazzo     0 B
    

    It has 3 layers total, but we can now squash it and tag it as something else.

    $ docker squash --tag better_busybox busybox
    ccbe958582b2
    

    Now we have these 2 images.

    $ docker images
    REPOSITORY          TAG                 IMAGE ID            CREATED              VIRTUAL SIZE
    better_busybox      latest              ccbe958582b2        About a minute ago   2.43 MB
    busybox             latest              8c2e06607696        8 weeks ago          2.43 MB
    

    But better_busybox only has a single layer in its history.

    $ docker history better_busybox
    IMAGE               CREATED              CREATED BY          SIZE                COMMENT
    ccbe958582b2        About a minute ago                       2.43 MB 
    
  • Merging layers of a newly built image into a single layer relative to the base image.

    Let's build an image using this Dockerfile:

    FROM busybox:latest
    
    MAINTAINER Josh Hawn <josh.hawn@docker.com>
    
    ADD docker/docs /docs
    RUN echo hello > /hello.txt
    $ docker build -t test_build .
    Sending build context to Docker daemon 93.87 MB
    Step 0 : FROM busybox:latest
     ---> 8c2e06607696
    Step 1 : MAINTAINER Josh Hawn <josh.hawn@docker.com>
     ---> Running in 0c2b29e087d5
     ---> 4ac4b75a88f6
    Removing intermediate container 0c2b29e087d5
    Step 2 : ADD docker/docs /docs
     ---> c9e50d91bc69
    Removing intermediate container ad9e921d943f
    Step 3 : RUN echo hello > /hello.txt
     ---> Running in e15283387c37
     ---> 49644b91f444
    Removing intermediate container e15283387c37
    Successfully built 49644b91f444
    

    And inspect the history of this new image:

    $ docker history test_build
    IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
    49644b91f444        24 minutes ago      /bin/sh -c echo hello > /hello.txt              6 B                 
    c9e50d91bc69        24 minutes ago      /bin/sh -c #(nop) ADD dir:24ae8a736955084e39d   8.537 MB            
    4ac4b75a88f6        24 minutes ago      /bin/sh -c #(nop) MAINTAINER Josh Hawn <josh.   0 B                 
    8c2e06607696        8 weeks ago         /bin/sh -c #(nop) CMD ["/bin/sh"]               0 B                 
    6ce2e90b0bc7        8 weeks ago         /bin/sh -c #(nop) ADD file:8cf517d90fe79547c4   2.43 MB             
    cf2616975b4a        8 weeks ago         /bin/sh -c #(nop) MAINTAINER Jérôme Petazzo     0 B
    

    See that it has 6 layers. That's 3 more than were already in the base image. We can squash these 3 into an image with one additional layer relative to the base image, and re-tag this new image.

    $ docker squash --tag test_build test_build busybox
    

    Now test_build only has a single new layer on top of the original 3 layers from busybox:

    $ docker history test_build
    IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
    97561533b0fa        2 minutes ago                                                       8.537 MB            
    8c2e06607696        8 weeks ago         /bin/sh -c #(nop) CMD ["/bin/sh"]               0 B                 
    6ce2e90b0bc7        8 weeks ago         /bin/sh -c #(nop) ADD file:8cf517d90fe79547c4   2.43 MB             
    cf2616975b4a        8 weeks ago         /bin/sh -c #(nop) MAINTAINER Jérôme Petazzo     0 B
    

@thaJeztah
Copy link
Member

Thanks @jlhawn! before we proceed, I'm interested to hear if @crosbymichael agrees with this implementation, based on #332 (comment)

For relieving the immediate pain, we also suggest updating the documentation on how to manually squash layers if people really need it, with external tools. @crosbymichael volunteered on writing such a tool.

ping @crosbymichael interested in your opinion, to prevent giving people "false hope"

@jlhawn
Copy link
Contributor Author

jlhawn commented Jun 14, 2015

@thaJeztah I'm happy to wait to see what the rest of the team thinks, but I'll go ahead and implement it anyway for the following reasons:

  • I think there's way more than just the need to do 'absolute' squashing (which can be done manually today with docker export but is kind of a hack). With 'relative' layer squashing you still get all of the benefits of shared rootfs layers.
  • I think the functionality can be added to Docker much more elegantly than any external tool because it's already really simple to do with the graphdriver interface.
  • I'm motivated to do it now so I'd rather not wait and potentially be discouraged.
  • It's fun.

@jsm
Copy link

jsm commented Jun 14, 2015

@heph

@thaJeztah
Copy link
Member

I'm motivated to do it now so I'd rather not wait and potentially be discouraged. It's fun.

👍 don't let me stop you!

@jlhawn jlhawn force-pushed the layer_squash branch 3 times, most recently from 44b9636 to 75aec15 Compare June 15, 2015 23:13
@jlhawn jlhawn changed the title [WIP] Image Layer squash command Image Layer squash command Jun 15, 2015
@jlhawn jlhawn changed the title Image Layer squash command Image Layer squash command: docker squash Jun 15, 2015
@jlhawn jlhawn changed the title Image Layer squash command: docker squash docker squash: Consolidate image layers Jun 15, 2015
@jlhawn jlhawn added kind/proposal area/cli Client area/distribution Image Distribution kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny labels Jun 15, 2015
@jlhawn
Copy link
Contributor Author

jlhawn commented Jun 16, 2015

I've added integration tests and docs, so this is no longer a work-in-progress.

@vincentwoo @tianon @cpuguy83 @SvenDowideit @thaJeztah @yosifkit @rhatdan please take a look and/or try it out and let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why have a deprecated option on a new cmd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, is that what the #notrunc means? I actually just copied this section from the implementation of the docker history command. 😞

...

Just got back from reading the comments/docs on pkg/mflag. It all makes sense now 😵

@duglin
Copy link
Contributor

duglin commented Jun 16, 2015

Oops - sorry -jumped the gun on the design review. Wanna squash (LOL) your commits? :-)

@vincentwoo
Copy link
Contributor

@jlhawn thanks for going the distance on this one. I'm glad I pinged you.

As an end user who would start using this immediately, I'd also want to know:

  • Can we add an easy -t option for tagging? There's no situation in which I'd use this and not immediately tag the result.
  • Does this handle whiteouts efficiently? A lot of the charm is I can unroll all my long RUN commands with && into individual RUNs, but only if the layers that get deleted don't actually end up in the final image.
  • Does it handle all the metadata merging, too? ENV, USER, etc?

@jlhawn
Copy link
Contributor Author

jlhawn commented Jun 17, 2015

  • Can we add an easy -t option for tagging? There's no situation in which I'd use this and not immediately tag the result.

Yeah, I understand that can be pretty common. I should have a little extra time to add this.

  • Does this handle whiteouts efficiently? A lot of the charm is I can unroll all my long RUN commands with && into individual RUNs, but only if the layers that get deleted don't actually end up in the final image.

whiteouts - now there's an implementation detail that shouldn't be exposed!

The efficiency of squashing layers depends solely upon how many files/directories are in the filesystem. It uses the same method of generating a layer diff as all storage drivers do (aside from AUFS) with their parent layer: Mount both the image rootfs and its ancestor and perform a filepath walk of both, detecting all additions, changes, and deletions. The more stuff that there is in either filesystem then the longer time squashing will take. It's really no different than docker commit in that respect.

  • Does it handle all the metadata merging, too? ENV, USER, etc?

Yes, the resulting image will have the same runtime config as the original image.

@vincentwoo
Copy link
Contributor

Ah, what I meant by "efficiently" was just that "removed nodes do not end up in the squashed final image", and less "does the squash go fast". I take it from your answer that they do not, which is dope!

@jlhawn
Copy link
Contributor Author

jlhawn commented Jun 17, 2015

Oh, sorry. Yeah, they will not end up in the resulting image. Keep in mind, though, that squashing does require additional disk space (on the machine that builds and squashes the layers) as it effectively copies everything that was in the final image. So you will need to delete the unused original layers to reclaim that disk space.

@vincentwoo
Copy link
Contributor

Sounds great. There are quite a few things that I think will make this CLI tool better, but they're all pointless unless this gets merged. +1 to getting this in!

@icecrime icecrime removed area/cli Client area/distribution Image Distribution kind/feature Functionality or other elements that the project doesn't currently have. Features are new and shiny kind/proposal labels Jun 17, 2015
@rhatdan
Copy link
Contributor

rhatdan commented Jun 17, 2015

I also want to get this in, We have a home grown version of this running for our internal builds because of the amount of layers required to build some JBOSS applications. Getting a standard way is a big plus.

Josh Hawn added 5 commits June 18, 2015 09:11
graph.Squash consolidates all layers in between the given image and an
ancestor image into a single layer and returns the newly created image. A nil
ancestor argument indicates that the layer should be squashed all the way to
a single new base image.

tagStore.Squash first performs a lookup of a given image name or ID and the
ancestor name or ID before calling graph.Squash. This allows you to use
image tags rather than IDs to squash image layers.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
A new API endpoint in needed to remotely request that a new image be created
by consolidating the layers of one image or the layers between an image and
an ancestor image. Image IDs or names/tags are accepted.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
This new CLI command allows users to merge the layers of an image into a
new image, either all images or up to an ancestor layer.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Tests absolute and relativ squashing and well as squash conditions which
should result in non-ops returning the same image.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Updates API v20 markdown as well as adds manpage markdown for
`docker squash` and adds `squash` to the cli reference markdown.

Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
@jlhawn
Copy link
Contributor Author

jlhawn commented Jun 18, 2015

@vincentwoo @duglin

  • Can we add an easy -t option for tagging? There's no situation in which I'd use this and not immediately tag the result.

Done 😃

@vincentwoo
Copy link
Contributor

@jlhawn dockercon's over what's it gonna take to get this bad boy merged

@jessfraz jessfraz added area/distribution Image Distribution and removed Distribution labels Jul 10, 2015
@vincentwoo
Copy link
Contributor

@jlhawn can you give us an update on the status of this PR? Is it still awaiting review?

@tiborvass tiborvass added the status/needs-attention Calls for a collective discussion during a review session label Jul 31, 2015
@tiborvass
Copy link
Contributor

@jlhawn I think this will have to be reevaluated/rethought once the client-side builder is in place. WDYT? Would you mind closing it?

@tiborvass tiborvass removed the status/needs-attention Calls for a collective discussion during a review session label Aug 6, 2015
@icecrime
Copy link
Contributor

icecrime commented Aug 7, 2015

  1. We want to move the builder out first
  2. Layers are an implementation detail that should not leak in the UX

Sorry, closing this.

@icecrime icecrime closed this Aug 7, 2015
@vincentwoo
Copy link
Contributor

Does that mean you're nixing stuff like docker history et al?

@jlhawn jlhawn deleted the layer_squash branch November 22, 2016 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.