Skip to content

Proposal: Add a NOCACHE instruction to Dockerfiles#10682

Closed
duglin wants to merge 1 commit intomoby:masterfrom
duglin:1996-NoCache
Closed

Proposal: Add a NOCACHE instruction to Dockerfiles#10682
duglin wants to merge 1 commit intomoby:masterfrom
duglin:1996-NoCache

Conversation

@duglin
Copy link
Contributor

@duglin duglin commented Feb 10, 2015

This adds a NOCACHE instruction to Dockerfiles which will disable all
caching from that point forward. The cache is still populated, but the
look-up processing is disbled.

Closes #1996

Signed-off-by: Doug Davis dug@us.ibm.com

@curtiszimmerman
Copy link

I'll definitely use it. Thank you.

@tianon
Copy link
Member

tianon commented Feb 10, 2015

I'm still very strongly -1 on this (#1996 (comment)):

I have yet to see a good example of a Dockerfile that couldn't be rewritten to cache-bust appropriately and naturally when the underlying resource needed to be updated.

Having a flag on "docker build" to stop the cache in a particular place would be much more flexible, IMO (and put the control of the cache back into the hands of the system operator who gets to manage that cache anyhow).

@duglin
Copy link
Contributor Author

duglin commented Feb 10, 2015

@tianon how does someone automatically bust the cache at a certain spot in the Dockerfile w/o touching the Dockerfile each time?

@tianon
Copy link
Member

tianon commented Feb 10, 2015

Why would you need to do that, and why wouldn't that be covered more flexibly by docker build --bust-cache=.*apt-get.*?

For example, https://github.com/docker/docker/blob/f4dc496d36d31cf9ca1b3508f10954066ff7f8bc/Dockerfile#L51 and https://github.com/docker/docker/blob/f4dc496d36d31cf9ca1b3508f10954066ff7f8bc/Dockerfile#L119 and https://github.com/docker/docker/blob/f4dc496d36d31cf9ca1b3508f10954066ff7f8bc/Dockerfile#L127. Using these techniques instead ensures that everyone either gets a Dockerfile that won't build, or an image that is functionally equivalent -- there's not much wiggle room in between. We also get a nice, natural cache-bust if those underlying resources need to change because the Dockerfile itself needs to change, which IMO makes good, reasonable sense.

@duglin
Copy link
Contributor Author

duglin commented Feb 10, 2015

@tianon reading thru the issue its clear to me that there are a number of people who want the ability to turn off caching at a certain point in the Dockerfile processing every time. Asking them to modify their Dockerfile each time, while may be possible, isn't very user friendly.

As to your docker build --bust-cache=.*apt-get.* example, there are a couple of problems with this:
1 - its not very user friendly. It would require all instances of "docker build" to be modified if the RUN command were modified and a new regex was needed. Or, just if the author of the Dockerfile wants to move the 'bust cache' line to some place else - they would need a mechanism in place to automatically notify all users of their Dockerfile - that can be quite hard.
2 - it doesn't work for the DockerHub + github integration model. As I understand it, although I haven't used it myself, the hub just does a docker build on your repo/Dockerfile. There's no mechanism in place to pass in additional flags. But, even if there were, you now require additional information to be shared with all users of this Dockerfile to ensure everyone busts their cache at the right spot.

Unless I'm missing something, the links you included don't show how to bust the cache each time w/o modifying the Dockerfile, and that's what I'm interested in seeing because if that's possible then this PR isn't needed (I think).

I'd like to understand why you think its so bad to have something within the Dockerfile say "stop caching every time right here"? If that's what people need, as expressed by the original issue, I'm having a hard time understanding why we wouldn't offer a nice/easy solution. Its not like it breaks some fundamental Docker philosophy does it?

Now, if we want to discuss why this feature is needed at all, then I think that would be good. While I implemented it to help scratch an itch that people had, I have to be honest and say that I still don't fully understand when people need it. But I'm willing to accept that not everyone has the same needs that I do :-)

@thaJeztah
Copy link
Member

Another creative way to bust the cache would be;

COPY cache-buster cache-buster
RUN do-some-stuff

to bust the cache run

touch cache-buster && docker build ....

Not sure about the hub, does it use caching at all? Especially when it does a checkout of the GitHub repo, the files would always be newer, thus not cached.

@tianon
Copy link
Member

tianon commented Feb 10, 2015 via email

@duglin
Copy link
Contributor Author

duglin commented Feb 10, 2015

@thaJeztah the issue I can see with that is that is still requires work by the person kicking off the build. I think any solution that requires the person kicking off the build to have "extra" knowledge is probably not the right solution. As I understand it, the requirement is that the Dockerfile author wants to have control (bust it here!).

@thaJeztah
Copy link
Member

@duglin well, the "cache-buster" enables the user to decide if the cache has to be busted.

If I want the cache to be always busted from a specific point onwards, my usual approach is just;

FROM my-base-image-that-contains-things-that-DONT-change-often

And then docker build --no-cache. Now that we have docker build -f, Dockerfiles for the "base" and "extends" image can even reside in the same location, so it's easy to maintain as well.

Please, don't see my comments as negative, I appreciate your attempt to solve this, I just think most cases can already be solved with the existing options, and those options are not even that hacky.

@duglin
Copy link
Contributor Author

duglin commented Feb 10, 2015

@thaJeztah your first approach puts control in the hands of the user, my understanding is that the requirement is that its in control of the Dockerfile author. As for the "create a non-changing base image" approach... if that works for people I'll close the issue. But, it feels like a hack and makes people jump through a lot of hoops when a simple NOCACHE seems to do the trick.
But I'd like to hear from the people who claim they need this.

@yosifkit
Copy link
Contributor

I definitely agree with @tianon. While you should design your file with the ability to run --no-cache while maintaining consistency, that should not be thrust upon you from the Dockerfile. I would be far more interested in a docker build flag that stops the cache on a matching line like @tianon suggested (as well as shykes and crosbymichael in previous discussions). It seems the only reason we do not have the --break-cache flag is that the PR was abandoned (#4322).

I still don't understand why you would have users of the Dockerfile needing a forced cache-bust. Shouldn't they either a) be using the image with docker pull or b) be understanding the Dockerfile (ie. know how to break the cache if they need it).

I would love a concrete example where this would actually be more useful than the flag.

@jessfraz jessfraz changed the title Add a NOCACHE instruction to Dockerfiles Proposal: Add a NOCACHE instruction to Dockerfiles Feb 10, 2015
@curtiszimmerman
Copy link

Your "know how to break the cache if they need it" is where the ugly kludge comes in of writing in some arbitrary data into the Dockerfile that gets changed when the cache needs to be abandoned. A flag is far more user-friendly. Regex in the build command is a headache waiting to happen, imo. What's the harm in a Dockerfile flag, seriously?

@duglin
Copy link
Contributor Author

duglin commented Feb 10, 2015

I think it would really help (at least for me) if someone could enumerate some of the usecases that require the need to invalidate the cache 1/2 thru a Dockefile every single time it is processed.

@curtiszimmerman
Copy link

Git repo updates. Having to write a commit hash into a Dockerfile which changes and therefore triggers an actual "git clone" is an awful practice.

@curtiszimmerman
Copy link

There are at least a couple of use cases around in the various requests for this feature. At least I know you've seen a couple suggested in one of the issues we've talked it over in.

@cpuguy83
Copy link
Member

@curtiszimmerman I think that's a perfectly acceptable/reasonable thing to do.
You could also put your commit hash into a file that gets added in, so you just have to update the file with the new hash, and not the Dockerfile itself.

I agree with @tianon that embedding the control of the caching into the Dockerfile itself feels wrong.

@duglin
Copy link
Contributor Author

duglin commented Feb 11, 2015

If we want control to be in the hands of the builder then it seems like some of the solutions mentioned would work just fine - even saying "if you don't like what the Dockerfile defines then modify it". This PR is more about letting the Dockerfile author set an initial baseline for what's supposed to happen w/o requiring intervention from the builder.

I'd really like to understand why it feels wrong to specify "don't use the cache" from within the Dockerfile if the authors know they'll want it run/updated/etc... every time. People are clearly asking for this so why make it harder on them to do what they want?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to have an example why someone would choose to do this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@curtiszimmerman
Copy link

@cpuguy83 I really disagree. Dockerfiles are state transformation vehicles. I'm operating on an image, permuting through states until I get to an end state I am happy with. As an image builder, I want maximum control over this process. But what you're suggesting is that it's reasonable to force people to use a third-party state vehicle (e.g. sed -ri "s/%HASH%.*?%XHASH%/%HASH%\necho $(git rev-parse HEAD)\n%XHASH%/" ./Dockerfile && docker build . -- and who knows if that's right, it's just awful to think about and look at) just to convey a desired state back into the first. Why can't we just skip the scenic route? This seems like really a lot of trouble just to get docker to pull new code.

@duglin
Copy link
Contributor Author

duglin commented Feb 11, 2015

Been thinking more about this, and it seems to me that there might be two different requirements at play here:

  1. Dockerfile authors want control over when to turn off caching while processing a Dockerfile
  2. User/builder wants control over when caching is turned off while processing a Dockerfile

While these both end up tweaking the "UtilizeCache" flag during the build process, they are quite a bit different w.r.t. who is being given control. And unfortunately I don't think there's a solution that can satisfy both requirements. After all, they are different users working with a different set of tools (vi/Dockerfile vs docker build) and working at different points in the workflow.

With that in mind, I think we may need two different solutions:

  1. This PR (I think) satisfies the first requirement. So, perhaps we first need to decide if people accept the requirement. IOW, do folks like @tianon @cpuguy83 even accept that Dockerfile authors (not builders) need this level of control? If we can't get agreement on this then we might as well close this PR. But, personally, given the number of people asking for this (and the painful hacks that have been mentioned) I think we'd be doing a disservice to them if we rejected it. But in the end, my goal here is to resolve the issue one way or another (gotta get that issue count down! :-) )

  2. Provide a flag on the docker build command to say "turn off caching at this spot in the Dockerfile". This one is might have two flavors to it:
    2a) Builder can modify Dockerfile. Whether its to add some change to force the cache to break, or whether its to add some "label" that people can then reference (e.g. docker build --bust=label1) is still TBD, but the high-order bit here to me is that they can modify the Dockerfile.
    2b) Builder can not modify the Dockerfile. I'm thinking of cases where the Dockerfile (and the build context) is pulled in via a remote URL/git-repo and so asking them to modify the Dockerfile might be too much to ask. In those cases I think a solution like @tianon's regex suggestion is a good fit here.
    However, I think this solution is different enough that it should be done under a different PR. I'm willing to work on that PR if there's interest but I do think it should be done separately.

Sorry for the rambling, but I think the net here is:

  • does Docker accept the first requirement? (author getting control)
  • do people need the 2nd PR? (a mechanism in which the builder has control)

@tianon
Copy link
Member

tianon commented Feb 11, 2015

I'm definitely still personally -1 on that first one.

I'm indifferent on the second -- in my experience, combining "rmi" and the rare "--no-cache" solves all my cache problems, as @thaJeztah summarized so nicely previously in this thread (and it's definitely even easier to solve this way now that we have "docker build -f").

IMO, the cache isn't a feature a Dockerfile author should be counting on -- it's a convenience provided to speed up subsequent builds.

https://en.wikipedia.org/wiki/Cache_%28computing%29

In computing, a cache is a component that transparently stores data so that future requests for that data can be served faster.

@duglin
Copy link
Contributor Author

duglin commented Feb 11, 2015

@tianon I believe the first one is exactly related to your comment about not counting on the cache. I think people are talking about the case where the Dockerfile author knows that using a cache would result in the wrong output so relying on the builder to use --no-cache (or something else to bust it) is exactly what NOCACHE is trying to address. So, it addresses the exact opposite problem - instead of trying to use (or rely on) a cache people are saying "we don't ever want to use the cache" and want that baked into their instructions for Docker build.

Also, since you're still -1 on the first one, I have to ask you to elaborate on why you think its an invalid usecase/requirement.

@cpuguy83
Copy link
Member

Yes @tianon, that's my thought as well, just more eloquently put... the cache is an implementation detail.
It's unfortunate (and IMHO a bug in the caching mechanism) that the cache leaks through in how we design Dockerfiles today (e.g. RUN apt-get update && apt-get install -y <stuff> && <do some stuff> && apt-get remove --purge <stuff I don't need in the final image>)

@dreamcat4
Copy link

Hello! Sorry I could not find a more exact issue, but it can solve some of the problems related to this issue:

I believe strongly now that we have the new --opts parsing in place, a thing we can do is slightly different:

What would help is a generic global --no-commit flag that can be valid and applied to any Dockerfile directive, not just RUN. Why?

Well that is the simplest basic way to implement build-time secrets support. Because then we could ADD --no-commit <secrets_file> /tmp during the build time. And the secret would not be saved / remembered into a next layer. Instead, a subsequent RUN command straight after could use, then delete (discard) the secret file before it has the chance to be commited into a layer.

BUT ALSO: being a gerneral flag it is actually can be used for other purposes too. For example when ADDing extra build-time files e.g. to build something then the temporary ADDed stuff CAN deleted before next layer commit to keep image side down and stop it from bloating into some unnecessary multi-GB images etc. So in effect we would be are killing 2 birds with 1 stone there.

I would also like to add that this is a much requested feature, so if anyone else can suggest a better and simpler solution than --no-commit as described above ^^ then please say so.

@duglin
Copy link
Contributor Author

duglin commented May 17, 2015

I assume a --no-commit would disable the cache from that point on, right?
And using it on the last command in a Dockerfile would basically be like not having run the command at all?

@dreamcat4
Copy link

@duglin Nope. It would only apply to that specific instruction. Committing would automatically re-enable upon the next instruction. Get it? Because that way, you do not need to write 3 commands to do what can be achieved with only 1 command.

For what you want that would need a more full (and coherent) feature set and look something like a total of 4 related flags:

# One-time command flags that apply on current Docker <CMD> only
<CMD> --no-commit <args>
<CMD> --commit <args>

# The toggle flags
RUN --commit-off true
<CMDS...>
RUN --commit-on true

But I'm not suggesting to actually implement those additional extra 3 flags to fill out the full feature set. But rather just leave them in reserve. Because that would be extraneous for and unnecessary to solving those 2 immediate problems I previously highlighted. Just --no-commit flag will do the job for those use cases.

Heck I really need that --no-commit flag. Simple build time secrets are not possible without it. My justification (of secrets) being that:

Running a special local secrets server is way too much hassle and totally unjustified for just 1 api key. And not actually feasible when building images on dockerhub. Whilst the other main option of a new docker secrets api would also be much more complex solution to secrets. And at the moment such a new feature seems to be a big reach to be counting upon. Never mind the fact a fully-blow secrets API would take ages to implement / test / etc.

@dreamcat4
Copy link

@duglin Of course! If we do not wish for so many individual flags to maintain, then this exact same (the full feature set) can easily be implemented in just 1 flag. Setting it to one of four possible / recognised values. It would be:

<CMD> --cache=<on|off|true|false>

Again, where --cache=on and --cache=off are the toggle flags. While --cache=true and --cache=false are the transient 1-time-only flags. I think that's the most expressive / powerful syntax I can ever come up with. What do you think?

@ronsmits
Copy link
Contributor

ronsmits commented Sep 3, 2015

Ok I am lost, Is this going to be there or not?

My use case:

  • jenkins builds a war, deploys it to a repository
  • jenkins kicks of a docker build where the war is retrieved from the repository
  • jenkins restarts the container.

Now I run it with --no-cache to make sure the new war is retrieved. If I could do

NOCACHE
RUN wget .....

I would be certains it is always done right.

@thaJeztah
Copy link
Member

Is this going to be there or not?

No, this won't be implemented, see: #10682 (comment).

However, we're finalizing a PR to allow build-time parameters (see #15182), which would allow you to do, for example:

# Define "BUILDNUMBER" arg, and (optionally) set a default
ARG BUILDNUMBER=1234
RUN wget https://foobar.example.com/builds/app-1.0.0.$BUILDNUMBER.war

And build it with;

docker build --build-arg BUILDNUMBER=319485 --tag myapp:build-319485 .

Just an example, but demonstrates the possibilities.

@ganey
Copy link

ganey commented Nov 24, 2015

So, just to clarify, if I'm using a service such as Google Managed VM's, the only way for me to get non cached content from git is to alter my Dockerfile each time?

https://cloud.google.com/sdk/gcloud/reference/preview/app/deploy

These are built based on a supplied Dockerfile, so due to the deployment mechanisms for the Google services, I am not aware of any way to pass any build time parameters to the docker build process, so cannot use --no-cache or the suggested --build-arg or anything like that.

A flag in the Dockerfile that works the same as the --no-cache would allow for dynamic deployment / scaling of the latest code, without any manipulation of the Dockerfile.

@thaJeztah
Copy link
Member

@ganey that sounds like something to ask Google. For example, automated builds on Docker Hub default to --no-cache, possibly something they could consider.

@ganey
Copy link

ganey commented Nov 24, 2015

@thaJeztah okay, thanks. Hopefully they'll consider it!

@toots
Copy link

toots commented Dec 2, 2015

Well I'm late to the party but honestly, this is the kind of situation that I hate to see in open-source projects. Y'all might not see a "real world" use for it or have some kind of weird, intricate ways to do it already but unless you have a real technical opposition to a feature -- for instance if the codebase makes it hard or hacky to implement, then you should listen to your users..

Anyway, here's my "real world" use case: we want to have the youtube-dl binary as up to date as possible in our images. We could regularly update a build number and rebuild but that's really annoying. Instead, it's be nice to be able to do:

NOCACHE

RUN curl https://yt-dl.org/latest/youtube-dl -o /usr/local/bin/youtube-dl

RUN chmod a+rx /usr/local/bin/youtube-dl

And, yeah, we might put this at the very end of the Dockerfile so as to benefit from the cache where we can as well.

Functionally, the binary is always the same but it is updated regularly to adapt to the latest youtube/vimeo layout. There is no reason to force upon us a change in our code.

Also, having an explicit instruction in the Dockerfile makes it explicit what we are doing, instead of relying on implicit techniques that a dev unfamiliar with the technology might overlook or not understand.

@thaJeztah
Copy link
Member

@toots also check my earlier example using build-time args, which would allow you to break the cache "on demand" by specifying a value for the arg (e.g. current time, current date); #10682 (comment)

@macmacbr
Copy link

macmacbr commented Jan 5, 2016

I really like the NOCACHE or even an ALWAYSRUN Dockerfile command for those situations.
We could celebrate the anniversary of this ticket with a new version of docker.
Cheers.

@paul-callahan
Copy link

as others have mentioned, commands like RUN git pull seem to be always cached when they should never be.

My simple workaround is:
incorrect

edit:

ADD http://www.timeapi.org/utc/now /tmp/bustcache
RUN git pull

@duglin
Copy link
Contributor Author

duglin commented Feb 2, 2016

@paul-callahan I'm pretty sure that doesn't work since $(date) is evaluated after the cache check is done - and by the shell, not docker.

@cpuguy83
Copy link
Member

cpuguy83 commented Feb 2, 2016

@paul-callahan Right, RUN is always cached. The command itself is not inspected to determine what it should do.

It might be worthwhile to use --build-arg and then specify the commit/tag that you want to actually pull via the arg.
This will cause a cache bust if the specified --build-arg changes.

@paul-callahan
Copy link

@duglin actually, you're right, that doesn't work.
This seems to, though. Obviously creates a new layer, though. Which I'll take over having to use --no-cache.

ADD http://www.timeapi.org/utc/now /tmp/bustcache
RUN git pull

@cpuguy83 if --build-arg variable value changing causes cache invalidation for RUN, then there's an angle for generalizing the hack using the bash $SECONDS var which changes every second.

docker build --build-arg CACHE_BUSTER=$SECONDS -t myimg .

@jakirkham
Copy link

Yeah, I'm pretty sad to see this closed. It's clear to me that there are tons of use cases where someone grabs something from the internet in a Dockerfile and that resource is mutable. The notion that we should simply let these mutable things be cached is simply backwards IMHO. @duglin proposed a simple and clear way around this problem that avoids trying to interpret in advance which of these things might be cached by giving power to the end user. This still seems like the right approach even a year later. The traffic here should indicate that is the case.

@lillem4n
Copy link

I'm also very sad to see this closed. In our use case we are pulling from a git repository very late in a larger build, and using a --no-cache when building the image have two problems:

  1. It slows down considerably since everything BEFORE our git pull is also not cached
  2. It might be forgotten by the devops when publishing a new version in a semi-manual build

@LouisKottmann
Copy link

LouisKottmann commented Oct 21, 2016

What I do is templatize my Dockerfiles (via bash heredocs) and I use a construct like:

cat <<EOF
ENV BUILT_AT=$(date "+%Y-%m-%d")
RUN DEBIAN_FRONTEND=noninteractive \
    apt-get update \
    && apt-get upgrade -y \
    && apt-get autoremove \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* \
              /tmp/* \
              /var/tmp/*

EOF

Which basically busts the cache daily and can be inserted at any point of the Dockerfile. Tweak that BUILT_AT variable as desired.

@thaJeztah
Copy link
Member

thaJeztah commented Oct 21, 2016

@LouisKottmann @lillem4n no need to construct the Dockerfile, you can simply pass it as a build-arg. However, for the Git repo, I'd suggest to use the actual commit over using a date. Using the commit would make it reproducible. Also see #10682 (comment)

However if you're having a big part that does not change, and only the last part of the Dockerfile should be always rebuilt; I'd consider using a base image for the infrequent changing parts.

@csymeonides-mf
Copy link

When everyone has to resort to a hacky solution to make something happen, it's a powerful argument for a new feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New feature request: Selectively disable caching for specific RUN commands in Dockerfile