Added recursive .dockerignore capabilities by ObsessiveOrange · Pull Request #21020 · moby/moby

ObsessiveOrange · 2016-03-08T06:13:01Z

Added new functionality: .dockerignore files can now be placed at any directory, and relative paths will be based on the directory the .dockerignore file is in.

Closes Proposal: recursive .dockerignore files #20944
builder/dockerignore/dockerignore.go changed to allow for reading of entire recursive directory tree
Tests added to verify functionality of recursive .dockerignore files, including * matching and exceptions.

Closes #20944 >builder/dockerignore/dockerignore.go now contains one extra method to read all files recursively. The old method is still in use for places where a single-directory read is still required >Tests have been updated, and a few new ones added to make sure the recursive search is fully function, including exclusions. >Documentation has been updated to reflect the new ability to recursively define .dockerignore files. Signed-off-by: Benedict Wong <bennydictwong@gmail.com>

ObsessiveOrange · 2016-03-08T06:33:48Z

Seeing as this is my first time working on a non-trivial library (and also my first time contributing Go code), I would really appreciate any feedback, specifically about design, code quality, testing and documentation. I'm just another student trying to learn from the pros.

Thanks! :)

thaJeztah · 2016-03-08T11:26:29Z

Seeing as this is my first time working on a non-trivial library (and also my first time contributing Go code), I would really appreciate any feedback, specifically about design, code quality, testing and documentation. I'm just another student trying to learn from the pros.

No worries, thanks so much for giving it a go 😄

thaJeztah · 2016-03-08T11:33:51Z

builder/dockerignore/dockerignore.go

+// reads all .dockerignore files in subdirectories and returns the list of file
+// patterns to ignore. Note this will trim whitespace from each line as well
+// as use GO's "clean" func to get the shortest/cleanest path for each.
+func ReadAllRecursive(baseDir string, directory string) ([]string, error) {


Without looking at the code (I just looked at the description), will this always do a deep scan for .dockerignore files? Thinking of this situation, where a deep scan is not needed;

cat ./.dockerignore *

In this case, the .dockerignore at the root of the build-context excludes all files, so no scanning for other .dockerignore files should be needed.

So the ReadAllRecursive method first reads the .dockerignore file at the current level, and then gets ready to recurse into subdirectories to do the deep scan. Prior to actually recursing, however, it does a check to make sure that the directory does not match any of the already-found exclusions. (This includes from previous levels).

So, in the case that you mentioned, it would stop at the root, since every subsequent subdirectory is excluded.

I hope that answers your question?

Thanks, yes it does! Sorry for the noise, I just glanced over the changes 👍

thaJeztah · 2016-03-08T14:59:49Z

let me ping @duglin, who's done a lot of work in this area, so may be a good person do an initial review

duglin · 2016-03-08T22:46:25Z

Will try to look it over deeper over the next day or so but my initial take on this is that I'm concerned about the performance of it. We already know that dockerignore processing is too slow, which is why #20128 was created and I still need to review it. I believe this will add another traversal over the entire build context - which can't be good for our timing.

I would prefer if we looked to merge the dockerignore processing into the tar'ing process so that we only walk the tree once. @vbatts would this totally break things because it introduces docker specific stuff into that process? I kind of think we've crossed that bridge already but may not have admitted it.

thaJeztah · 2016-03-09T11:48:43Z

windowsTP4 timed out after 2 hours, restarting https://jenkins.dockerproject.org/job/Docker-PRs-WoW-TP4/2320/console

tonistiigi · 2016-03-09T18:02:59Z

builder/dockerignore/dockerignore.go

+	} else {
+		// If reader is nil, it means that something odd happened:
+		// lack of permissions, etc.
+		if reader == nil {


You don't need to check that. But probably makes sense to return error if err != nil && !os.IsNotExist(err)

The original reason that it was checked was to make sure the behavior was maintained. I found that removing it would make some of the build tests fail, so i left it in. Should I be looking at replacing/removing it?

ObsessiveOrange · 2016-03-10T05:41:44Z

I just took a look at the TarWithOptions, and had a few thoughts.

Firstly, I took a look at #20128, and noticed that pwaller mentioned most of the time was spent creating the regular expressions from the patterns. If these were to be cached (perhaps in a map?), would it speed up the process? (At the cost of extra space. However, even with 100 rules, each at a filepath length of 100 4-byte utf-8 characters, this map would only take up 80kb).

Secondly, are we allowing a subdirectory's .dockerignore file state that a file in it's parent folder should be ignored?

So, if our directory structure is:

root
>subdir1
>>.dockerignore
>file1

Should root/subdir1/.dockerignore be able to list a rule ../file1?

If so, wouldn't that require two traversals, the first to parse all the rules, then the second to actually create the archive?

@thaJeztah, @duglin I'm also taking a look at pre-generating all the regular expressions, so that in cases where there are massive numbers of files, they aren't being regenerated hundreds or thousands of times over. That might be able to improve the performance. I think i have a replacement for the clean/build/match cycle that fileutils goes through, and am running tests. I'm looking to have that done before the end of the weekend.

thaJeztah · 2016-03-10T16:04:47Z

I don't think ../file1 should be supported. We should try to match .gitignore, which treats the location of the .gitignore file itself as the "root" of the exclusion, i.e. https://git-scm.com/docs/gitignore.

Supporting ../...... also doesn't make sense if a .dockerignore is put at the root of the build-context, because Docker doesn't use files outside of the build-context (parent directories)

ObsessiveOrange · 2016-03-14T18:32:27Z

I just pushed a new commit which should address @duglin's performance concerns. Based on pwaller's (why can't i mention/ping him? :s) comments in #20128, i have added a new package, implementing a form of pre-compiled regular expressions. Unlike in #20128, the gobwas/glob library is not used, as it appears that it would require some adapting from the current regular expression set that is currently supported to those that glob supports.

Based on this, a back-of-the envelope estimate that I got by running all DockerSuite.TestBuild* tests are as follows:

Docker master branch: ~ 917.926s
Previous commit (Added recursive .dockerignore capabilities): ~900.838s
Current commit (Added Pre-compiled Regular Expressions): ~889.742s

As always, comments would be much appreciated! :)

Concerns were previously raised about the impact of recursive .dockerignore files on performance, since it would require another context walk. Pwaller noted that after profiling, most of the time is due to regular expressions having to be constantly regenerated by fileutils.Match. This patch attempts to address the performance concerns, creating precompiled regular expressions, and reusing them as needed. Signed-off-by: Benedict Wong <bennydictwong@gmail.com>

ObsessiveOrange · 2016-03-18T14:59:07Z

ping @thaJeztah @duglin

Just wanted to check back in on this PR. If there's any feedback, or anything that needs to be changed, I'd love to know.

pwaller · 2016-03-18T16:40:21Z

Hi @ObsessiveOrange you can ping me, but I'm not a docker contributor so I don't show up in auto-complete (unless I participate in this PR). Feel free to ping me.

The PR looks good to me, although pkg/precompiledregexp looks a bit weird as a name. Only a minor comment.

I'll try and find a moment to take performance measurements with/without your changes from my side. No promises though.

ObsessiveOrange · 2016-03-19T00:03:21Z

Hi @pwaller! Thanks for your comments! The whole non auto-complete thing makes a lot more sense now.

If you could get some performance data, that'd be great! I don't actually have anything to test it on, except building the standard docker container.

Cheers!

ObsessiveOrange · 2016-04-06T00:55:09Z

Hi @pwaller, any chance that you would have had time to take a look at the performance of these patches?

Also, @duglin, is there any chance that you would you have time to look through some of this code?

thaJeztah · 2016-04-14T19:15:01Z

@ObsessiveOrange we just discussed this PR in the maintainers review session and we are worried about breaking backward compatibility for a "nice to have". For example; we know many users have a directory structure containing multiple Dockerfiles and associated .dockerignore files (e.g. for different builds from the same source). Currently, those .dockerignore files are "ignored", but with this change, they will actually do something, and break existing setups.

We don't want to break backward compatibility for a "nice to have", so decided to not proceed with this PR.

We really appreciate you working on this, and taking the time to contribute, so, sorry for closing this PR, but I hope you understand 👍

ObsessiveOrange · 2016-04-17T16:20:25Z

@thaJeztah Thanks for keeping me informed! I completely understand your (and the other maintainer's) decision.

If i may take a little more of your time, I just have one last thought regarding a change that might not break backward compatibility, while still allowing for this new feature.

I'm thinking that we could add a comment at the top of each .dockerIgnore file that should be recursive, something like the #!/usr/bin/env bash that bash scripts have. Maybe something along the lines of: #!recursive-root: SampleName for the root ignore file, and #! recursive-child: SampleName for all children .dockerignore files. This would then be required to be added to any subsequent .dockerignore file for it to be parsed. The recursive-child annotation would allow us to tell which root it was meant for. In this way, we could even have multiple sets of recursive .dockerignore trees, a few .dockerignore files that are entirely non-recursive, all without conflicting with each other.

Root .dockerignore:

#! recursive-root: Sample
build/*
runner.sh

Child .dockerignore:

#! recursive-child: Sample
src/test/resources/*
bin/*
!bin/compiled.exe

Further, by making the .dockerignore files only recursive when the comments are present, older .dockerignore entries are not affected unless explicitly set.

If this would work, i'd love to work on it, but if not, I completely understand :)

thaJeztah · 2016-04-18T14:34:38Z

@ObsessiveOrange we briefly discussed other options, but adding such special "meta" information in the ignore files is adding too much complexity (both from coding, and from a UX perspective). For now, we decided to not add this as a feature.

ObsessiveOrange · 2016-04-18T14:52:26Z

Yep. sounds good! Thanks for letting me know! :) 👍

GordonTheTurtle added area/distribution Image Distribution status/0-triage labels Mar 8, 2016

thaJeztah added status/1-design-review and removed area/distribution Image Distribution status/0-triage labels Mar 8, 2016

thaJeztah reviewed Mar 8, 2016
View reviewed changes

GordonTheTurtle added dco/no Automatically set by a bot when one of the commits lacks proper signature and removed dco/no Automatically set by a bot when one of the commits lacks proper signature labels Mar 9, 2016

tonistiigi reviewed Mar 9, 2016
View reviewed changes

pwaller mentioned this pull request Mar 18, 2016

[WIP] New .dockerignore implementation #20128

Closed

thaJeztah assigned duglin Apr 14, 2016

duglin removed their assignment Apr 14, 2016

thaJeztah closed this Apr 14, 2016

thaJeztah mentioned this pull request Apr 14, 2016

Proposal: recursive .dockerignore files #20944

Closed

wcalderipe mentioned this pull request Oct 24, 2018

.dockerignore does not support recursive discovery wcalderipe/easy-toggles#1

Closed

keskiju mentioned this pull request Feb 6, 2019

.dockerignore files ignored during prod build TaitoUnited/full-stack-template#57

Open

thaJeztah mentioned this pull request Dec 19, 2019

[epic] builder: collected issues on improving .dockerignore #40319

Open

8 tasks

pxpeterxu mentioned this pull request May 4, 2022

fix(docker): exclude plugin-server node_modules in Dockerfile copy PostHog/posthog#9635

Closed

Conversation

ObsessiveOrange commented Mar 8, 2016

Uh oh!

ObsessiveOrange commented Mar 8, 2016

Uh oh!

thaJeztah commented Mar 8, 2016

Uh oh!

thaJeztah Mar 8, 2016

Choose a reason for hiding this comment

Uh oh!

ObsessiveOrange Mar 8, 2016

Choose a reason for hiding this comment

Uh oh!

thaJeztah Mar 8, 2016

Choose a reason for hiding this comment

Uh oh!

thaJeztah commented Mar 8, 2016

Uh oh!

duglin commented Mar 8, 2016

Uh oh!

thaJeztah commented Mar 9, 2016

Uh oh!

tonistiigi Mar 9, 2016

Choose a reason for hiding this comment

Uh oh!

ObsessiveOrange Mar 22, 2016

Choose a reason for hiding this comment

Uh oh!

ObsessiveOrange commented Mar 10, 2016

Uh oh!

thaJeztah commented Mar 10, 2016

Uh oh!

ObsessiveOrange commented Mar 14, 2016

Uh oh!

ObsessiveOrange commented Mar 18, 2016

Uh oh!

pwaller commented Mar 18, 2016

Uh oh!

ObsessiveOrange commented Mar 19, 2016

Uh oh!

ObsessiveOrange commented Apr 6, 2016

Uh oh!

thaJeztah commented Apr 14, 2016

Uh oh!

ObsessiveOrange commented Apr 17, 2016

Uh oh!

thaJeztah commented Apr 18, 2016

Uh oh!

ObsessiveOrange commented Apr 18, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants