Added recursive .dockerignore capabilities#21020
Added recursive .dockerignore capabilities#21020ObsessiveOrange wants to merge 2 commits intomoby:masterfrom ObsessiveOrange:20944-recursive-dockerignore-files
Conversation
Closes #20944 >builder/dockerignore/dockerignore.go now contains one extra method to read all files recursively. The old method is still in use for places where a single-directory read is still required >Tests have been updated, and a few new ones added to make sure the recursive search is fully function, including exclusions. >Documentation has been updated to reflect the new ability to recursively define .dockerignore files. Signed-off-by: Benedict Wong <bennydictwong@gmail.com>
|
Seeing as this is my first time working on a non-trivial library (and also my first time contributing Go code), I would really appreciate any feedback, specifically about design, code quality, testing and documentation. I'm just another student trying to learn from the pros. Thanks! :) |
No worries, thanks so much for giving it a go 😄 |
builder/dockerignore/dockerignore.go
Outdated
| // reads all .dockerignore files in subdirectories and returns the list of file | ||
| // patterns to ignore. Note this will trim whitespace from each line as well | ||
| // as use GO's "clean" func to get the shortest/cleanest path for each. | ||
| func ReadAllRecursive(baseDir string, directory string) ([]string, error) { |
There was a problem hiding this comment.
Without looking at the code (I just looked at the description), will this always do a deep scan for .dockerignore files? Thinking of this situation, where a deep scan is not needed;
cat ./.dockerignore
*
In this case, the .dockerignore at the root of the build-context excludes all files, so no scanning for other .dockerignore files should be needed.
There was a problem hiding this comment.
So the ReadAllRecursive method first reads the .dockerignore file at the current level, and then gets ready to recurse into subdirectories to do the deep scan. Prior to actually recursing, however, it does a check to make sure that the directory does not match any of the already-found exclusions. (This includes from previous levels).
So, in the case that you mentioned, it would stop at the root, since every subsequent subdirectory is excluded.
I hope that answers your question?
There was a problem hiding this comment.
Thanks, yes it does! Sorry for the noise, I just glanced over the changes 👍
|
let me ping @duglin, who's done a lot of work in this area, so may be a good person do an initial review |
|
Will try to look it over deeper over the next day or so but my initial take on this is that I'm concerned about the performance of it. We already know that dockerignore processing is too slow, which is why #20128 was created and I still need to review it. I believe this will add another traversal over the entire build context - which can't be good for our timing. I would prefer if we looked to merge the dockerignore processing into the tar'ing process so that we only walk the tree once. @vbatts would this totally break things because it introduces docker specific stuff into that process? I kind of think we've crossed that bridge already but may not have admitted it. |
|
windowsTP4 timed out after 2 hours, restarting https://jenkins.dockerproject.org/job/Docker-PRs-WoW-TP4/2320/console |
| } else { | ||
| // If reader is nil, it means that something odd happened: | ||
| // lack of permissions, etc. | ||
| if reader == nil { |
There was a problem hiding this comment.
You don't need to check that. But probably makes sense to return error if err != nil && !os.IsNotExist(err)
There was a problem hiding this comment.
The original reason that it was checked was to make sure the behavior was maintained. I found that removing it would make some of the build tests fail, so i left it in. Should I be looking at replacing/removing it?
|
I just took a look at the TarWithOptions, and had a few thoughts. Firstly, I took a look at #20128, and noticed that pwaller mentioned most of the time was spent creating the regular expressions from the patterns. If these were to be cached (perhaps in a map?), would it speed up the process? (At the cost of extra space. However, even with 100 rules, each at a filepath length of 100 4-byte utf-8 characters, this map would only take up 80kb). Secondly, are we allowing a subdirectory's So, if our directory structure is: Should If so, wouldn't that require two traversals, the first to parse all the rules, then the second to actually create the archive? @thaJeztah, @duglin I'm also taking a look at pre-generating all the regular expressions, so that in cases where there are massive numbers of files, they aren't being regenerated hundreds or thousands of times over. That might be able to improve the performance. I think i have a replacement for the clean/build/match cycle that fileutils goes through, and am running tests. I'm looking to have that done before the end of the weekend. |
|
I don't think Supporting |
|
I just pushed a new commit which should address @duglin's performance concerns. Based on pwaller's (why can't i mention/ping him? :s) comments in #20128, i have added a new package, implementing a form of pre-compiled regular expressions. Unlike in #20128, the gobwas/glob library is not used, as it appears that it would require some adapting from the current regular expression set that is currently supported to those that glob supports. Based on this, a back-of-the envelope estimate that I got by running all DockerSuite.TestBuild* tests are as follows: Docker master branch: ~ 917.926s As always, comments would be much appreciated! :) |
Concerns were previously raised about the impact of recursive .dockerignore files on performance, since it would require another context walk. Pwaller noted that after profiling, most of the time is due to regular expressions having to be constantly regenerated by fileutils.Match. This patch attempts to address the performance concerns, creating precompiled regular expressions, and reusing them as needed. Signed-off-by: Benedict Wong <bennydictwong@gmail.com>
|
ping @thaJeztah @duglin Just wanted to check back in on this PR. If there's any feedback, or anything that needs to be changed, I'd love to know. |
|
Hi @ObsessiveOrange you can ping me, but I'm not a docker contributor so I don't show up in auto-complete (unless I participate in this PR). Feel free to ping me. The PR looks good to me, although I'll try and find a moment to take performance measurements with/without your changes from my side. No promises though. |
|
Hi @pwaller! Thanks for your comments! The whole non auto-complete thing makes a lot more sense now. If you could get some performance data, that'd be great! I don't actually have anything to test it on, except building the standard docker container. Cheers! |
|
@ObsessiveOrange we just discussed this PR in the maintainers review session and we are worried about breaking backward compatibility for a "nice to have". For example; we know many users have a directory structure containing multiple Dockerfiles and associated We don't want to break backward compatibility for a "nice to have", so decided to not proceed with this PR. We really appreciate you working on this, and taking the time to contribute, so, sorry for closing this PR, but I hope you understand 👍 |
|
@thaJeztah Thanks for keeping me informed! I completely understand your (and the other maintainer's) decision. If i may take a little more of your time, I just have one last thought regarding a change that might not break backward compatibility, while still allowing for this new feature. I'm thinking that we could add a comment at the top of each Root
Child
Further, by making the If this would work, i'd love to work on it, but if not, I completely understand :) |
|
@ObsessiveOrange we briefly discussed other options, but adding such special "meta" information in the ignore files is adding too much complexity (both from coding, and from a UX perspective). For now, we decided to not add this as a feature. |
|
Yep. sounds good! Thanks for letting me know! :) 👍 |
Added new functionality:
.dockerignorefiles can now be placed at any directory, and relative paths will be based on the directory the.dockerignorefile is in.builder/dockerignore/dockerignore.gochanged to allow for reading of entire recursive directory tree.dockerignorefiles, including * matching and exceptions.