Skip to content

Added recursive .dockerignore capabilities#21020

Closed
ObsessiveOrange wants to merge 2 commits intomoby:masterfrom
ObsessiveOrange:20944-recursive-dockerignore-files
Closed

Added recursive .dockerignore capabilities#21020
ObsessiveOrange wants to merge 2 commits intomoby:masterfrom
ObsessiveOrange:20944-recursive-dockerignore-files

Conversation

@ObsessiveOrange
Copy link
Copy Markdown

Added new functionality: .dockerignore files can now be placed at any directory, and relative paths will be based on the directory the .dockerignore file is in.

  • Closes Proposal: recursive .dockerignore files #20944
  • builder/dockerignore/dockerignore.go changed to allow for reading of entire recursive directory tree
  • Tests added to verify functionality of recursive .dockerignore files, including * matching and exceptions.

Closes #20944
>builder/dockerignore/dockerignore.go now contains one extra method to
read all files recursively. The old method is still in use for places
where a single-directory read is still required
>Tests have been updated, and a few new ones added to make sure the
recursive search is fully function, including exclusions.
>Documentation has been updated to reflect the new ability to
recursively define .dockerignore files.

Signed-off-by: Benedict Wong <bennydictwong@gmail.com>
@ObsessiveOrange
Copy link
Copy Markdown
Author

Seeing as this is my first time working on a non-trivial library (and also my first time contributing Go code), I would really appreciate any feedback, specifically about design, code quality, testing and documentation. I'm just another student trying to learn from the pros.

Thanks! :)

@thaJeztah
Copy link
Copy Markdown
Member

Seeing as this is my first time working on a non-trivial library (and also my first time contributing Go code), I would really appreciate any feedback, specifically about design, code quality, testing and documentation. I'm just another student trying to learn from the pros.

No worries, thanks so much for giving it a go 😄

// reads all .dockerignore files in subdirectories and returns the list of file
// patterns to ignore. Note this will trim whitespace from each line as well
// as use GO's "clean" func to get the shortest/cleanest path for each.
func ReadAllRecursive(baseDir string, directory string) ([]string, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without looking at the code (I just looked at the description), will this always do a deep scan for .dockerignore files? Thinking of this situation, where a deep scan is not needed;

cat ./.dockerignore
*

In this case, the .dockerignore at the root of the build-context excludes all files, so no scanning for other .dockerignore files should be needed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the ReadAllRecursive method first reads the .dockerignore file at the current level, and then gets ready to recurse into subdirectories to do the deep scan. Prior to actually recursing, however, it does a check to make sure that the directory does not match any of the already-found exclusions. (This includes from previous levels).

So, in the case that you mentioned, it would stop at the root, since every subsequent subdirectory is excluded.

I hope that answers your question?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes it does! Sorry for the noise, I just glanced over the changes 👍

@thaJeztah
Copy link
Copy Markdown
Member

let me ping @duglin, who's done a lot of work in this area, so may be a good person do an initial review

@duglin
Copy link
Copy Markdown
Contributor

duglin commented Mar 8, 2016

Will try to look it over deeper over the next day or so but my initial take on this is that I'm concerned about the performance of it. We already know that dockerignore processing is too slow, which is why #20128 was created and I still need to review it. I believe this will add another traversal over the entire build context - which can't be good for our timing.

I would prefer if we looked to merge the dockerignore processing into the tar'ing process so that we only walk the tree once. @vbatts would this totally break things because it introduces docker specific stuff into that process? I kind of think we've crossed that bridge already but may not have admitted it.

@GordonTheTurtle GordonTheTurtle added dco/no Automatically set by a bot when one of the commits lacks proper signature and removed dco/no Automatically set by a bot when one of the commits lacks proper signature labels Mar 9, 2016
@thaJeztah
Copy link
Copy Markdown
Member

windowsTP4 timed out after 2 hours, restarting https://jenkins.dockerproject.org/job/Docker-PRs-WoW-TP4/2320/console

} else {
// If reader is nil, it means that something odd happened:
// lack of permissions, etc.
if reader == nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to check that. But probably makes sense to return error if err != nil && !os.IsNotExist(err)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original reason that it was checked was to make sure the behavior was maintained. I found that removing it would make some of the build tests fail, so i left it in. Should I be looking at replacing/removing it?

@ObsessiveOrange
Copy link
Copy Markdown
Author

I just took a look at the TarWithOptions, and had a few thoughts.

Firstly, I took a look at #20128, and noticed that pwaller mentioned most of the time was spent creating the regular expressions from the patterns. If these were to be cached (perhaps in a map?), would it speed up the process? (At the cost of extra space. However, even with 100 rules, each at a filepath length of 100 4-byte utf-8 characters, this map would only take up 80kb).

Secondly, are we allowing a subdirectory's .dockerignore file state that a file in it's parent folder should be ignored?

So, if our directory structure is:

root
>subdir1
>>.dockerignore
>file1

Should root/subdir1/.dockerignore be able to list a rule ../file1?

If so, wouldn't that require two traversals, the first to parse all the rules, then the second to actually create the archive?

@thaJeztah, @duglin I'm also taking a look at pre-generating all the regular expressions, so that in cases where there are massive numbers of files, they aren't being regenerated hundreds or thousands of times over. That might be able to improve the performance. I think i have a replacement for the clean/build/match cycle that fileutils goes through, and am running tests. I'm looking to have that done before the end of the weekend.

@thaJeztah
Copy link
Copy Markdown
Member

I don't think ../file1 should be supported. We should try to match .gitignore, which treats the location of the .gitignore file itself as the "root" of the exclusion, i.e. https://git-scm.com/docs/gitignore.

Supporting ../...... also doesn't make sense if a .dockerignore is put at the root of the build-context, because Docker doesn't use files outside of the build-context (parent directories)

@ObsessiveOrange
Copy link
Copy Markdown
Author

I just pushed a new commit which should address @duglin's performance concerns. Based on pwaller's (why can't i mention/ping him? :s) comments in #20128, i have added a new package, implementing a form of pre-compiled regular expressions. Unlike in #20128, the gobwas/glob library is not used, as it appears that it would require some adapting from the current regular expression set that is currently supported to those that glob supports.

Based on this, a back-of-the envelope estimate that I got by running all DockerSuite.TestBuild* tests are as follows:

Docker master branch: ~ 917.926s
Previous commit (Added recursive .dockerignore capabilities): ~900.838s
Current commit (Added Pre-compiled Regular Expressions): ~889.742s

As always, comments would be much appreciated! :)

Concerns were previously raised about the impact of recursive .dockerignore files on performance, since it would require another context walk. Pwaller noted that after profiling, most of the time is due to regular expressions having to be constantly regenerated by fileutils.Match.

This patch attempts to address the performance concerns, creating precompiled regular expressions, and reusing them as needed.

Signed-off-by: Benedict Wong <bennydictwong@gmail.com>
@ObsessiveOrange
Copy link
Copy Markdown
Author

ping @thaJeztah @duglin

Just wanted to check back in on this PR. If there's any feedback, or anything that needs to be changed, I'd love to know.

@pwaller
Copy link
Copy Markdown
Contributor

pwaller commented Mar 18, 2016

Hi @ObsessiveOrange you can ping me, but I'm not a docker contributor so I don't show up in auto-complete (unless I participate in this PR). Feel free to ping me.

The PR looks good to me, although pkg/precompiledregexp looks a bit weird as a name. Only a minor comment.

I'll try and find a moment to take performance measurements with/without your changes from my side. No promises though.

@ObsessiveOrange
Copy link
Copy Markdown
Author

Hi @pwaller! Thanks for your comments! The whole non auto-complete thing makes a lot more sense now.

If you could get some performance data, that'd be great! I don't actually have anything to test it on, except building the standard docker container.

Cheers!

@ObsessiveOrange
Copy link
Copy Markdown
Author

Hi @pwaller, any chance that you would have had time to take a look at the performance of these patches?

Also, @duglin, is there any chance that you would you have time to look through some of this code?

@duglin duglin removed their assignment Apr 14, 2016
@thaJeztah
Copy link
Copy Markdown
Member

@ObsessiveOrange we just discussed this PR in the maintainers review session and we are worried about breaking backward compatibility for a "nice to have". For example; we know many users have a directory structure containing multiple Dockerfiles and associated .dockerignore files (e.g. for different builds from the same source). Currently, those .dockerignore files are "ignored", but with this change, they will actually do something, and break existing setups.

We don't want to break backward compatibility for a "nice to have", so decided to not proceed with this PR.

We really appreciate you working on this, and taking the time to contribute, so, sorry for closing this PR, but I hope you understand 👍

@thaJeztah thaJeztah closed this Apr 14, 2016
@ObsessiveOrange
Copy link
Copy Markdown
Author

@thaJeztah Thanks for keeping me informed! I completely understand your (and the other maintainer's) decision.

If i may take a little more of your time, I just have one last thought regarding a change that might not break backward compatibility, while still allowing for this new feature.

I'm thinking that we could add a comment at the top of each .dockerIgnore file that should be recursive, something like the #!/usr/bin/env bash that bash scripts have. Maybe something along the lines of: #!recursive-root: SampleName for the root ignore file, and #! recursive-child: SampleName for all children .dockerignore files. This would then be required to be added to any subsequent .dockerignore file for it to be parsed. The recursive-child annotation would allow us to tell which root it was meant for. In this way, we could even have multiple sets of recursive .dockerignore trees, a few .dockerignore files that are entirely non-recursive, all without conflicting with each other.

Root .dockerignore:

#! recursive-root: Sample
build/*
runner.sh

Child .dockerignore:

#! recursive-child: Sample
src/test/resources/*
bin/*
!bin/compiled.exe

Further, by making the .dockerignore files only recursive when the comments are present, older .dockerignore entries are not affected unless explicitly set.

If this would work, i'd love to work on it, but if not, I completely understand :)

@thaJeztah
Copy link
Copy Markdown
Member

@ObsessiveOrange we briefly discussed other options, but adding such special "meta" information in the ignore files is adding too much complexity (both from coding, and from a UX perspective). For now, we decided to not add this as a feature.

@ObsessiveOrange
Copy link
Copy Markdown
Author

Yep. sounds good! Thanks for letting me know! :) 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants