fileutils: Fix incorrect handling of "**/foo" pattern#42676
fileutils: Fix incorrect handling of "**/foo" pattern#42676aaronlehmann merged 4 commits intomoby:masterfrom
Conversation
(*PatternMatcher).Matches includes a special case for when the pattern matches a parent dir, even though it doesn't match the current path. However, it assumes that the parent dir which would match the pattern must have the same number of separators as the pattern itself. This doesn't hold true with a patern like "**/foo". A file foo/bar would have len(parentPathDirs) == 1, which is less than the number of path len(pattern.dirs) == 2... therefore this check would be skipped. Given that "**/foo" matches "foo", I think it's a bug that the "parent subdir matches" check is being skipped in this case. It seems safer to loop over the parent subdirs and check each against the pattern. It's possible there is a safe optimization to check only a certain subset, but the existing logic seems unsafe. Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
|
Yes, this fixes #41433 |
|
@AkihiroSuda @tonistiigi PTAL |
|
Is anyone available to review this one? Thanks! |
tonistiigi
left a comment
There was a problem hiding this comment.
I'm not quite sure how the parent check is used in the common cases. Is it ever needed for the walk cases used in dockerignore/buildkit and if it is, can't there be more optimal caching. Asking because this seems to introduce more complexity per single file due to another loop.
Yes, #41433 is a dockerignore issue this PR fixes.
Are you suggesting a cache in |
|
A possible optimization here would be to introduce a new method: func (pm *PatternMatcher) MatchesUsingParentResult(file string, anyParentMatched bool) (bool, error) {
matched := anyParentMatched
file = filepath.FromSlash(file)
for _, pattern := range pm.patterns {
// Skip evaluation if this is an inclusion and the filename already matched the
// pattern, or it's an exclusion and it has not matched the pattern yet.
if pattern.exclusion != matched {
continue
}
match, err := pattern.match(file)
if err != nil {
return false, err
}
if match {
matched = !pattern.exclusion
}
}
return matched, nil
}Then callers doing a complete walk which can keep track of whether any parent matched can use this method instead of the slower implementation in |
|
That latest suggestion sgtm. Might even make sense to detect if there are any exclusions at all in the matcher and avoid checking children completely if not. |
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
|
CI failure looks unrelated. |
|
|
||
| // Matches matches path against all the patterns. Matches is not safe to be | ||
| // called concurrently | ||
| // Matches returns true if "file" matches any of the patterns |
There was a problem hiding this comment.
Could we rename this and remove the range loop. So all callers need to update and choose if they don't need to match parents or if they track the parents on the caller side.
There was a problem hiding this comment.
What's the use case for not matching parent dirs? I think that would give incorrect behavior with a number of patterns. Are there cases where we know in advance we wouldn't encounter such patterns?
Removing Matches would complicate code that isn't doing a walk. For example, removeDockerfile currently uses Matches, and it would need similar logic to the loop in Matches. Even TarWithOptions, which I modified to use MatchesUsingParentResult, uses Matches as a convenience on the first file/dir it encounters matching the current include prefix.
There was a problem hiding this comment.
I think it is unexpected that this function now has m*n complexity and this would make it sure it doesn't go unnoticed. If you think there are cases that need parent matches but would still prefer to use Matches (eg. they only call a single file anyway) maybe leave it as is but rename to MatchesOrAnyParentMatches.
There was a problem hiding this comment.
Renamed it. I kept the original (buggy) implementation as well because some vendored code calls it. We can remove it as a second step.
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
|
@tonistiigi: Do the latest changes look good? |
|
Thanks! Looks like CI failed but I don't think it's related. Can someone please retrigger it? |
This updates fsutil to use the new methods introduced in moby/moby#42676. This fixes the issue with patterns like `**/foo` which was the motivation for that fix.
This updates fsutil to use the new methods introduced in moby/moby#42676. This fixes the issue with patterns like `**/foo` which was the motivation for that fix.
This updates fsutil to use the new methods introduced in moby/moby#42676. This fixes the issue with patterns like `**/foo` which was the motivation for that fix.
(*PatternMatcher).Matchesincludes a special case for when the patternmatches a parent dir, even though it doesn't match the current path.
However, it assumes that the parent dir which would match the pattern
must have the same number of separators as the pattern itself. This
doesn't hold true with a patern like
**/foo. A filefoo/barwould havelen(parentPathDirs) == 1, which is less than the number of pathlen(pattern.dirs) == 2... therefore this check would be skipped.Given that
**/foomatchesfoo, I think it's a bug that the "parentsubdir matches" check is being skipped in this case.
It seems safer to loop over the parent subdirs and check each against
the pattern. It's possible there is a safe optimization to check only a
certain subset, but the existing logic seems unsafe.
This was found while using the
IncludePatternsfeature of BuildKit's"copy" op.
cc @coryb @tonistiigi