Skip to content

fileutils: Fix incorrect handling of "**/foo" pattern#42676

Merged
aaronlehmann merged 4 commits intomoby:masterfrom
aaronlehmann:patternmatcher-doublestar-bug
Aug 17, 2021
Merged

fileutils: Fix incorrect handling of "**/foo" pattern#42676
aaronlehmann merged 4 commits intomoby:masterfrom
aaronlehmann:patternmatcher-doublestar-bug

Conversation

@aaronlehmann
Copy link
Copy Markdown

@aaronlehmann aaronlehmann commented Jul 26, 2021

(*PatternMatcher).Matches includes a special case for when the pattern
matches a parent dir, even though it doesn't match the current path.
However, it assumes that the parent dir which would match the pattern
must have the same number of separators as the pattern itself. This
doesn't hold true with a patern like **/foo. A file foo/bar would have
len(parentPathDirs) == 1, which is less than the number of path
len(pattern.dirs) == 2... therefore this check would be skipped.

Given that **/foo matches foo, I think it's a bug that the "parent
subdir matches" check is being skipped in this case.

It seems safer to loop over the parent subdirs and check each against
the pattern. It's possible there is a safe optimization to check only a
certain subset, but the existing logic seems unsafe.

This was found while using the IncludePatterns feature of BuildKit's
"copy" op.

cc @coryb @tonistiigi

(*PatternMatcher).Matches includes a special case for when the pattern
matches a parent dir, even though it doesn't match the current path.
However, it assumes that the parent dir which would match the pattern
must have the same number of separators as the pattern itself. This
doesn't hold true with a patern like "**/foo". A file foo/bar would have
len(parentPathDirs) == 1, which is less than the number of path
len(pattern.dirs) == 2... therefore this check would be skipped.

Given that "**/foo" matches "foo", I think it's a bug that the "parent
subdir matches" check is being skipped in this case.

It seems safer to loop over the parent subdirs and check each against
the pattern. It's possible there is a safe optimization to check only a
certain subset, but the existing logic seems unsafe.

Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
@thaJeztah
Copy link
Copy Markdown
Member

Relates to #40319 (I see there's #41433 - not sure if this addresses that though)

@thaJeztah thaJeztah added this to the 21.xx milestone Jul 27, 2021
@aaronlehmann
Copy link
Copy Markdown
Author

Yes, this fixes #41433

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah
Copy link
Copy Markdown
Member

@AkihiroSuda @tonistiigi PTAL

@aaronlehmann
Copy link
Copy Markdown
Author

Is anyone available to review this one? Thanks!

Copy link
Copy Markdown
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure how the parent check is used in the common cases. Is it ever needed for the walk cases used in dockerignore/buildkit and if it is, can't there be more optimal caching. Asking because this seems to introduce more complexity per single file due to another loop.

@aaronlehmann
Copy link
Copy Markdown
Author

Is it ever needed for the walk cases used in dockerignore/buildkit

Yes, #41433 is a dockerignore issue this PR fixes.

can't there be more optimal caching

Are you suggesting a cache in PatternMatcher to memoize these calls to match?

@aaronlehmann
Copy link
Copy Markdown
Author

aaronlehmann commented Aug 11, 2021

A possible optimization here would be to introduce a new method:

func (pm *PatternMatcher) MatchesUsingParentResult(file string, anyParentMatched bool) (bool, error) {
	matched := anyParentMatched
	file = filepath.FromSlash(file)

	for _, pattern := range pm.patterns {
		// Skip evaluation if this is an inclusion and the filename already matched the
		// pattern, or it's an exclusion and it has not matched the pattern yet.
		if pattern.exclusion != matched {
			continue
		}

		match, err := pattern.match(file)
		if err != nil {
			return false, err
		}

		if match {
			matched = !pattern.exclusion
		}
	}
	return matched, nil
}

Then callers doing a complete walk which can keep track of whether any parent matched can use this method instead of the slower implementation in Matches... but this comes at the cost of more complexity in the calling functions.

@tonistiigi
Copy link
Copy Markdown
Member

That latest suggestion sgtm. Might even make sense to detect if there are any exclusions at all in the matcher and avoid checking children completely if not.

Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
@aaronlehmann
Copy link
Copy Markdown
Author

CI failure looks unrelated.


// Matches matches path against all the patterns. Matches is not safe to be
// called concurrently
// Matches returns true if "file" matches any of the patterns
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this and remove the range loop. So all callers need to update and choose if they don't need to match parents or if they track the parents on the caller side.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for not matching parent dirs? I think that would give incorrect behavior with a number of patterns. Are there cases where we know in advance we wouldn't encounter such patterns?

Removing Matches would complicate code that isn't doing a walk. For example, removeDockerfile currently uses Matches, and it would need similar logic to the loop in Matches. Even TarWithOptions, which I modified to use MatchesUsingParentResult, uses Matches as a convenience on the first file/dir it encounters matching the current include prefix.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is unexpected that this function now has m*n complexity and this would make it sure it doesn't go unnoticed. If you think there are cases that need parent matches but would still prefer to use Matches (eg. they only call a single file anyway) maybe leave it as is but rename to MatchesOrAnyParentMatches.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed it. I kept the original (buggy) implementation as well because some vendored code calls it. We can remove it as a second step.

Aaron Lehmann added 2 commits August 12, 2021 18:10
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
@aaronlehmann
Copy link
Copy Markdown
Author

@tonistiigi: Do the latest changes look good?

@aaronlehmann
Copy link
Copy Markdown
Author

Thanks! Looks like CI failed but I don't think it's related. Can someone please retrigger it?

@aaronlehmann aaronlehmann merged commit ba2adee into moby:master Aug 17, 2021
@aaronlehmann aaronlehmann deleted the patternmatcher-doublestar-bug branch August 17, 2021 02:59
aaronlehmann pushed a commit to aaronlehmann/fsutil that referenced this pull request Aug 17, 2021
This updates fsutil to use the new methods introduced in
moby/moby#42676. This fixes the issue with
patterns like `**/foo` which was the motivation for that fix.
aaronlehmann pushed a commit to aaronlehmann/fsutil that referenced this pull request Aug 17, 2021
This updates fsutil to use the new methods introduced in
moby/moby#42676. This fixes the issue with
patterns like `**/foo` which was the motivation for that fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

wildcard directories not implemented correctly in dockerignore

3 participants