Exclude files in search.largeFiles#480
Conversation
| // IgnoreSizeMax determines whether the max size should be ignored. | ||
| func (o *Options) IgnoreSizeMax(name string) bool { | ||
| matched := false | ||
| for _, pattern := range o.LargeFiles { |
There was a problem hiding this comment.
We could bucket patterns to distinguish negate/exclude patterns from include patterns. Then we could visit each and every exclude pattern before any include pattern, just to ensure a file is not disallowed by the former. I don't think this optimization would save us much though.
| } | ||
|
|
||
| // if negated then strip prefix meta characters representing negated filter pattern | ||
| if strings.Index(pattern, negate) == 0 { |
There was a problem hiding this comment.
Maybe it is better to use strings.HasPrefix here? I looked at the implementations of both and HasPrefix seems to have less overhead.
There was a problem hiding this comment.
That makes sense, I'll replace with HasPrefix
There was a problem hiding this comment.
Indeed this function would be relatively straightforward using HasPrefix. You essentially only need to deal with strings.HasPrefix(pattern, "!") and strings.HasPrefix(pattern, "\\!")
keegancsmith
left a comment
There was a problem hiding this comment.
The idea of using ! and \! LGTM. Some inline comments.
FYI here is the documentation from gitignore about ! which I think we should try be consistent with. However, we always explore all files so we don't need to be consistent with the performance note about directories. https://git-scm.com/docs/gitignore#_pattern_format
An optional prefix "!" which negates the pattern; any matching file excluded by a previous pattern will become included again. It is not possible to re-include a file if a parent directory of that file is excluded. Git doesn’t list excluded directories for performance reasons, so any patterns on contained files have no effect, no matter where they are defined. Put a backslash ("\") in front of the first "!" for patterns that begin with a literal "!", for example, "\!important!.txt".
| } | ||
|
|
||
| // if negated then strip prefix meta characters representing negated filter pattern | ||
| if strings.Index(pattern, negate) == 0 { |
There was a problem hiding this comment.
Indeed this function would be relatively straightforward using HasPrefix. You essentially only need to deal with strings.HasPrefix(pattern, "!") and strings.HasPrefix(pattern, "\\!")
…dless of order. use HasPrefix()
|
You can hold off on reviewing my recent push until I finish the sourcegraph/sourcegraph PR |
|
https://github.com/sourcegraph/sourcegraph/pull/45318 PR for sourcegraph/sourcegraph to update filter for searches and add test cases. |
|
@keegancsmith btw do I need to do any updates for these version My line of thinking is that for the specific case of file names prefixed with literal |
keegancsmith
left a comment
There was a problem hiding this comment.
btw do I need to do any updates for these version consts?
Nah, I think the likelyhood of a pattern prefixed with ! are so low it isn't worth the tradeoff on causing reindexing. Even then, we will eventually converge assuming the repo has commits to it. All in all, I think it is fine and I'd consider noting that in the changelog entry.
The site config setting search.largeFiles should allow file exclusions so that known large files can be excluded from the index.
This PR introduces extra steps in builder for handling interpretation of largeFiles patterns. Although doublestar still handles parsing for us we now handle additional cases:
a pattern is prefixed by
!: We use this character to define negation.Any file matching the pattern following theReplaced with behavior that matches gitignore more closely. The current behavior is now to take the latest entry in the pattern list, and it will take precedence / override preceding entries in the pattern list. We iterate in reverse and return first pattern that matches, regardless of if it is an exclude pattern or not. Note that doublestar in a more recent version uses negate meta characters!is excluded from the index, regardless if another pattern in largeFiles is a positive match on the file.^and!but only for character classes.a pattern escapes an exclamation mark
\!: No interpretation of the!so no negation is applied, and ensure our doublestar patternMatch includes the!(and no escape\).GH Issue: https://github.com/sourcegraph/sourcegraph/issues/40688