Conversation
Used by FilenamePassesIncludeExcludeFilter
|
Looks like the |
ttaylorr
left a comment
There was a problem hiding this comment.
Looking good, few comments. I think the underlying implementation is correct, but the tests should verify that for me 👍
| filepath.Walk(config.LocalWorkingDir, func(path string, info os.FileInfo, err error) error { | ||
| if err != nil { | ||
| return err | ||
| fchan, errchan := tools.FastWalkGitRepo(config.LocalWorkingDir) |
There was a problem hiding this comment.
Nice. What do you think about pulling this out to a private helper function in the commands package? I know the logic is duplicated in a few places across this PR, but I'm not so much thinking about that as I want the findAttributeFiles func to read cleanly.
There was a problem hiding this comment.
It's only actually used once in production code, the other place is in the tests. Since what this findAttributeFiles function is doing is only hooking out items from the channel that are .gitattributes files it seems to me to make sense like it is, processing the channel and skipping anything else. I certainly don't want another function that builds a full list and then needs to be iterated, the other option is a function which takes search criteria but that's pretty specific and nothing else is using it yet. Maybe if something else needs it later it could be refactored but wrapping it speculatively right now might not actually be re-usable anyway.
| } | ||
|
|
||
| if len(includePaths) > 0 { | ||
| matched := false |
There was a problem hiding this comment.
We could get rid of the above if by changing this line to:
matched := len(includePaths) == 0There was a problem hiding this comment.
But that conditional guards a lot more than just setting matched := false, in fact matched is deliberate scoped inside that block. It's that way because early-out by lack of inclusion requires checking every item and only concluding at the end, as opposed to early-out by exclusion which can return as soon as it's matched. That's why the exclusion block is simpler.
There was a problem hiding this comment.
Ah, you're right; nice catch.
| break | ||
| } | ||
| } | ||
| if !matched { |
There was a problem hiding this comment.
Or if we keep the if on L139, I don't think we need this one, since we break out whenever matched is true on L144.
There was a problem hiding this comment.
I don't understand your logic, sorry. You'll need to write me the complete block as you see it because to me that doesn't seem to equate to the same result.
There was a problem hiding this comment.
No worries, I think we can keep this as-is in the interest of merging this soon.
Yeah and yet the very same commit worked fine in the other AppVeyor run. Seems dubious to me, I can't reproduce it. |
|
Re-tested everything on Windows this morning, totally fine. Looks like it's passing on AppVeyor now too, did someone re-run it? If so thanks, I don't have access to do that and was going to push a pointless change just to force it 😄 Appreciate if someone could approve this, I don't think any changes are needed. @ttaylorr @technoweenie |
technoweenie
left a comment
There was a problem hiding this comment.
This looks good. I'd just prefer not to export anything in tools that isn't necessary.
tools/filetools.go
Outdated
| // all be of the same type. includePaths/excludePaths can be nil. | ||
| func FastWalk(dir string, includePaths, excludePaths []string) (<-chan FastWalkInfo, <-chan error) { | ||
| return FastWalkWithExcludeFiles(dir, "", includePaths, excludePaths) | ||
| } |
There was a problem hiding this comment.
Do we need this func? Looks like it's only used in tests, while FastWalkGitRepo() is used in code.
|
Since they’re all related variations of the same functionality it seemed sensible to expose them all in case it was useful to re-use them, but can be hidden if you prefer.
|
ttaylorr
left a comment
There was a problem hiding this comment.
Looking awesome. Few minor nits here and there, otherwise I think this is ready to merge.
tools/filetools.go
Outdated
|
|
||
| // Main recursive implementation of fast walk | ||
| // Increment waitg.Add(1) for each new goroutine launched internally | ||
| func fastWalkItem(parentDir string, itemFi os.FileInfo, excludeFilename string, |
There was a problem hiding this comment.
This method's name is confusing, mainly because it does not specify what an item is in context. Is an item a directory? A file? Something else?
If you could specify what that refers to specifically, and improve the documentation to point out what some of the parameters mean (in particular, I'm thinking of itemFi and fiChan) that would be 👍
There was a problem hiding this comment.
Is an item a directory? A file? Something else?
It's both, that's why I used Item and not File or Dir. Guess it could be fastWalkFileOrDir
| return | ||
| } | ||
| defer df.Close() | ||
| jobSize := 100 |
There was a problem hiding this comment.
This variable is effectively a constant. Would you mind extracting it and giving it a name outside of this method?
There was a problem hiding this comment.
Why bother, unless something else needs to use the same size?
There was a problem hiding this comment.
I figured that might be a good place to document why the number is what it is and what it's used for. I suppose that can be done inline, but that's just my preference 😄
There was a problem hiding this comment.
Yeah I don't want to pollute the namespace, it's only a named var at all to self-document what it's for.
| for children, err := df.Readdir(jobSize); err == nil; children, err = df.Readdir(jobSize) { | ||
| // Parallelise all dirs, and chop large dirs into batches | ||
| waitg.Add(1) | ||
| go func(subitems []os.FileInfo) { |
tools/filetools.go
Outdated
| // If any changes are made a copy of the array is taken so the original is not | ||
| // modified | ||
| func loadExcludeFilename(filename, parentDir string, excludePaths []string) ([]string, error) { | ||
|
|
There was a problem hiding this comment.
Nit: please remove this newline.
| defer f.Close() | ||
|
|
||
| retPaths := excludePaths | ||
| modified := false |
There was a problem hiding this comment.
For these variables that hold the zero-value of their type, I prefer the:
var modified booldeclaration, but that's just a stylistic preference, which you should feel free to ignore.
There was a problem hiding this comment.
I prefer the way it is, lol. If this is later changed to require a default value of true, then the diff of that update would just be:
-modified := false
+modified := trueThere was a problem hiding this comment.
Yeah I prefer explicitness of defaults, but maybe just because I come from languages that don't guarantee initialised variables
|
I think |
|
All done - CLA check seems to be stalled though? |
Fixes #1611
Provides a number of more optimal replacements for
filepath.Walk. The easiest of which istools.FastWalkGitRepowhich automatically avoids walking.gitand anything that matches any.gitignoreit discovers as it walks.Lower-level versions are available to include/exclude custom patterns.
I've used this in
git lfs trackto significantly speed up the enumeration of.gitattributesfiles, particularly when a repo contains a lot of files, most of which are in the.gitignore, as in #1611.I've also refactored
FilenamePassesIncludeExcludeFilterinto thetoolspackage, and extracted a utility functionFileMatchwhich replacesfilepath.Matchby mirroring the behaviour ofgitignorepattern matching, which wasn't completely supported before. This has a side effect of enhancing the capabilities of the include/exclude filters used elsewhere to support a more complete range of wildcard options including the "**" folder match.