Reworked to use git ls-files in some circumstances instead of FastWalkGitRepo#3823
Conversation
|
The approach I'm taking, which I'm still waiting on feedback on, is the The general approach of using |
|
RE large repos: I think the limiting factor is going to be the size of the working tree, and the complexity of the gitignores. To that end, our repo is the result of an SVN conversion, and it is fairly large at 278,000 files, with a gitignore over 3k lines. I will take a look at your changes, and see where we differ. If you are open to accepting a PR, I will keep working on this. |
|
I spent a minute and reviewed your change. It looks pretty similar to mine, except you are hitting the filesystem a little more. I think I need to understand a little more:
|
|
For 1, we need to honor For 2, it isn't practically useful to lock a file that isn't yet committed, since nobody else will be contending over it. I don't think we have to support that case. For 3, I don't believe we're querying the server if you're talking about the code in I don't think we intrinsically need to call |
|
I ran some trivial bench marking on the two change sets: My guess is that the majority of the difference is the calls to stat. I will rework my change so the test cases pass. My guess is it will end up looking very similar to your change, except for the calls to stat, and the NUL separated output from |
14ade2b to
d542f07
Compare
Due to the complexities of handling .gitignores, rely directly on git-ls-files rather than a custom gitignore-aware directory walking routine. As a side-effect, there has been a minor behaviorial change related to locking. Previously, if a file was locked, then a matching pattern was added to a .gitignore, that file was no longer considered locked and the writeable attribute was set on that file. With this change, files which match a .gitignore, but have been addded to the repository, will still have their write bit cleared. Fixes git-lfs#3797
|
Updated benchmark: As expected, it is faster than running stat on all of the files, but not as fast as the initial version because it has to scan the working directory ( |
|
Not sure why the windows ci build is failing. Is that expected? |
|
I believe there's something broken with Windows CI. Reported as #3828. |
|
@SeamusConnor Please, rebase onto latest |
|
Alright, looks like whatever was done to the windows CI build on master fixed my issue. |
bk2204
left a comment
There was a problem hiding this comment.
First, sorry it's taken me so long to get to this.
Overall, this looks like a great improvement. I have some concerns about the memory usage with the technique implemented here, especially on large repositories. I think using a goroutine approach to return items incrementally may be a better approach.
| FullPath: scanner.Text(), | ||
| } | ||
| rv.Files[scanner.Text()] = finfo | ||
| rv.FilesByName[base] = append(rv.FilesByName[base], finfo) |
There was a problem hiding this comment.
I think this approach has the real possibility of using a lot of memory on large repositories, since we're storing data twice for each file instead of operating incrementally. I think we'd want to continue to do things incrementally like we do for the current walker.
There was a problem hiding this comment.
Yeah, I considered that when I did it. For my repo (which IMO is quite large w/ a working tree of ~300k files), ls-files -z -o --cached produced around 20 MiB of data. I made a completely untested tradeoff for speed here, with the assumption that a callback or channel based approach would be slower. Perhaps the most prudent thing to do would be for me to try it with an iterative interface, and see what happens to the execution time/memory consumption?
Anyway, that was the thought process. I could go either way. Just let me know.
There was a problem hiding this comment.
Largest git installation I worked on was 1M files with 1.5TB .git/objects (before git-lfs existed) and this was much beyond what any sane person would do. So if what you did works reasonably well for 300K files, I believe that's Good Enough to leave as is.
There was a problem hiding this comment.
Okay, let's go with it. It's going to scale better than the current status quo, and it's not too difficult to fix if it doesn't.
In commit 13a8af6 of PR git-lfs#1616 we added a FastWalkGitRepo() function to our "tools" package for the purpose of improving the performance of the "git lfs track" command. At the time, this command used the Walk() function of the "filepath" package from the Go standard library to traverse the contents of the current Git working tree and locate all ".gitattributes" files. Then in commit f1fdc85 of the same PR git-lfs#1616 we updated the "git lfs track" command's findAttributeFiles() function to use our new FastWalkGitRepo() function instead of the "filepath" package's Walk() function. This change made searches for ".gitattributes" files in large repositories faster for several reasons. Unlike the Walk() function from the "filepath" package, the functions called by the FastWalkGitRepo() function to traverse a directory hierarchy did not sort the entries in each directory, and also ignored all ".git" directories and all entries which matched any pattern found in a ".gitignore" file. Later, in PRs git-lfs#1870 and git-lfs#2689, we expanded the number of callers of the FastWalkGitRepo() function. In particular, in PR git-lfs#2689 we began to make use of the function during the final phase of all Git LFS commands where we find and delete any stale temporary files stored in our ".lfs/tmp" directory. This PR introduced our "fs" package whose cleanupTmp() method calls the FastWalkGitRepo() function, passing the path to the ".lfs/tmp" directory and an anonymous callback function which removes any temporary object data files that are more than an hour old. We then added a FastWalkGitRepoAll() function to our "fs" package in PR git-lfs#3190, which operated in a similar fashion as the FastWalkGitRepo() function but did not read ".gitignore" files and so also did not skip directory entries matching any patterns found in those files. Next, in PR git-lfs#3686, we updated the internal implementation of the FastWalkGitRepo() and FastWalkGitRepoAll() functions to avoid entering submodules when traversing a Git working tree. To make this change, we added a check to the Walk() method of the "fastWalker" structure so it would return immediately when processing a directory if the directory contained an entry named ".git", unless the directory was the root of the working tree. Note that the Walk() method already ignored any directory entries with the name ".git", but this only meant it would traverse through the contents of a submodule checkout in a working tree while skipping the submodule's ".git" directory. Then in commit 83d7f76 of PR git-lfs#3823 we first implemented the NewLsFiles() function in our "git" package, which executes a "git ls-files" command and returns list of files it outputs. As well, we updated the findAttributeFiles() function of our "git" package and the fixFileWriteFlags() function in our "locking" package to both call the NewLsFiles() function instead of calling either the FastWalkGitRepo() or FastWalkGitRepoAll() functions. Since these were the only instances where we actually used the FastWalkGitRepo() or FastWalkGitRepoAll() functions to traverse a Git working tree, in the same commit of PR git-lfs#3823 we removed the FastWalkGitRepo() function and renamed the FastWalkGitRepoAll() function to FastWalkDir(). We also simplified some of the internal functions called by the FastWalkDir() function because they no longer needed to detect or parse ".gitignore" files, or skip directory entries matching the patterns from those files. Although the two remaining use cases for the FastWalkDir() function also did not need the function to detect and skip submodules or directories named ".git", the logic to do so was retained in the internal functions of the "tools" package. Specifically, the fastWalkWithExcludeFiles() function, which is called by the FastWalkDir() function, establishes two file path filters with the patterns ".git" and "**/.git", which are then passed down to the Walk() method of the "fastWalker" structure. That method checks whether the current directory entry's name matches either of the filter patterns, and if it does returns immediately without processing the entry any further. In addition, the Walk() method still also performs the check added in PR git-lfs#3686 to try to avoid traversing into Git submodules. As noted above, however, neither of the two remaining callers of the FastWalkDir() function require these checks, because they only need the function to traverse directory hierarchies within the ".git/lfs" directory. The cleanupTmp() method of the Filesystem structure in our "fs" package uses the FastWalkDir() function to find stale temporary files within the ".git/lfs/tmp" directory. The EachObject() method of the same structure, meanwhile, uses the FastWalkDir() function to invoke a callback function for each object file under the ".git/lfs/objects" directory. In both cases, the directory hierarchy traversed by the FastWalkDir() function is entirely within the ".git/lfs" directory, so there is no value in trying to exclude ".git" directories while performing the traversal. We therefore now simplify the FastWalkDir() function and the internal functions it invokes by removing the unnecessary checks for submodules and for directory entries named ".git". First, we update the fastWalkWithExcludeFiles() function so that does not initialize any file path filters, and we rename the function to fastWalkDir(). Next, we remove the Walk() method's "excludePaths" parameter, and alter the method so that it no longer skips directory entries based on whether their names match the patterns in that parameter's file path filters. We also eliminate the check in the Walk() method for directories containing ".git" directory entries, as this check's only purpose was to skip Git submodules within a working tree. As well, we revise the code comments relating to all of these functions and methods to reflect their new names and their simplified behaviours. To minimize the changes in this commit, however, we leave the names of the functions' parameters and internal variables intact, even though some of them still reflect the original design and the expectation that the functions would be used with Git working trees. In a subsequent commit in this PR we will then rename these variables and parameters, along with the "rootDir" field of the "fastWalker" structure, so that they more accurately represent the functions' current purpose and implementation. Finally, note that the "cleans only temp files and directories older than an hour" test in our "t/t-tempfile.sh" shell test script verifies the behaviour of cleanupTmp() function, which employs the FastWalkDir() function, while the TestFastWalkBasic() test function in our Go test suite directly exercises the fastWalkDir() internal function. Both tests thus provide some assurance that our changes in this commit have not introduced any unexpected regressions.
In commit 13a8af6 of PR git-lfs#1616 we added a FastWalkGitRepo() function to our "tools" package for the purpose of improving the performance of the "git lfs track" command. At the time, this command used the Walk() function of the "filepath" package from the Go standard library to traverse the contents of the current Git working tree and locate all ".gitattributes" files. Then in commit f1fdc85 of the same PR git-lfs#1616 we updated the "git lfs track" command's findAttributeFiles() function to use our new FastWalkGitRepo() function instead of the "filepath" package's Walk() function. This change made searches for ".gitattributes" files in large repositories faster for several reasons. Unlike the Walk() function from the "filepath" package, the functions called by the FastWalkGitRepo() function to traverse a directory hierarchy did not sort the entries in each directory, and also ignored all ".git" directories and all entries which matched any pattern found in a ".gitignore" file. Later, in PRs git-lfs#1870 and git-lfs#2689, we expanded the number of callers of the FastWalkGitRepo() function. In particular, in PR git-lfs#2689 we began to make use of the function during the final phase of all Git LFS commands where we find and delete any stale temporary files stored in our ".lfs/tmp" directory. This PR introduced our "fs" package whose cleanupTmp() method calls the FastWalkGitRepo() function, passing the path to the ".lfs/tmp" directory and an anonymous callback function which removes any temporary object data files that are more than an hour old. We then added a FastWalkGitRepoAll() function to our "fs" package in PR git-lfs#3190, which operated in a similar fashion as the FastWalkGitRepo() function but did not read ".gitignore" files and so also did not skip directory entries matching any patterns found in those files. Next, in PR git-lfs#3686, we updated the internal implementation of the FastWalkGitRepo() and FastWalkGitRepoAll() functions to avoid entering submodules when traversing a Git working tree. To make this change, we added a check to the Walk() method of the "fastWalker" structure so it would return immediately when processing a directory if the directory contained an entry named ".git", unless the directory was the root of the working tree. Note that the Walk() method already ignored any directory entries with the name ".git", but this only meant it would traverse through the contents of a submodule checkout in a working tree while skipping the submodule's ".git" directory. Then in commit 83d7f76 of PR git-lfs#3823 we first implemented the NewLsFiles() function in our "git" package, which executes a "git ls-files" command and returns list of files it outputs. As well, we updated the findAttributeFiles() function of our "git" package and the fixFileWriteFlags() function in our "locking" package to both call the NewLsFiles() function instead of calling either the FastWalkGitRepo() or FastWalkGitRepoAll() functions. Since these were the only instances where we actually used the FastWalkGitRepo() or FastWalkGitRepoAll() functions to traverse a Git working tree, in the same commit of PR git-lfs#3823 we removed the FastWalkGitRepo() function and renamed the FastWalkGitRepoAll() function to FastWalkDir(). We also simplified some of the internal functions called by the FastWalkDir() function because they no longer needed to detect or parse ".gitignore" files, or skip directory entries matching the patterns from those files. Although the two remaining use cases for the FastWalkDir() function also did not need the function to detect and skip submodules or directories named ".git", the logic to do so was retained in the internal functions of the "tools" package. Specifically, the fastWalkWithExcludeFiles() function, which is called by the FastWalkDir() function, establishes two file path filters with the patterns ".git" and "**/.git", which are then passed down to the Walk() method of the "fastWalker" structure. That method checks whether the current directory entry's name matches either of the filter patterns, and if it does returns immediately without processing the entry any further. In addition, the Walk() method still also performs the check added in PR git-lfs#3686 to try to avoid traversing into Git submodules. As noted above, however, neither of the two remaining callers of the FastWalkDir() function require these checks, because they only need the function to traverse directory hierarchies within the ".git/lfs" directory. The cleanupTmp() method of the Filesystem structure in our "fs" package uses the FastWalkDir() function to find stale temporary files within the ".git/lfs/tmp" directory. The EachObject() method of the same structure, meanwhile, uses the FastWalkDir() function to invoke a callback function for each object file under the ".git/lfs/objects" directory. In both cases, the directory hierarchy traversed by the FastWalkDir() function is entirely within the ".git/lfs" directory, so there is no value in trying to exclude ".git" directories while performing the traversal. We therefore now simplify the FastWalkDir() function and the internal functions it invokes by removing the unnecessary checks for submodules and for directory entries named ".git". First, we update the fastWalkWithExcludeFiles() function so that does not initialize any file path filters, and we rename the function to fastWalkDir(). Next, we remove the Walk() method's "excludePaths" parameter, and alter the method so that it no longer skips directory entries based on whether their names match the patterns in that parameter's file path filters. We also eliminate the check in the Walk() method for directories containing ".git" directory entries, as this check's only purpose was to skip Git submodules within a working tree. As well, we revise the code comments relating to all of these functions and methods to reflect their new names and their simplified behaviours. To minimize the changes in this commit, however, we leave the names of the functions' parameters and internal variables intact, even though some of them still reflect the original design and the expectation that the functions would be used with Git working trees. In a subsequent commit in this PR we will then rename these variables and parameters, along with the "rootDir" field of the "fastWalker" structure, so that they more accurately represent the functions' current purpose and implementation. Finally, note that the "cleans only temp files and directories older than an hour" test in our "t/t-tempfile.sh" shell test script verifies the behaviour of cleanupTmp() function, which employs the FastWalkDir() function, while the TestFastWalkBasic() test function in our Go test suite directly exercises the fastWalkDir() internal function. Both tests thus provide some assurance that our changes in this commit have not introduced any unexpected regressions.
In commit 13a8af6 of PR git-lfs#1616 we added a FastWalkGitRepo() function to our "tools" package for the purpose of improving the performance of the "git lfs track" command. At the time, this command used the Walk() function of the "filepath" package from the Go standard library to traverse the contents of the current Git working tree and locate all ".gitattributes" files. Then in commit f1fdc85 of the same PR git-lfs#1616 we updated the "git lfs track" command's findAttributeFiles() function to use our new FastWalkGitRepo() function instead of the "filepath" package's Walk() function. This change made searches for ".gitattributes" files in large repositories faster for several reasons. Unlike the Walk() function from the "filepath" package, the functions called by the FastWalkGitRepo() function to traverse a directory hierarchy did not sort the entries in each directory, and also ignored all ".git" directories and all entries which matched any pattern found in a ".gitignore" file. Later, in PRs git-lfs#1870 and git-lfs#2689, we expanded the number of callers of the FastWalkGitRepo() function. In particular, in PR git-lfs#2689 we began to make use of the function during the final phase of all Git LFS commands where we find and delete any stale temporary files stored in our ".lfs/tmp" directory. This PR introduced our "fs" package whose cleanupTmp() method calls the FastWalkGitRepo() function, passing the path to the ".lfs/tmp" directory and an anonymous callback function which removes any temporary object data files that are more than an hour old. We then added a FastWalkGitRepoAll() function to our "fs" package in PR git-lfs#3190, which operated in a similar fashion as the FastWalkGitRepo() function but did not read ".gitignore" files and so also did not skip directory entries matching any patterns found in those files. Next, in PR git-lfs#3686, we updated the internal implementation of the FastWalkGitRepo() and FastWalkGitRepoAll() functions to avoid entering submodules when traversing a Git working tree. To make this change, we added a check to the Walk() method of the "fastWalker" structure so it would return immediately when processing a directory if the directory contained an entry named ".git", unless the directory was the root of the working tree. Note that the Walk() method already ignored any directory entries with the name ".git", but this only meant it would traverse through the contents of a submodule checkout in a working tree while skipping the submodule's ".git" directory. Then in commit 83d7f76 of PR git-lfs#3823 we first implemented the NewLsFiles() function in our "git" package, which executes a "git ls-files" command and returns list of files it outputs. As well, we updated the findAttributeFiles() function of our "git" package and the fixFileWriteFlags() function in our "locking" package to both call the NewLsFiles() function instead of calling either the FastWalkGitRepo() or FastWalkGitRepoAll() functions. Since these were the only instances where we actually used the FastWalkGitRepo() or FastWalkGitRepoAll() functions to traverse a Git working tree, in the same commit of PR git-lfs#3823 we removed the FastWalkGitRepo() function and renamed the FastWalkGitRepoAll() function to FastWalkDir(). We also simplified some of the internal functions called by the FastWalkDir() function because they no longer needed to detect or parse ".gitignore" files, or skip directory entries matching the patterns from those files. Although the two remaining use cases for the FastWalkDir() function also did not need the function to detect and skip submodules or directories named ".git", the logic to do so was retained in the internal functions of the "tools" package. Specifically, the fastWalkWithExcludeFiles() function, which is called by the FastWalkDir() function, establishes two file path filters with the patterns ".git" and "**/.git", which are then passed down to the Walk() method of the "fastWalker" structure. That method checks whether the current directory entry's name matches either of the filter patterns, and if it does returns immediately without processing the entry any further. In addition, the Walk() method still also performs the check added in PR git-lfs#3686 to try to avoid traversing into Git submodules. As noted above, however, neither of the two remaining callers of the FastWalkDir() function require these checks, because they only need the function to traverse directory hierarchies within the ".git/lfs" directory. The cleanupTmp() method of the Filesystem structure in our "fs" package uses the FastWalkDir() function to find stale temporary files within the ".git/lfs/tmp" directory. The EachObject() method of the same structure, meanwhile, uses the FastWalkDir() function to invoke a callback function for each object file under the ".git/lfs/objects" directory. In both cases, the directory hierarchy traversed by the FastWalkDir() function is entirely within the ".git/lfs" directory, so there is no value in trying to exclude ".git" directories while performing the traversal. We therefore now simplify the FastWalkDir() function and the internal functions it invokes by removing the unnecessary checks for submodules and for directory entries named ".git". First, we update the fastWalkWithExcludeFiles() function so that does not initialize any file path filters, and we rename the function to fastWalkDir(). Next, we remove the Walk() method's "excludePaths" parameter, and alter the method so that it no longer skips directory entries based on whether their names match the patterns in that parameter's file path filters. We also eliminate the check in the Walk() method for directories containing ".git" directory entries, as this check's only purpose was to skip Git submodules within a working tree. As well, we revise the code comments relating to all of these functions and methods to reflect their new names and their simplified behaviours. To minimize the changes in this commit, however, we leave the names of the functions' parameters and internal variables intact, even though some of them still reflect the original design and the expectation that the functions would be used with Git working trees. In a subsequent commit in this PR we will then rename these variables and parameters, along with the "rootDir" field of the "fastWalker" structure, so that they more accurately represent the functions' current purpose and implementation. Finally, note that the "cleans only temp files and directories older than an hour" test in our "t/t-tempfile.sh" shell test script verifies the behaviour of cleanupTmp() function, which employs the FastWalkDir() function, while the TestFastWalkBasic() test function in our Go test suite directly exercises the fastWalkDir() internal function. Both tests thus provide some assurance that our changes in this commit have not introduced any unexpected regressions.
In commit 13a8af6 of PR git-lfs#1616 we added a FastWalkGitRepo() function to our "tools" package for the purpose of improving the performance of the "git lfs track" command. At the time, this command used the Walk() function of the "filepath" package from the Go standard library to traverse the contents of the current Git working tree and locate all ".gitattributes" files. Then in commit f1fdc85 of the same PR git-lfs#1616 we updated the "git lfs track" command's findAttributeFiles() function to use our new FastWalkGitRepo() function instead of the "filepath" package's Walk() function. This change made searches for ".gitattributes" files in large repositories faster for several reasons. Unlike the Walk() function from the "filepath" package, the functions called by the FastWalkGitRepo() function to traverse a directory hierarchy did not sort the entries in each directory, and also ignored all ".git" directories and all entries which matched any pattern found in a ".gitignore" file. Later, in PRs git-lfs#1870 and git-lfs#2689, we expanded the number of callers of the FastWalkGitRepo() function. In particular, in PR git-lfs#2689 we began to make use of the function in our "git lfs prune" command to locate all the object files in our local storage directories, which by default are located within the ".git/lfs/objects" directory. As well, with this PR we began to use the FastWalkGitRepo() function during the final phase of all Git LFS commands where we find and delete any stale temporary files from the directory where we store such files, which by default is the ".git/lfs/tmp" directory. PR git-lfs#2689 introduced our "fs" package, whose Filesystem structure's EachObject() and cleanupTmp() methods both called the FastWalkGitRepo() function, passing the paths to the local object storage directory and the local temporary file storage directory, respectively. We then added a FastWalkGitRepoAll() function to our "fs" package in PR git-lfs#3190. This function operated in a similar fashion as the FastWalkGitRepo() function but did not read ".gitignore" files and so also did not skip directory entries matching any patterns found in those files. Next, in PR git-lfs#3686, we updated the internal implementation of the FastWalkGitRepo() and FastWalkGitRepoAll() functions to avoid entering submodules when traversing a Git working tree. To make this change, we added a check to the "fastWalker" structure's Walk() method so it would return immediately when processing a directory if the directory contained an entry named ".git", unless the directory was the root of the working tree. Note that the Walk() method already ignored any directory entries with the name ".git", but this only meant it would traverse through the contents of a submodule checkout in a working tree while skipping the submodule's ".git" directory. Then in commit 83d7f76 of PR git-lfs#3823 we first implemented the NewLsFiles() function in our "git" package, which executes a "git ls-files" command and returns the list of files it outputs. As well, we updated the findAttributeFiles() function of our "git" package and the fixFileWriteFlags() function in our "locking" package to both call the NewLsFiles() function instead of calling either the FastWalkGitRepo() or FastWalkGitRepoAll() functions. Since these were the only instances where we actually used the FastWalkGitRepo() or FastWalkGitRepoAll() functions to traverse a Git working tree, in the same commit of PR git-lfs#3823 we removed the FastWalkGitRepo() function and renamed the FastWalkGitRepoAll() function to FastWalkDir(). We also simplified some of the internal functions called by the FastWalkDir() function because they no longer needed to detect or parse ".gitignore" files, or skip directory entries matching the patterns from those files. Although the two remaining use cases for the FastWalkDir() function also did not need the function to detect and skip submodules or directories named ".git", the logic to do so was retained in the internal functions of the "tools" package. Specifically, the fastWalkWithExcludeFiles() function, which is called by the FastWalkDir() function, establishes two file path filters with the patterns ".git" and "**/.git", which are then passed down to the Walk() method of the "fastWalker" structure. That method checks whether the current directory entry's name matches either of the filter patterns, and if it does returns immediately without processing the entry any further. In addition, the Walk() method still also performs the check added in PR git-lfs#3686 to try to avoid traversing into Git submodules. As noted above, however, neither of the two remaining callers of the FastWalkDir() function require these checks, because they only need the function to traverse directory hierarchies which the Git LFS client has created, and which are specifically not Git working trees. The cleanupTmp() method of the Filesystem structure in our "fs" package uses the FastWalkDir() function to find stale files within the local temporary file storage directory. The EachObject() method of the same structure, meanwhile, uses the FastWalkDir() function to invoke a callback function for each object file in the local object storage directory. By default, these local storage directories are the ".git/lfs/tmp" and ".git/lfs/objects" directories. If the "lfs.storage" configuration option is set to a relative path, then these directories will be located somewhere within the ".git" directory, while if the option is set to an absolute path, our local storage directories will be located under that arbitrary location. In all these cases the Git LFS client creates and manages these local storage directories, so we can expect them not to contain ".git" directories or submodules. This is true even when a user has configured the "lfs.storage" option with an absolute path, since the client is still responsible for creating and managing "tmp" and "objects" directories within that arbitrary location. We therefore now simplify the FastWalkDir() function and the internal functions it invokes by removing the unnecessary checks for submodules and for directory entries named ".git". First, we update the fastWalkWithExcludeFiles() function so that does not initialize any file path filters, and we rename the function to fastWalkDir(). Next, we remove the Walk() method's "excludePaths" parameter, and alter the method so that it no longer skips directory entries based on whether their names match the patterns in that parameter's file path filters. We also eliminate the check in the Walk() method for directories containing ".git" directory entries, as this check's only purpose was to skip Git submodules within a working tree. As well, we revise the code comments relating to all of these functions and methods to reflect their new names and their simplified behaviours. To minimize the changes in this commit, however, we leave the names of the functions' parameters and internal variables intact, even though some of them still reflect the original design and the expectation that the functions would be used with Git working trees. In a subsequent commit in this PR we will then rename these variables and parameters, along with the "rootDir" field of the "fastWalker" structure, so that they more accurately represent the functions' current purpose and implementation. Finally, note that the "cleans only temp files and directories older than an hour" test in our "t/t-tempfile.sh" shell test script verifies the behaviour of cleanupTmp() function, which employs the FastWalkDir() function, while the TestFastWalkBasic() test function in our Go test suite directly exercises the fastWalkDir() internal function. Both tests thus provide some assurance that our changes in this commit have not introduced any unexpected regressions.
This is a rough cut of addressing #3797 by replacing FastWalkGitRepo with
git ls-files. For my use case, it cuts post-hook execution time down from ~40 seconds, to a few hundred ms.At this point, I am looking for a high-level review of the changes, and direction on wether I should proceed w/ completing this change.
There are a few things that need to be addressed:
I am also requesting any feedback that anyone familiar w/ the code might have. As it stands, git-lfs is unusable for our project without this (or a similar) fix, and I would like to verify that this fix is sound before we use it internally.