Avoid pruning when identical files both match and do not match lfs.fetchexclude#4973
Merged
chrisd8088 merged 1 commit intogit-lfs:mainfrom Apr 28, 2022
Merged
Conversation
In commit d2221dc of PR git-lfs#2851 the "git lfs prune" command was changed to respect the "lfs.fetchexclude" configuration option such that objects would always be pruned if they were referenced by files whose paths matched one of the patterns in the configuration option (unless they were referenced by an unpushed commit). However, this filter is applied using the GitScanner.ScanRef() method, which indirectly utilizes the internal scanRefsToChan() function, and that function only visits unique OIDs a single time each, even if they are referenced by multiple tree entries (i.e., if there are multiple files with the same content). This means that if an LFS object appears in both a file that matches a pattern from "lfs.fetchexclude" and in a file that does not match, the object may be pruned if the file path seen during the scan is the matching one regardless of whether the non-matching file would otherwise have its object retained. To resolve this we change the pruneTaskGetRetainedAtRef() function to use the GitScanner.ScanTree() method instead of ScanRef(), because ScanTree() visits all file paths in each commit. We need to pass our callback to the ScanTree() method so that we can save all non-matching files' OIDs into our list of OIDs to be retained; therefore we need to add a callback argument to ScanTree() in the same manner as is done for ScanRef() and various other GitScanner methods. We also introduce additional checks in our "prune all excluded paths" test to ensure that we always retain objects when they appear in a commit to be retained and at least one of the files referencing that object ID does not match the "lfs.fetchexclude" filter.
bk2204
approved these changes
Apr 28, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In PR #2851 the
git lfs prunecommand was changed to respect thelfs.fetchexcludeconfiguration option such that objects would always be pruned if they were referenced by files whose paths matched one of the patterns in the configuration option (unless they were referenced by an unpushed commit).However, this filter is applied using the
GitScanner.ScanRef()method, which indirectly utilizes the internalscanRefsToChan()function, and that function only visits unique Git LFS OIDs a single time each, even if they are referenced by multiple tree entries (i.e., if there are multiple files with the same content).This means that if an LFS object appears in both a file that matches a pattern from
lfs.fetchexcludeand in a file thatdoes not match, the object may be pruned if the file path seen during the scan is the matching one regardless of whether the non-matching file would otherwise have its object retained.
To resolve this we change the
pruneTaskGetRetainedAtRef()function to use theGitScanner.ScanTree()method instead ofScanRef(), becauseScanTree()visits all file paths in each commit. We need to pass our callback to theScanTree()method so that we can save all non-matching files' OIDs into our list of OIDs to be retained; therefore we need to add a callback argument toScanTree()in the same manner as is done forScanRef()and various otherGitScannermethods.We also introduce additional checks in our
"prune all excluded paths"test to ensure that we always retain objects when they appear in a commit to be retained and at least one of the files referencing that object ID does not match thelfs.fetchexcludefilter./cc @larsxschneider as author of #2851.