Call migrate() BlobFn on every blob#4671
Merged
bk2204 merged 3 commits intogit-lfs:mainfrom Oct 12, 2021
ycongal-smile:4628_dup_files_v2
Merged
Call migrate() BlobFn on every blob#4671bk2204 merged 3 commits intogit-lfs:mainfrom ycongal-smile:4628_dup_files_v2
bk2204 merged 3 commits intogit-lfs:mainfrom
ycongal-smile:4628_dup_files_v2
Conversation
Duplicated files = same name and same content. This reproduce issue #4628
Caching is now done inside the BlobFn function instead of in the caller. To do this, the OID of the original blob is given as an argument to BlobFn. Adapt migrate() calls to this : * For "migrate info", a set of "seen" OID is cached to avoid counting them twice. * For "migrate import", the cleaned blob is cached and reused. Fixes #4628
TestRewriterDoesntVisitUnchangedSubtrees:
* Changed to TestRewriterDoesVisitUnchangedSubtrees (BlobFn will now be
called on every blobs whether they changed or not)
* Assert changed to match that.
TestRewriterIgnoresPathsThatDontMatchFilter:
* Update "seen" count: "subdir/b.txt" should now be seen twice.
TestHistoryRewriterCallbacksSubtrees:
* Update asserted call order. BlobFn("a.txt") is now called twice.
Contributor
Author
|
(Fixed format errors) |
Contributor
Author
|
Thanks! |
bk2204
added a commit
to bk2204/git-lfs
that referenced
this pull request
Oct 25, 2021
Call migrate() BlobFn on every blob
bk2204
added a commit
to bk2204/git-lfs
that referenced
this pull request
Jan 18, 2022
When we cache files, do so on the full path instead of just the directory entry. This means that when we have an identical file with the same name in two different direectories, we distinguish between the two paths and ensure both are added to .gitattributes. This is an alternate solution to git-lfs#4671 which should perform better. For compmarison, with a clone of Git's main repository with the following command, we get: git lfs migrate import --everything --include="*.h": * v3.0.1 (broken): 608s user, 53s system, 5:34 total * v3.0.2 (fixed): 13435s user, 1255s system, 1:43:17 total * this commit (fixed): 716s user, 67s system, 6:59 total This is a much better performance characteristic for equivalent results. Preserve the integration from the earlier attempt at fixing this plus add an additional one. Avoid using assert_pointer in the new test because that helper doesn't always work correctly when there are two files with the same file name.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Caching is now done inside the BlobFn function instead of in the caller. To do this, the OID of the original blob is given as an argument to BlobFn.
Adapt migrate() calls to this :
Fixes #4628
Also add a test case for #4628 and adapt unittest to follow the BlobFn change