Skip to content

Fix migrate import speed regression#4813

Merged
bk2204 merged 3 commits intogit-lfs:mainfrom
bk2204:slow-migrate-import
Jan 18, 2022
Merged

Fix migrate import speed regression#4813
bk2204 merged 3 commits intogit-lfs:mainfrom
bk2204:slow-migrate-import

Conversation

@bk2204
Copy link
Member

@bk2204 bk2204 commented Jan 12, 2022

When we cache files, do so on the full path instead of just the directory entry. This means that when we have an identical file with the same name in two different direectories, we distinguish between the two paths and ensure both are added to .gitattributes.

This is an alternate solution to #4671 which should perform better. For compmarison, with a clone of Git's main repository with the following command, we get:

git lfs migrate import --everything --include="*.h":

  • v3.0.1 (broken): 608s user, 53s system, 5:34 total
  • v3.0.2 (fixed): 13435s user, 1255s system, 1:43:17 total
  • this commit (fixed): 716s user, 67s system, 6:59 total

This is a much better performance characteristic for equivalent results.

Preserve the integration from the earlier attempt at fixing this plus add an additional one. Avoid using assert_pointer in the new test because that helper doesn't always work correctly when there are two files with the same file name.

Fixes #4750

This caused a substantial performance regression.  We should solve this
in a different way.

This reverts commit 791690a.
This caused a substantial performance regression.  We should solve this
in a different way.

This reverts commit 3acbec2.
@bk2204 bk2204 force-pushed the slow-migrate-import branch from 86f3671 to 9edcbc7 Compare January 13, 2022 21:56
@bk2204 bk2204 marked this pull request as ready for review January 14, 2022 14:28
@bk2204 bk2204 requested a review from a team as a code owner January 14, 2022 14:28
Copy link
Member

@chrisd8088 chrisd8088 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be an incredibly tidy and elegant solution to a long-standing problem, along with reversing the regression. Wow, thank you! 🙇

I believe this may also resolve #4167, from my own testing. I don't know if it's worth adding another test case, though; I think the one you've added here might be sufficient, although it deals with copied files, not moved ones.

One note: I notice that the commit description and this PR's description mention PR #4176, but I think the PR in question is actually #4671. It might be good to update at least the commit description, for future clarity. (Note that that's different again from #4167 I mentioned above, which is a still-open bug report.)

Again, thank you for this. ❤️

When we cache files, do so on the full path instead of just the
directory entry.  This means that when we have an identical file with
the same name in two different direectories, we distinguish between the
two paths and ensure both are added to .gitattributes.

This is an alternate solution to git-lfs#4671 which should perform better.  For
compmarison, with a clone of Git's main repository with the following
command, we get:

git lfs migrate import --everything --include="*.h":

* v3.0.1      (broken):  608s user,   53s system,    5:34 total
* v3.0.2      (fixed): 13435s user, 1255s system, 1:43:17 total
* this commit (fixed):   716s user,   67s system,    6:59 total

This is a much better performance characteristic for equivalent results.

Preserve the integration from the earlier attempt at fixing this plus
add an additional one.  Avoid using assert_pointer in the new test
because that helper doesn't always work correctly when there are two
files with the same file name.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

git lfs migrate import performance issues with version 3.0.2

2 participants