Skip to content

Fix git lfs prune is deleting staged files in the index#5637

Merged
bk2204 merged 6 commits intogit-lfs:mainfrom
Anchorpoint-Software:prune-index
Feb 6, 2024
Merged

Fix git lfs prune is deleting staged files in the index#5637
bk2204 merged 6 commits intogit-lfs:mainfrom
Anchorpoint-Software:prune-index

Conversation

@jochenhz
Copy link
Contributor

This PR should fix #5636

Running git lfs prune right after git add will prune the indexed files from the LFS cache so that any commit will result in a rightfully rejected push. Even worse, doing multiple commits on changes of the same file will permanently delete LFS data that is not recoverable.

This change retains all files that are in the index by diffing the index against the HEAD.

@jochenhz jochenhz requested a review from a team as a code owner January 31, 2024 09:45
Copy link
Member

@bk2204 bk2204 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a situation with multiple worktrees we'll probably want to handle here (and some tests for that case would be helpful as well). In addition, it looks like we're failing some tests, too.

func pruneTaskGetRetainedIndex(gitscanner *lfs.GitScanner, retainChan chan string, errorChan chan error, waitg *sync.WaitGroup, sem *semaphore.Weighted) {
defer waitg.Done()

err := gitscanner.ScanIndex("HEAD", func(p *lfs.WrappedPointer, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to handle multiple indices, one for each worktree. It might therefore be helpful to call this from pruneTaskGetRetainedWorktree. If we don't do that, we might produce wrong results if I do git add in worktree A and then git lfs prune in another worktree.

@jochenhz
Copy link
Contributor Author

Good point, I'll take a look at this

@bk2204
Copy link
Member

bk2204 commented Jan 31, 2024

I do also want to just say thanks for sending a patch for this before we even got a chance to look at the issue. I'm overall really happy with the direction this is going in.

@jochenhz
Copy link
Contributor Author

I do also want to just say thanks for sending a patch for this before we even got a chance to look at the issue. I'm overall really happy with the direction this is going in.

Sure thing, I appreciate that you guys are very busy and would like to increase the likelihood of this being fixed.

I am actually a little bit stuck how to implement this for worktrees. The problem is that to use git diff-index I'd need to cd into the worktree and don't really know how to do that as I cannot change the config of the GitScanner being used

@bk2204
Copy link
Member

bk2204 commented Jan 31, 2024

I am actually a little bit stuck how to implement this for worktrees. The problem is that to use git diff-index I'd need to cd into the worktree and don't really know how to do that as I cannot change the config of the GitScanner being used

The way I would do this is to pass an additional argument to ScanIndex that would eventually get passed down to DiffIndex that, if it's not the empty string, is the working directory. That can be passed used as git -C WORKING-DIRECTORY diff-index to invoke Git in the proper location. If it is the empty string, then it's ignored and we don't change the working directory. Does that make sense to you?

@jochenhz
Copy link
Contributor Author

I am actually a little bit stuck how to implement this for worktrees. The problem is that to use git diff-index I'd need to cd into the worktree and don't really know how to do that as I cannot change the config of the GitScanner being used

The way I would do this is to pass an additional argument to ScanIndex that would eventually get passed down to DiffIndex that, if it's not the empty string, is the working directory. That can be passed used as git -C WORKING-DIRECTORY diff-index to invoke Git in the proper location. If it is the empty string, then it's ignored and we don't change the working directory. Does that make sense to you?

Makes perfect sense, thanks for your input. I'll update the PR once I'd like some feedback!

Running git lfs prune right after git add will prune the indexed files from the LFS cache so that any commit will result in a rightfully rejected push. Even worse, doing multiple commits on changes of the same file will permanently delete LFS data that is not recoverable.

This change retains all files that are in the indices of all worktrees by diffing each index against the corresponding HEAD.
@jochenhz
Copy link
Contributor Author

jochenhz commented Feb 1, 2024

@bk2204 I have force pushed a new solution taking worktrees into account

@bk2204
Copy link
Member

bk2204 commented Feb 2, 2024

It looks like we still have one or two failing tests here. Once that's fixed, I'll do a final review.

@jochenhz
Copy link
Contributor Author

jochenhz commented Feb 2, 2024

@bk2204 yeah, it was failing with git 2.0.0 as no worktrees were supported back then and the git.go RootDir() function was not handling missing working directories correctly. Should be all green now 🤞

Copy link
Member

@bk2204 bk2204 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Thanks again for the patch.

@bk2204 bk2204 merged commit df0eab4 into git-lfs:main Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Git LFS prune deletes staged objects

2 participants