Fsck command by technoweenie · Pull Request #290 · git-lfs/git-lfs

technoweenie · 2015-05-08T17:08:19Z

This builds on #280 by @zeroshirts, adding an fsck command. Right now, all it does is makes sure the .git/lfs/objects files all match the OID (a sha256 hash of the contents). By default, it removes any bad files so that Git LFS can re-download them the next time they're referenced.

Questions:

Is fsck the right name? It would be cool to eventually have something like Homebrew's doctor command for checking multiple things besides file system stuff.

Is deleting the corrupt objects the right default? I'm not sure what we could do to fix files.

Observations (not necessarily blockers for this PR):

We should think hard about how git fsck decides what Git refs to scan. It currently only looks in HEAD and the index. Though I can come up with other useful scenarios:

Scan a commit by name: git lfs fsck --commit abcdef
Scan everything in .git/lfs/objects: git lfs fsck --all
Scan all commits.

It'd also be nice if there was some consistency in how options work with git lfs fsck and potentially similar commands like git lfs get.

Also, we may want to build a higher level scanner in the lfs package, instead of calling lfs.ScanRefs() and lfs.ScanIndex() directly. Something that we can re-use in other commands, without having to export internal things like *lfs.wrappedPointer.

cfg := &lfs.ScanConfig{
  StartCommit: "abcdef",
  EndCommit: "ghijkl",
}

err := lfs.Scan(cfg, func(path string, p *lfs.Pointer) error {
  ...
})

…mand

This ensures a file is closed if there’s an IO error while hashing the contents

… running in ci

christophershirk · 2015-05-09T04:16:56Z

Good catch finding the part where I leaked a file descriptor on that error return path ( 2f6c3be ). The only problem with using the defer f.Close() trick is that you'll hold to all of the file descriptors in that for loop and then close them all at once on function exit. Imagine if a user has thousands of tracked LFS files... we could potentially exhaust the pool of available file descriptors. I suggest closing them as 'early' as possible to avoid this.

christophershirk · 2015-05-09T04:48:02Z

Unless I'm missing something, I think the only unsurprising (from the user's perspective) policy choices regarding corrupted objects is to simply detect them, or to overwrite them with good versions from remotes. In some cases, the corrupted objects might not exist on any of the remotes, and deleting the local corrupted objects feels aggressive. Corrupted photos might only have minor aberations, mp3s a tiny glitch...

Another concern I have is the interaction of an aggressive-delete-on-corruption policy with the kind of ephemeral data corruption that can occur on RAID hardware that doesn't validate block/stripe checksums on-the-fly. At time t=x, a 'git-lfs fsck' read might get serviced by a RAID disk suffering from bitrot in some of the LFS object data. This would cause the fsck to fail. But at time t=x+𝛿, the same 'git-lfs fsck' read might pull data from a different RAID disk that doesn't suffer from bitrot. If 'fsck' warned the user of data corruption, it still allows for an opportunity for a raid scrub to cleanse the transient bitrot errors. But if fsck eagerly deletes the corruption, and the data doesn't exist anywhere else, then I think we've exacerbated things.

technoweenie · 2015-05-12T18:02:37Z

I'm cool with swapping the default to the dry run, and requiring an addition flag to delete the corrupted objects. I don't want it to log it as an exception like in the original implementation, though. I think it should list them out, along with a couple next steps:

Purge the corrupt files
Purge and re-download from a Git LFS server. This flag should probably optionally take a remote name if you don't want to get it from origin.

technoweenie · 2015-05-19T21:22:20Z

This is on hold until #306 is merged and we can properly test this!

michael-k · 2015-06-07T21:01:54Z

link #262

zeroshirts and others added 12 commits May 4, 2015 00:33

Fix script/test errors

087084f

Add LFS fsck command

a05e331

Minor fix for unit test

e934a82

code review changes

1c8c54d

Merge commit '1c8c54d78373f8b7679a0bbc81ec8b60588c67ed' into fsck-com…

2de64f7

…mand

actually test the fsck command

54d4f8c

defer Close() right after opening a file

2f6c3be

This ensures a file is closed if there’s an IO error while hashing the contents

write fsck results to stdout instead of logging an error

452b763

delete corrupt lfs objects so they re-download on the next checkout

e3127f1

add fsck --dry-run

ae63570

store the scanned pointers in a map to remove dupes

0d5ed5c

manually write the object files since the clean filter probably isn't…

33a1bf4

… running in ci

technoweenie mentioned this pull request May 8, 2015

Add LFS fsck feature #280

Closed

michael-k mentioned this pull request May 27, 2015

Release v0.5.2 #281

Closed

38 tasks

michael-k mentioned this pull request Jun 7, 2015

Fsck command #373

Merged

technoweenie closed this Jun 19, 2015

technoweenie deleted the fsck-command branch June 19, 2015 23:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fsck command#290

Fsck command#290
technoweenie wants to merge 12 commits intomasterfrom
fsck-command

technoweenie commented May 8, 2015

Uh oh!

christophershirk commented May 9, 2015

Uh oh!

christophershirk commented May 9, 2015

Uh oh!

technoweenie commented May 12, 2015

Uh oh!

technoweenie commented May 19, 2015

Uh oh!

michael-k commented Jun 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

technoweenie commented May 8, 2015

Uh oh!

christophershirk commented May 9, 2015

Uh oh!

christophershirk commented May 9, 2015

Uh oh!

technoweenie commented May 12, 2015

Uh oh!

technoweenie commented May 19, 2015

Uh oh!

michael-k commented Jun 7, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants