Skip to content

Fsck command#290

Closed
technoweenie wants to merge 12 commits intomasterfrom
fsck-command
Closed

Fsck command#290
technoweenie wants to merge 12 commits intomasterfrom
fsck-command

Conversation

@technoweenie
Copy link
Contributor

This builds on #280 by @zeroshirts, adding an fsck command. Right now, all it does is makes sure the .git/lfs/objects files all match the OID (a sha256 hash of the contents). By default, it removes any bad files so that Git LFS can re-download them the next time they're referenced.

Questions:

Is fsck the right name? It would be cool to eventually have something like Homebrew's doctor command for checking multiple things besides file system stuff.

Is deleting the corrupt objects the right default? I'm not sure what we could do to fix files.

Observations (not necessarily blockers for this PR):

We should think hard about how git fsck decides what Git refs to scan. It currently only looks in HEAD and the index. Though I can come up with other useful scenarios:

  • Scan a commit by name: git lfs fsck --commit abcdef
  • Scan everything in .git/lfs/objects: git lfs fsck --all
  • Scan all commits.

It'd also be nice if there was some consistency in how options work with git lfs fsck and potentially similar commands like git lfs get.

Also, we may want to build a higher level scanner in the lfs package, instead of calling lfs.ScanRefs() and lfs.ScanIndex() directly. Something that we can re-use in other commands, without having to export internal things like *lfs.wrappedPointer.

cfg := &lfs.ScanConfig{
  StartCommit: "abcdef",
  EndCommit: "ghijkl",
}

err := lfs.Scan(cfg, func(path string, p *lfs.Pointer) error {
  ...
})

@technoweenie technoweenie mentioned this pull request May 8, 2015
@christophershirk
Copy link

Good catch finding the part where I leaked a file descriptor on that error return path ( 2f6c3be ). The only problem with using the defer f.Close() trick is that you'll hold to all of the file descriptors in that for loop and then close them all at once on function exit. Imagine if a user has thousands of tracked LFS files... we could potentially exhaust the pool of available file descriptors. I suggest closing them as 'early' as possible to avoid this.

@christophershirk
Copy link

Unless I'm missing something, I think the only unsurprising (from the user's perspective) policy choices regarding corrupted objects is to simply detect them, or to overwrite them with good versions from remotes. In some cases, the corrupted objects might not exist on any of the remotes, and deleting the local corrupted objects feels aggressive. Corrupted photos might only have minor aberations, mp3s a tiny glitch...

Another concern I have is the interaction of an aggressive-delete-on-corruption policy with the kind of ephemeral data corruption that can occur on RAID hardware that doesn't validate block/stripe checksums on-the-fly. At time t=x, a 'git-lfs fsck' read might get serviced by a RAID disk suffering from bitrot in some of the LFS object data. This would cause the fsck to fail. But at time t=x+𝛿, the same 'git-lfs fsck' read might pull data from a different RAID disk that doesn't suffer from bitrot. If 'fsck' warned the user of data corruption, it still allows for an opportunity for a raid scrub to cleanse the transient bitrot errors. But if fsck eagerly deletes the corruption, and the data doesn't exist anywhere else, then I think we've exacerbated things.

@technoweenie
Copy link
Contributor Author

I'm cool with swapping the default to the dry run, and requiring an addition flag to delete the corrupted objects. I don't want it to log it as an exception like in the original implementation, though. I think it should list them out, along with a couple next steps:

  1. Purge the corrupt files
  2. Purge and re-download from a Git LFS server. This flag should probably optionally take a remote name if you don't want to get it from origin.

@technoweenie
Copy link
Contributor Author

This is on hold until #306 is merged and we can properly test this!

@michael-k michael-k mentioned this pull request May 27, 2015
38 tasks
@michael-k michael-k mentioned this pull request Jun 7, 2015
@michael-k
Copy link
Contributor

link #262

@technoweenie technoweenie deleted the fsck-command branch June 19, 2015 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants