Conversation
This ensures a file is closed if there’s an IO error while hashing the contents
|
Good catch finding the part where I leaked a file descriptor on that error return path ( 2f6c3be ). The only problem with using the defer f.Close() trick is that you'll hold to all of the file descriptors in that for loop and then close them all at once on function exit. Imagine if a user has thousands of tracked LFS files... we could potentially exhaust the pool of available file descriptors. I suggest closing them as 'early' as possible to avoid this. |
|
Unless I'm missing something, I think the only unsurprising (from the user's perspective) policy choices regarding corrupted objects is to simply detect them, or to overwrite them with good versions from remotes. In some cases, the corrupted objects might not exist on any of the remotes, and deleting the local corrupted objects feels aggressive. Corrupted photos might only have minor aberations, mp3s a tiny glitch... Another concern I have is the interaction of an aggressive-delete-on-corruption policy with the kind of ephemeral data corruption that can occur on RAID hardware that doesn't validate block/stripe checksums on-the-fly. At time t=x, a 'git-lfs fsck' read might get serviced by a RAID disk suffering from bitrot in some of the LFS object data. This would cause the fsck to fail. But at time t=x+𝛿, the same 'git-lfs fsck' read might pull data from a different RAID disk that doesn't suffer from bitrot. If 'fsck' warned the user of data corruption, it still allows for an opportunity for a raid scrub to cleanse the transient bitrot errors. But if fsck eagerly deletes the corruption, and the data doesn't exist anywhere else, then I think we've exacerbated things. |
|
I'm cool with swapping the default to the dry run, and requiring an addition flag to delete the corrupted objects. I don't want it to log it as an exception like in the original implementation, though. I think it should list them out, along with a couple next steps:
|
|
This is on hold until #306 is merged and we can properly test this! |
|
link #262 |
This builds on #280 by @zeroshirts, adding an fsck command. Right now, all it does is makes sure the
.git/lfs/objectsfiles all match the OID (a sha256 hash of the contents). By default, it removes any bad files so that Git LFS can re-download them the next time they're referenced.Questions:
Is
fsckthe right name? It would be cool to eventually have something like Homebrew'sdoctorcommand for checking multiple things besides file system stuff.Is deleting the corrupt objects the right default? I'm not sure what we could do to fix files.
Observations (not necessarily blockers for this PR):
We should think hard about how
git fsckdecides what Git refs to scan. It currently only looks in HEAD and the index. Though I can come up with other useful scenarios:git lfs fsck --commit abcdef.git/lfs/objects:git lfs fsck --allIt'd also be nice if there was some consistency in how options work with
git lfs fsckand potentially similar commands likegit lfs get.Also, we may want to build a higher level scanner in the
lfspackage, instead of callinglfs.ScanRefs()andlfs.ScanIndex()directly. Something that we can re-use in other commands, without having to export internal things like*lfs.wrappedPointer.