See the discussion in issue #3635 for more context. But here is a short summary:
The lfs.storage config option allows the user to set a shared global location used to store lfs blobs.
lfs.storage is a way you can place the storage for your LFS data on a different disk or location
The current implementation of lfs.storage however, has a serious drawback in that it doesn't work correctly with git lfs prune. Attempting to prune a repository, which shares its lfs.storage with another repository, will result in data loss as the pruning repository will treat all "foreign" blobs as regular dangling blobs and thus delete them.
I hope to discuss 2 problems in this issue:
1. Documentation/Safety
If I didn't miss anything, the above problem is mentioned in the documentation in 2 places.
man git-lfs-prune:
DESCRIPTION
...
Note: you should not run git lfs prune if you have different repositories sharing the same custom storage directory; see git-lfs-config(1) for more details about
lfs.storage option.
and
man git-lfs-config:
LIST OF OPTIONS
...
Other settings
...
○ lfs.storage
Allow override LFS storage directory. Non-absolute path is relativized to inside of Git repository directory (usually .git).
Note: you should not run git lfs prune if you have different repositories sharing the same storage directory.
Default: lfs in Git repository directory (usually .git/lfs).
Both notes say basically the same thing. In my opinion, they are inadequate, considering that missing this will most likely lead to data loss.
- These should
WARNINGs, not Notes.
- The text should explicitly state, that this will lead to data loss.
you should not run is way to ambiguous.
- An extra warning should be added to the
VERIFY REMOTE section in git-lfs-prune, explicitly stating that the checks/guarantees given by --verify-remote don't apply in this case.
- The description of
VERIFY REMOTE should also mention that the guarantees given by --verify-remote only apply to the files that are currently tracked by the current repository, and that orphaned or untracked lfs blobs will still be always deleted even if they don't exist in the remote.
Additionally, it's very easy to forget about this problem or accidentally run a prune command in the wrong repository. I think, that you should consider adding some runtime safety nets for running lfs prune, when lfs.storage is set. At least, print a WARNING to the tty and wait before deleting the blobs, if lfs.storage is set to an absolute path. Preferably, completely refuse to run a prune in such a case, unless a --force option or something is specified.
2. Better lfs.storage
Here's a relatively simple proposals on how to improve lfs.storage. Introduce a new lfs.storage_prefix option. Setting it would be equivalent to setting lfs.storage to {lfs.storage_prefix}/{unique_repository_identifier}, where unique_repository_identifier would be some string, which is unique for every repository. For example, it could be
- an absolute path to the repo with all the
/ replaced with %
- the string specified in a
.git/lfs_uuid file which would be generated on the first lfs operation inside the repo
This would allow the user to specify a centralized location for their lfs storage without compromising the ability to prune it. Unfortunately, this solution doesn't allow for the deduplication of blobs, but that is much harder to implement without storing extra meta information for every blob in the storage.
See the discussion in issue #3635 for more context. But here is a short summary:
The
lfs.storageconfig option allows the user to set a shared global location used to storelfsblobs.The current implementation of
lfs.storagehowever, has a serious drawback in that it doesn't work correctly withgit lfs prune. Attempting to prune a repository, which shares itslfs.storagewith another repository, will result in data loss as the pruning repository will treat all "foreign" blobs as regular dangling blobs and thus delete them.I hope to discuss 2 problems in this issue:
1. Documentation/Safety
If I didn't miss anything, the above problem is mentioned in the documentation in 2 places.
man git-lfs-prune:DESCRIPTION ... Note: you should not run git lfs prune if you have different repositories sharing the same custom storage directory; see git-lfs-config(1) for more details about lfs.storage option.and
man git-lfs-config:LIST OF OPTIONS ... Other settings ... ○ lfs.storage Allow override LFS storage directory. Non-absolute path is relativized to inside of Git repository directory (usually .git). Note: you should not run git lfs prune if you have different repositories sharing the same storage directory. Default: lfs in Git repository directory (usually .git/lfs).Both notes say basically the same thing. In my opinion, they are inadequate, considering that missing this will most likely lead to data loss.
WARNINGs, notNotes.you should not runis way to ambiguous.VERIFY REMOTEsection ingit-lfs-prune, explicitly stating that the checks/guarantees given by--verify-remotedon't apply in this case.VERIFY REMOTEshould also mention that the guarantees given by--verify-remoteonly apply to the files that are currently tracked by the current repository, and that orphaned or untrackedlfsblobs will still be always deleted even if they don't exist in the remote.Additionally, it's very easy to forget about this problem or accidentally run a
prunecommand in the wrong repository. I think, that you should consider adding some runtime safety nets for runninglfs prune, whenlfs.storageis set. At least, print a WARNING to the tty and wait before deleting the blobs, iflfs.storageis set to an absolute path. Preferably, completely refuse to run aprunein such a case, unless a--forceoption or something is specified.2. Better
lfs.storageHere's a relatively simple proposals on how to improve
lfs.storage. Introduce a newlfs.storage_prefixoption. Setting it would be equivalent to settinglfs.storageto{lfs.storage_prefix}/{unique_repository_identifier}, whereunique_repository_identifierwould be some string, which is unique for every repository. For example, it could be/replaced with%.git/lfs_uuidfile which would be generated on the firstlfsoperation inside the repoThis would allow the user to specify a centralized location for their
lfsstorage without compromising the ability to prune it. Unfortunately, this solution doesn't allow for the deduplication of blobs, but that is much harder to implement without storing extra meta information for every blob in the storage.