Skip to content

Sharing lfs.storage locations between distinct repositories. #4530

@ruro

Description

@ruro

See the discussion in issue #3635 for more context. But here is a short summary:

The lfs.storage config option allows the user to set a shared global location used to store lfs blobs.

lfs.storage is a way you can place the storage for your LFS data on a different disk or location

The current implementation of lfs.storage however, has a serious drawback in that it doesn't work correctly with git lfs prune. Attempting to prune a repository, which shares its lfs.storage with another repository, will result in data loss as the pruning repository will treat all "foreign" blobs as regular dangling blobs and thus delete them.


I hope to discuss 2 problems in this issue:

1. Documentation/Safety

If I didn't miss anything, the above problem is mentioned in the documentation in 2 places.
man git-lfs-prune:

DESCRIPTION
       ...
       Note:  you should not run git lfs prune if you have different repositories sharing the same custom storage directory; see git-lfs-config(1) for more details about
       lfs.storage option.

and
man git-lfs-config:

LIST OF OPTIONS
   ...
   Other settings
       ...
       ○   lfs.storage
           Allow override LFS storage directory. Non-absolute path is relativized to inside of Git repository directory (usually .git).
           Note: you should not run git lfs prune if you have different repositories sharing the same storage directory.
           Default: lfs in Git repository directory (usually .git/lfs).

Both notes say basically the same thing. In my opinion, they are inadequate, considering that missing this will most likely lead to data loss.

  • These should WARNINGs, not Notes.
  • The text should explicitly state, that this will lead to data loss. you should not run is way to ambiguous.
  • An extra warning should be added to the VERIFY REMOTE section in git-lfs-prune, explicitly stating that the checks/guarantees given by --verify-remote don't apply in this case.
  • The description of VERIFY REMOTE should also mention that the guarantees given by --verify-remote only apply to the files that are currently tracked by the current repository, and that orphaned or untracked lfs blobs will still be always deleted even if they don't exist in the remote.

Additionally, it's very easy to forget about this problem or accidentally run a prune command in the wrong repository. I think, that you should consider adding some runtime safety nets for running lfs prune, when lfs.storage is set. At least, print a WARNING to the tty and wait before deleting the blobs, if lfs.storage is set to an absolute path. Preferably, completely refuse to run a prune in such a case, unless a --force option or something is specified.

2. Better lfs.storage

Here's a relatively simple proposals on how to improve lfs.storage. Introduce a new lfs.storage_prefix option. Setting it would be equivalent to setting lfs.storage to {lfs.storage_prefix}/{unique_repository_identifier}, where unique_repository_identifier would be some string, which is unique for every repository. For example, it could be

  • an absolute path to the repo with all the / replaced with %
  • the string specified in a .git/lfs_uuid file which would be generated on the first lfs operation inside the repo

This would allow the user to specify a centralized location for their lfs storage without compromising the ability to prune it. Unfortunately, this solution doesn't allow for the deduplication of blobs, but that is much harder to implement without storing extra meta information for every blob in the storage.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions