shared_cache.md

Shared LFS object cache proposal

Problem

Git LFS stores all objects in a per-repository directory (.git/lfs/objects/). The lfs.storage setting allows multiple repositories to share a single storage directory, but this creates a safety problem: git lfs prune cannot determine which repositories depend on which objects, so pruning risks data loss from repositories that have not yet pushed their objects to a remote.

For users with many clones of the same or related repositories (e.g., CI workers, developers working on forks), significant disk space is wasted storing duplicate copies of the same LFS objects.

This is a long-standing request in the git-lfs project. Related issues include:

#4530: Sharing lfs.storage locations between distinct repositories — proposes safeguards or per-repo subdirectories under a shared root, but does not separate downloaded objects from locally-created ones.
#1875: Common object store — requests shared local object storage across repos.
#2147: Disable local LFS cache — requests avoiding duplication of large objects.

The existing lfs.storage option conflates the cache with the repo's storage, with no concept of a safe-to-discard, read-through cache layer. The git lfs prune command exists to evict objects that are pushed to the remote server, but it is not run automatically, and it is unsafe if multiple repositories share the same storage. This proposal addresses that gap.

Design

The solution separates LFS objects into two tiers:

Per-repository storage (.git/lfs/objects/): Objects created locally through git add. These may not yet be on any remote server and are not safe to remove until pushed.
Shared cache (lfs.cachedir): Objects downloaded from remote servers. Every object in this directory came from a remote and can be re-downloaded, making it always safe to evict.

Object lifecycle

Locally created objects (git add / clean filter):

working tree -> clean filter -> per-repo .git/lfs/objects/

Downloaded objects (checkout / fetch / pull):

remote server -> shared cache lfs.cachedir/objects/

After push (when lfs.cachetransferonpush is enabled):

per-repo .git/lfs/objects/ -> push to remote -> move to shared cache

Object lookup

When Git LFS needs to read an object (smudge, upload, etc.), it checks:

Per-repository storage first
Shared cache second
Remote server (download) if not found locally

This priority ensures that locally-created objects always take precedence.

Cache safety

The shared cache is always safe to prune or evict entirely because:

Every object in the cache was either downloaded from a remote server or moved there after a successful push.
Git LFS already handles missing objects gracefully:
- git status and git diff do not read LFS object files (they work with pointer blobs in the git object database).
- git add reads from the working tree, not LFS storage.
- The smudge filter writes the pointer back when an object is missing and reports "content not local, use fetch to download."
- Files already checked out in the working tree are unaffected by cache eviction.

The only consequence of cache eviction is that checkout or restore operations may need to re-download objects from the remote, which is the same behavior as when a checkout is created from an old reference that contains LFS objects that have not yet been downloaded.

Per-repository prune

git lfs prune operates only on per-repository storage and does not touch the shared cache. This is correct because:

The per-repo prune logic (retain unpushed, recent refs, worktrees, stash) only applies to objects that might not be on a remote yet.
Cache objects are, by definition, already on a remote.

Interaction with existing features

lfs.storage: The shared cache is independent of lfs.storage. When both are set, lfs.storage controls where locally-created objects go, and lfs.cachedir controls where downloads go.

Reference directories (GIT_ALTERNATE_OBJECT_DIRECTORIES, .git/objects/info/alternates): The existing LinkOrCopyFromReference mechanism copies objects from alternates into per-repo storage. This is correct for git clone --reference where the referenced repo may contain unpushed objects. The shared cache is a separate, complementary mechanism that reads directly from the cache without copying.

Automatic eviction

Note: this section describes a possible eviction implementation. The proposal can first be discussed on its merits and on the features it provides to users, even if the actual internal implementation may follow a different path later.

Credits: this high-level eviction strategy takes inspiration from ccache's cache management approach, but the proposed implementation is original: it relies on a single memory-mapped file with atomic counters, instead of multiple files with locking operations.

By default, the shared cache grows without limit. When lfs.cachemaxsize or lfs.cachemaxfiles is set, the cache uses approximate LRU (Least Recently Used) eviction via the cacheevict module:

A persistent .cache-sizes file in the cache objects directory tracks total size, file count, and per-partition counters using mmap'd atomic operations.
After each download completes, eviction is triggered if limits are exceeded. The tracking hooks are centralized in the common transfer adapter base layer, so all download methods (basic HTTP, SSH, custom transfer agents) are covered automatically. Objects moved to the cache by other paths (post-push via lfs.cachetransferonpush, or git lfs prune --move-to-cache) are also tracked. The partition with the most files is selected and its oldest files (by mtime) are deleted until the partition is within budget.
Only one partition is processed per automatic eviction pass. By default 256 partitions are used, so only 1/256 of the objects in the cache are listed and processed.
Hit/miss statistics are tracked: cache hits (objects read from cache) and misses (objects downloaded) are visible via git lfs env.
Reading a cached object updates its mtime, keeping frequently-used objects from being evicted.

The eviction handler is cross-process safe: multiple git-lfs processes sharing the same cache coordinate via atomic operations on the mmap'd file, with a non-blocking eviction lock that detects dead processes.

The cacheevict module is only activated when at least one limit is configured. When neither lfs.cachemaxsize nor lfs.cachemaxfiles is set (the default), no mmap file is created and the shared cache operates without any eviction or statistics tracking.

Note: the mmap'd coordination file is designed for single-machine use. A cache directory on a network filesystem (NFS, CIFS) works fine when accessed from a single machine, but the eviction limits must not be set if the same cache directory is shared between multiple machines, as cross-machine mmap coherency is not guaranteed. In that case, cache eviction should be handled externally, or by running git lfs cache trim periodically.

Concurrent access

Multiple processes may read from and write to the shared cache simultaneously. Safety is ensured by:

Downloads use a temporary file in the cache's tmp/ directory, then atomically rename to the final location. If two processes download the same object, the second rename either succeeds (overwriting with identical content) or silently succeeds if the file already exists.
The cache temp directory is on the same filesystem as the cache objects directory, ensuring os.Rename is atomic.
The eviction handler uses lock-free atomic counters for size and file tracking, and a non-blocking CAS lock for eviction coordination.
Reads are naturally safe since LFS objects are immutable.

Configuration

`lfs.cachedir`

Absolute path to a shared cache directory for downloaded LFS objects. Multiple repositories may share the same cache directory. When not set, all objects are stored in per-repository storage as before.

The cache uses the same directory layout as per-repo storage:

lfs.cachedir/
  objects/
    ab/
      cd/
        abcd1234...  (full OID)
  tmp/               (download temp files)

`lfs.cachetransferonpush`

Boolean, default false. When true and lfs.cachedir is set, objects that are successfully pushed to a remote are moved from per-repo storage into the shared cache.

This is useful for CI workers that continuously push data and would otherwise accumulate unbounded per-repo storage.

The default value is set to false to maintain git's inherent distributed backup property: even if the remote git server is lost, all LFS files would have been uploaded by at least one developer of the team, and that developer's copy is only removed from per-repo storage when they explicitly opt in (via git lfs prune, or by setting this value to true to automate the move on every push).

`lfs.cachemaxsize` and `lfs.cachemaxfiles`

See "Automatic eviction" section.

Migrating existing repositories

To move existing LFS objects from per-repo storage into the shared cache, use git lfs prune --move-to-cache. This reuses the standard prune retain logic: objects that are referenced by current/recent/unpushed refs are kept in per-repo storage, while everything else is moved to the cache.

To move all pushed objects (leaving only unpushed/stashed/index objects in per-repo storage):

git lfs prune --move-to-cache --force

Use --dry-run to preview what would be moved without making changes.

Cache management

The git lfs cache command provides direct cache management:

git lfs cache stats — show cache size, file count, and hit/miss statistics
git lfs cache clear — remove all cached files
git lfs cache trim — remove oldest files by size, count, or age limits

These commands work by scanning the directory tree and do not require the mmap-based eviction handler. This makes them suitable for caches on network filesystems or in multi-machine environments where automatic eviction cannot be enabled.

When no flags are passed, trim falls back to the configured lfs.cachemaxsize/lfs.cachemaxfiles limits, making it suitable as a periodic cron job for full-rescan trimming.

Local clones

When lfs.cachedir is configured per-repo (not globally), a local clone (git clone --local, --shared, or --reference) will not inherit the setting. Downloaded objects that only exist in the source repo's cache will not be found by the new clone, and checkout will skip those files (writing pointers instead, as with any missing LFS object).

Setting lfs.cachedir in the global git config (~/.gitconfig) avoids this entirely. For per-repo configurations, two optional mechanisms are proposed (implemented in separate commits so that either can be included or reverted independently):

Option A: Alternate discovery

Git creates an alternates file pointing to the source repo's objects directory. Git LFS can read the source repo's lfs.cachedir config and add its objects directory to the reference dirs, allowing LinkOrCopyFromReference to find and copy cached objects into the new clone's per-repo storage.

This is non-invasive (no config changes to the new clone) but only works while the alternates file exists, and copies objects into per-repo storage rather than reading directly from the cache.

Option B: Config propagation

On the first git-lfs command in the new clone, if lfs.cachedir is not already configured, git-lfs reads the source repo's setting from the alternates and sets it in the new clone's local git config. This gives the new clone full shared cache support permanently.

This modifies the new clone's .git/config automatically, which may be unexpected in some workflows. It is idempotent: it only sets the config if not already configured at any level.

Experimental implementation

Support for the shared cache feature as described in this proposal is implemented in an experimental git-lfs version for testing and is pending submission upstream after this proposal is approved.

It is available at:

https://github.com/InSimo/git-lfs/tree/insimo

It includes more changes than necessary for this proposal. The quality of the implementation is not adequate for submission as-is. The code changes, tests and documentation were created with the assistance of AI agents. The design, implementation, and tests were directed and reviewed by human developers, but only partially for now. It should be considered experimental only. It may be modified or rewritten based on received feedback on the proposal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared LFS object cache proposal

Problem

Design

Object lifecycle

Object lookup

Cache safety

Per-repository prune

Interaction with existing features

Automatic eviction

Concurrent access

Configuration

`lfs.cachedir`

`lfs.cachetransferonpush`

`lfs.cachemaxsize` and `lfs.cachemaxfiles`

Migrating existing repositories

Cache management

Local clones

Option A: Alternate discovery

Option B: Config propagation

Experimental implementation

FilesExpand file tree

shared_cache.md

Latest commit

History

shared_cache.md

File metadata and controls

Shared LFS object cache proposal

Problem

Design

Object lifecycle

Object lookup

Cache safety

Per-repository prune

Interaction with existing features

Automatic eviction

Concurrent access

Configuration

lfs.cachedir

lfs.cachetransferonpush

lfs.cachemaxsize and lfs.cachemaxfiles

Migrating existing repositories

Cache management

Local clones

Option A: Alternate discovery

Option B: Config propagation

Experimental implementation

`lfs.cachedir`

`lfs.cachetransferonpush`

`lfs.cachemaxsize` and `lfs.cachemaxfiles`