Skip to content

'delay' filter-process capability #2466

@ttaylorr

Description

@ttaylorr

Goal

Improve the speed at which Git LFS performs checkouts with a majority of
uncached objects.

Background

Late last year, we added support for the new process filter, added in Git
v2.11.0, as a means to convert files during check-in/out operations. This
greatly improved the performance of both operations, most notably on Windows,
where the overhead of spawning git-lfs-clean(1) or git-lfs-smudge(1) for
each file in the operation incurred significant overhead. (See:
#1329 (comment)).

We noted, however, that this protocol (as is the case with 'smudge' and 'clean'
filters) has an opportunity to improve its performance with uncached items.
Consider the following:

  1. Git does a checkout operation after a fetch or clone on a repository with
    many LFS objects.
  2. Git asks LFS for the first remaining object in the checkout.
  3. LFS sees that the object is not in the cache, since this is a clone and no
    objects are cached yet.
  4. LFS downloads the file in whole, and then responds to Git.
  5. If the checkout is done, see step 6, else see step 2.
  6. Done

In the above, LFS has to download each large file one at a time before it can
move onto the next. While thinking about ways to improve this, I summarized some
discussion that @larsxschneider and I had been having in
#1632 (comment):

@larsxschneider and I have already begun talking about adding a new capability
to the protocol to allow the git-filter-server (LFS) to "promise" that it
will convert a file and yield the path on disk where it will dump the result
of that conversion. [...]

This will allow us to process many transfers concurrently, taking full
advantage of the proposed new transfer queue design, and yielding even greater
speeds for the new filter protocol implementation. Perhaps more importantly,
it will allow us to do normal Git operations (git clone, git push,
git pull, and etc.) without needing to temporarily disable the smudge filter
to parallelize the transfers. [...]

Propositions

Lars has already implemented a spike of what support for what this would look
like in #1646.

Since Lars' patch (see: git/git@487fe1f) is on 'next' and 'pu', I propose that we
expand on his spike above and add support for the 'delay' capability into Git LFS.

Technical Overview

See: the documentation in gitattributes.txt.

Non-goals

While this would allow us to remove two helper commands:

  • git lfs clone
  • git lfs fetch

I don't think that we can do this until a MAJOR semantic version, the next of
which would be v3.0.0. At that time, or before, I'd like to discuss the
feasibility of removing these commands.


/cc @git-lfs/core
/cc @larsxschneider

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions