Skip to content

commands: teach 'delay' capability to git-lfs-filter-process(1)#2511

Merged
ttaylorr merged 20 commits intomasterfrom
delay
Aug 24, 2017
Merged

commands: teach 'delay' capability to git-lfs-filter-process(1)#2511
ttaylorr merged 20 commits intomasterfrom
delay

Conversation

@ttaylorr
Copy link
Contributor

@ttaylorr ttaylorr commented Aug 15, 2017

This pull requests teaches the git-lfs-filter-process(1) command the 'delay' capability that was proposed by @larsxschneider in #1632 (comment) and added to Git in git/git@51b8aec.

This implementation allows the process filter to respond with a status=delayed when asked to smudge the contents of a file that it does not already have in the local .git/lfs/objects cache.

If any items were delayed, Git will later ask for blobs that were available with the list_available_blobs command, which Git LFS will either respond with the immediately available blobs, wait until at least one becomes available, or respond with an empty set indicating that the transfer is complete.

To implement this, I added a new delayedSmudge helper function, which behaves the same way as smudge(), but instead adds the pointer to *tq.TransferQueue given as an argument.

Here are some benchmarks when running git clone on GitHub.com and a local copy of GitHub:

lfs.concurrenttransfers=1 ...=3 ...=8 ...=16 ...=32
v2.13.2
github.com 29.2297 - - - -
github.dev 18.4657 - - - -
v2.14.1.next
github.com 45.6183 28.6867 14.9877 13.8893 13.7767
github.dev 9.02767 7.41867 7.70833 7.33933 7.71067

For a delta of:

lfs.concurrenttransfers=1 ...=3 ...=8 ...=16 ...=32
github.com -43.3866 0.543 14.242 15.3404 15.453
github.dev 9.43803 11.04703 10.75737 11.12637 10.75503

Excluding outliers:

v2.13.2 v2.14.1.next delta delta %
github.com 29.2297 17.8351 11.4s faster 164% faster
github.dev 18.4657 7.840934 10.6s faster 236% faster

There were three big performance changes that impact this improvement:

  1. tq: make Add() accept new items during batch processing #2483: Infinitely buffering the tq.(*TransferQueue).Add() function, which allows the TransferQueue to accept new items sent by Git, instead of waiting until the first batch of items were transferred.
  2. tq: increase default lfs.concurrenttransfers to 8 #2506: Increasing the default number of concurrently running transfers from 3 to 8 allows LFS to take advantage of more network bandwidth and process items faster.
  3. Not blocking on list_available_blobs until tq.Wait() returns. Instead, read from tq.Watch() until a read becomes blocking (via select { case <-tq.Wait(): ... default: }. This allows Git to perform the checkout while the tq.(*TransferQueue) is still transferring items.

Following this, I'd like to deprecate the git lfs clone command, since git clone is now as fast or faster than git lfs clone as of the latest tip of git.git.

EDIT: the each number is the average of 5 runs of a given command, with the outliers removed. The benchmark command is written as: benchmark.sh. The average-timings command is written as: average-timings.sh.

Closes: #2466.


/cc @git-lfs/core
/cc @larsxschneider

@ttaylorr ttaylorr added this to the v2.3.0 milestone Aug 15, 2017
Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. I pointed out a few things below:

status = delayedStausFromErr(err)
} else {
status = statusFromErr(err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this nested if is complicated. Would something like this work? It looks like calling Flush() on nil is a no-op any way.

if delayed {
  s.WriteStatus(delayedStausFromErr(err))
} else if ferr := w.Flush(); ferr != nil {
  s.WriteStatus(statusFromErr(ferr))
} else {
  s.WriteStatus(statusFromErr(err))
}

Continuing to declare status outside of that scope is fine too, but it didn't look like it was needed anywhere else.

BTW delayedStausFromErr has a typo :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Way better -- thanks for this suggestion. Changed things up in 1001e1c (and I got that typo!)

//
// delayedSmudge returns the number of bytes written, whether the checkout was
// delayed, the *lfs.Pointer that was smudged, and an error, if one occurred.
func delayedSmudge(s *git.FilterProcessScanner, to io.Writer, from io.Reader, tq *tq.TransferQueue, filename string, skip bool, filter *filepathfilter.Filter) (int64, bool, *lfs.Pointer, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tq parameter clobbers the tq package name. Not critical of course, but may be worth changing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call -- I think we should be fine for now, since we don't use any other exported members from the tq package, but this will be a good way to prevent an ambiguous diff later on if that changes. Covered this one in: a9942cd.

}
}
case "list_available_blobs":
closeOnce.Do(func() { go q.Wait() })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could use some clarification here. It's my understanding that a transfer queue can accept new transfers until Wait() is called. If I'm interpreting the protocol docs correctly, then the process should look like this:

  1. Git sends all pointers through smudge with can-delay=1
  2. Git sends list_available_blobs. The transfer queue is closed.
  3. Git sends more smudge commands for each available blob, never calling list_available_blobs again.
  4. Profit!!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 not quite -- I think the confusion has to do with the readAvailable() func. readAvailable does as many reads as possible before <-tq.Watch() becomes blocking, responding to a list_available_blobs command with any blobs that it currently has ready, or blocking until at least one blob is ready.

The process looks like this:

  1. Git sends all pointers through command=smudge with can-delay=1.
  2. Git issues a list_available_blobs command, indicating that no more new blobs will be sent as above. The transfer queue is accordingly closed.
  3. We read as many currently available blobs as we can before blocking, or block until a) at least one blob becomes available, or b) the transfer queue has no more blobs to send, in which case we send an empty list, and the process is complete.
  4. Git send a command=smudge with can-delay=0 for each blob in the list above.
  5. Git issues another list_available_blobs command.
  6. See step 3.

I did my best to clarify this in 8816463.

if !ok {
return ts
}
return append(ts, t)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this part need to exist? If ts is empty, the for should start another iteration that'll attempt to read from the channel. Very possible there's some detail about select that I'm forgetting...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, though the duplication is unfortunate. Specifically, this return append, could be turned into an append only, but the blocking <-ch read causes the enclosing for loop to not spin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeez, the return makes all the difference in the world. Thanks.


if [ ! -d .git/info ]; then
mkdir .git/info
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think you could replace this with mkdir -p .git/info.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call -- I got this one in: bc376b8.

Copy link
Contributor

@technoweenie technoweenie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really excited to see this land!

@larsxschneider
Copy link
Member

As suggested by @ttaylorr, I added a progress indicator for filtering content in Git and posted the patch to the Git mailing list.

If you clone a repo it would look like this:

$ git clone https://server/some/repo.git
Cloning into 'repo'...
remote: Counting objects: 6077, done.
remote: Total 6077 (delta 0), reused 0 (delta 0), pack-reused 6077
Receiving objects: 100% (6077/6077), 6.00 MiB | 1.24 MiB/s, done.
Resolving deltas: 100% (4177/4177), done.
Filtering content: 100% (62/62), 7.63 MiB | 304.00 KiB/s, done.

If you checkout a commit that references many LFS files that you don't have locally on your machine yet, then you would see something like this:

$ git checkout mybranch
Filtering content: 100% (62/62), 7.63 MiB | 312.00 KiB/s, done.
Switched to branch 'mybranch'

A nice side effect is that users can easily see the bandwidth performance of their LFS servers 🙂

@larsxschneider
Copy link
Member

There might be a bug in the current (7115b57) delay implementation. Try these steps:

(1) Checkout a branch of a repo with files in LFS
(2) Delete all files in that repo except for the .git folder
(3) Run git checkout -f HEAD

The checkout operation should be blazingly fast as all objects should be available locally already. However, it looks like as if Git LFS downloads all objects, again.

@ttaylorr
Copy link
Contributor Author

ttaylorr commented Aug 20, 2017

The checkout operation should be blazingly fast as all objects should be available locally already. However, it looks like as if Git LFS downloads all objects, again.

@larsxschneider great catch, this was definitely a bug. With c04745c:

~/D/git-lfs-test (master) $ find .git/lfs/objects -type f | wc -l
     187

~/D/git-lfs-test (master) $ rm -rf ./000 ./001 ./002 ./003 ./004 ./005 ./006 ./007 ./008 ./009 ./010 ./script

~/D/git-lfs-test (master!) $ time git checkout -f HEAD
Your branch is up-to-date with 'origin/master'.
git checkout -f HEAD  0.05s user 0.07s system 84% cpu 0.146 total

EDIT: the above commit had a bug where it would not smudge pointers for which the local object was cached. I fixed that in: 50ecd67.

}

// Write 'statusFromErr(nil)', even though 'perr != nil', since
// we are about to write non-delayed smudged contents to "to".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix seems to work! Thank you 👍! However, I still try to decipher the comment 😄. perr should be nil here, no? TBH I don't really understand the same comment a few lines above either. Can you give me a hint?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, could it be that this was the reason for the degraded performance in the lfs.concurrenttransfers=1 case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good question -- I think the comment is out of date but the code is correct here: this case occurs when we have the contents locally and can smudge without delay. It is certainly possible that this could affect cases where lfs.concurrenttransfers=1, but I think this would be negligible. I fixed this up in 4cefa24.

@larsxschneider
Copy link
Member

Cloning repositories via SSH with a large number of LFS files seems to consistently result in a "batch response: Bad credentials" error:

$ git clone git@github.com:larsxschneider/lfstest-manyfiles.git testrepo
Cloning into 'testrepo'...
warning: templates not found /Users/lars/share/git-core/templates
remote: Counting objects: 15012, done.
remote: Total 15012 (delta 0), reused 0 (delta 0), pack-reused 15012
Receiving objects: 100% (15012/15012), 2.02 MiB | 1.57 MiB/s, done.
Downloading 1/test11791.bin (6 B)00), 17.69 KiB | 0 bytes/s
Error downloading object: 1/test11791.bin (c5a39d8): Smudge error: Error downloading 1/test11791.bin (c5a39d8045fd604bdcf307bbe06e7e41dd59a2399dc2a1807e4a6e3dfef5148b): batch response: Bad credentials

Errors logged to /Users/lars/Temp/lfstest/testrepo/.git/lfs/objects/logs/20170820T190408.392912345.log
Use `git lfs logs last` to view the log.
error: external filter 'git-lfs filter-process' failed
fatal: 1/test11791.bin: smudge filter lfs failed
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

Plus, in case of an error Git LFS prints Downloading 1/test11791.bin (6 B) over the Filtering content: 100% (62/62), 7.63 MiB | 312.00 KiB/s progress.
Can/should we disable the Downloading... output in Git LFS?

@technoweenie
Copy link
Contributor

Cloning repositories via SSH with a large number of LFS files seems to consistently result in a "batch response: Bad credentials" error:

Hmm, I wouldn't think this PR would affect that. It's probably an unrelated issue. Can you try with an HTTPS remote (just to confirm whether this is a general transfer queue issue, an ssh auth issue, or a delay filter issue)?

As for the SSH thing, my hunch is that the expiration timeout is not being respected. Is this happening at around the 10 minute mark?

$ ssh git@github.com git-lfs-authenticate github/git-lfs-test download
{
  "href": "https://lfs.github.com/github/git-lfs-test",
  "header": {
    "Authorization": "RemoteAuth booya"
  },
  "expires_at": "2017-08-20T18:10:12Z",
  "expires_in": 599
}

Two potential spots that I can think of:

  • Does the Batch API try to refresh the Authorization header after it expires? I thought it recognizes that and automatically retries. Maybe there's a race condition, and it should refresh the token if expires_in is < 5 or something.
  • Does the SSH cred cacher respect the expires_in? It should be falling back to a real ssh call after expires_in seconds have passed.

Can/should we disable the Downloading... output in Git LFS?

I think we still need some output for legacy smudges. The filter should disable them (or even implement its own progress output).

Copy link
Member

@larsxschneider larsxschneider left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few nits that you can address if you want (most relevant from my point of view is the delayedStatusFromErr suggestion).

I am happy to see that merged 👍

@ttaylorr
Copy link
Contributor Author

🙇 thanks for the review, everybody!

@jjgod
Copy link

jjgod commented Sep 14, 2017

What git version do I need for this?

~$ git lfs version
git-lfs/2.3.0 (GitHub; darwin amd64; go 1.8.3; git 4dd2bf73)
~$ git --version
git version 2.14.1

Yet it doesn't seem to work, no concurrent transfers when I do git clone .

For cloning the same repository:

git lfs clone:

real	0m28.256s
user	0m24.853s
sys	0m9.224s

git clone:

real	2m5.616s
user	0m15.168s
sys	0m10.439s

@larsxschneider
Copy link
Member

@jjgod You need the upcoming Git version 2.15 😄
See https://twitter.com/kit3bus/status/908579232422486016

@DJm00n
Copy link

DJm00n commented Oct 31, 2017

@larsxschneider

$ git --version
git version 2.15.0.windows.1
$ git lfs version
git-lfs/2.3.4 (GitHub; windows amd64; go 1.8.3; git d2f6752f)

Clone speed is fine but no progress output and looks like hang:

Speed:
git clone

real 0m34.317s
user 0m0.000s
sys 0m0.000s

git lfs clone

Git LFS: (16 of 16 files) 130.94 MB / 130.94 MB
real    0m38.161s
user    0m0.000s
sys     0m0.015s

@DJm00n
Copy link

DJm00n commented Oct 31, 2017

@larsxschneider Strange, seems like git/git@52f1d62e is merged to git 2.15 already. macOS output is without progress meter too.

@larsxschneider
Copy link
Member

larsxschneider commented Oct 31, 2017 via email

@DJm00n
Copy link

DJm00n commented Oct 31, 2017

@larsxschneider, missed that thread. On your test repo
git clone https://github.com/larsxschneider/lfstest-manyfiles prints "Filtering content" progress too on my Macbook. But not on my vsonline test repo. So it looks like more general git issue not git-for-windows only.

MacBook-Pro:Develop djm00n$ git version
git version 2.15.0
MacBook-Pro:Develop djm00n$ git lfs version
git-lfs/2.3.4 (GitHub; darwin amd64; go 1.9.1)

Git and Git LFS are installed from Homebrew.

@DJm00n
Copy link

DJm00n commented Dec 4, 2017

@larsxschneider just updated to git 2.15.1 and still no progress showing on clone. Any updates?
If any info/logs are needed just ask and ill provide them.
my test repo:
repo https://dimitriy-ryazantcev.visualstudio.com/DefaultCollection/_git/Test1
gittest1/gittest999!

@larsxschneider
Copy link
Member

Thanks @DJm00n ! I can reproduce the issue with your repo! Let's see if I can find the problem...

@larsxschneider
Copy link
Member

@DJm00n I think I found and fixed the issue. Let's see what the Git experts think: https://public-inbox.org/git/20171204203647.30546-1-lars.schneider@autodesk.com/

@ttaylorr
Copy link
Contributor Author

ttaylorr commented Dec 5, 2017

@DJm00n I think I found and fixed the issue. Let's see what the Git experts think: https://public-inbox.org/git/20171204203647.30546-1-lars.schneider@autodesk.com/

@larsxschneider Excellent find, and thanks for keeping us up-to-date.

@awmcclain
Copy link

awmcclain commented May 9, 2018

Sorry for the late question, but does this mean that by upgrading lfs & git that the speed will increase for git pull's where a large number of added files are tracked?

@ttaylorr
Copy link
Contributor Author

ttaylorr commented May 9, 2018

but does this mean that by upgrading lfs & git that the speed will increase for git pull's where a large number of added files are tracked?

That's correct!

chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request Oct 16, 2023
The "git lfs clone" command has been deprecated since Git LFS v2.3.0
and Git v2.15.0 were released, so we update our manual page for the
command to reflect this fact and clarify when and why the command
should still be used, if at all.

As discussed in git-lfs#2466, Git's support for long-running filter processes
(introduced in Git v2.11.0 via the filter.<driver>.process setting) was
enhanced in v2.15.0 with the "delay" capability whereby the filter can
signal that it will operate asynchronously on multiple blobs, thus allowing
it to process multiple blobs in parallel.  Git LFS was updated to make
use of the "delay" capability in a series of PRs culminating in PR git-lfs#2511,
all of which were included in the Git LFS v2.3.0 release.

We also make some minor edits to the manual page to improve the formatting
of literal strings.
chrisd8088 added a commit to chrisd8088/git-lfs that referenced this pull request May 15, 2025
The "git lfs clone" command has been deprecated since Git LFS v2.3.0
and Git v2.15.0 were released, so we update our manual page for the
command to reflect this fact and clarify when and why the command
should still be used, if at all.

As discussed in git-lfs#2466, Git's support for long-running filter processes
(introduced in Git v2.11.0 via the "filter.<driver>.process" setting) was
enhanced in v2.15.0 with the "delay" capability whereby the filter can
signal that it will operate asynchronously on multiple blobs, thus allowing
it to process multiple blobs in parallel.  Git LFS was updated to make
use of the "delay" capability in a series of PRs culminating in PR git-lfs#2511,
all of which were included in the Git LFS v2.3.0 release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants