WIP: promised downloads (proof of concept)#1646
WIP: promised downloads (proof of concept)#1646larsxschneider wants to merge 1 commit intomasterfrom
Conversation
If a file is not in the GitLFS cache, then GitLFS returns an empty file upon Git's smudge request and starts the download right away. At the end of the "process-filter" protocol GitLFS waits for the downloads to finish and writes the files to the Git working tree. This allows us to get rid of the git lfs clone|pull|fetch commands.
|
Looking like a great start. I'm really psyched to get this in, it's going to be a great addition combined with the filter process stuff we implemented in #1617. I can take a stab at this and have it handle transfers in parallel, shouldn't be too tricky. What's required on the Git side? |
| var dd DeferredDownload | ||
| dd.Ptr = ptr | ||
| dd.WorkingFile = workingfile | ||
| deferredDownloads = append(deferredDownloads, dd) |
There was a problem hiding this comment.
Here we write nothing to writer which means the smudge result is an empty file. We might want to write the GitLFS pointer there to writer or a message like file is being downloaded. It doesn't matter what we write ... the file is overwritten at the end anyways. This intermedia step is only relevant in case GitLFS dies...
|
Nothing required on the Git side. Regular 2.11 release candidate is sufficient. |
My concern is that by writing empty contents to the file and then doing the copy on the LFS side, the story of failing a transfer part of the way through is not great. If our filter dies before we have a chance to copy the files from their temporary locations, the working tree is left in a dirty state which would show all of the pointer files being removed. I see a few options on how to remediate this:
My preference would be to teach Git the promise-based approach, that way we can respect the EDIT: @larsxschneider mentioned to me that the |
|
I looked a bit deeper into a possible Git core implementation and realized that the "temporary file" idea might be too complicated with the current Git core filter code. I proposed an alternative idea as RFC to the Git list: http://public-inbox.org/git/D10F7C47-14E8-465B-8B7A-A09A1B28A39F@gmail.com/ Note the last sentence of Eric's reply:
That might be interesting for #1649 /cc @technoweenie ! |
|
Interesting. Both this idea, and the comment about |
|
I posted a patch to support "promised downloads" natively in Git: |
|
Just found this. Love the idea! It would be great for general LFS support. Though hopefully we can still support "skip smudge" for some use cases. |
Don't see any reason why we couldn't 👍 . |
|
If this was implemented as is would there be any way of supporting workflows that make use of the |
In Git today, you'd have to use the
|
|
I posted my v3 of the relevant Git parts for this change: Feedback would be highly appreciated for Git filter protocol change described here: |
|
The necessary Git filter protocol changes have been merged to the Git core I hope they make it into the Git 2.14 release 😊 |
🎉 Hooray! Excellent job on this patch series, as always. With regards to implementing support for this in LFS, I am more than happy to help. My current roadmap is focused on performance improvements to the migrator which I anticipate taking about a month. This change should be self-contained enough that I am able to work on it at the same time as the other items on my roadmap. @larsxschneider how do you feel about adopting a similar workflow to when we worked on the initial |
|
Thanks @ttaylorr ! Yeah, I think a similar workflow as before would be great. Please let me know if I can help in any way. |
Will do, thanks! 😄 |
|
@larsxschneider Thanks and congrats on the git patch! Excited to see how this improves clone/fetch times 🤘 |
|
This PR is superseded by #2511 |
Goal (implemented as proof of concept demo):
If a file is not in the GitLFS cache, then GitLFS returns an empty file upon Git's smudge request and starts the download right away. At the end of the "process-filter" protocol GitLFS waits for the downloads to finish and writes the files to the Git working tree.
This should allows us to get rid of the
git lfs clone|pull|fetchcommands.@ttaylorr If you like it then please change the code to use the proper parallel download machinery. 😉