Skip to content

WIP: promised downloads (proof of concept)#1646

Closed
larsxschneider wants to merge 1 commit intomasterfrom
promised-downloads
Closed

WIP: promised downloads (proof of concept)#1646
larsxschneider wants to merge 1 commit intomasterfrom
promised-downloads

Conversation

@larsxschneider
Copy link
Member

Goal (implemented as proof of concept demo):
If a file is not in the GitLFS cache, then GitLFS returns an empty file upon Git's smudge request and starts the download right away. At the end of the "process-filter" protocol GitLFS waits for the downloads to finish and writes the files to the Git working tree.

This should allows us to get rid of the git lfs clone|pull|fetch commands.


@ttaylorr If you like it then please change the code to use the proper parallel download machinery. 😉

If a file is not in the GitLFS cache, then GitLFS returns an empty file
upon Git's smudge request and starts the download right away. At the end
of the "process-filter" protocol GitLFS waits for the downloads to
finish and writes the files to the Git working tree.

This allows us to get rid of the git lfs clone|pull|fetch commands.
@ttaylorr
Copy link
Contributor

Looking like a great start. I'm really psyched to get this in, it's going to be a great addition combined with the filter process stuff we implemented in #1617.

I can take a stab at this and have it handle transfers in parallel, shouldn't be too tricky.

What's required on the Git side?

var dd DeferredDownload
dd.Ptr = ptr
dd.WorkingFile = workingfile
deferredDownloads = append(deferredDownloads, dd)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we write nothing to writer which means the smudge result is an empty file. We might want to write the GitLFS pointer there to writer or a message like file is being downloaded. It doesn't matter what we write ... the file is overwritten at the end anyways. This intermedia step is only relevant in case GitLFS dies...

@larsxschneider
Copy link
Member Author

Nothing required on the Git side. Regular 2.11 release candidate is sufficient.

@ttaylorr
Copy link
Contributor

ttaylorr commented Nov 12, 2016

Nothing required on the Git side. Regular 2.11 release candidate is sufficient.

My concern is that by writing empty contents to the file and then doing the copy on the LFS side, the story of failing a transfer part of the way through is not great. If our filter dies before we have a chance to copy the files from their temporary locations, the working tree is left in a dirty state which would show all of the pointer files being removed.

I see a few options on how to remediate this:

  • Implement a small patch in Git adding a protocol capability to "roll back" the conversions performed by the filter in the event of a failure. We could send this packet at the very end if a transfer failed.
  • Implement the patch as originally discussed where Git would know about the temporary locations of the files on disk and would perform the copy itself, respected the filter.<driver>.required config option.
  • Maintain a set of those temporary files ourselves and write another filter which would execute the copy ONLY if the transfer succeeded. With this approach, I think our safest bet would be to echo back the contents of the file as given to us by the filter, but this doesn't really solve the problem since we would have to write a potentially huge amount of data over the pipe, leaving us where we started.

My preference would be to teach Git the promise-based approach, that way we can respect the filter.<driver>.required option and defer the responsibility of bringing the changes into the working-tree/index to Git itself. This way, if a transfer fails, Git can choose to not copy the files over, not wasting disk I/O and (more importantly) not leaving the working tree or index in a dirty state.

EDIT: @larsxschneider mentioned to me that the git lfs clone command (and its peers) actually have this behavior of leaving the working tree in a dirty state if a transfer fails. [1] I agree that a good first step would be to simply parallelize the transfers here, and then leave the tree in a dirty state if we fail. We can reinvestigate after and see if it still makes sense to implement this on the Git side.

@larsxschneider
Copy link
Member Author

I looked a bit deeper into a possible Git core implementation and realized that the "temporary file" idea might be too complicated with the current Git core filter code. I proposed an alternative idea as RFC to the Git list: http://public-inbox.org/git/D10F7C47-14E8-465B-8B7A-A09A1B28A39F@gmail.com/

Note the last sentence of Eric's reply:

Anyways, I'll plan on doing something similar (in Perl) with the
synchronous parts of public-inbox which relies on "cat-file --batch"
at some point... (my rotational disks are sloooooooow :<)

That might be interesting for #1649 /cc @technoweenie !

@technoweenie
Copy link
Contributor

Interesting. Both this idea, and the comment about cat-file --batch.

@larsxschneider
Copy link
Member Author

larsxschneider commented Jan 8, 2017

I posted a patch to support "promised downloads" natively in Git:
http://public-inbox.org/git/20170108191736.47359-1-larsxschneider@gmail.com/

@rjbell4
Copy link
Contributor

rjbell4 commented Mar 12, 2017

Just found this. Love the idea! It would be great for general LFS support. Though hopefully we can still support "skip smudge" for some use cases.

@ttaylorr
Copy link
Contributor

Just found this. Love the idea! It would be great for general LFS support. Though hopefully we can still support "skip smudge" for some use cases.

Don't see any reason why we couldn't 👍 .

@Spunkie
Copy link

Spunkie commented Mar 27, 2017

If this was implemented as is would there be any way of supporting workflows that make use of the -X/--exclude= and -I/--include= params when using straight git clone|pull|fetch or would I still need to fallback on git lfs clone|pull|fetch cmds? I'm not really sure if that is equivalent to the "skip smudge" you are talking about above?

@ttaylorr
Copy link
Contributor

If this was implemented as is would there be any way of supporting workflows that make use of the -X/--exclude= and -I/--include= params when using straight git clone|pull|fetch or would I still need to fallback on git lfs clone|pull|fetch cmds?

In Git today, you'd have to use the git lfs variants to make use of LFS-specific flags. That being said, there are a few alternatives:

  1. Git forwards all un-recognized flags to filter drivers, hooks, etc.
  2. You git lfs install --skip-smudge, and then do a git lfs checkout -I ... -X ... once inside your repository.

@larsxschneider
Copy link
Member Author

I posted my v3 of the relevant Git parts for this change:
http://public-inbox.org/git/20170409191107.20547-1-larsxschneider@gmail.com/

Feedback would be highly appreciated for Git filter protocol change described here:
larsxschneider/git@08a461f103#diff-6fb066a3d66afe00c0995f97f5c680fdR521

@larsxschneider
Copy link
Member Author

The necessary Git filter protocol changes have been merged to the Git core next branch and are scheduled for the master branch: git/git@2841e8f

I hope they make it into the Git 2.14 release 😊

@ttaylorr
Copy link
Contributor

The necessary Git filter protocol changes have been merged to the Git core next branch and are scheduled for the master branch: git/git@2841e8f

🎉 Hooray! Excellent job on this patch series, as always.

With regards to implementing support for this in LFS, I am more than happy to help. My current roadmap is focused on performance improvements to the migrator which I anticipate taking about a month.

This change should be self-contained enough that I am able to work on it at the same time as the other items on my roadmap.

@larsxschneider how do you feel about adopting a similar workflow to when we worked on the initial process filter?

@larsxschneider
Copy link
Member Author

Thanks @ttaylorr ! Yeah, I think a similar workflow as before would be great. Please let me know if I can help in any way.

@ttaylorr
Copy link
Contributor

Thanks @ttaylorr ! Yeah, I think a similar workflow as before would be great. Please let me know if I can help in any way.

Will do, thanks! 😄

@technoweenie
Copy link
Contributor

@larsxschneider Thanks and congrats on the git patch! Excited to see how this improves clone/fetch times 🤘

@larsxschneider
Copy link
Member Author

This PR is superseded by #2511

@larsxschneider larsxschneider deleted the promised-downloads branch August 23, 2017 08:25
@larsxschneider larsxschneider deleted the promised-downloads branch August 23, 2017 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants