Endpoint for batch upload/download operations#285
Conversation
|
Interesting. I imagined having endpoints like |
|
/cc @ddanier @meltingice since you've both worked on Git LFS server implementations. |
|
Any thoughts on always sending and returning objects? So instead of: Send a hash. It gives us a place to add extra meta data for the request. Maybe we want to tell the server what the local commit is that we're downloading the objects for? |
|
I like that idea 👍 |
|
Cool, I like this idea. Should help to speed things up a lot. Is this going to be a mandatory endpoint, or is there some way the client can discover whether or not the server offers batching? Perhaps try the batch endpoint first, and then fall back to non-batched endpoints if it 404's? |
I'd like to transition the code to using the batch endpoint as the single required endpoint for servers to implement for Git LFS v1.0.
|
|
I think a batch job would be really nice. However I don't like the idea of adding additional data to the requests (commit sha1, branch). The git-lfs server is currently pretty easy to implement as it's basically a simple key value store. This is - in my eyes - one of the key features that make it a feasible solution for many git providers (GitLab could for example add this without many problems, I did my own implementation in just under 10 hours). Now just adding more information might not complicate things, but allow us to do so. In the end this might make implementations much harder. Another thing to keep in mind is that we are talking about big files. So the API request to check a files existance should have a pretty low impact on the overall performance, most of the time should be spended up/downloading the file itself. So although adding a batch job is a great thing from a technical perspective it might not necessarily be that much better from a usability perspective. My personal opinion is to just implement it anyways, as you cannot change this so easily when more servers exist. |
I want to add this meta data in a way that's not required for servers to process. It should be easy to just focus on the proposed "objects" key if that's all your implementation cares about. I'm hoping to keep the API simple for small servers, with optional features that bigger servers can choose to implement.
I hope to ship support for both APIs in the client first, defaulting to the current key/value methods. If we can't prove that reducing the API overhead helps in a meaningful way, we can scrap it. I'm anxious to compare this to the work done in #258. |
|
I should add that since github.com does not yet have a batch api, any testing will need to be done against the lfs test server. A matching PR containing a batch endpoint is here: git-lfs/lfs-test-server#27 |
* Ensure concurrent values are at least 1 * Ensure batch boolean follows git config's rules * Tests for each
test/test-batch-transfer.sh
Outdated
There was a problem hiding this comment.
I've started using this pattern:
git commit -m "add a.dat" 2>&1 | tee commit.log
grep "master (root-commit)" commit.logNice thing is that the output is only printed to STDOUT once with tee. If the last grep fails, you'll only see the commit output once, and the matches of the successful grep commands.
Reorganize the transfer queue to provide a channel to watch for object OIDs as they finish. This can be used in the `get` command to feed a goroutine that will copy the file to the working directory and inform the update-index process about it as the transfers finish. This leads to a greatly reduced amount of time spent updating the index after a get.
is no longer needed, just 'git lfs' grep and string comparison improvements from test-happy-path.sh
|
Merging this, as it seems like a great improvement, and will serve as a base for future changes. |
Endpoint for batch upload/download operations
|
As per #341 I'd really appreciate it if this capability could be made available via the SSH API in addition to HTTP. |
This PR allows the LFS client to batch its upload and download operations.
Typically, the client has to ask the server about each object it wants to upload or download, resulting in many http round trips. This causes significant delay when dealing with a large amount of objects.
This PR proposes a change to the API that provides a batch endpoint that gives the client information about a number of objects at once.
The client will
POSTan array of oid/size objects to the batch endpoint, and the endpoint will return an array of link relations for each object asked about. The presence or absence of any of thedownload,upload, andverifyrelations can be used to determine the state of the object.downloadpresent - server has and has verified the objectuploadpresent - server does not have the object, client can upload and followverify(if given)TODO:
getcommand to bulk download