Endpoint for batch upload/download operations by rubyist · Pull Request #285 · git-lfs/git-lfs

rubyist · 2015-05-05T15:21:35Z

This PR allows the LFS client to batch its upload and download operations.

Typically, the client has to ask the server about each object it wants to upload or download, resulting in many http round trips. This causes significant delay when dealing with a large amount of objects.

This PR proposes a change to the API that provides a batch endpoint that gives the client information about a number of objects at once.

The client will POST an array of oid/size objects to the batch endpoint, and the endpoint will return an array of link relations for each object asked about. The presence or absence of any of the download, upload, and verify relations can be used to determine the state of the object.

download present - server has and has verified the object
upload present - server does not have the object, client can upload and follow verify (if given)

TODO:

Spec the batch endpoints
Upload queue can work with batch endpoint
Add a get command to bulk download

technoweenie · 2015-05-05T16:20:18Z

Interesting. I imagined having endpoints like /download, /upload, and /verify. Having one endpoint that can send the correct hypermedia relations is better 👍

technoweenie · 2015-05-05T16:22:37Z

/cc @ddanier @meltingice since you've both worked on Git LFS server implementations.

technoweenie · 2015-05-05T16:29:06Z

Any thoughts on always sending and returning objects? So instead of:

[
  { "oid": "abcdef" }, ...
]

Send a hash. It gives us a place to add extra meta data for the request. Maybe we want to tell the server what the local commit is that we're downloading the objects for?

{
  "commit": "sha1",
  "branch": "master",
  "objects": [
    { "oid": "abcdef" }, ...
  ]
}

rubyist · 2015-05-05T16:29:57Z

I like that idea 👍

meltingice · 2015-05-05T19:04:42Z

Cool, I like this idea. Should help to speed things up a lot.

Is this going to be a mandatory endpoint, or is there some way the client can discover whether or not the server offers batching? Perhaps try the batch endpoint first, and then fall back to non-batched endpoints if it 404's?

technoweenie · 2015-05-05T20:04:29Z

Is this going to be a mandatory endpoint, or is there some way the client can discover whether or not the server offers batching? Perhaps try the batch endpoint first, and then fall back to non-batched endpoints if it 404's?

I'd like to transition the code to using the batch endpoint as the single required endpoint for servers to implement for Git LFS v1.0.

First, ship experimental support. Enable some local lfs.batch config on a repository to use it.
Make the batch API default, and fallback to the original API if the server returns a 404 or lfs.batch is false.
Remove support for the old /objects API.

ddanier · 2015-05-06T02:39:19Z

I think a batch job would be really nice.

However I don't like the idea of adding additional data to the requests (commit sha1, branch). The git-lfs server is currently pretty easy to implement as it's basically a simple key value store. This is - in my eyes - one of the key features that make it a feasible solution for many git providers (GitLab could for example add this without many problems, I did my own implementation in just under 10 hours). Now just adding more information might not complicate things, but allow us to do so. In the end this might make implementations much harder.

Another thing to keep in mind is that we are talking about big files. So the API request to check a files existance should have a pretty low impact on the overall performance, most of the time should be spended up/downloading the file itself. So although adding a batch job is a great thing from a technical perspective it might not necessarily be that much better from a usability perspective. My personal opinion is to just implement it anyways, as you cannot change this so easily when more servers exist.

technoweenie · 2015-05-06T12:17:04Z

However I don't like the idea of adding additional data to the requests (commit sha1, branch).

I want to add this meta data in a way that's not required for servers to process. It should be easy to just focus on the proposed "objects" key if that's all your implementation cares about. I'm hoping to keep the API simple for small servers, with optional features that bigger servers can choose to implement.

So the API request to check a files existance should have a pretty low impact on the overall performance, most of the time should be spended up/downloading the file itself.

I hope to ship support for both APIs in the client first, defaulting to the current key/value methods. If we can't prove that reducing the API overhead helps in a meaningful way, we can scrap it. I'm anxious to compare this to the work done in #258.

…ploads

rubyist · 2015-05-26T14:09:03Z

I should add that since github.com does not yet have a batch api, any testing will need to be done against the lfs test server. A matching PR containing a batch endpoint is here: git-lfs/lfs-test-server#27

* Ensure concurrent values are at least 1 * Ensure batch boolean follows git config's rules * Tests for each

technoweenie · 2015-05-27T16:20:22Z

test/test-batch-transfer.sh

I've started using this pattern:

git commit -m "add a.dat" 2>&1 | tee commit.log grep "master (root-commit)" commit.log

Nice thing is that the output is only printed to STDOUT once with tee. If the last grep fails, you'll only see the commit output once, and the matches of the successful grep commands.

Reorganize the transfer queue to provide a channel to watch for object OIDs as they finish. This can be used in the `get` command to feed a goroutine that will copy the file to the working directory and inform the update-index process about it as the transfers finish. This leads to a greatly reduced amount of time spent updating the index after a get.

is no longer needed, just 'git lfs' grep and string comparison improvements from test-happy-path.sh

technoweenie · 2015-05-28T16:45:15Z

Merging this, as it seems like a great improvement, and will serve as a base for future changes.

Endpoint for batch upload/download operations

catphish · 2015-05-28T18:32:06Z

As per #341 I'd really appreciate it if this capability could be made available via the SSH API in addition to HTTP.

Draft an endpoint for batch upload/download operations

694a6fd

rubyist added the wip label May 5, 2015

technoweenie added the storage label May 5, 2015

technoweenie assigned technoweenie and rubyist and unassigned technoweenie May 5, 2015

accept and return an object with an array instead of an array

978471f

rubyist mentioned this pull request May 5, 2015

Implement the batch endpoint git-lfs/lfs-test-server#27

Merged

technoweenie added the enhancement label May 6, 2015

rubyist added 12 commits May 7, 2015 14:33

Initial rough draft of client code for the Batch endpoint and batch u…

52dac4f

…ploads

Flip batch with config

a870cc8

Batch takes objectResource instead of Uploadable

93381bf

Start a download queue

b4a4844

Should be part of the queues

bf966e0

ConcurrentUploads => ConcurrentTransfers

f2ecad0

Naming consistency

bf10519

Basic "lfs get" that takes a ref, queues downloads

c869fc6

Can batch downloads

3bb2846

Using structs here is unnecessary, a map will do

de200c6

Don't need to open if we're just using Stat()

1d87b73

Make PointerSmudgeObject only download to the media directory

f905afa

rubyist mentioned this pull request May 21, 2015

panic: runtime error uploading files #317

Closed

meltingice mentioned this pull request May 26, 2015

Implement batch upload endpoint meltingice/git-lfs-s3#1

Open

rubyist added 2 commits May 26, 2015 15:46

More robust config parsing for concurrent/batch vals

135d59b

* Ensure concurrent values are at least 1 * Ensure batch boolean follows git config's rules * Tests for each

Add the concurrent and batch settings to lfs env output

36246c5

technoweenie reviewed May 27, 2015
View reviewed changes

rubyist and others added 9 commits May 27, 2015 15:03

Update tracerx so it can run perf tracing without regular tracing

b44af51

Add some perf tracing to lfs get, run a single update-index process

d059f10

merge master

3eb8e24

fix env tests

589592c

fix batch tests

e3173e2

is no longer needed, just 'git lfs' grep and string comparison improvements from test-happy-path.sh

fix push and pre-push tests

71ebdfd

Remove unused Upload() function, fix upload tests to use new functions

efdcb06

update test to use tee

2fbe6ea

technoweenie mentioned this pull request May 28, 2015

HTTP vs SSH and download URLs #341

Closed

mention in the API spec that the batch api is still experimental

7df3999

technoweenie added a commit that referenced this pull request May 28, 2015

Merge pull request #285 from github/multitransfer

ac67b68

Endpoint for batch upload/download operations

technoweenie merged commit ac67b68 into master May 28, 2015

technoweenie deleted the multitransfer branch May 28, 2015 16:50

This was referenced May 29, 2015

Release v0.5.2 #281

Closed

[DO NOT MERGE] How to cache small binaries in git lfs? #360

Closed

rubyist mentioned this pull request Jun 23, 2015

issue with pushing to a new branch #432

Closed

technoweenie unassigned rubyist Jul 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Endpoint for batch upload/download operations#285

Endpoint for batch upload/download operations#285
technoweenie merged 38 commits intomasterfrom
multitransfer

rubyist commented May 5, 2015

Uh oh!

technoweenie commented May 5, 2015

Uh oh!

technoweenie commented May 5, 2015

Uh oh!

technoweenie commented May 5, 2015

Uh oh!

rubyist commented May 5, 2015

Uh oh!

meltingice commented May 5, 2015

Uh oh!

technoweenie commented May 5, 2015

Uh oh!

ddanier commented May 6, 2015

Uh oh!

technoweenie commented May 6, 2015

Uh oh!

rubyist commented May 26, 2015

Uh oh!

technoweenie May 27, 2015

Uh oh!

technoweenie commented May 28, 2015

Uh oh!

catphish commented May 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

rubyist commented May 5, 2015

Uh oh!

technoweenie commented May 5, 2015

Uh oh!

technoweenie commented May 5, 2015

Uh oh!

technoweenie commented May 5, 2015

Uh oh!

rubyist commented May 5, 2015

Uh oh!

meltingice commented May 5, 2015

Uh oh!

technoweenie commented May 5, 2015

Uh oh!

ddanier commented May 6, 2015

Uh oh!

technoweenie commented May 6, 2015

Uh oh!

rubyist commented May 26, 2015

Uh oh!

technoweenie May 27, 2015

Choose a reason for hiding this comment

Uh oh!

technoweenie commented May 28, 2015

Uh oh!

catphish commented May 28, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants