Skip to content

Support concurrent uploads#258

Merged
rubyist merged 21 commits intomasterfrom
concurrent-uploads
Apr 30, 2015
Merged

Support concurrent uploads#258
rubyist merged 21 commits intomasterfrom
concurrent-uploads

Conversation

@rubyist
Copy link
Contributor

@rubyist rubyist commented Apr 23, 2015

This PR will add support to allow uploads to happen concurrently.

  • Create an upload queue
  • Synchronize initial git config call
  • Synchronize first credentials access
  • Display all upload errors
  • Add a configuration option with a sensible default
  • Nicer progress output
  • Docs
  • Fix all the TODOs I added

lfs/client.go Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, closing these channels will stop the for loops in the goroutines once they've processed the last messages. Right? Not that it matters much since the command will exit right after this happens.

Also, I think Wait() is a strange name. It makes sense here, looking at the implementation of the queue. But I don't want the upload queue in the push command to wait, I want it to process or run or something. Prior art: http://golang.org/pkg/os/exec/#Cmd.Run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct on the goroutines.

For Wait(), starting or running is implicit, we don't need to wait until the loop that figures out what to upload to finish before starting uploads. See http://golang.org/pkg/os/exec/#Cmd.Wait and http://golang.org/pkg/sync/#WaitGroup.Wait (which is what it's actually doing).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<bike shedding comment>. Cool 👍

@technoweenie
Copy link
Contributor

Synchronize first credentials access

It might be cool to restructure lfs.Upload so it can take a number of files. Then, it could post an array of objects and get back multiple link relations. One Git LFS API request for every n files.

Probably out of scope for this PR. Just thinking....

@technoweenie
Copy link
Contributor

How will upload progress work on the cli output? We almost need something like the pb package that can track multiple, separate progress bars.

@rubyist
Copy link
Contributor Author

rubyist commented Apr 23, 2015

I think posting an array would be good, that'll save a lot of round trips when there's a lot of files. I would agree that's out of scope for this PR, that would require an API change because I don't think we currently say that is allowed.

For progress, it probably will have to be something with pb showing multiple bars. That should have been added to the TODO list.

rubyist added 3 commits April 23, 2015 12:20
This refactoring will make a number of things easier. These didn't
really need to be in their own packages. Having them separate packages,
there are a few refactorings I've tried to do that end up with circular
dependencies due to things outside of `lfs` depending on `lfs`. Pushing
these into `lfs` makes refactoring simpler.
@rubyist
Copy link
Contributor Author

rubyist commented Apr 23, 2015

A progress bar that shows multiple entries is going to be problematic. It's not too complicated on unixy environments with ansi codes, but windows is a different story. I think you can access some terminal manipulation routines through the windows syscalls and kernel32, but that's a bit beyond my knowledge. There are a couple cross platform terminal ui libraries (e.g. go-termbox), but those want to take over the entire screen, and with git also dumping output to the screen that's not very practical.

We have a few options we can do:

  • Don't show the progress bar(s) for concurrent uploads
  • Show a single progress bar that covers the entire upload set
  • Show a multi-bar on unix, and nothing or the single bar on windows (perhaps someone can contribute a windows multi-bar)

@technoweenie
Copy link
Contributor

Show a single progress bar that covers the entire upload set

I like this idea. Instead of a single "uploading foo.bar (100MB)" message per file, have a single message showing the number of files and total storage, as well as a single progress bar for the whole thing.

@rubyist
Copy link
Contributor Author

rubyist commented Apr 23, 2015

That's my leaning as well, because it's the simplest :).

rubyist added 5 commits April 23, 2015 17:18
If wg.Add is called from the loop reading from the channel, there are
scenarios (number of uploads <= number of workers) where Wait() will
shut down the channels before they're read from.
@rubyist
Copy link
Contributor Author

rubyist commented Apr 24, 2015

pb doesn't really know about progress bars that span multiple items, but we can sneak it in with the newer Prefix() method by updating the prefix as transfers complete:

uploads

@technoweenie
Copy link
Contributor

Super cool 👍

rubyist added 3 commits April 29, 2015 10:37
To prevent being asked for credentials n times (where n = # of
concurrent uploads), we process the upload queue synchronously until one
of the uploads succeeds, and then allow concurrent uploads.

This could be a bit faster if the condition is fired after the first
successful API hit, rather than the first successful total upload, but
some deeper refactoring will need to be done.
This lets the api client code broadcast success/fail events that the
upload queue can listen for. The queue will process synchronously until
it receives a success event, then it will run concurrently.
Conflicts:
	commands/command_push.go
	lfs/pointer.go
	lfs/pointer_clean.go
	lfs/pointer_smudge.go
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be ScanRefs or something?

@rubyist
Copy link
Contributor Author

rubyist commented Apr 29, 2015

This is now ready for 👀

The number of uploads that will happen concurrently is configured via lfs.concurrentuploads (better name?). It defaults to 3 (better default?).

When push or pre-push happens, the files to be uploaded are calculated as before and added to a queue. The queue is then processed. In order to prevent being prompted for authentication n times simultaneously, the queue is processed sequentially until an API call completes successfully. This should happen right after the first API call. Once an API call succeeds the upload workers will wake up and begin processing the queue concurrently until it's finished.

Progress output is now a single bar that covers the combined sizes of all uploads (see gif above).

Edit to add: any upload errors that occur are collected and displayed after processing is complete.

@rubyist rubyist added review and removed wip labels Apr 29, 2015
@technoweenie
Copy link
Contributor

Found a small issue:

So I used the new git lfs push command twice in row, without a git push in between. The progress counter stayed at "0 of 100 files" until the first upload. It skipped the first 50 since the first git lfs push call already uploaded them. It'd be cool if it could update the file counter and the byte counter on an upload that can be skipped because the remote server already has it.

$ git lfs push origin master 
(61 of 100 files) 3.91 MB / 19.53 MB  20.00 % 1m56s     

I'm using this to generate 50 Git LFS files to upload.

require "time"
require "fileutils"

now = Time.now.strftime("%Y%m%d-%I%M%S")
FileUtils.mkdir_p(now)

50.times do |i|
  File.open("#{now}/#{i}.dat", "w") do |f|
    f.write(`openssl rand -hex 102400`)
  end
end

puts `git add #{now}`
puts `git commit -m "files for #{now}"`

An easy way to keep upload progress bars up to date when the file
doesn't need to be uploaded is to just drain the callback reader into
ioutil.Discard.
rubyist added a commit that referenced this pull request Apr 30, 2015
@rubyist rubyist merged commit 085e17a into master Apr 30, 2015
@rubyist rubyist deleted the concurrent-uploads branch April 30, 2015 15:53
@rubyist rubyist removed the review label Apr 30, 2015
@technoweenie
Copy link
Contributor

Here's how it looks now:

# create 50 files and commit

$ g lfs push origin master
(50 of 50 files) 9.77 MB / 9.77 MB  100.00 % 34s                                                                                                                                                                                                                                                                                                                            

# create 50 more files and another commit

$ git push origin master
(100 of 100 files) 19.53 MB / 19.53 MB  100.00 % 49s

The progress bar looks good. There's a lot of overhead, but I think we'll need to make some major API changes to solve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants