lfs/transfer_queue: support multiple retries per object by ttaylorr · Pull Request #1535 · git-lfs/git-lfs

ttaylorr · 2016-09-21T23:13:52Z

This pull-request teaches the lfs/transfer_queue how to retry an object more than once per transfer.

Previously

The transfer queue used to follow these steps when executing a transfer:

Receive a bunch of objects from calls to Add()
Start either the batchApiRoutine or individualApiRoutine based on its capabilities.
Attempt to transfer the object(s), and commit to a retry if it failed with a retriable error.
1. Collect the transfers to be retried in the retryCollector
2. Close the transfer channel to kick off all of the objects to be retried.
3. Re-add all of the objects to be retried and re-process them once.
Exit.

This works fine, but limits the transfer queue to:

Responding to one set of retries per transfer.
Responding to one retry per unique OID in the queue.

Now

Now, the transfer queue knows how to immediately retry an object. It works like this:

Steps 1-4 from above.
If an object failed, and can be retried, enqueue a retry.
Collect the retry, and...
1. Make sure that we have the budget to retry the object.
2. Add() it to the next batch, or API channel (in legacy mode)
3. If in batch mode, immediately flush the batch, forcing the batchApiRoutine function to receive a new batch.
Wait until all objects have been either transferred, or abandoned.

(The ability to flush a batch was introduced in #1528, and enables the ability to immediately retry an object in a batch.)

In order to make sure that all items are processed before exiting, the way we treat the internal waitgroup q.wait changed slightly. Previously, q.wait was incremented every time we try to preform a transfer on an item. To prevent a situation where the WaitGroup would reach zero while waiting in between failing an object and retrying it, the WaitGroup is now only incremented the first time an object begins a transfer.

The transferqueue also now keeps track of the number of retries made per OID in order to prevent infinitely retrying that transfer. Currently, the maximum number of retries per object is set to 1, in order to keep the same behavior as from before this PR which is why there are no new tests.

I've also left some line comments throughout the code to clarify some further things.

/cc @technoweenie @rubyist @sinbad for 👀 and 💭

ttaylorr · 2016-09-21T23:37:52Z

lfs/transfer_queue.go

-				for _, t := range batch {
-					q.retry(t.(Transferable))
+			var errOnce sync.Once
+			for _, o := range batch {


If eventually we end up grouping retried objects and "fresh" objects in the same batch, we should only fail objects that have exceeded their retry budget, and not all of the items in the batch at once. To keep the old behavior of reporting to the errorc channel the same, I wrapped it in a sync.Once, so it can only happen once.

technoweenie

This is a great small step forward 👍

lfs/transfer_queue: support multiple retries per object

9b1bd04

ttaylorr commented Sep 21, 2016

View reviewed changes

lfs/transfer_queue: remove debug stmt

4b19449

technoweenie approved these changes Sep 22, 2016

View reviewed changes

ttaylorr merged commit fae6810 into master Sep 22, 2016

ttaylorr deleted the multiple-retries branch September 22, 2016 19:22

ttaylorr mentioned this pull request Dec 12, 2016

tq: prioritize transferring retries before new items #1758

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lfs/transfer_queue: support multiple retries per object#1535

lfs/transfer_queue: support multiple retries per object#1535
ttaylorr merged 2 commits intomasterfrom
multiple-retries

ttaylorr commented Sep 21, 2016

Uh oh!

ttaylorr Sep 21, 2016

Uh oh!

technoweenie left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ttaylorr commented Sep 21, 2016

Previously

Now

Uh oh!

ttaylorr Sep 21, 2016

Choose a reason for hiding this comment

Uh oh!

technoweenie left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants