Retry git clone on git clone failure in gh repo fork --clone#6962
Retry git clone on git clone failure in gh repo fork --clone#6962
git clone on git clone failure in gh repo fork --clone#6962Conversation
samcoe
left a comment
There was a problem hiding this comment.
@jsoref Thanks for starting on this work. It looks good so far. I left a comment regarding keeping the retry mechanism internal and not exposing those low level implementation details to our users. Let me know what you think.
Additionally, if we could get a test verifying this retry behavior works I think this is good to ship.
|
I'm happy to drop the flags. I have no idea how to test the failure case. |
|
@jsoref No problem. Feel free to commit code for dropping of the flags and one of our team members will take over writing of a test. |
|
Great. I couldn't even get it to build. The Makefile wasn't helpful; I ran into the incredibly confounding: $ go mod download
go: modules disabled by GO111MODULE=off; see 'go help modules'(Which does not explain anything relating to the message.) And even when I forced it to on, nothing happened. |
It might help you to remove the legacy GO111MODULE environment variable from wherever it's being set. Our project builds in a pristine Go development environment, but we can't guarantee building in an environment you have modified. |
|
That didn't work at all (I've tried repeatedly while fighting various golang projects), apparently golang hid its setting in some config file, so I wanted |
|
Maybe then explicitly setting it to "on" just for the gh project might be the way to go? |
|
That worked. Now I have a |
mislav
left a comment
There was a problem hiding this comment.
Thanks for taking this on! My main concern is that this doesn't solve your original problem (more about that in comments). My secondary concerns are that it would be best that the retry code is testable and that it supports Context cancellation.
pkg/cmd/repo/clone/clone.go
Outdated
| if err == nil { | ||
| break | ||
| } | ||
| time.Sleep(retryBackoff * (time.Duration(i) + 1)) |
There was a problem hiding this comment.
Testable Go code should never be allowed to sleep the same period of time in tests as it does during normal runtime. Even a relatively short duration like sleeping 100ms would make tests really slow. Finding a way to inject a shorter (fake) Duration in tests would be crucial.
Also, the Sleep statement would not return early if a Context has been cancelled, therefore if a Context is involved, it would be better to use a Timer and use a select statement to react to either a Timer completing or a Context being cancelled, whichever comes first. Example: https://github.com/cli/oauth/blob/adf5f733ffd14993aad9f3ff02c757a737f7c63b/device/poller.go#L30-L37
There was a problem hiding this comment.
This is beyond what I'm up for doing at this time.
It should involve introducing a poller.go, and really there should be a poller_test.go -- and even that repo doesn't have one.
git clone on git clone failure in gh repo fork --clone
|
@jsoref @mislav I went ahead and updated this PR to use a new version of the BackOff package which helped clean things up a lot. I also wrote a couple tests. Let me know what you think. Only thing I was unsure of was the exponential backoff duration variables. Not sure we really need exponential backoff here and we could simplify to just a set backoff duration. The BackOff package is fairly easy to use and well documented, I would vote for using it going forward in regards to #6985. I did have to rebase the PR to fix the tests I wrote that were failing due the recent |
| ios.SetStdoutTTY(tt.tty) | ||
| ios.SetStderrTTY(tt.tty) | ||
| tt.opts.IO = ios | ||
| t.Run(tt.name, func(t *testing.T) { |
There was a problem hiding this comment.
Something that I had not thought about until now is that when using the -run flag with go test that all this set up happens for each test even though the tests won't be run. If we encapsulate all the code in t.Run the setup won't happen unless the test is actually going to be run.
| cloneDir, err := backoff.RetryWithData(func() (string, error) { | ||
| forkedRepoURL := ghrepo.FormatRemoteURL(forkedRepo, protocol) | ||
| dir, err := gitClient.Clone(ctx, forkedRepoURL, opts.GitArgs) | ||
| if err == nil { | ||
| return dir, err | ||
| } | ||
| var execError errWithExitCode | ||
| if errors.As(err, &execError) && execError.ExitCode() == 128 { | ||
| return "", err | ||
| } | ||
| return "", backoff.Permanent(err) | ||
| }, backoff.WithContext(backoff.WithMaxRetries(opts.BackOff, 3), ctx)) |
There was a problem hiding this comment.
Will users see an indication of the failures/retries?
Since I trigger this problem regularly, I'd love to have a sense of how many retries I'm spending.
There was a problem hiding this comment.
As of right now, no they will not. Could be added in though.
There was a problem hiding this comment.
I think it's worth adding. That'd help inform decisions about the right time values.
Alternatively, I suppose I could try using this version of gh w/ that logging and see what they say...
Lemme see if I can manage to make that work locally...
pkg/cmd/repo/fork/fork.go
Outdated
| bo.Multiplier = 1.1 | ||
| bo.MaxInterval = 10 * time.Second | ||
| bo.MaxElapsedTime = 5 * time.Minute |
There was a problem hiding this comment.
I have absolutely no idea what proper numbers would be. I stole mine from something else because it was suggested.
IME, a clone very shortly after the failed clone works, but I don't have a great sense of scale and I'm not sure if it relates to the size of a repository (and if that's by commits, by number of files in the working directory, or some other measure).
My clones come from repositories of various sizes somewhat randomly/haphazardly...
Addressed comments and had offline conversation about changes.
Fixes #5118 -- hopefully
This is entirely untested, but the linter likes it, so there's that.