util: fix retry.WithMaxAttempts context cancelled before run.#25612
util: fix retry.WithMaxAttempts context cancelled before run.#25612craig[bot] merged 1 commit intocockroachdb:masterfrom
retry.WithMaxAttempts context cancelled before run.#25612Conversation
|
@a-robinson Testing could be really tricky because I've run |
|
@windchan7 try reproducing on a gceworker ( Also, to have GH auto-close the issues, you need to have each one on its own line: Fixes #a. You can just edit the PR message and leave the commit message as is. |
a-robinson
left a comment
There was a problem hiding this comment.
Nice find!
@a-robinson Testing could be really tricky because I've run
make stress PKG=./pkg/serverlocally. It's been over 200 runs but I still can't observe the seg fault.
You'd probably have a better time just stressing a test from one of the other issues, since there we at least know which test it happened in. And running on a gceworker, as @tschottdorf suggested.
pkg/util/retry/retry.go
Outdated
| // It guarantees fn will run at least once. Otherwise, an error will be returned. | ||
| func WithMaxAttempts(ctx context.Context, opts Options, n int, fn func() error) error { | ||
| if n <= 0 { | ||
| return errors.Errorf("max attempts should not be 0 or below") |
There was a problem hiding this comment.
In case this ever triggers, it'd likely be helpful to include the value of n as part of the message
pkg/util/retry/retry.go
Outdated
| } | ||
| } | ||
| if err == nil { | ||
| err = errors.Errorf("function never gets run") |
There was a problem hiding this comment.
Can we make this error message a little clearer? For example, errors.Wrap("did not run function", ctx.Err())
If context gets cancelled right after `retry.WithMaxAttempts` runs, the function passed to it will never gets run. Now `retry.WithMaxAttempts` will at least run the function once otherwise an error will be returned. Making this change because there are places such as `show_cluster_setting.go` require the passed in function to be run at least once. Otherwise there will be seg fault. Fixes: 25600, 25603, 25570, 25567, 25566, 25511, 25485. Release note: None
|
LGTM but wait for Alex! Reviewed 1 of 1 files at r1, 1 of 1 files at r2. pkg/util/retry/retry.go, line 169 at r1 (raw file):
s/gets/ or better what Alex said. Comments from Reviewable |
|
@a-robinson Hey Alex, is it good to go? |
|
Thanks a lot! |
|
bors r+ |
Build failed |
|
@tschottdorf In this case, I should file an issue about the test flakiness on |
|
bors r+ |
Normally, yes, but in this case I'll fix it up as part of #25625, since I introduced a handful of similar flakes. Thanks for commenting about it or else I wouldn't have noticed! |
Canceled (will resume) |
24883: dep: Bump grpc-go version r=a-robinson a=a-robinson And pull in new packages -- in particular, the encoding/proto package isn't needed for compilation but is needed at runtime. Release note: None --------------------- We should wait to merge this until they've cut the 1.12 release tag so that we aren't just at an arbitrary commit in their git history but I'm sending out the PR now so that I (or whoever would have done this) don't have to deal with debugging the missing encoding/proto package when it comes time to merge this. As tested in #17370 (comment), this gives a 5-10% boost in whole-cluster throughput and improved tail latencies when run with a highly concurrent workload. It appears to have little performance effect for lower-concurrency workloads. 25410: sql: run schema changes after CREATE TABLE in txn r=vivekmenezes a=vivekmenezes Includes a commit from #25362 and should be reviewed after that change. 25612: util: fix `retry.WithMaxAttempts` context cancelled before run. r=windchan7 a=windchan7 If context gets cancelled right after `retry.WithMaxAttempts` runs, the function passed to it will never gets run. Now `retry.WithMaxAttempts` will at least run the function once otherwise an error will be returned. Making this change because there are places such as `show_cluster_setting.go` require the passed in function to be run at least once. Otherwise there will be seg fault. Fixes: #25600. Fixes: #25603. Fixes: #25570. Fixes: #25567. Fixes: #25566. Fixes: #25511. Fixes: #25485. Release note: None 25625: storage: Adding testing knob to disable automatic lease renewals r=a-robinson a=a-robinson In order to fix the test flakes caused by automatic lease renewals Fixes #25537 Fixes #25540 Fixes #25568 Fixes #25573 Fixes #25576 Fixes #25589 Fixes #25594 Fixes #25599 Fixes #25605 Fixes #25620 Release note: None Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com> Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com> Co-authored-by: Victor Chen <victor@cockroachlabs.com>
|
👍 |
|
I think this should be backported to 2.0 as well, given that it fixes a segfault. |
Build succeeded |
If context gets cancelled right after
retry.WithMaxAttemptsruns, the functionpassed to it will never gets run. Now
retry.WithMaxAttemptswill at least runthe function once otherwise an error will be returned.
Making this change because there are places such as
show_cluster_setting.gorequire the passed in function to be run at least once. Otherwise there will
be seg fault.
Fixes: #25600.
Fixes: #25603.
Fixes: #25570.
Fixes: #25567.
Fixes: #25566.
Fixes: #25511.
Fixes: #25485.
Release note: None