Skip to content

server: busy loop through resolver list during join process#53713

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
irfansharif:200831.join-busy-loop
Sep 4, 2020
Merged

server: busy loop through resolver list during join process#53713
craig[bot] merged 2 commits intocockroachdb:masterfrom
irfansharif:200831.join-busy-loop

Conversation

@irfansharif
Copy link
Copy Markdown
Contributor

Deferred doing this in #52526. Probably a good idea to do have it, it'll
bring down the cluster convergence time (time taken for all nodes to
find out about the initialization) by a bit.

Release justification: low risk, high benefit changes to existing functionality
Release note: None

@irfansharif irfansharif requested a review from tbg August 31, 2020 22:02
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

return nil
}

if errors.Is(err, ErrJoinRPCUnsupported) || errors.Is(err, ErrIncompatibleBinaryVersion) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should handle ctx.Done() and ctx.Err() here too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was an intentional omission. This API is a bit crufty as written (I've just added a TODO for what needs to change), but basically as written it needs to return nil errors on context cancellation. Which is what we're doing below. Here I just wanted to avoid doing a non-nil error check and return a nil error anyway.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea there were some pretty crappy hacks in there, so I'll wait for #53934 to land and have this sit on top of that instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I'll let you rebase and then we can wrap this up

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, PTAL.

@irfansharif irfansharif force-pushed the 200831.join-busy-loop branch from 6aa2985 to c6b518f Compare September 3, 2020 14:58
Previously we used two separate channels to relay the results of using
the join RPC, one for possible error in attempting to do and one for the
actual result if successful. There was no real reason for this, and the
code gluing them together was a bit fragile (and hid a latent bug).

Previously we could deadlock if the select loop in ServeAndWait, for a
successful join attempt, drained the errCh before draining joinCh.
Because the successful joinCh drain then followed up with trying to
drain the errCh (already drained by now), it would just block
indefinitely.

We just simplify all this to using one channel to encapsulate the
results of the join attempt.

Release justification: bug fixes and low-risk updates to new functionality
Release note: None
Deferred doing this in cockroachdb#52526. Probably a good idea to do have it, it'll
bring down the cluster convergence time (time taken for all nodes to
find out about the initialization) by a bit.

Release justification: low risk, high benefit changes to existing functionality
Release note: None
@irfansharif irfansharif force-pushed the 200831.join-busy-loop branch from c6b518f to c6bc053 Compare September 4, 2020 17:37
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @tbg)

@irfansharif
Copy link
Copy Markdown
Contributor Author

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 4, 2020

Build succeeded:

@craig craig bot merged commit 7cd72a7 into cockroachdb:master Sep 4, 2020
@irfansharif irfansharif deleted the 200831.join-busy-loop branch September 4, 2020 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants