server: busy loop through resolver list during join process#53713
server: busy loop through resolver list during join process#53713craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
pkg/server/init.go
Outdated
| return nil | ||
| } | ||
|
|
||
| if errors.Is(err, ErrJoinRPCUnsupported) || errors.Is(err, ErrIncompatibleBinaryVersion) { |
There was a problem hiding this comment.
you should handle ctx.Done() and ctx.Err() here too
There was a problem hiding this comment.
That was an intentional omission. This API is a bit crufty as written (I've just added a TODO for what needs to change), but basically as written it needs to return nil errors on context cancellation. Which is what we're doing below. Here I just wanted to avoid doing a non-nil error check and return a nil error anyway.
There was a problem hiding this comment.
Yea there were some pretty crappy hacks in there, so I'll wait for #53934 to land and have this sit on top of that instead.
There was a problem hiding this comment.
ok I'll let you rebase and then we can wrap this up
6aa2985 to
c6b518f
Compare
Previously we used two separate channels to relay the results of using the join RPC, one for possible error in attempting to do and one for the actual result if successful. There was no real reason for this, and the code gluing them together was a bit fragile (and hid a latent bug). Previously we could deadlock if the select loop in ServeAndWait, for a successful join attempt, drained the errCh before draining joinCh. Because the successful joinCh drain then followed up with trying to drain the errCh (already drained by now), it would just block indefinitely. We just simplify all this to using one channel to encapsulate the results of the join attempt. Release justification: bug fixes and low-risk updates to new functionality Release note: None
Deferred doing this in cockroachdb#52526. Probably a good idea to do have it, it'll bring down the cluster convergence time (time taken for all nodes to find out about the initialization) by a bit. Release justification: low risk, high benefit changes to existing functionality Release note: None
c6b518f to
c6bc053
Compare
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @tbg)
|
TFTR! bors r+ |
|
Build succeeded: |
Deferred doing this in #52526. Probably a good idea to do have it, it'll
bring down the cluster convergence time (time taken for all nodes to
find out about the initialization) by a bit.
Release justification: low risk, high benefit changes to existing functionality
Release note: None