server: make the Drain interfaces available to SQL-only servers#76294
server: make the Drain interfaces available to SQL-only servers#76294craig[bot] merged 8 commits intocockroachdb:masterfrom
Conversation
57057ae to
e141e61
Compare
Release note: None
Release note: None
Release note: None
Release note: None
Release note: None
cameronnunez
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 3 of 3 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 4 of 4 files at r7.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)
pkg/server/tenant.go, line 78 at r8 (raw file):
} // SQLServerWrapper is a utility structs that encapsulates
nit: "is a utility struct"
jeffswenson
left a comment
There was a problem hiding this comment.
LGTM
It would be nice if there were an rpc to cancel a drain. There is a mismatch between the CRDB draining state and the serverless draining state. When the operator decides to reduce the number of pods allocated to a serverless cluster, it marks the surplus pods in a draining state. The sqlproxy maintains existing connections to draining pods, but it will not send new connections to them. If the cluster's utilization increases, the pods are put back in a serving state. If there was a way to cancel CRDB drains, we could notify the sql server it is draining as soon as the operator decides to reduce capacity. With one way drains, the operator needs to wait to send the drain until it is ready to remove the pod.
| ctx = t.AnnotateCtx(ctx) | ||
|
|
||
| // Which node is this request for? | ||
| parsedInstanceID, local, err := t.status.parseInstanceID(req.NodeId) |
There was a problem hiding this comment.
FYI: We do not track instance ID in the CC operator. The operator thinks in terms of pods. Determining the instance ID of the pod we want to remove is more difficult than sending an RPC to the specific pod we want to drain. I suspect we will pass NodeId == "" when sending a Drain rpc.
There was a problem hiding this comment.
It would be nice if there were an rpc to cancel a drain.
Point taken. let's discuss this in #76423.
I also encourage you to raise that issue to your nearest PM, so we can put it in our roadmap.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @cameronnunez and @jeffswenson)
pkg/server/tenant.go, line 78 at r8 (raw file):
Previously, cameronnunez (Cameron Nunez) wrote…
nit: "is a utility struct"
Fixed. Thanks
pkg/server/tenant_admin.go, line 126 at r6 (raw file):
I suspect we will pass NodeId == "" when sending a Drain rpc.
Yes, that works.
The semantics as implemented here ensure that we're transparently compatible with what is supported on storage nodes, so the command-line tooling can stay the same.
|
TFYRs! bors r=cameronnunez,JeffSwenson |
|
bors r- |
|
Canceled. |
|
bors r=cameronnunez,JeffSwenson |
|
Build failed: |
|
hm lint failures |
Release note: None
Release note: None
This brings the `mt start-sql` command "under the fold" with regards to server shutdown. It now properly waits for client connections to go away, subject to the same cluster settings as usual. Release note: None
|
bors r=cameronnunez,JeffSwenson |
|
Build succeeded: |
Fixes #74412
First commit from #76292; the reviewers are invited to ignore it in this PR.
See individual commits for details.
cc @cockroachdb/obs-inf-prs