server: make the Drain interfaces available to SQL-only servers by knz · Pull Request #76294 · cockroachdb/cockroach

knz · 2022-02-09T15:00:31Z

Fixes #74412

First commit from #76292; the reviewers are invited to ignore it in this PR.

See individual commits for details.

cc @cockroachdb/obs-inf-prs

cockroach-teamcity · 2022-02-09T15:00:44Z

This change is

Release note: None

cameronnunez

Good work!

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 3 of 3 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 4 of 4 files at r7.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)

pkg/server/tenant.go, line 78 at r8 (raw file):

}

// SQLServerWrapper is a utility structs that encapsulates

nit: "is a utility struct"

jeffswenson

LGTM

It would be nice if there were an rpc to cancel a drain. There is a mismatch between the CRDB draining state and the serverless draining state. When the operator decides to reduce the number of pods allocated to a serverless cluster, it marks the surplus pods in a draining state. The sqlproxy maintains existing connections to draining pods, but it will not send new connections to them. If the cluster's utilization increases, the pods are put back in a serving state. If there was a way to cancel CRDB drains, we could notify the sql server it is draining as soon as the operator decides to reduce capacity. With one way drains, the operator needs to wait to send the drain until it is ready to remove the pod.

jeffswenson · 2022-02-10T22:24:24Z

pkg/server/tenant_admin.go

+	ctx = t.AnnotateCtx(ctx)
+
+	// Which node is this request for?
+	parsedInstanceID, local, err := t.status.parseInstanceID(req.NodeId)


FYI: We do not track instance ID in the CC operator. The operator thinks in terms of pods. Determining the instance ID of the pod we want to remove is more difficult than sending an RPC to the specific pod we want to drain. I suspect we will pass NodeId == "" when sending a Drain rpc.

knz

It would be nice if there were an rpc to cancel a drain.

Point taken. let's discuss this in #76423.
I also encourage you to raise that issue to your nearest PM, so we can put it in our roadmap.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cameronnunez and @jeffswenson)

pkg/server/tenant.go, line 78 at r8 (raw file):

Previously, cameronnunez (Cameron Nunez) wrote…

nit: "is a utility struct"

Fixed. Thanks

pkg/server/tenant_admin.go, line 126 at r6 (raw file):

I suspect we will pass NodeId == "" when sending a Drain rpc.

Yes, that works.

The semantics as implemented here ensure that we're transparently compatible with what is supported on storage nodes, so the command-line tooling can stay the same.

knz · 2022-02-11T12:12:14Z

TFYRs!

bors r=cameronnunez,JeffSwenson

knz · 2022-02-11T12:19:30Z

bors r-

craig · 2022-02-11T12:19:32Z

Canceled.

knz · 2022-02-11T12:24:34Z

bors r=cameronnunez,JeffSwenson

craig · 2022-02-11T12:48:11Z

Build failed:

GitHub CI (Cockroach)

knz · 2022-02-11T12:49:44Z

hm lint failures
bors r-

Release note: None

This brings the `mt start-sql` command "under the fold" with regards to server shutdown. It now properly waits for client connections to go away, subject to the same cluster settings as usual. Release note: None

knz · 2022-02-11T12:53:57Z

bors r=cameronnunez,JeffSwenson

craig · 2022-02-11T13:58:18Z

Build succeeded:

GitHub CI (Cockroach)

knz requested a review from a team February 9, 2022 15:00

knz requested a review from a team as a code owner February 9, 2022 15:00

knz requested review from erikgrinaker and removed request for a team February 9, 2022 15:00

blathers-crl bot added the T-server-and-security DB Server & Security label Feb 9, 2022

erikgrinaker removed their request for review February 9, 2022 15:12

knz force-pushed the 20220209-drain branch 3 times, most recently from 57057ae to e141e61 Compare February 9, 2022 17:53

knz added 5 commits February 10, 2022 13:08

server: move the drain delegate code to a separate function

bd6749a

Release note: None

server: split the shutdown code to a different function

de87350

Release note: None

server: split the local RPC handler to a separate function

6ef6482

Release note: None

server: lift the drain logic into a properly encapsulated object

e7bb503

Release note: None

server: make SQL-only server use a regular grpcServer

f451b0e

Release note: None

knz force-pushed the 20220209-drain branch from e141e61 to 20bdaf1 Compare February 10, 2022 13:21

knz requested review from a team as code owners February 10, 2022 13:21

cameronnunez approved these changes Feb 10, 2022

View reviewed changes

jeffswenson approved these changes Feb 10, 2022

View reviewed changes

knz mentioned this pull request Feb 11, 2022

server: provide the ability to cancel a server shutdown after it has started #76423

Open

knz commented Feb 11, 2022

View reviewed changes

knz force-pushed the 20220209-drain branch from 20bdaf1 to 9434b48 Compare February 11, 2022 12:24

knz force-pushed the 20220209-drain branch from 9434b48 to dc7ef8d Compare February 11, 2022 12:50

knz added 3 commits February 11, 2022 13:52

server: implement the Drain RPC for SQL-only servers

5476fe3

Release note: None

server: define DrainClients on both TestServer and TestTenant

30a73f0

Release note: None

cli,server: make the CLI handle shutdown in a uniform manner

bbf9613

This brings the `mt start-sql` command "under the fold" with regards to server shutdown. It now properly waits for client connections to go away, subject to the same cluster settings as usual. Release note: None

knz force-pushed the 20220209-drain branch from dc7ef8d to bbf9613 Compare February 11, 2022 12:52

craig bot merged commit 2356961 into cockroachdb:master Feb 11, 2022

knz mentioned this pull request Feb 15, 2022

query_wait does not work with tenant clusters #67783

Closed

jeffswenson mentioned this pull request Apr 20, 2022

sql: high create and alter user tail latency #80288

Closed

Conversation

knz commented Feb 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Feb 9, 2022

Uh oh!

cameronnunez left a comment

Choose a reason for hiding this comment

Uh oh!

jeffswenson left a comment

Choose a reason for hiding this comment

Uh oh!

jeffswenson Feb 10, 2022

Choose a reason for hiding this comment

Uh oh!

knz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

knz commented Feb 11, 2022

Uh oh!

knz commented Feb 11, 2022

Uh oh!

craig bot commented Feb 11, 2022

Uh oh!

knz commented Feb 11, 2022

Uh oh!

craig bot commented Feb 11, 2022

Uh oh!

knz commented Feb 11, 2022

Uh oh!

knz commented Feb 11, 2022

Uh oh!

craig bot commented Feb 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

knz commented Feb 9, 2022 •

edited

Loading

knz left a comment •

edited

Loading