Skip to content

server: make the Drain interfaces available to SQL-only servers#76294

Merged
craig[bot] merged 8 commits intocockroachdb:masterfrom
knz:20220209-drain
Feb 11, 2022
Merged

server: make the Drain interfaces available to SQL-only servers#76294
craig[bot] merged 8 commits intocockroachdb:masterfrom
knz:20220209-drain

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Feb 9, 2022

Fixes #74412

First commit from #76292; the reviewers are invited to ignore it in this PR.

See individual commits for details.

cc @cockroachdb/obs-inf-prs

@knz knz requested a review from a team February 9, 2022 15:00
@knz knz requested a review from a team as a code owner February 9, 2022 15:00
@knz knz requested review from erikgrinaker and removed request for a team February 9, 2022 15:00
@blathers-crl blathers-crl bot added the T-server-and-security DB Server & Security label Feb 9, 2022
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@erikgrinaker erikgrinaker removed their request for review February 9, 2022 15:12
@knz knz force-pushed the 20220209-drain branch 3 times, most recently from 57057ae to e141e61 Compare February 9, 2022 17:53
@knz knz requested review from a team as code owners February 10, 2022 13:21
Copy link
Copy Markdown
Contributor

@cameronnunez cameronnunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Good work!

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, 3 of 3 files at r4, 1 of 1 files at r5, 3 of 3 files at r6, 4 of 4 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cameronnunez and @knz)


pkg/server/tenant.go, line 78 at r8 (raw file):

}

// SQLServerWrapper is a utility structs that encapsulates

nit: "is a utility struct"

Copy link
Copy Markdown
Collaborator

@jeffswenson jeffswenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

It would be nice if there were an rpc to cancel a drain. There is a mismatch between the CRDB draining state and the serverless draining state. When the operator decides to reduce the number of pods allocated to a serverless cluster, it marks the surplus pods in a draining state. The sqlproxy maintains existing connections to draining pods, but it will not send new connections to them. If the cluster's utilization increases, the pods are put back in a serving state. If there was a way to cancel CRDB drains, we could notify the sql server it is draining as soon as the operator decides to reduce capacity. With one way drains, the operator needs to wait to send the drain until it is ready to remove the pod.

ctx = t.AnnotateCtx(ctx)

// Which node is this request for?
parsedInstanceID, local, err := t.status.parseInstanceID(req.NodeId)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: We do not track instance ID in the CC operator. The operator thinks in terms of pods. Determining the instance ID of the pod we want to remove is more difficult than sending an RPC to the specific pod we want to drain. I suspect we will pass NodeId == "" when sending a Drain rpc.

Copy link
Copy Markdown
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if there were an rpc to cancel a drain.

Point taken. let's discuss this in #76423.
I also encourage you to raise that issue to your nearest PM, so we can put it in our roadmap.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cameronnunez and @jeffswenson)


pkg/server/tenant.go, line 78 at r8 (raw file):

Previously, cameronnunez (Cameron Nunez) wrote…

nit: "is a utility struct"

Fixed. Thanks


pkg/server/tenant_admin.go, line 126 at r6 (raw file):

I suspect we will pass NodeId == "" when sending a Drain rpc.

Yes, that works.

The semantics as implemented here ensure that we're transparently compatible with what is supported on storage nodes, so the command-line tooling can stay the same.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 11, 2022

TFYRs!

bors r=cameronnunez,JeffSwenson

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 11, 2022

bors r-

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 11, 2022

Canceled.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 11, 2022

bors r=cameronnunez,JeffSwenson

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 11, 2022

Build failed:

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 11, 2022

hm lint failures
bors r-

knz added 3 commits February 11, 2022 13:52
This brings the `mt start-sql` command "under the fold" with regards
to server shutdown. It now properly waits for client connections to go
away, subject to the same cluster settings as usual.

Release note: None
@knz
Copy link
Copy Markdown
Contributor Author

knz commented Feb 11, 2022

bors r=cameronnunez,JeffSwenson

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 11, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

T-server-and-security DB Server & Security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

server: equip the mt-start-sql server code with a drain process

4 participants