Skip to content

Method to ask cockroachdb if it is "safe" to decommission a node #70486

@data-matt

Description

@data-matt

Is your feature request related to a problem? Please describe.
Our operators have automated the provisioning of cockroachdb clusters on-premise. We would like to be able ask cockroachdb if it is safe to remove a node.

The main concern is around data redundancy, i.e How do we know if we will have enough replicas in zone or region?
We don't want to inspect zone constraints, we want to simply ask cockroach if we remove a node, will we be able to avoid an outage? We want to guarantee that we can maintain the correct RF for all databases on the clusters.

For more context, we have to imagine that end users have access to a webui portal, where they can remove nodes. At scale we can't manually verify every removal of a node for 100s of clusters.

For example:
If we have 9 nodes across 3 regions, can we safely remove 4 nodes and maintain quorum for the databases with 5 RF?
If we have 6 nodes in 1 region, can we safely remove 1 node?
Do we have under replicated ranges that are about to be up-replicated to X node?

Describe the solution you'd like
A solution to ask this question from SQL layer would be easy for operators to use.

Alternatively:
cockroach node decommission --dry_run

Describe alternatives you've considered
SQL statements retrieving the replication factor for all zones and then comparing it to node counts.

Additional context
We have seen that there is "cockroach node decommission". However it does not appear to finish gracefully in situations as described above.

gz#9825

gz#10113

gz#10216

Jira issue: CRDB-10098

Epic CRDB-20924

Metadata

Metadata

Assignees

Labels

A-cli-adminCLI commands that pertain to controlling and configuring nodesA-kv-decom-rolling-restartDecommission and Rolling RestartsC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)O-postmortemOriginated from a Postmortem action item.T-kvKV Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions