Skip to content

kvserver: allocating replacement replicas needs to consider fully satisfying constraints #94809

@AlexTalks

Description

@AlexTalks

In a recent investigation, it was discovered that when we have constraints applying to some but not all the replicas needed for a range, it is possible for a replacement operation (such as during decommission) to not consider that all constraints are no longer satisfied. This occurs when we have configurations such as num_replicas = 3, constraints = '{<some constraint>: 1}', and thus would expect to have 2 replicas that do not need to satisfy any constraints, known as "unconstrained replicas"; however replacement of the one replica that satisfies the constraint should not be possible.

This can be reproduced simply with the following:

roachprod create local -n4
roachprod stage local release v22.1.12
roachprod start local --racks=4
roachprod ssh local:1 -- './cockroach workload init kv --splits=100'
roachprod sql local:1 -- -e "alter database kv configure zone using num_replicas=3, constraints='{+rack=3: 1}';"
# wait for rebalancing
roachprod ssh local:1 -- './cockroach node decommission 4 --insecure'
# this should not succeed, but it does

Jira issue: CRDB-23152

Metadata

Metadata

Assignees

Labels

A-kv-decom-rolling-restartDecommission and Rolling RestartsA-kv-distributionRelating to rebalancing and leasing.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-kvKV Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions