-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: lease counts diverge when a new node is added to a cluster #67740
Description
Describe the problem
In a cluster with at least 3 localities, adding a new node to an existing locality reliably triggers a lease count divergence. The lease counts continue to diverge until this newly added node is fully hydrated with the ~mean number of replicas.
This looks something like the following:

Cause
In order to understand how this happens, consider a cluster with 3 racks: rack=0, rack=1, rack=2 and 9 nodes. Let's walk through adding a new node (n10) to rack=0. Because of the diversity heuristic, only the other existing nodes in rack=0 are allowed to shed their replicas away to n10. For ranges that have their leases also in rack=0, this means that those leaseholders will first need to shed their lease away to one of the nodes in racks 1 or 2 and expect those nodes to execute the rebalance. This will continue until n10 has received roughly ~mean number of replicas relative to the rest of the nodes in the cluster.
However, in a cluster with enough data, fully hydrating the new node will take a while, sometimes on the order of hours (note that n10 can only receive new replicas at the snapshot rate dictated by kv.snapshot_rebalance.max_rate). Until this happens, nodes in rack=0 will continue shedding leases away to nodes in racks 1 and 2 until they basically have zero leases.
To Reproduce
I can reproduce this by following the steps outlined above on both 20.2 and 21.1.
Additional details
Until n10 is fully hydrated, nodes in rack=0 will continue hitting the considerRebalance path (for the ranges for which they are the leaseholders) in the allocator:
cockroach/pkg/kv/kvserver/replicate_queue.go
Line 487 in 7495434
| return rq.considerRebalance(ctx, repl, voterReplicas, nonVoterReplicas, canTransferLeaseFrom, dryRun) |
Since there is indeed a valid rebalance candidate, RebalanceVoter will return ok==true here:
cockroach/pkg/kv/kvserver/replicate_queue.go
Lines 1079 to 1087 in 7495434
| addTarget, removeTarget, details, ok := rq.allocator.RebalanceVoter( | |
| ctx, | |
| zone, | |
| repl.RaftStatus(), | |
| existingVoters, | |
| existingNonVoters, | |
| rangeUsageInfo, | |
| storeFilterThrottled, | |
| ) |
This will then lead to the call to maybeTransferLeaseAway here:
cockroach/pkg/kv/kvserver/replicate_queue.go
Lines 1106 to 1107 in 7495434
| } else if done, err := rq.maybeTransferLeaseAway( | |
| ctx, repl, removeTarget.StoreID, dryRun, canTransferLeaseFrom, |
This will transfer the lease away to another replica.
/cc. @cockroachdb/kv
gz#5876
Epic CRDB-10569
gz#9817