Describe the problem
In a cluster with at least 3 localities, adding a new node to an existing locality reliably triggers a lease count divergence. The lease counts continue to diverge until this newly added node is fully hydrated with the ~mean number of replicas.
This looks something like the following:

Cause
In order to understand how this happens, consider a cluster with 3 racks: rack=0, rack=1, rack=2 and 9 nodes. Let's walk through adding a new node (n10) to rack=0. Because of the diversity heuristic, only the other existing nodes in rack=0 are allowed to shed their replicas away to n10. For ranges that have their leases also in rack=0, this means that those leaseholders will first need to shed their lease away to one of the nodes in racks 1 or 2 and expect those nodes to execute the rebalance. This will continue until n10 has received roughly ~mean number of replicas relative to the rest of the nodes in the cluster.
However, in a cluster with enough data, fully hydrating the new node will take a while, sometimes on the order of hours (note that n10 can only receive new replicas at the snapshot rate dictated by kv.snapshot_rebalance.max_rate). Until this happens, nodes in rack=0 will continue shedding leases away to nodes in racks 1 and 2 until they basically have zero leases.
To Reproduce
I can reproduce this by following the steps outlined above on both 20.2 and 21.1.
Additional details
Until n10 is fully hydrated, nodes in rack=0 will continue hitting the considerRebalance path (for the ranges for which they are the leaseholders) in the allocator:
|
return rq.considerRebalance(ctx, repl, voterReplicas, nonVoterReplicas, canTransferLeaseFrom, dryRun) |
Since there is indeed a valid rebalance candidate, RebalanceVoter will return ok==true here:
|
addTarget, removeTarget, details, ok := rq.allocator.RebalanceVoter( |
|
ctx, |
|
zone, |
|
repl.RaftStatus(), |
|
existingVoters, |
|
existingNonVoters, |
|
rangeUsageInfo, |
|
storeFilterThrottled, |
|
) |
This will then lead to the call to maybeTransferLeaseAway here:
|
} else if done, err := rq.maybeTransferLeaseAway( |
|
ctx, repl, removeTarget.StoreID, dryRun, canTransferLeaseFrom, |
This will transfer the lease away to another replica.
/cc. @cockroachdb/kv
gz#5876
Epic CRDB-10569
gz#9817
Describe the problem
In a cluster with at least 3 localities, adding a new node to an existing locality reliably triggers a lease count divergence. The lease counts continue to diverge until this newly added node is fully hydrated with the ~mean number of replicas.
This looks something like the following:

Cause
In order to understand how this happens, consider a cluster with 3 racks:
rack=0, rack=1, rack=2and 9 nodes. Let's walk through adding a new node (n10) torack=0. Because of the diversity heuristic, only the other existing nodes inrack=0are allowed to shed their replicas away ton10. For ranges that have their leases also inrack=0, this means that those leaseholders will first need to shed their lease away to one of the nodes in racks1or2and expect those nodes to execute the rebalance. This will continue untiln10has received roughly ~mean number of replicas relative to the rest of the nodes in the cluster.However, in a cluster with enough data, fully hydrating the new node will take a while, sometimes on the order of hours (note that
n10can only receive new replicas at the snapshot rate dictated bykv.snapshot_rebalance.max_rate). Until this happens, nodes inrack=0will continue shedding leases away to nodes in racks1and2until they basically have zero leases.To Reproduce
I can reproduce this by following the steps outlined above on both 20.2 and 21.1.
Additional details
Until
n10is fully hydrated, nodes inrack=0will continue hitting theconsiderRebalancepath (for the ranges for which they are the leaseholders) in the allocator:cockroach/pkg/kv/kvserver/replicate_queue.go
Line 487 in 7495434
Since there is indeed a valid rebalance candidate,
RebalanceVoterwill returnok==truehere:cockroach/pkg/kv/kvserver/replicate_queue.go
Lines 1079 to 1087 in 7495434
This will then lead to the call to
maybeTransferLeaseAwayhere:cockroach/pkg/kv/kvserver/replicate_queue.go
Lines 1106 to 1107 in 7495434
This will transfer the lease away to another replica.
/cc. @cockroachdb/kv
gz#5876
Epic CRDB-10569
gz#9817