Skip to content

kvserver: lease counts diverge when a new node is added to a cluster #67740

@aayushshah15

Description

@aayushshah15

Describe the problem

In a cluster with at least 3 localities, adding a new node to an existing locality reliably triggers a lease count divergence. The lease counts continue to diverge until this newly added node is fully hydrated with the ~mean number of replicas.

This looks something like the following:
image (1)

Cause

In order to understand how this happens, consider a cluster with 3 racks: rack=0, rack=1, rack=2 and 9 nodes. Let's walk through adding a new node (n10) to rack=0. Because of the diversity heuristic, only the other existing nodes in rack=0 are allowed to shed their replicas away to n10. For ranges that have their leases also in rack=0, this means that those leaseholders will first need to shed their lease away to one of the nodes in racks 1 or 2 and expect those nodes to execute the rebalance. This will continue until n10 has received roughly ~mean number of replicas relative to the rest of the nodes in the cluster.

However, in a cluster with enough data, fully hydrating the new node will take a while, sometimes on the order of hours (note that n10 can only receive new replicas at the snapshot rate dictated by kv.snapshot_rebalance.max_rate). Until this happens, nodes in rack=0 will continue shedding leases away to nodes in racks 1 and 2 until they basically have zero leases.

To Reproduce

I can reproduce this by following the steps outlined above on both 20.2 and 21.1.

Additional details

Until n10 is fully hydrated, nodes in rack=0 will continue hitting the considerRebalance path (for the ranges for which they are the leaseholders) in the allocator:

return rq.considerRebalance(ctx, repl, voterReplicas, nonVoterReplicas, canTransferLeaseFrom, dryRun)

Since there is indeed a valid rebalance candidate, RebalanceVoter will return ok==true here:

addTarget, removeTarget, details, ok := rq.allocator.RebalanceVoter(
ctx,
zone,
repl.RaftStatus(),
existingVoters,
existingNonVoters,
rangeUsageInfo,
storeFilterThrottled,
)

This will then lead to the call to maybeTransferLeaseAway here:

} else if done, err := rq.maybeTransferLeaseAway(
ctx, repl, removeTarget.StoreID, dryRun, canTransferLeaseFrom,

This will transfer the lease away to another replica.

/cc. @cockroachdb/kv

gz#5876

Epic CRDB-10569

gz#9817

Metadata

Metadata

Assignees

Labels

A-kv-distributionRelating to rebalancing and leasing.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-kvKV Team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions