-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: eagerly move leases to preferred regions #106100
Description
When a leaseholder is lost, any surviving replica may acquire the lease, even if it violates lease preferences. There are two main reasons for this: we need to elect a new Raft leader who will acquire the lease, which is agnostic to lease preferences, and there may not even be any surviving replicas that satisfy the lease preferences at all.
However, after acquiring a lease, we rely on the replicate queue to transfer the lease back to a replica that conforms with the preferences, which can take several minutes in some cases. In multi-region clusters, this can cause severe latency degradation if the lease is acquired in a remote region.
We should eagerly move leases back to preferred regions when acquiring them outside of the preferred regions.
To reproduce:
$ roachprod create local -n 5
$ roachprod start local --racks 3
> create database kv;
> alter database kv configure zone using num_replicas=5, constraints='{"+rack=0": 2, "+rack=1": 2, "+rack=2": 1}', lease_preferences='[[+rack=0]]';
$ ./cockroach workload init kv --splits 1000
# Wait for all leases to move to rack=0 nodes.
> select range_id, start_key, end_key, lease_holder, lease_holder_locality from [show ranges from database kv with details] where lease_holder_locality not like '%rack=0';
# Alternatively, mass-relocate all violating leases to n4:
> alter range relocate lease to 4 for select range_id from [show ranges from database kv with details] where lease_holder_locality not like '%rack=0';
$ roachprod stop local:4
# In this case, we actually see leases move back to rack=0 quite soon. In a production cluster, we saw leases linger for 10 minutes. Unclear why.
> select range_id, start_key, end_key, lease_holder, lease_holder_locality from [show ranges from database kv with details] where lease_holder_locality not like '%rack=0';
# However, if we move leases to a different node, it will take several minutes before they're moved back to the preferences.
> alter range relocate lease to 5 for select range_id from [show ranges from database kv];
> select range_id, start_key, end_key, lease_holder, lease_holder_locality from [show ranges from database kv with details] where lease_holder_locality not like '%rack=0';
# Remember to use --racks 3 when restarting n4.
$ roachprod start local:4 --racks 3
Jira issue: CRDB-29399
Epic CRDB-27235