kvserver: honor lease preferences in store balance#84863
kvserver: honor lease preferences in store balance#84863craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
05fba76 to
2445fbc
Compare
|
On the configuration from the issue, it hasn't violated the lease preference. The leases for the kv table remain within us-east1. Which is the first three stores. export gce_project=cockroach-ephemeral
export email=austen-lease-prefs
roachprod create $email -n 9 --clouds=gce --gce-zones=us-east1-b:1,us-east1-c:1,us-east1-d:1,us-east4-a:1,us-east4-b:1,us-east4-c:1,us-west1-a:1,us-west1-b:1,us-west1-c:1 --gce-machine-type=n1-standard-4
roachprod put $email cockroach-remote cockroach
roachprod start $email
roachprod run $email:1 -- './cockroach workload init kv'
roachprod sql $email:1 -- -e 'ALTER DATABASE kv SET PRIMARY REGION "us-east1";'
roachprod sql $email:1 -- -e 'ALTER DATABASE kv ADD REGION "us-west1"';
roachprod sql $email:1 -- -e 'ALTER DATABASE kv ADD REGION "us-east4"';
roachprod sql $email:1 -- -e 'ALTER DATABASE kv SURVIVE REGION FAILURE;'
while true
do
roachprod sql $email:1 -- -e "SELECT now(), range_id, node_id, node_id = lease_holder as is_leasholder, ARRAY[node_id] <@ voting_replicas as voter, locality, database_name, table_name, start_pretty, end_pretty FROM [ SELECT start_pretty, end_pretty, range_id, unnest(replicas) as node_id, unnest(replica_localities) as locality, lease_holder, voting_replicas, database_name, table_name FROM [ SELECT * FROM crdb_internal.ranges WHERE database_name = 'kv' ] GROUP BY range_id, node_id, lease_holder, start_pretty, end_pretty, locality, voting_replicas, database_name, table_name ] WHERE node_id = lease_holder and locality not like 'cloud=gce,region=us-east1%' ORDER BY range_id ASC, node_id = lease_holder DESC, ARRAY[node_id] <@ voting_replicas ASC;"
sleep 5
done
roachprod run $email -- './cockroach workload run kv --max-rate=1000 --read-percent=95 --max-block-bytes=2' |
lidorcarmel
left a comment
There was a problem hiding this comment.
awesome testing :)
Reviewed 1 of 2 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @kvoli)
pkg/kv/kvserver/store_rebalancer.go line 678 at r1 (raw file):
for i := 0; i < len(targetVoterRepls); i++ { if targetVoterRepls[i].StoreID == leaseCandidates[newLeaseIdx].StoreID {
say our voters were on stores 5,6,7, and the allocator said the only valid target is 7.
then, when we iterate over the candidate (only 7), we see that it is ReplicaIsBehind, so we don't update newLeaseIdx.
then, here we will decide to move to 7 because newLeaseIdx is 0.
is that right?
IOW, if all candidates (from the allocator) are raft-behind, then we will move the lease to a replica that is behind, instead of a replica that is not behind but doesn't respect the lease preference.
IOW(2), what's more important? lease preferences or raft-replica-behind state? probably better to stick with the old behavior if all candidates are behind?
lidorcarmel
left a comment
There was a problem hiding this comment.
Reviewed 1 of 2 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @kvoli)
pkg/kv/kvserver/store_rebalancer.go line 678 at r1 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
say our voters were on stores 5,6,7, and the allocator said the only valid target is 7.
then, when we iterate over the candidate (only 7), we see that it isReplicaIsBehind, so we don't updatenewLeaseIdx.
then, here we will decide to move to 7 because newLeaseIdx is 0.
is that right?
IOW, if all candidates (from the allocator) are raft-behind, then we will move the lease to a replica that is behind, instead of a replica that is not behind but doesn't respect the lease preference.
IOW(2), what's more important? lease preferences or raft-replica-behind state? probably better to stick with the old behavior if all candidates are behind?
looking some more, in addition to the above PreferredLeaseholders is a bit misleading, returning a slice but that slice can only be nil or with one element, so the code here can be simplified if we change that function to return a ReplicaDescriptor instead of a slice.
Also, that function tries to return the first good candidate, which means that we should pass in a sorted slice by QPS.
Last, the allocator has a FilterBehindReplicas function which might make the code here cleaner? maybe.. I guess something like:
upToDateReplicas := FilterBehindReplicas(ctx, raftStatus, replicas)
(copied from somewhere)
right after calling PreferredLeaseholders.
WDYT?
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @kvoli)
pkg/kv/kvserver/store_rebalancer.go line 653 at r1 (raw file):
// it is more likely than not that this would only occur under // misconfiguration. leaseCandidates := sr.rq.allocator.PreferredLeaseholders(rebalanceCtx.conf, targetVoterRepls)
We should use the ValidLeaseTargets() method here, which has other checks in addition to lease preferences, such that whether the replica needs a snapshot or not, or whether the replica belongs to a suspect /draining node (which aren't allowed to receive leases). This perhaps answers part of Lidor's concerns from below as well.
kvoli
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @lidorcarmel)
pkg/kv/kvserver/store_rebalancer.go line 653 at r1 (raw file):
Previously, aayushshah15 (Aayush Shah) wrote…
We should use the
ValidLeaseTargets()method here, which has other checks in addition to lease preferences, such that whether the replica needs a snapshot or not, or whether the replica belongs to a suspect /draining node (which aren't allowed to receive leases). This perhaps answers part of Lidor's concerns from below as well.
Originally I used that method. Unfortunately it won't work with replicas that aren't already initialized. It filters on behind raft snapshots, in which case it removes the incoming voters who don't yet exist.
I think it might be best to add in a flag, similar to the testing knobs, which ignores this.
pkg/kv/kvserver/store_rebalancer.go line 678 at r1 (raw file):
then, when we iterate over the candidate (only 7), we see that it is ReplicaIsBehind, so we don't update newLeaseIdx.
That is a good point.
The filter behind replicas won't be able to be used here, as it filters incoming replicas which don't yet exist too.
I think altering the allocator to pass a flag that performs the check, if the candidate exists in the range descriptor otherwise we skip.
lidorcarmel
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @kvoli)
pkg/kv/kvserver/store_rebalancer.go line 678 at r1 (raw file):
Previously, kvoli (Austen) wrote…
then, when we iterate over the candidate (only 7), we see that it is ReplicaIsBehind, so we don't update newLeaseIdx.
That is a good point.The filter behind replicas won't be able to be used here, as it filters incoming replicas which don't yet exist too.
I think altering the allocator to pass a flag that performs the check, if the candidate exists in the range descriptor otherwise we skip.
I see, makes sense, thanks!
5923c33 to
95cdb1b
Compare
lidorcarmel
left a comment
There was a problem hiding this comment.
LGTM!
Reviewed 6 of 7 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @kvoli)
pkg/kv/kvserver/store_rebalancer.go line 669 at r2 (raw file):
// this range post-rebalance would have no suitable location. if len(validTargets) == 0 { continue
can you please add a level 3 VEventf log here? similar to some above. we're skipping a lease (rarely, very rarely) so a log would be great.
pkg/kv/kvserver/store_rebalancer.go line 676 at r2 (raw file):
newLeaseQPS := math.MaxFloat64 // Find the incoming voting replica, which is a valid lease target and
not necessarily incoming voting, right? might be existing?
Code quote:
Find the incoming voting replicapkg/kv/kvserver/allocator/base.go line 56 at r2 (raw file):
if len(constraints) == 0 { return true
perhaps an unintentional empty line?
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1477 at r2 (raw file):
Desc() *roachpb.RangeDescriptor }, // excludeLeaseRepl dictates whether the result set can include the source
drop this comment?
Code quote:
// excludeLeaseRepl dictates whether the result set can include the source
// replica.pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1523 at r2 (raw file):
// candidates which are not in the current range descriptors // replica set, however are in the candidate list. Uninitialized // replicas, will always need a snapshot.
remove the comma?
Code quote:
,pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1539 at r2 (raw file):
if a.knobs != nil && a.knobs.RaftStatusFn != nil { status = a.knobs.RaftStatusFn(leaseRepl)
empty line?
95cdb1b to
8fcd897
Compare
kvoli
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @kvoli, and @lidorcarmel)
pkg/kv/kvserver/store_rebalancer.go line 669 at r2 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
can you please add a level 3 VEventf log here? similar to some above. we're skipping a lease (rarely, very rarely) so a log would be great.
Added
pkg/kv/kvserver/store_rebalancer.go line 676 at r2 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
not necessarily incoming voting, right? might be existing?
Updated
pkg/kv/kvserver/allocator/base.go line 56 at r2 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
perhaps an unintentional empty line?
Updated
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1477 at r2 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
drop this comment?
Updated
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1523 at r2 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
remove the comma?
Updated
pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1539 at r2 (raw file):
Previously, lidorcarmel (Lidor Carmel) wrote…
empty line?
Updated
lidorcarmel
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @kvoli, and @lidorcarmel)
pkg/kv/kvserver/store_rebalancer.go line 672 at r3 (raw file):
ctx, 3, "could not find rebalance opportunities for r%d, replica found to hold lease",
maybe "no" replica found?
Code quote:
replica found to hold lease8fcd897 to
49a5f23
Compare
aayushshah15
left a comment
There was a problem hiding this comment.
and ditto on the nice testing.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @kvoli, and @lidorcarmel)
|
bors r+ |
49a5f23 to
5942101
Compare
|
Canceled. |
Mentioned merge conflict, rebased on master. |
5942101 to
dce3b43
Compare
Previously, lease preferences were not considered when selecting a voter to become the leaseholder after replica rebalancing. Instead, the voter on the store with the least QPS was selected. This patch introduces a check to try and satisfy the lease preference from the rebalanced voting set of replicas, if possible. Among voters satisfying the preference, the one on a store with the least QPS is selected to be the leaseholder. When not possible, where none of the voting replicas satisfy the lease preference, the store rebalancer will pick the miminum QPS store, as before. resolves: cockroachdb#83951 Release note: None
dce3b43 to
54484c1
Compare
|
bors r+ |
|
This PR was included in a batch that was canceled, it will be automatically retried |
|
Build failed (retrying...): |
|
Build failed (retrying...): |
|
Build succeeded: |
|
blathers backport 22.1 |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 54484c1 to blathers/backport-release-22.1-84863: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1 failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Manual reconstruction of cockroachdb#84863, mostly due to merge conflicts from \cockroachdb#79991 which will not make it to 22.1. Previously, lease preferences were not considered when selecting a voter to become the leaseholder after replica rebalancing. Instead, the voter on the store with the least QPS was selected. This patch introduces a check to try and satisfy the lease preference from the rebalanced voting set of replicas, if possible. Among voters satisfying the preference, the one on a store with the least QPS is selected to be the leaseholder. When not possible, where none of the voting replicas satisfy the lease preference, the store rebalancer will pick the miminum QPS store, as before. Release note (bug fix, performance improvement): Previously it was possible for leases to temporarily move outside of explicitly configured regions. This often happened during load based-rebalancing, something CRDB continually does across the cluster. Because of this it was also possible to observe a continual rate of lease thrashing as leases moved out of configured zones, triggered rebalancing, induce other leases to move out of configured zone, while the original set moved back, and so on. This is now fixed. Release justification: Bug fix.
Manual reconstruction of cockroachdb#84863, mostly due to merge conflicts from \cockroachdb#79991 which will not make it to 22.1. Previously, lease preferences were not considered when selecting a voter to become the leaseholder after replica rebalancing. Instead, the voter on the store with the least QPS was selected. This patch introduces a check to try and satisfy the lease preference from the rebalanced voting set of replicas, if possible. Among voters satisfying the preference, the one on a store with the least QPS is selected to be the leaseholder. When not possible, where none of the voting replicas satisfy the lease preference, the store rebalancer will pick the miminum QPS store, as before. Release note (bug fix, performance improvement): Previously it was possible for leases to temporarily move outside of explicitly configured regions. This often happened during load based-rebalancing, something CRDB continually does across the cluster. Because of this it was also possible to observe a continual rate of lease thrashing as leases moved out of configured zones, triggered rebalancing, induce other leases to move out of configured zone, while the original set moved back, and so on. This is now fixed. Release justification: Bug fix.

Previously, lease preferences were not considered when selecting a
voter to become the leaseholder after replica rebalancing. Instead, the
voter on the store with the least QPS was selected.
This patch introduces a check to try and satisfy the lease preference
from the rebalanced voting set of replicas, if possible. Among voters
satisfying the preference, the one on a store with the least QPS is
selected to be the leaseholder.
When not possible, where none of the voting replicas satisfy the lease
preference, the store rebalancer will pick the miminum QPS store, as
before.
resolves: #83951
Release note: None