Skip to content

kvserver: honor lease preferences in store balance#84863

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
kvoli:220718.store-rebalancer-leasepref
Jul 28, 2022
Merged

kvserver: honor lease preferences in store balance#84863
craig[bot] merged 1 commit intocockroachdb:masterfrom
kvoli:220718.store-rebalancer-leasepref

Conversation

@kvoli
Copy link
Copy Markdown
Contributor

@kvoli kvoli commented Jul 21, 2022

Previously, lease preferences were not considered when selecting a
voter to become the leaseholder after replica rebalancing. Instead, the
voter on the store with the least QPS was selected.

This patch introduces a check to try and satisfy the lease preference
from the rebalanced voting set of replicas, if possible. Among voters
satisfying the preference, the one on a store with the least QPS is
selected to be the leaseholder.

When not possible, where none of the voting replicas satisfy the lease
preference, the store rebalancer will pick the miminum QPS store, as
before.

resolves: #83951

Release note: None

@kvoli kvoli self-assigned this Jul 21, 2022
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@kvoli kvoli force-pushed the 220718.store-rebalancer-leasepref branch from 05fba76 to 2445fbc Compare July 21, 2022 18:35
@kvoli
Copy link
Copy Markdown
Contributor Author

kvoli commented Jul 21, 2022

On the configuration from the issue, it hasn't violated the lease preference. The leases for the kv table remain within us-east1. Which is the first three stores.

image

export gce_project=cockroach-ephemeral
export email=austen-lease-prefs

roachprod create $email -n 9 --clouds=gce --gce-zones=us-east1-b:1,us-east1-c:1,us-east1-d:1,us-east4-a:1,us-east4-b:1,us-east4-c:1,us-west1-a:1,us-west1-b:1,us-west1-c:1 --gce-machine-type=n1-standard-4 

roachprod put $email cockroach-remote cockroach
roachprod start $email

roachprod run $email:1 -- './cockroach workload init kv'

roachprod sql $email:1 -- -e 'ALTER DATABASE kv SET PRIMARY REGION "us-east1";'
roachprod sql $email:1 -- -e 'ALTER DATABASE kv ADD REGION "us-west1"';
roachprod sql $email:1 -- -e 'ALTER DATABASE kv ADD REGION "us-east4"';
roachprod sql $email:1 -- -e 'ALTER DATABASE kv SURVIVE REGION FAILURE;'


while true 
do
roachprod sql $email:1 -- -e "SELECT now(), range_id, node_id, node_id = lease_holder as is_leasholder, ARRAY[node_id] <@ voting_replicas as voter, locality, database_name, table_name, start_pretty, end_pretty FROM [ SELECT start_pretty, end_pretty, range_id, unnest(replicas) as node_id, unnest(replica_localities) as locality, lease_holder, voting_replicas, database_name, table_name FROM [ SELECT * FROM crdb_internal.ranges WHERE database_name = 'kv' ] GROUP BY range_id, node_id, lease_holder, start_pretty, end_pretty, locality, voting_replicas, database_name, table_name ] WHERE node_id = lease_holder and locality not like 'cloud=gce,region=us-east1%' ORDER BY range_id ASC, node_id = lease_holder DESC, ARRAY[node_id] <@ voting_replicas ASC;" 
sleep 5
done

roachprod run $email -- './cockroach workload run kv --max-rate=1000 --read-percent=95 --max-block-bytes=2'

@kvoli kvoli marked this pull request as ready for review July 21, 2022 19:31
@kvoli kvoli requested a review from a team as a code owner July 21, 2022 19:31
Copy link
Copy Markdown
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome testing :)

Reviewed 1 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @kvoli)


pkg/kv/kvserver/store_rebalancer.go line 678 at r1 (raw file):

		for i := 0; i < len(targetVoterRepls); i++ {
			if targetVoterRepls[i].StoreID == leaseCandidates[newLeaseIdx].StoreID {

say our voters were on stores 5,6,7, and the allocator said the only valid target is 7.
then, when we iterate over the candidate (only 7), we see that it is ReplicaIsBehind, so we don't update newLeaseIdx.
then, here we will decide to move to 7 because newLeaseIdx is 0.
is that right?
IOW, if all candidates (from the allocator) are raft-behind, then we will move the lease to a replica that is behind, instead of a replica that is not behind but doesn't respect the lease preference.
IOW(2), what's more important? lease preferences or raft-replica-behind state? probably better to stick with the old behavior if all candidates are behind?

Copy link
Copy Markdown
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @kvoli)


pkg/kv/kvserver/store_rebalancer.go line 678 at r1 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

say our voters were on stores 5,6,7, and the allocator said the only valid target is 7.
then, when we iterate over the candidate (only 7), we see that it is ReplicaIsBehind, so we don't update newLeaseIdx.
then, here we will decide to move to 7 because newLeaseIdx is 0.
is that right?
IOW, if all candidates (from the allocator) are raft-behind, then we will move the lease to a replica that is behind, instead of a replica that is not behind but doesn't respect the lease preference.
IOW(2), what's more important? lease preferences or raft-replica-behind state? probably better to stick with the old behavior if all candidates are behind?

looking some more, in addition to the above PreferredLeaseholders is a bit misleading, returning a slice but that slice can only be nil or with one element, so the code here can be simplified if we change that function to return a ReplicaDescriptor instead of a slice.
Also, that function tries to return the first good candidate, which means that we should pass in a sorted slice by QPS.
Last, the allocator has a FilterBehindReplicas function which might make the code here cleaner? maybe.. I guess something like:

upToDateReplicas := FilterBehindReplicas(ctx, raftStatus, replicas)

(copied from somewhere)
right after calling PreferredLeaseholders.

WDYT?

Copy link
Copy Markdown
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @kvoli)


pkg/kv/kvserver/store_rebalancer.go line 653 at r1 (raw file):

		// it is more likely than not that this would only occur under
		// misconfiguration.
		leaseCandidates := sr.rq.allocator.PreferredLeaseholders(rebalanceCtx.conf, targetVoterRepls)

We should use the ValidLeaseTargets() method here, which has other checks in addition to lease preferences, such that whether the replica needs a snapshot or not, or whether the replica belongs to a suspect /draining node (which aren't allowed to receive leases). This perhaps answers part of Lidor's concerns from below as well.

Copy link
Copy Markdown
Contributor Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @lidorcarmel)


pkg/kv/kvserver/store_rebalancer.go line 653 at r1 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

We should use the ValidLeaseTargets() method here, which has other checks in addition to lease preferences, such that whether the replica needs a snapshot or not, or whether the replica belongs to a suspect /draining node (which aren't allowed to receive leases). This perhaps answers part of Lidor's concerns from below as well.

Originally I used that method. Unfortunately it won't work with replicas that aren't already initialized. It filters on behind raft snapshots, in which case it removes the incoming voters who don't yet exist.

I think it might be best to add in a flag, similar to the testing knobs, which ignores this.


pkg/kv/kvserver/store_rebalancer.go line 678 at r1 (raw file):

then, when we iterate over the candidate (only 7), we see that it is ReplicaIsBehind, so we don't update newLeaseIdx.
That is a good point.

The filter behind replicas won't be able to be used here, as it filters incoming replicas which don't yet exist too.

I think altering the allocator to pass a flag that performs the check, if the candidate exists in the range descriptor otherwise we skip.

Copy link
Copy Markdown
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @kvoli)


pkg/kv/kvserver/store_rebalancer.go line 678 at r1 (raw file):

Previously, kvoli (Austen) wrote…

then, when we iterate over the candidate (only 7), we see that it is ReplicaIsBehind, so we don't update newLeaseIdx.
That is a good point.

The filter behind replicas won't be able to be used here, as it filters incoming replicas which don't yet exist too.

I think altering the allocator to pass a flag that performs the check, if the candidate exists in the range descriptor otherwise we skip.

I see, makes sense, thanks!

@kvoli kvoli force-pushed the 220718.store-rebalancer-leasepref branch 4 times, most recently from 5923c33 to 95cdb1b Compare July 25, 2022 22:07
Copy link
Copy Markdown
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Reviewed 6 of 7 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @kvoli)


pkg/kv/kvserver/store_rebalancer.go line 669 at r2 (raw file):

		// this range post-rebalance would have no suitable location.
		if len(validTargets) == 0 {
			continue

can you please add a level 3 VEventf log here? similar to some above. we're skipping a lease (rarely, very rarely) so a log would be great.


pkg/kv/kvserver/store_rebalancer.go line 676 at r2 (raw file):

		newLeaseQPS := math.MaxFloat64

		// Find the incoming voting replica, which is a valid lease target and

not necessarily incoming voting, right? might be existing?

Code quote:

Find the incoming voting replica

pkg/kv/kvserver/allocator/base.go line 56 at r2 (raw file):

	if len(constraints) == 0 {
		return true

perhaps an unintentional empty line?


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1477 at r2 (raw file):

		Desc() *roachpb.RangeDescriptor
	},
	// excludeLeaseRepl dictates whether the result set can include the source

drop this comment?

Code quote:

	// excludeLeaseRepl dictates whether the result set can include the source
	// replica.

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1523 at r2 (raw file):

			// candidates which are not in the current range descriptors
			// replica set, however are in the candidate list. Uninitialized
			// replicas, will always need a snapshot.

remove the comma?

Code quote:

,

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1539 at r2 (raw file):

		if a.knobs != nil && a.knobs.RaftStatusFn != nil {
			status = a.knobs.RaftStatusFn(leaseRepl)

empty line?

@kvoli kvoli force-pushed the 220718.store-rebalancer-leasepref branch from 95cdb1b to 8fcd897 Compare July 26, 2022 21:38
Copy link
Copy Markdown
Contributor Author

@kvoli kvoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @kvoli, and @lidorcarmel)


pkg/kv/kvserver/store_rebalancer.go line 669 at r2 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

can you please add a level 3 VEventf log here? similar to some above. we're skipping a lease (rarely, very rarely) so a log would be great.

Added


pkg/kv/kvserver/store_rebalancer.go line 676 at r2 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

not necessarily incoming voting, right? might be existing?

Updated


pkg/kv/kvserver/allocator/base.go line 56 at r2 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

perhaps an unintentional empty line?

Updated


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1477 at r2 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

drop this comment?

Updated


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1523 at r2 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

remove the comma?

Updated


pkg/kv/kvserver/allocator/allocatorimpl/allocator.go line 1539 at r2 (raw file):

Previously, lidorcarmel (Lidor Carmel) wrote…

empty line?

Updated

Copy link
Copy Markdown
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @kvoli, and @lidorcarmel)


pkg/kv/kvserver/store_rebalancer.go line 672 at r3 (raw file):

				ctx,
				3,
				"could not find rebalance opportunities for r%d, replica found to hold lease",

maybe "no" replica found?

Code quote:

replica found to hold lease

@kvoli kvoli force-pushed the 220718.store-rebalancer-leasepref branch from 8fcd897 to 49a5f23 Compare July 26, 2022 22:26
Copy link
Copy Markdown
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: and ditto on the nice testing.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @kvoli, and @lidorcarmel)

@kvoli
Copy link
Copy Markdown
Contributor Author

kvoli commented Jul 27, 2022

bors r+

@kvoli kvoli force-pushed the 220718.store-rebalancer-leasepref branch from 49a5f23 to 5942101 Compare July 28, 2022 00:07
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 28, 2022

Canceled.

@kvoli
Copy link
Copy Markdown
Contributor Author

kvoli commented Jul 28, 2022

Canceled.

Mentioned merge conflict, rebased on master.

@kvoli kvoli force-pushed the 220718.store-rebalancer-leasepref branch from 5942101 to dce3b43 Compare July 28, 2022 13:21
Previously, lease preferences were not considered when selecting a
voter to become the leaseholder after replica rebalancing. Instead, the
voter on the store with the least QPS was selected.

This patch introduces a check to try and satisfy the lease preference
from the rebalanced voting set of replicas, if possible. Among voters
satisfying the preference, the one on a store with the least QPS is
selected to be the leaseholder.

When not possible, where none of the voting replicas satisfy the lease
preference, the store rebalancer will pick the miminum QPS store, as
before.

resolves: cockroachdb#83951

Release note: None
@kvoli kvoli force-pushed the 220718.store-rebalancer-leasepref branch from dce3b43 to 54484c1 Compare July 28, 2022 14:31
@kvoli
Copy link
Copy Markdown
Contributor Author

kvoli commented Jul 28, 2022

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 28, 2022

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 28, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 28, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 28, 2022

Build succeeded:

@craig craig bot merged commit 8c7cfe8 into cockroachdb:master Jul 28, 2022
@kvoli kvoli deleted the 220718.store-rebalancer-leasepref branch July 28, 2022 19:20
@irfansharif
Copy link
Copy Markdown
Contributor

blathers backport 22.1

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Oct 14, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 54484c1 to blathers/backport-release-22.1-84863: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1 failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 14, 2022
Manual reconstruction of cockroachdb#84863, mostly due to merge conflicts from
\cockroachdb#79991 which will not make it to 22.1. Previously, lease preferences
were not considered when selecting a voter to become the leaseholder
after replica rebalancing. Instead, the voter on the store with the
least QPS was selected. This patch introduces a check to try and satisfy
the lease preference from the rebalanced voting set of replicas, if
possible. Among voters satisfying the preference, the one on a store
with the least QPS is selected to be the leaseholder.

When not possible, where none of the voting replicas satisfy the lease
preference, the store rebalancer will pick the miminum QPS store, as
before.

Release note (bug fix, performance improvement): Previously it was
possible for leases to temporarily move outside of explicitly configured
regions. This often happened during load based-rebalancing, something
CRDB continually does across the cluster. Because of this it was also
possible to observe a continual rate of lease thrashing as leases moved
out of configured zones, triggered rebalancing, induce other leases to
move out of configured zone, while the original set moved back, and so
on. This is now fixed.

Release justification: Bug fix.
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 14, 2022
Manual reconstruction of cockroachdb#84863, mostly due to merge conflicts from
\cockroachdb#79991 which will not make it to 22.1. Previously, lease preferences
were not considered when selecting a voter to become the leaseholder
after replica rebalancing. Instead, the voter on the store with the
least QPS was selected. This patch introduces a check to try and satisfy
the lease preference from the rebalanced voting set of replicas, if
possible. Among voters satisfying the preference, the one on a store
with the least QPS is selected to be the leaseholder.

When not possible, where none of the voting replicas satisfy the lease
preference, the store rebalancer will pick the miminum QPS store, as
before.

Release note (bug fix, performance improvement): Previously it was
possible for leases to temporarily move outside of explicitly configured
regions. This often happened during load based-rebalancing, something
CRDB continually does across the cluster. Because of this it was also
possible to observe a continual rate of lease thrashing as leases moved
out of configured zones, triggered rebalancing, induce other leases to
move out of configured zone, while the original set moved back, and so
on. This is now fixed.

Release justification: Bug fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Leaseholders temporarily move outside of regional table's configured region

5 participants