Skip to content

kvserver: stop reconciling lease count imbalances in the StoreRebalancer#75999

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
aayushshah15:20220203_dontReconcileLeaseCountImbalanceInStoreRebalancer
Feb 8, 2022
Merged

kvserver: stop reconciling lease count imbalances in the StoreRebalancer#75999
craig[bot] merged 2 commits intocockroachdb:masterfrom
aayushshah15:20220203_dontReconcileLeaseCountImbalanceInStoreRebalancer

Conversation

@aayushshah15
Copy link
Copy Markdown
Contributor

Generally, the StoreRebalancer doesn't try to attempt lease transfers for
replicas that account for less than 0.1% of the store's total QPS. However, it
allows such lease transfers if the store has a higher than average number of
leases. This is fraught because of the following reasons:

  1. It doesn't adhere to any padding. It will do these tiny lease transfers
    until the store has less than or equal to the average number of leases.
  2. Additionally, "average" here means the average across the cluster, not just
    the stores that have replicas for the range. This is clearly not good, as this
    doesn't translate well to heterogeneously loaded clusters.
  3. These lease transfers also don't adhere to
    kv.allocator.min_lease_transfer_interval. This means they are not rate
    limited like the lease transfers done by the replicateQueue.

This patch fixes this behavior. This patch is similar to #64559.

Noticed while running a tpcc workload on a multi-region cluster, but on a
database constrained to a single region. This was done to simulate a
"heterogeneously loaded cluster".

Release note: None

@aayushshah15 aayushshah15 requested a review from a team as a code owner February 4, 2022 01:59
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@aayushshah15 aayushshah15 requested a review from nvb February 4, 2022 01:59
@aayushshah15
Copy link
Copy Markdown
Contributor Author

Noticed while running a tpcc workload on a multi-region cluster, but on a
database constrained to a single region. This was done to simulate a
"heterogeneously loaded cluster".

image

Nodes 1 to 7 are in the region the tpcc database is constrained to. Nodes 8 - 13 are in a region that has ~no data in it.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)


pkg/kv/kvserver/store_rebalancer.go, line 443 at r1 (raw file):

		// Don't bother moving leases whose QPS is below some small fraction of the
		// store's QPS (unless the store has extra leases to spare anyway). It's

Do we want to remove this part of the comment? And should we have done the same thing in #64559?

Code quote:

(unless the store has extra leases to spare anyway)

…ncer`

Generally, the `StoreRebalancer` doesn't try to attempt lease transfers for
replicas that account for less than 0.1% of the store's total QPS. However, it
allows such lease transfers if the store has a higher than average number of
leases. This is fraught because of the following reasons:

1. It doesn't adhere to any padding. It will do these tiny lease transfers
until the store has less than or equal to the average number of leases.
2. Additionally, "average" here means the average across the cluster, not just
the stores that have replicas for the range. This is clearly not good, as this
doesn't translate well to heterogeneously loaded clusters.
3. These lease transfers also don't adhere to
`kv.allocator.min_lease_transfer_interval`. This means they are not rate
limited like the lease transfers done by the replicateQueue.

This patch fixes this behavior. This patch is similar to cockroachdb#64559.

Noticed while running a tpcc workload on a multi-region cluster, but on a
database constrained to a single region. This was done to simulate a
"heterogeneously loaded cluster".

Release note: None
@aayushshah15 aayushshah15 force-pushed the 20220203_dontReconcileLeaseCountImbalanceInStoreRebalancer branch from a490d84 to 5657073 Compare February 7, 2022 20:11
Copy link
Copy Markdown
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/kv/kvserver/store_rebalancer.go, line 443 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we want to remove this part of the comment? And should we have done the same thing in #64559?

Done and done.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 7, 2022

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 8, 2022

Build succeeded:

@craig craig bot merged commit 8ec5bf1 into cockroachdb:master Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants