kvserver: stop reconciling range count imbalance in the StoreRebalancer by aayushshah15 · Pull Request #64559 · cockroachdb/cockroach

aayushshah15 · 2021-05-03T08:48:10Z

Previously, the StoreRebalancer would rebalance ranges off of a node if
it had more than the average number of replicas compared to the cluster
average. This was undesirable because the StoreRebalancer was not respecting
the {over,under}fullness thresholds that the replicateQueue does, which could
lead to redundant range relocations even in a well balanced cluster.

This commit is an opinionated fix to prevent this behavior and an attempt to
more clearly delineate the responsibilities of the replica rebalancing aspects
of the StoreRebalancer and the replicateQueue.

Noticed while looking into #64064.

Release note: None

cockroach-teamcity · 2021-05-03T08:48:17Z

This change is

Previously, the `StoreRebalancer` would rebalance ranges off of a node if it had more than the average number of replicas compared to the cluster average. This was undesirable because the `StoreRebalancer` was not respecting the {over,under}fullness thresholds that the replicateQueue does, which could lead to redundant range relocations even in a well balanced cluster. This commit is an opinionated fix to prevent this behavior and an attempt to more clearly delineate the responsibilities of the replica rebalancing aspects of the `StoreRebalancer` and the `replicateQueue`. Noticed while looking into cockroachdb#64064. Release note: None

nvb

but I can't say that I fully understand the effect that this will have. The replicateQueue is already responsible for making rebalancing decisions based on range count imbalance? So this change removes any consideration of range count imbalance from the StoreRebalancer once the hottest range on a node is responsible for less than .1% of its load? I guess that all seems reasonable.

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)

aayushshah15 · 2021-05-04T18:25:56Z

So this change removes any consideration of range count imbalance from the StoreRebalancer once the hottest range on a node is responsible for less than .1% of its load?

yup, exactly.

Thanks for reviewing.

bors r+

craig · 2021-05-04T19:27:14Z

Build succeeded:

GitHub CI (Cockroach)

…ncer` Generally, the `StoreRebalancer` doesn't try to attempt lease transfers for replicas that account for less than 0.1% of the store's total QPS. However, it allows such lease transfers if the store has a higher than average number of leases. This is fraught because of the following reasons: 1. It doesn't adhere to any padding. It will do these tiny lease transfers until the store has less than or equal to the average number of leases. 2. Additionally, "average" here means the average across the cluster, not just the stores that have replicas for the range. This is clearly not good, as this doesn't translate well to heterogeneously loaded clusters. 3. These lease transfers also don't adhere to `kv.allocator.min_lease_transfer_interval`. This means they are not rate limited like the lease transfers done by the replicateQueue. This patch fixes this behavior. This patch is similar to cockroachdb#64559. Noticed while running a tpcc workload on a multi-region cluster, but on a database constrained to a single region. This was done to simulate a "heterogeneously loaded cluster". Release note: None

…76205 75541: rfc: kv lock observability via `crdb_locks` r=AlexTalks a=AlexTalks _Link to [RFC text](https://github.com/AlexTalks/cockroach/blob/rfc_crdb_locks/docs/RFCS/20220104_crdb_locks.md)._ This RFC proposes a feature for observing locks (and contention on locks) via a new virtual table called `crdb_internal.crdb_locks`. Release note: None 75738: roachtest/follower_reads: unskip follower-reads/mixed-version/single-region r=andreimatei a=andreimatei It should have been unskipped with the rest of the follower reads tests when #69817 was closed. Release note: None 75898: sql: consider virtual PK columns as stored in the optimizer r=mgartner a=mgartner Allowing virtual primary key columns created a contradiction of two hard-coded assumptions of the optimizer: 1. Primary key columns are written to the primary index (and all secondary indexes). 2. Virtual columns are not written to the primary index. To prevent this contradiction from causing bugs while planning, the opt catalog presents columns that are marked as virtual PK columns in the descriptor as stored columns to the optimizer. This is valid because virtual computed PK columns behave exactly like stored computed PK columns: they are written to all indexes. The test catalog has been updated to mimic this behavior. Fixes #75147 Fixes #75874 Release note (bug fix): A bug has been fixed that caused internal errors when querying tables with virtual columns in the primary key. This bug was only present since version 22.1.0-alpha.1 and does not appear in any production releases. 75912: util/tracing: don't return recordings for non-recording spans r=andreimatei a=andreimatei This patch makes sp.GetRecording() return nil if the span was not recording. This is already what the documentation on GetRecording() was implying, but the actual behavior was different: a non-recording span (that's not a no-op span) would return a non-nil recording consisting only of the the span's metadata. This behavior is not necessarily unreasonable, but it turns out it's a bit dangerous from a performance perspective - there's a cost (e.g. some allocations) for generating this recording, and it was too easy for unsuspecting callers to pay that when they didn't want to. In particular, DistSQL was indiscriminently calling GetRecording() (through its GetTraceData* utilities), and paying this measurable cost unintentionally. Eliminating this cost is particularly important when the Tracer is configured in the "active spans registry" tracing mode; when not in that mode, all the spans would be no-ops and the GetRecording calls were shortcircuited anyway. This is seen in BenchmarkTracing//trace=on, particularly on "3node/scan" which was the benchmark with the bigest difference between tracing on and off. name old time/op new time/op delta Tracing/1node/scan/trace=on-32 214µs ± 4% 206µs ± 5% ~ (p=0.222 n=5+5) Tracing/1node/insert/trace=on-32 338µs ± 5% 334µs ± 4% ~ (p=0.690 n=5+5) Tracing/3node/scan/trace=on-32 515µs ± 3% 484µs ± 3% -6.11% (p=0.008 n=5+5) Tracing/3node/insert/trace=on-32 635µs ± 1% 627µs ± 1% ~ (p=0.095 n=5+5) name old alloc/op new alloc/op delta Tracing/1node/scan/trace=on-32 27.2kB ± 2% 25.6kB ± 2% -5.94% (p=0.008 n=5+5) Tracing/1node/insert/trace=on-32 45.2kB ± 2% 45.5kB ± 2% ~ (p=0.690 n=5+5) Tracing/3node/scan/trace=on-32 83.2kB ± 3% 79.7kB ± 5% ~ (p=0.095 n=5+5) Tracing/3node/insert/trace=on-32 102kB ± 2% 103kB ± 1% ~ (p=0.548 n=5+5) name old allocs/op new allocs/op delta Tracing/1node/scan/trace=on-32 242 ± 1% 229 ± 0% -5.37% (p=0.000 n=5+4) Tracing/1node/insert/trace=on-32 338 ± 1% 339 ± 1% ~ (p=0.563 n=5+5) Tracing/3node/scan/trace=on-32 819 ± 1% 788 ± 6% ~ (p=0.135 n=5+5) Tracing/3node/insert/trace=on-32 871 ± 0% 880 ± 0% +0.99% (p=0.016 n=4+5) Release note: None 75981: kvserver: clean-up the interaction between the mergeQueue and `AdminRelocateRange` r=aayushshah15 a=aayushshah15 This commit makes it such that the `mergeQueue` no longer has to take special care to avoid a lease transfer when calling into `AdminRelocateRange`. This was enabled by #75676. Release note: None 75999: kvserver: stop reconciling lease count imbalances in the `StoreRebalancer` r=aayushshah15 a=aayushshah15 Generally, the `StoreRebalancer` doesn't try to attempt lease transfers for replicas that account for less than 0.1% of the store's total QPS. However, it allows such lease transfers if the store has a higher than average number of leases. This is fraught because of the following reasons: 1. It doesn't adhere to any padding. It will do these tiny lease transfers until the store has less than or equal to the average number of leases. 2. Additionally, "average" here means the average across the cluster, not just the stores that have replicas for the range. This is clearly not good, as this doesn't translate well to heterogeneously loaded clusters. 3. These lease transfers also don't adhere to `kv.allocator.min_lease_transfer_interval`. This means they are not rate limited like the lease transfers done by the replicateQueue. This patch fixes this behavior. This patch is similar to #64559. Noticed while running a tpcc workload on a multi-region cluster, but on a database constrained to a single region. This was done to simulate a "heterogeneously loaded cluster". Release note: None 76078: opt: plan inner lookup joins on virtual column indexes in more cases r=mgartner a=mgartner ExtractJoinEqualities now reuses computed columns instead of synthesizing new columns when it creates projections that exactly match a computed column expression of a base table. This allows GenerateLookupJoinsWithVirtualCols to generate lookup joins in more cases. This also paves the way for exploring anti- and semi-lookup joins on indexes with virtual columns and expression indexes. Fixes #75872 Release note (performance improvement): The optimizer now plans inner lookup joins using expression indexes in more cases, resulting in more efficient query plans. 76193: optbuilder: do not create invalid casts when building CASE expressions r=mgartner a=mgartner The optbuilder no longer creates invalid casts when building CASE expressions that have branches of different types. CASE expressions that previously caused internal errors now result in user-facing errors. A similar change was made recently to UNION expressions in #75219. Note that there is more work to be done to be fully consistent with Postgres's CASE typing behavior, see #75365. Fixes #75365 Release note (bug fix): CASE expressions with branches that result in types that cannot be cast to a common type now result in a user-facing error instead of an internal error. 76200: cloud: bump orchestrator to v21.2.5 r=celiala a=Azhng Release note: None 76205: roachtest: update 22.1 version map to v21.2.5 r=celiala a=Azhng Release note: None Co-authored-by: Alex Sarkesian <sarkesian@cockroachlabs.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com> Co-authored-by: Aayush Shah <aayush.shah15@gmail.com> Co-authored-by: Azhng <archer.xn@gmail.com>

…ncer` Generally, the `StoreRebalancer` doesn't try to attempt lease transfers for replicas that account for less than 0.1% of the store's total QPS. However, it allows such lease transfers if the store has a higher than average number of leases. This is fraught because of the following reasons: 1. It doesn't adhere to any padding. It will do these tiny lease transfers until the store has less than or equal to the average number of leases. 2. Additionally, "average" here means the average across the cluster, not just the stores that have replicas for the range. This is clearly not good, as this doesn't translate well to heterogeneously loaded clusters. 3. These lease transfers also don't adhere to `kv.allocator.min_lease_transfer_interval`. This means they are not rate limited like the lease transfers done by the replicateQueue. This patch fixes this behavior. This patch is similar to cockroachdb#64559. Noticed while running a tpcc workload on a multi-region cluster, but on a database constrained to a single region. This was done to simulate a "heterogeneously loaded cluster". Release note: None

aayushshah15 force-pushed the 20210503_noRangeCountRebalanceByStoreRebalancer branch from 28093c6 to ab47c93 Compare May 3, 2021 08:52

aayushshah15 requested a review from nvb May 4, 2021 00:30

nvb approved these changes May 4, 2021

View reviewed changes

craig bot merged commit 181dd6f into cockroachdb:master May 4, 2021

aayushshah15 mentioned this pull request Feb 4, 2022

kvserver: stop reconciling lease count imbalances in the StoreRebalancer #75999

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: stop reconciling range count imbalance in the StoreRebalancer#64559

kvserver: stop reconciling range count imbalance in the StoreRebalancer#64559
craig[bot] merged 1 commit intocockroachdb:masterfrom
aayushshah15:20210503_noRangeCountRebalanceByStoreRebalancer

aayushshah15 commented May 3, 2021 •

edited

Loading

Uh oh!

cockroach-teamcity commented May 3, 2021

Uh oh!

nvb left a comment

Uh oh!

aayushshah15 commented May 4, 2021

Uh oh!

craig bot commented May 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aayushshah15 commented May 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented May 3, 2021

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

aayushshah15 commented May 4, 2021

Uh oh!

craig bot commented May 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aayushshah15 commented May 3, 2021 •

edited

Loading