Skip to content

kvserver: stop reconciling range count imbalance in the StoreRebalancer#64559

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
aayushshah15:20210503_noRangeCountRebalanceByStoreRebalancer
May 4, 2021
Merged

kvserver: stop reconciling range count imbalance in the StoreRebalancer#64559
craig[bot] merged 1 commit intocockroachdb:masterfrom
aayushshah15:20210503_noRangeCountRebalanceByStoreRebalancer

Conversation

@aayushshah15
Copy link
Copy Markdown
Contributor

@aayushshah15 aayushshah15 commented May 3, 2021

Previously, the StoreRebalancer would rebalance ranges off of a node if
it had more than the average number of replicas compared to the cluster
average. This was undesirable because the StoreRebalancer was not respecting
the {over,under}fullness thresholds that the replicateQueue does, which could
lead to redundant range relocations even in a well balanced cluster.

This commit is an opinionated fix to prevent this behavior and an attempt to
more clearly delineate the responsibilities of the replica rebalancing aspects
of the StoreRebalancer and the replicateQueue.

Noticed while looking into #64064.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Previously, the `StoreRebalancer` would rebalance ranges off of a node if
it had more than the average number of replicas compared to the cluster
average. This was undesirable because the `StoreRebalancer` was not respecting
the {over,under}fullness thresholds that the replicateQueue does, which could
lead to redundant range relocations even in a well balanced cluster.

This commit is an opinionated fix to prevent this behavior and an attempt to
more clearly delineate the responsibilities of the replica rebalancing aspects
of the `StoreRebalancer` and the `replicateQueue`.

Noticed while looking into cockroachdb#64064.

Release note: None
@aayushshah15 aayushshah15 force-pushed the 20210503_noRangeCountRebalanceByStoreRebalancer branch from 28093c6 to ab47c93 Compare May 3, 2021 08:52
@aayushshah15 aayushshah15 requested a review from nvb May 4, 2021 00:30
Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: but I can't say that I fully understand the effect that this will have. The replicateQueue is already responsible for making rebalancing decisions based on range count imbalance? So this change removes any consideration of range count imbalance from the StoreRebalancer once the hottest range on a node is responsible for less than .1% of its load? I guess that all seems reasonable.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)

@aayushshah15
Copy link
Copy Markdown
Contributor Author

So this change removes any consideration of range count imbalance from the StoreRebalancer once the hottest range on a node is responsible for less than .1% of its load?

yup, exactly.

Thanks for reviewing.

bors r+

@craig craig bot merged commit 181dd6f into cockroachdb:master May 4, 2021
@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 4, 2021

Build succeeded:

aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Feb 4, 2022
…ncer`

Generally, the `StoreRebalancer` doesn't try to attempt lease transfers for
replicas that account for less than 0.1% of the store's total QPS. However, it
allows such lease transfers if the store has a higher than average number of
leases. This is fraught because of the following reasons:

1. It doesn't adhere to any padding. It will do these tiny lease transfers
until the store has less than or equal to the average number of leases.
2. Additionally, "average" here means the average across the cluster, not just
the stores that have replicas for the range. This is clearly not good, as this
doesn't translate well to heterogeneously loaded clusters.
3. These lease transfers also don't adhere to
`kv.allocator.min_lease_transfer_interval`. This means they are not rate
limited like the lease transfers done by the replicateQueue.

This patch fixes this behavior. This patch is similar to cockroachdb#64559.

Noticed while running a tpcc workload on a multi-region cluster, but on a
database constrained to a single region. This was done to simulate a
"heterogeneously loaded cluster".

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request Feb 7, 2022
…ncer`

Generally, the `StoreRebalancer` doesn't try to attempt lease transfers for
replicas that account for less than 0.1% of the store's total QPS. However, it
allows such lease transfers if the store has a higher than average number of
leases. This is fraught because of the following reasons:

1. It doesn't adhere to any padding. It will do these tiny lease transfers
until the store has less than or equal to the average number of leases.
2. Additionally, "average" here means the average across the cluster, not just
the stores that have replicas for the range. This is clearly not good, as this
doesn't translate well to heterogeneously loaded clusters.
3. These lease transfers also don't adhere to
`kv.allocator.min_lease_transfer_interval`. This means they are not rate
limited like the lease transfers done by the replicateQueue.

This patch fixes this behavior. This patch is similar to cockroachdb#64559.

Noticed while running a tpcc workload on a multi-region cluster, but on a
database constrained to a single region. This was done to simulate a
"heterogeneously loaded cluster".

Release note: None
craig bot pushed a commit that referenced this pull request Feb 8, 2022
…76205

75541: rfc: kv lock observability via `crdb_locks` r=AlexTalks a=AlexTalks

_Link to [RFC text](https://github.com/AlexTalks/cockroach/blob/rfc_crdb_locks/docs/RFCS/20220104_crdb_locks.md)._

This RFC proposes a feature for observing locks (and contention on
locks) via a new virtual table called `crdb_internal.crdb_locks`.

Release note: None

75738: roachtest/follower_reads: unskip follower-reads/mixed-version/single-region r=andreimatei a=andreimatei

It should have been unskipped with the rest of the follower reads tests
when #69817 was closed.

Release note: None

75898: sql: consider virtual PK columns as stored in the optimizer r=mgartner a=mgartner

Allowing virtual primary key columns created a contradiction of two
hard-coded assumptions of the optimizer:

  1. Primary key columns are written to the primary index (and all
     secondary indexes).
  2. Virtual columns are not written to the primary index.

To prevent this contradiction from causing bugs while planning, the opt
catalog presents columns that are marked as virtual PK columns in the
descriptor as stored columns to the optimizer. This is valid because
virtual computed PK columns behave exactly like stored computed PK
columns: they are written to all indexes.

The test catalog has been updated to mimic this behavior.

Fixes #75147
Fixes #75874

Release note (bug fix): A bug has been fixed that caused internal errors
when querying tables with virtual columns in the primary key. This bug
was only present since version 22.1.0-alpha.1 and does not appear in any
production releases.

75912: util/tracing: don't return recordings for non-recording spans r=andreimatei a=andreimatei

This patch makes sp.GetRecording() return nil if the span was not
recording. This is already what the documentation on GetRecording() was
implying, but the actual behavior was different: a non-recording span
(that's not a no-op span) would return a non-nil recording consisting
only of the the span's metadata. This behavior is not necessarily
unreasonable, but it turns out it's a bit dangerous from a performance
perspective - there's a cost (e.g. some allocations) for generating this
recording, and it was too easy for unsuspecting callers to pay that when
they didn't want to. In particular, DistSQL was indiscriminently calling
GetRecording() (through its GetTraceData* utilities), and paying this
measurable cost unintentionally.

Eliminating this cost is particularly important when the Tracer is
configured in the "active spans registry" tracing mode; when not in that
mode, all the spans would be no-ops and the GetRecording calls were
shortcircuited anyway. This is seen in BenchmarkTracing//trace=on,
particularly on "3node/scan" which was the benchmark with the bigest
difference between tracing on and off.

name                              old time/op    new time/op    delta
Tracing/1node/scan/trace=on-32       214µs ± 4%     206µs ± 5%    ~     (p=0.222 n=5+5)
Tracing/1node/insert/trace=on-32     338µs ± 5%     334µs ± 4%    ~     (p=0.690 n=5+5)
Tracing/3node/scan/trace=on-32       515µs ± 3%     484µs ± 3%  -6.11%  (p=0.008 n=5+5)
Tracing/3node/insert/trace=on-32     635µs ± 1%     627µs ± 1%    ~     (p=0.095 n=5+5)

name                              old alloc/op   new alloc/op   delta
Tracing/1node/scan/trace=on-32      27.2kB ± 2%    25.6kB ± 2%  -5.94%  (p=0.008 n=5+5)
Tracing/1node/insert/trace=on-32    45.2kB ± 2%    45.5kB ± 2%    ~     (p=0.690 n=5+5)
Tracing/3node/scan/trace=on-32      83.2kB ± 3%    79.7kB ± 5%    ~     (p=0.095 n=5+5)
Tracing/3node/insert/trace=on-32     102kB ± 2%     103kB ± 1%    ~     (p=0.548 n=5+5)

name                              old allocs/op  new allocs/op  delta
Tracing/1node/scan/trace=on-32         242 ± 1%       229 ± 0%  -5.37%  (p=0.000 n=5+4)
Tracing/1node/insert/trace=on-32       338 ± 1%       339 ± 1%    ~     (p=0.563 n=5+5)
Tracing/3node/scan/trace=on-32         819 ± 1%       788 ± 6%    ~     (p=0.135 n=5+5)
Tracing/3node/insert/trace=on-32       871 ± 0%       880 ± 0%  +0.99%  (p=0.016 n=4+5)

Release note: None

75981: kvserver: clean-up the interaction between the mergeQueue and `AdminRelocateRange` r=aayushshah15 a=aayushshah15

This commit makes it such that the `mergeQueue` no longer has to take special
care to avoid a lease transfer when calling into `AdminRelocateRange`. This was
enabled by #75676.

Release note: None


75999: kvserver: stop reconciling lease count imbalances in the `StoreRebalancer` r=aayushshah15 a=aayushshah15

Generally, the `StoreRebalancer` doesn't try to attempt lease transfers for
replicas that account for less than 0.1% of the store's total QPS. However, it
allows such lease transfers if the store has a higher than average number of
leases. This is fraught because of the following reasons:

1. It doesn't adhere to any padding. It will do these tiny lease transfers
until the store has less than or equal to the average number of leases.
2. Additionally, "average" here means the average across the cluster, not just
the stores that have replicas for the range. This is clearly not good, as this
doesn't translate well to heterogeneously loaded clusters.
3. These lease transfers also don't adhere to
`kv.allocator.min_lease_transfer_interval`. This means they are not rate
limited like the lease transfers done by the replicateQueue.

This patch fixes this behavior. This patch is similar to #64559.

Noticed while running a tpcc workload on a multi-region cluster, but on a
database constrained to a single region. This was done to simulate a
"heterogeneously loaded cluster".

Release note: None


76078: opt: plan inner lookup joins on virtual column indexes in more cases r=mgartner a=mgartner

ExtractJoinEqualities now reuses computed columns instead of
synthesizing new columns when it creates projections that exactly match
a computed column expression of a base table. This allows
GenerateLookupJoinsWithVirtualCols to generate lookup joins in more
cases. This also paves the way for exploring  anti- and semi-lookup
joins on indexes with virtual columns and expression indexes.

Fixes #75872

Release note (performance improvement): The optimizer now plans inner
lookup joins using expression indexes in more cases, resulting in more
efficient query plans.

76193: optbuilder: do not create invalid casts when building CASE expressions r=mgartner a=mgartner

The optbuilder no longer creates invalid casts when building CASE
expressions that have branches of different types. CASE expressions that
previously caused internal errors now result in user-facing errors. A
similar change was made recently to UNION expressions in #75219.

Note that there is more work to be done to be fully consistent with
Postgres's CASE typing behavior, see #75365.

Fixes #75365

Release note (bug fix): CASE expressions with branches that result in
types that cannot be cast to a common type now result in a user-facing
error instead of an internal error.

76200: cloud: bump orchestrator to v21.2.5 r=celiala a=Azhng

Release note: None

76205: roachtest: update 22.1 version map to v21.2.5 r=celiala a=Azhng

Release note: None

Co-authored-by: Alex Sarkesian <sarkesian@cockroachlabs.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: Azhng <archer.xn@gmail.com>
RajivTS pushed a commit to RajivTS/cockroach that referenced this pull request Mar 6, 2022
…ncer`

Generally, the `StoreRebalancer` doesn't try to attempt lease transfers for
replicas that account for less than 0.1% of the store's total QPS. However, it
allows such lease transfers if the store has a higher than average number of
leases. This is fraught because of the following reasons:

1. It doesn't adhere to any padding. It will do these tiny lease transfers
until the store has less than or equal to the average number of leases.
2. Additionally, "average" here means the average across the cluster, not just
the stores that have replicas for the range. This is clearly not good, as this
doesn't translate well to heterogeneously loaded clusters.
3. These lease transfers also don't adhere to
`kv.allocator.min_lease_transfer_interval`. This means they are not rate
limited like the lease transfers done by the replicateQueue.

This patch fixes this behavior. This patch is similar to cockroachdb#64559.

Noticed while running a tpcc workload on a multi-region cluster, but on a
database constrained to a single region. This was done to simulate a
"heterogeneously loaded cluster".

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants