Skip to content

kvserver: StoreRebalancer is not constraints-aware when comparing stores‚Äô QPS #61883

@aayushshah15

Description

@aayushshah15

Load based range rebalancing in CRDB happens at the store level (StoreRebalancer, which rebalances based on QPS) as well as at the per-range level (replicateQueue, which rebalances based on range count). In broad strokes, at the per-range level, we do the following:

  1. For each of the current stores that have a replica for the range, we compute a list of “comparable” stores — these are stores that meet the replication constraints or are identical to the existing store in terms of their locality.
  2. We compute capacity statistics for each of these comparable subsets of stores.
  3. If the existing store is “overfull” relative to its subset of comparable stores, we try to transfer the existing replica away. Likewise, if any other store is “underfull” relative to the subset, we also try to relinquish our replica and move it to such an underfull store.

In contrast to this, in the StoreRebalancer, we simply fetch the list of all stores in the cluster and compute QPS-based statistics across this entire set.

To see the hazard with this, consider a scenario where we have 5 regions with 3 stores each, but all tables are constrained to, say, some single region A. This will very likely result in all the 3 stores in region A fielding higher QPS than the cluster average.

In such a scenario, the coarseness of StoreRebalancer essentially de-activates it because it computes the QPS average across the entire set of stores in the cluster. The StoreRebalancer will not try to achieve balance across the 3 stores that satisfy constraints, but rather across the entire cluster. It will try to rebalance a replica away from these 3 stores but fail because all the other, underfull, stores violate constraints.

This means that the hottest ranges in the system will not be actively rebalanced away based on QPS.
cc @nvanbenschoten

gz#7724

Epic CRDB-6437

Metadata

Metadata

Assignees

Labels

A-kv-distributionRelating to rebalancing and leasing.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)GA-blockerT-sql-foundationsSQL Foundations Team (formerly SQL Schema + SQL Sessions)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions