-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: store-rebalancer can get blocked on load-based replica rebalances #79249
Description
The StoreRebalancer goroutine synchronously executes load-based lease transfers and load-based replica rebalances of the hottest ranges in a loop.
This means that, when a cluster is under duress and load-based replica rebalancing is taking a ~large amount of time, this can block the store rebalancer goroutine (blocking cheaper actions like load-based lease transfers) for an inordinate amount of time until the AdminRelocateRange call for each "hot range" to be processed either fails or hits its timeout. In other words, if the StoreRebalancer tries to rebalance away 1 replica each for a 100 ranges, and those rebalances are bound to hit their timeout, we won't see any load-based rebalancing on this store for a ~100minutes at a minimum.
We noticed this during an escalation where a single store on a hot node couldn't shed its load away because of this. The logs indicated that the StoreRebalancer goroutine was simply blocked on a ton of AdminRelocateRange calls that were eventually timing out:

Nodes 173 and 159 ^ were both nodes that had extremely high read amp during this incident.
@cockroachdb/kv-notifications
Jira issue: CRDB-14656