-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Recover replicas to a "good enough" store instead of the "best" store #86265
Description
In order to make decommissioning faster, we want the allocator to recover replicas to a store that is valid, and has the same risk diversity as the best candidate store, but may have more ranges than other stores.
Today we try to pick one of the best stores with a low range count, and if, for example, one of the nodes is new, we see thundering herd during decommissioning. By choosing a store that is good enough we can use more stores as targets and complete the decommission process faster.
The same applies for recovering from a dead node, but not for upreplicating - meaning, if for example we change the default from 3 replicas to 5, then the allocator will still try to allocate the 2 additional replicas on one of the best stores.
The downside is an increase in cost and bandwidth for recovery because we may recover to a store which is overloaded and later rebalance the same replica to a store that is underfull.
This effort is part of #85445.
Jira issue: CRDB-18657