-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: Consider which replica will be removed when adding a replica to improve balance #17971
Description
In clusters with multiple localities, we try to maintain diversity by spreading the replicas for each range as evenly across localities as possible. This means that in a cluster with 3 localities and 3 replicas per range, we'll try to keep ` replica in each locality for each range.
When in a rebalancing state where one locality has 1 replicas and the other localities each have 1, we'll always remove a replica from the locality that has 2. This is good.
What isn't as good is that we don't consider this when deciding whether to rebalance in the first place. Our rebalancing logic will kick in if a possible new destination is a better fit than any of the existing replicas, even if the new destination is in a different locality and thus won't actually be a direct replacement for the existing replica that isn't a great fit.
I haven't seen this cause massive problems by itself, but in combination with another problem (I've seen it flare up with both #17879 and #17970) it can make for rebalance thrashing, where we repeatedly add and remove a replica on the same 1 or 2 nodes.
Not all situations will be as straightforward as the 3-locality example above, so we'll have to make sure the fix for this is somewhat more general than just making sure that a potential replica to add is in the same locality as the worst existing replica.