Skip to content

storage: Consider which replica will be removed when adding a replica to improve balance #17971

@a-robinson

Description

@a-robinson

In clusters with multiple localities, we try to maintain diversity by spreading the replicas for each range as evenly across localities as possible. This means that in a cluster with 3 localities and 3 replicas per range, we'll try to keep ` replica in each locality for each range.

When in a rebalancing state where one locality has 1 replicas and the other localities each have 1, we'll always remove a replica from the locality that has 2. This is good.

What isn't as good is that we don't consider this when deciding whether to rebalance in the first place. Our rebalancing logic will kick in if a possible new destination is a better fit than any of the existing replicas, even if the new destination is in a different locality and thus won't actually be a direct replacement for the existing replica that isn't a great fit.

I haven't seen this cause massive problems by itself, but in combination with another problem (I've seen it flare up with both #17879 and #17970) it can make for rebalance thrashing, where we repeatedly add and remove a replica on the same 1 or 2 nodes.

Not all situations will be as straightforward as the 3-locality example above, so we'll have to make sure the fix for this is somewhat more general than just making sure that a potential replica to add is in the same locality as the worst existing replica.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions