-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: allocator balance disrupted by splits #9435
Description
The current allocator heuristics reach steady state when no node is >5% above or <5% below the average number of replicas in the cluster. But consider what happens when a range splits. For example, let's say we have a 10 node cluster containing 999 replicas (333 ranges). Our target for the number of replicas per node is [95, 105]. Now, let's say the per-node replica counts are:
95 95 95 95 95 104 105 105 105 105
If a range splits that is present on the fuller nodes we can transition to a state like:
95 95 95 95 95 104 105 106 106 106
The nodes with 106 replicas are now overfull per the heuristics and we'll have to rebalance off them. Thankfully there are 5 acceptable targets which means that we'll perform 3 concurrent rebalances on the cluster. I'm pretty sure I'm seeing exactly this scenario on delta right now.
Balancing purely on range count is a bit unfortunate in this regard. If we were balancing on storage there likely wouldn't be an issue since a split doesn't actually create more space.
Cc @cockroachdb/stability