Skip to content

storage: allocator balance disrupted by splits #9435

@petermattis

Description

@petermattis

The current allocator heuristics reach steady state when no node is >5% above or <5% below the average number of replicas in the cluster. But consider what happens when a range splits. For example, let's say we have a 10 node cluster containing 999 replicas (333 ranges). Our target for the number of replicas per node is [95, 105]. Now, let's say the per-node replica counts are:

95 95 95 95 95 104 105 105 105 105

If a range splits that is present on the fuller nodes we can transition to a state like:

95 95 95 95 95 104 105 106 106 106

The nodes with 106 replicas are now overfull per the heuristics and we'll have to rebalance off them. Thankfully there are 5 acceptable targets which means that we'll perform 3 concurrent rebalances on the cluster. I'm pretty sure I'm seeing exactly this scenario on delta right now.

Balancing purely on range count is a bit unfortunate in this regard. If we were balancing on storage there likely wouldn't be an issue since a split doesn't actually create more space.

Cc @cockroachdb/stability

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions