Skip to content

Recover replicas to a "good enough" store instead of the "best" store #86265

@lidorcarmel

Description

@lidorcarmel

In order to make decommissioning faster, we want the allocator to recover replicas to a store that is valid, and has the same risk diversity as the best candidate store, but may have more ranges than other stores.

Today we try to pick one of the best stores with a low range count, and if, for example, one of the nodes is new, we see thundering herd during decommissioning. By choosing a store that is good enough we can use more stores as targets and complete the decommission process faster.

The same applies for recovering from a dead node, but not for upreplicating - meaning, if for example we change the default from 3 replicas to 5, then the allocator will still try to allocate the 2 additional replicas on one of the best stores.

The downside is an increase in cost and bandwidth for recovery because we may recover to a store which is overloaded and later rebalance the same replica to a store that is underfull.

This effort is part of #85445.

Jira issue: CRDB-18657

Metadata

Metadata

Assignees

Labels

C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions