Recover replicas to a "good enough" store instead of the "best" store

In order to make decommissioning faster, we want the allocator to recover replicas to a store that is valid, and has the same risk diversity as the best candidate store, but may have more ranges than other stores.

Today we try to pick one of the best stores with a low range count, and if, for example, one of the nodes is new, we see thundering herd during decommissioning. By choosing a store that is good enough we can use more stores as targets and complete the decommission process faster.

The same applies for recovering from a dead node, but not for upreplicating - meaning, if for example we change the default from 3 replicas to 5, then the allocator will still try to allocate the 2 additional replicas on one of the best stores.

The downside is an increase in cost and bandwidth for recovery because we may recover to a store which is overloaded and later rebalance the same replica to a store that is underfull.

This effort is part of #85445.

Jira issue: CRDB-18657

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recover replicas to a "good enough" store instead of the "best" store #86265

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recover replicas to a "good enough" store instead of the "best" store #86265

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions