Skip to content

kvserver: provide a way for replicas to re-enter the quota pool #82403

@tbg

Description

@tbg

Describe the problem

Replicas can get "ignored" by the quota pool under certain conditions. This primarily happens when a node restarts, as then we want to avoid stalling foreground traffic until the follower has caught up. However, there is no guarantee that the follower will ever catch up.

To Reproduce

I haven't actually done this, but if you take a write-heavy workload with a few hot ranges, take down a node for 1-2 minutes, then bring it up in a state in which it is slightly underprovisioned for the workload, it should forever lag behind, and the quota pool will not be helping it catch up.

Expected behavior

Hard to formulate! There are different regimes. If a follower is behind and is "hopelessly slow", foreground traffic shouldn't slow down in response to it (see #79215). But if it's only marginally slower (making "good progress"), and perhaps slower only because it is a read-only satellite in a faraway region, etc, we need to slowly bring it back into circulation or AOST reads on this replica will fail forever (and, if it's a voter, availability will remain compromised forever since the replica has to catch up before it can make forward progress).

Additional data / screenshots

Environment:

Additional context

I'm not sure we have struggled with this in practice, but it is a legitimate concern and becomes more important if, for #79215, take an approach where the quota pool "temporarily" ignores followers that are overloaded (and stops sending appends to them). These nodes will "intentionally" fall behind but nothing will ensure that they catch up when they have become healthy.

Jira issue: CRDB-16355

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-admission-controlC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-kvKV Team

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions