-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: allow quiescing with paused followers #84252
Copy link
Copy link
Closed
Labels
A-kv-replicationRelating to Raft, consensus, and coordination.Relating to Raft, consensus, and coordination.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)E-starterMight be suitable for a starter project for new employees or team members.Might be suitable for a starter project for new employees or team members.T-kvKV TeamKV Team
Metadata
Metadata
Assignees
Labels
A-kv-replicationRelating to Raft, consensus, and coordination.Relating to Raft, consensus, and coordination.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)E-starterMight be suitable for a starter project for new employees or team members.Might be suitable for a starter project for new employees or team members.T-kvKV TeamKV Team
Is your feature request related to a problem? Please describe.
PR #83851 introduces the CRDB-level concept of "paused" followers, which are followers we're intentionally not replicating to. It mirrors the raft concept of the same name (followers raft is only sending to at a very low rate, for example during probing or when follower doesn't respond to MsgApp).
At the time of writing, a leaseholder with a paused follower will not quiesce.
Describe the solution you'd like
Ranges should be able to quiesce, ignoring paused followers. When a store unpauses, all quiesced replicas that had this follower as paused at the time of quiesce should unquiesce (to ensure the follower is promptly caught up).
This is very similar to what we already do for liveness, see
cockroach/pkg/kv/kvserver/replica_raft_quiesce.go
Lines 205 to 211 in 571bfa3
Describe alternatives you've considered
Additional context
Jira issue: CRDB-18409