-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: paused followers interacts poorly with leader-not-leaseholder state #84884
Description
In a cluster that was overloaded due to an index backfill, I saw a large number of ranges serving foreground traffic begin hitting circuit breaker errors. Digging in deeper, it became clear that all ranges in this state had split leaders and leaseholders, and the leaseholders were on an overloaded node. Earlier in the test, I had set admission.kv.pause_replication_io_threshold to 0.8 to avoid replicating to overloaded followers.
The combination of paused replicas and the leader-not-leaseholder split effectively caused unavailability. Even though the leaseholder could propose writes, it would never hear about their result, so it would never acknowledge the result of those writes to clients.
Should we allow the leaseholder to be paused?
On a related note, I noticed that while all nodes had a non-zero value for the admission.raft.paused_replicas, the overloaded node itself (purple) had a few spikes where it reported thousands of paused replicas. My understanding is that this metric is reported from leader side, not the follower side, so I don't understand this. Is there any reason why an overloaded node would start pausing other replicas?
Also, the description of this metric says: The count is emitted by the leaseholder of each range.
Should this instead say: The count is emitted by the leader of each range.
Jira issue: CRDB-17924
Epic CRDB-15069
