-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: allow reads on circuit-broken replicas #74799
Description
Touches #33007.
Is your feature request related to a problem? Please describe.
As of #71806, when a replica's circuit breaker trips, reads bounce on the breaker. This might be unnecessary in some cases - if there is a valid lease, there is a chance that the read would go through.
Reads may not go through though, as inflight replication proposals hold latches that may block the read.
Describe the solution you'd like
Serve reads "when it is possible" (i.e. when they don't get stuck).
Two solutions present themselves:
use two circuit breakers
To the replication circuit breaker (that's the one we have today), add a latching circuit breaker which trips when latch acquisition takes a long time (or alternatively, if a latch is held for a long time, which catches more problems but may also be more prone to false positives).
Both are checked on the write path, i.e. writes fail fast if either is tripped.
For reads, check only the latter, i.e. if the replication circuit breaker is tripped, reads continue to be served (or at least attempted) until the latching breaker trips as well.
check transitive deps for writes in latching when breaker open
We stick with just one breaker. If a read is waiting for latches and the breaker trips (or is tripped to begin with), visit the transitive set of dependencies and fail fast if it contains a mutation.
This is probably awkward to implement.
Describe alternatives you've considered
Additional context
Transitively computing latch dependencies also came up in the context of #65099 (comment).
Jira issue: CRDB-12271