Skip to content

kvserver: allow reads on circuit-broken replicas #74799

@tbg

Description

@tbg

Touches #33007.

Is your feature request related to a problem? Please describe.
As of #71806, when a replica's circuit breaker trips, reads bounce on the breaker. This might be unnecessary in some cases - if there is a valid lease, there is a chance that the read would go through.

Reads may not go through though, as inflight replication proposals hold latches that may block the read.

Describe the solution you'd like

Serve reads "when it is possible" (i.e. when they don't get stuck).

Two solutions present themselves:

use two circuit breakers

To the replication circuit breaker (that's the one we have today), add a latching circuit breaker which trips when latch acquisition takes a long time (or alternatively, if a latch is held for a long time, which catches more problems but may also be more prone to false positives).

Both are checked on the write path, i.e. writes fail fast if either is tripped.
For reads, check only the latter, i.e. if the replication circuit breaker is tripped, reads continue to be served (or at least attempted) until the latching breaker trips as well.

check transitive deps for writes in latching when breaker open

We stick with just one breaker. If a read is waiting for latches and the breaker trips (or is tripped to begin with), visit the transitive set of dependencies and fail fast if it contains a mutation.

This is probably awkward to implement.

Describe alternatives you've considered

Additional context
Transitively computing latch dependencies also came up in the context of #65099 (comment).

Jira issue: CRDB-12271

Metadata

Metadata

Assignees

Labels

C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions