Skip to content

kvserver: circuit-break requests to unavailable ranges#71806

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
tbg:circ-break-unavailable-ranges
Jan 14, 2022
Merged

kvserver: circuit-break requests to unavailable ranges#71806
craig[bot] merged 2 commits intocockroachdb:masterfrom
tbg:circ-break-unavailable-ranges

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Oct 21, 2021

Fixes #33007.
Closes #61311.

This PR uses the circuit breaker package introduced in #73641 to address
issue #33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an AmbiguousResultError, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within base.SlowRequestThreshold (15s at time of
writing), the breaker is tripped, cancelling all inflight and future
requests until the breaker heals. Perhaps surprisingly, the existing
"slowness" detection (the informational "have been waiting ... for
proposal" warning in executeWriteBatch) was moved deeper into
replication (refreshProposalsLocked), where it now trips the breaker.
This has the advantage of providing a unified path for lease requests
(which don't go through executeWriteBatch) and pipelined writes (which
return before waiting on the inflight replication process). To make this
work, we need to pick a little fight with how leases are (not) refreshed
(#74711) and we need to remember the ticks at which a proposal was first
inserted (despite potential reproposals).

A tripped breaker deploys a background probe which sends a
ProbeRequest (introduced in #72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, all of this is opt-in via the
kv.replica_circuit_breakers.enabled cluster setting while we determine
which of the many follow-up items to target for 22.1 (see #74705). For
example, the breaker-sensitive context cancelation is a toy
implementation that comes with too much of a performance overhead (one
extra goroutine per request); #74707 will address that.

Other things not done here that we certainly want in the 22.1 release
are UI work (#74713) and metrics (#74505).

The release note is deferred to #74705 (where breakers are turned on).

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg force-pushed the circ-break-unavailable-ranges branch from 8e40068 to e784823 Compare November 8, 2021 22:18
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Nov 8, 2021

@cockroachdb/repl-prs this is still a draft but it is ready for a high-level review. There is a lot going on and I can easily pull 5+ pieces of this into separate PRs and am planning to do so. However, before I start that I would like to get both of you on board with the basic approach. I think we've sufficiently discussed the circuit breaker API and are happy with it; I will rewrite the internals to avoid all of the unsafe stuff so ignore the internals but give the API another look and voice any larger concerns. For the NoopWrite, there is an alternative where we don't bypass the lease. I think bypassing the lease has worked out nicely and keeps NoopWrite simple and possibly reusable, but on the other hand the probe may pass but for some unknown reason a lease may never be acquired. My preference is to keep NoopWrite as is but make the probe use two stages, where it issues the noop write first and a lease-requiring write second but possibly this is a bad idea and I need to sleep over it and talk it through.

@tbg tbg force-pushed the circ-break-unavailable-ranges branch from e784823 to 8d559f0 Compare November 9, 2021 11:39
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)


pkg/util/circuit/circuitbreaker.go, line 28 at r1 (raw file):

// from Breaker.Err(), i.e. `errors.Is(err, ErrBreakerOpen()) can be
// used to check whether an error originated from some Breaker.
func ErrBreakerOpen() error {

not: why not simply export the variable?


pkg/util/circuit/circuitbreaker.go, line 56 at r1 (raw file):

	Name         string
	ShouldTrip   func(err error) error
	AsyncProbe   func(report func(error), done func())

Why does this need to take a done closure? Since the probe is expected to keep running, can't we just clean up when the closure returns?

Ah, I suppose it's because it's the probe's responsibility to spawn the async task. Is there a reason for that, i.e. why can't the prober spawn the task? The only reason I can see is for the caller to pass in its context, but that seems like an antipattern anyway?

@tbg tbg force-pushed the circ-break-unavailable-ranges branch 2 times, most recently from 0215e1b to dbd4ce3 Compare November 18, 2021 14:16
tbg added a commit to tbg/cockroach that referenced this pull request Dec 9, 2021
This defaults to `base.SlowRequestThreshold` but is customizable, which
will be helpful for testing cockroachdb#71806.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Dec 9, 2021
These are initially for use in cockroachdb#71806 but were originally conceived of
for cockroachdb#70485, which we are not currently prioritizing.

Importantly, this circuit breaker does not recruit a fraction of
requests to do the probing, which is desirable for both PR cockroachdb#71806 and
PR cockroachdb#70485; requests recruited as probes tend to incur high latency and
errors, and we don't want SQL client traffic to experience those.

Release note: None
@tbg tbg force-pushed the circ-break-unavailable-ranges branch from dbd4ce3 to 85486f2 Compare December 9, 2021 17:12
craig bot pushed a commit that referenced this pull request Dec 10, 2021
73639: kvserver: introduce sCfg.SlowReplicationThreshold r=erikgrinaker a=tbg

This defaults to `base.SlowRequestThreshold` but is customizable, which
will be helpful for testing #71806.

Release note: None


Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
tbg added a commit to tbg/cockroach that referenced this pull request Dec 10, 2021
These are initially for use in cockroachdb#71806 but were originally conceived of
for cockroachdb#70485, which we are not currently prioritizing.

Importantly, this circuit breaker does not recruit a fraction of
requests to do the probing, which is desirable for both PR cockroachdb#71806 and
PR cockroachdb#70485; requests recruited as probes tend to incur high latency and
errors, and we don't want SQL client traffic to experience those.

Touches cockroachdb#33007.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Dec 14, 2021
These are initially for use in cockroachdb#71806 but were originally conceived of
for cockroachdb#70485, which we are not currently prioritizing.

Importantly, this circuit breaker does not recruit a fraction of
requests to do the probing, which is desirable for both PR cockroachdb#71806 and
PR cockroachdb#70485; requests recruited as probes tend to incur high latency and
errors, and we don't want SQL client traffic to experience those.

Touches cockroachdb#33007.

Release note: None
craig bot pushed a commit that referenced this pull request Dec 14, 2021
73362: kv: don't unquiesce uninitialized replicas r=tbg a=nvanbenschoten

In a [support issue](https://github.com/cockroachlabs/support/issues/1340), we saw that 10s of thousands of uninitialized replicas were being ticked regularly and creating a large amount of background work on a node, driving up CPU. This commit updates the Raft quiescence logic to disallow uninitialized replicas from being unquiesced and Tick()'ing themselves.

Keeping uninitialized replicas quiesced even in the presence of Raft traffic avoids wasted work. We could Tick() these replicas, but doing so is unnecessary because uninitialized replicas can never win elections, so there is no reason for them to ever call an election. In fact, uninitialized replicas do not even know who their peers are, so there would be no way for them to call an election or for them to send any other non-reactive message. As a result, all work performed by an uninitialized replica is reactive and in response to incoming messages (see `processRequestQueue`).

There are multiple ways for an uninitialized replica to be created and then abandoned, and we don't do a good job garbage collecting them at a later point (see #73424), so it is important that they are cheap. Keeping them quiesced instead of letting them unquiesce and tick every 200ms indefinitely avoids a meaningful amount of periodic work for each uninitialized replica.

Release notes (bug fix): uninitialized replicas that are abandoned after an unsuccessful snapshot no longer perform periodic background work, so they no longer have a non-negligible cost.

73641: circuit: add probing-based circuit breaker r=erikgrinaker a=tbg

These are initially for use in #71806 but were originally conceived of
for #70485, which we are not currently prioritizing.

Importantly, this circuit breaker does not recruit a fraction of
requests to do the probing, which is desirable for both PR #71806 and
PR #70485; requests recruited as probes tend to incur high latency and
errors, and we don't want SQL client traffic to experience those.

Release note: None


73718: kv: pass roachpb.Header by pointer to DeclareKeysFunc r=nvanbenschoten a=nvanbenschoten

The `roachpb.Header` struct is up to 160 bytes in size. That's a little too
large to be passing by value repeatedly when doing so is easy to avoid. This
commit switches to passing roachpb.Header structs by pointer through the
DeclareKeysFunc implementations.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
@tbg tbg force-pushed the circ-break-unavailable-ranges branch from 85486f2 to ba48f8b Compare December 23, 2021 16:17
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Dec 23, 2021

Spent some time on this again today. While putting together an end-to-end test (lose quorum, try to write and see the failfast) I realized that we do need to make all parts of the read/write pipeline aware of the circuit breaker to avoid getting stuck. I knew that we needed lease proposals to be sensitive to the breaker (so that you wouldn't get stuck trying to acquire a lease over and over when there wasn't a valid lease to begin with) but I wasn't aware of a much more basic problem: that when one command fails fast after having proposed a command, the latches stick around. They have to; but this means that the next command will get stuck on the latches. So we need to short-circuit the latch acquisition as well when the breaker trips. Of course we don't want to plumb the circuit breaker everywhere, you'd really want to rely on context.Context for that sort of thing. I've hacked something together in the current version of the PR (as pushed right now) which I can polish to something that should get the job done for v1. Down the road, it would be interesting to explore #66884 as an ingredient here but I don't want this to become a prerequisite.

@tbg tbg force-pushed the circ-break-unavailable-ranges branch from ba48f8b to d6c7cc6 Compare January 3, 2022 16:21
tc.RequireIsNotLeaseholderError(t, tc.Write(n2))
})

runCircuitBreakerTest(t, "leaseholder-unavailable", func(t *testing.T, ctx context.Context, tc *circuitBreakerTest) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go into the next commit, "wip: add leaseholder-unavailable test".

@erikgrinaker
Copy link
Copy Markdown
Contributor

when one command fails fast after having proposed a command, the latches stick around. They have to; but this means that the next command will get stuck on the latches. So we need to short-circuit the latch acquisition as well when the breaker trips. Of course we don't want to plumb the circuit breaker everywhere, you'd really want to rely on context.Context for that sort of thing

Ugh, this is somewhat annoying. Conceptually, cancelling the context in Send() when the breaker trips makes sense -- it's essentially what we want to happen. But the bookkeeping gets a bit unwieldy, as you point out. It'd be nice if we could inject a cancel trigger into the context somehow, without needing to keep track of the individual cancel functions.

I'll have to remind myself of how the latch guards are managed, but if the latches turn out to be the only problem here, is it terrible to come up with a solution at/below the Raft level that causes us to abandon proposals/commands and release latches without having to do context management higher up? Although I realize that the context approach generalizes better across failure modes.

@tbg tbg force-pushed the circ-break-unavailable-ranges branch from d6c7cc6 to 1671b8d Compare January 4, 2022 16:29
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 5, 2022

Discussed this with Erik yesterday. Despite the potential perf overhead we decided that it was an important property that requests get canceled reliably on breaker tripping. This means making all ctxs cancelable and stashing the cancels. If we don't do that, we can sort of reliably refuse requests once the breaker has solidly tripped, but while the tripping is being detected, some requests might sneak in and get stuck on the latch manager (or really anywhere else that doesn't have a dedicated breaker check). We didn't want to plumb the breaker signal into the latch manager, as context cancellation is the "Go way" of doing what we're trying to do. As we have a few times before, we were dissatisfied with what the Go ecosystem provides in terms of joining multiple cancellation chains in a performant way; there are always more allocations involved than seems reasonable (related: we don't use Stopper.WithCancelOnQuiesce liberally because it's also expensive-ish).

We are abandoning our initial ambition to let reads that can be served succeed: this is just hard to build correctly. The main reason for reads to get stuck (no lease) is handled adequately by only circuit breaking on writes (lease acquisition is a write), but it's trickier if there is a stuck in-flight write that then prevents reads from getting their latches. We'd basically have to tell the latch manager to fail-fast a read but only if it is transitively waiting on a write's latch (write latches have to stick around on unavailable ranges, even if the client has long been fail-fasted). The "transitively" here is not something the latch manager is even concerned with today, see #73916 for a related discussion.

so:

  • early in Replica.Send, check the breaker, make context cancelable and stash the cancel func somewhere on Replica
  • on break, have the probe cancel everyone.

tbg added a commit to tbg/cockroach that referenced this pull request Jan 24, 2022
On the leaseholder, `ctx` passed to `computeChecksumPostApply` is that
of the proposal. As of cockroachdb#71806, this context is canceled right after the
corresponding proposal is signaled (and the client goroutine returns
from `sendWithRangeID`). This effectively prevents most consistency
checks from succeeding (they previously were not affected by
higher-level cancellation because the consistency check is triggered
from a local queue that talks directly to the replica, i.e. had
something like a minutes-long timeout).

This caused disastrous behavior in the `clearrange` suite of roachtests.
That test imports a large table. After the import, most ranges have
estimates (due to the ctx cancellation preventing the consistency
checks, which as a byproduct trigger stats adjustments) and their stats
claim that they are very small. Before recent PR cockroachdb#74674, `ClearRange` on
such ranges would use individual point deletions instead of the much
more efficient pebble range deletions, effectively writing a lot of data
and running the nodes out of disk.

Failures of `clearrange` with cockroachdb#74674 were also observed, but they did
not involve out-of-disk situations, so are possibly an alternative
failure mode (that may still be related to the newly introduced presence
of context cancellation).

Touches cockroachdb#68303.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 24, 2022
74810: kvserver: remove unnecessary special casing of lease for LAI r=nvanbenschoten a=tbg

Touches #33007 (this simplifies #71806).

We were previously "not" assigning a LeaseAppliedIndex to lease
request proposals. But we were doing so very half-heartedly: instead of
always assigning zero, we assigned "whatever was assigned to the
previous command". This meant that in practice, the first lease on a
range would get zero, and any subsequent leases would get some nonzero
numbers. This wasn't particularly principled and raises eyebrows. For
testing convenience and clarity it is helpful to assign a zero MLAI
to leases, which is what this commit does.

We then turn to the special-casing in refreshProposalsLocked.

`(*Replica).refreshProposalsLocked` previously refused to repropose
commands that had a zero `AppliedLeaseIndex` (LAI). This was done on
grounds of needing to provide replay protection for these requests.
Upon inspection it became clear that a zero MLAI could only ever apply
to lease requests; they are the only proposals with an exception, and as
we saw above it would not usually result in a zero, but could (and
especially in testing, where most leases are the first lease on their
respective range).

We [discussed] this internally and concluded that leases can in fact be
reproposed, since they have their own replay protection mediated through
the lease's sequence number. This is great since as stated above, we did
in fact repropose (some) leases and have for years.

This commit removes the special casing.

Fixes #74711.

As suggested by @nvanbenschoten I ran the `kvserver` tests with this
diff that reproposes every lease request:

```diff
diff --git a/pkg/kv/kvserver/replica_proposal_buf.go b/pkg/kv/kvserver/replica_proposal_buf.go
index 95765d8..1b66745866 100644
--- a/pkg/kv/kvserver/replica_proposal_buf.go
+++ b/pkg/kv/kvserver/replica_proposal_buf.go
@@ -576,6 +576,11 @@ func (b *propBuf) FlushLockedWithRaftGroup(
 			ents = append(ents, raftpb.Entry{
 				Data: p.encodedCommand,
 			})
+			if p.Request.IsLeaseRequest() {
+				ents = append(ents, raftpb.Entry{
+					Data: p.encodedCommand,
+				})
+			}
 		}
 	}
 	if firstErr != nil {

```

This inspired the subsequent commits.

[discussed]: https://cockroachlabs.slack.com/archives/G01G8LK77DK/p1641559185021100

Release note: None

74923: sql: pre-split hash sharded indexes ranges before DELETE_AND_WRITE_ONLY r=chengxiong-ruan a=chengxiong-ruan

Fixes #74558

Pre-split ranges on shard boundaries before SchemaChanger move
new hash sharded indexes from DELETE_ONLY to DELETE_AND_WRITE_ONLY
state.

Release note (bug fix): None

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com>
tbg added a commit to tbg/cockroach that referenced this pull request Jan 25, 2022
On the leaseholder, `ctx` passed to `computeChecksumPostApply` is that
of the proposal. As of cockroachdb#71806, this context is canceled right after the
corresponding proposal is signaled (and the client goroutine returns
from `sendWithRangeID`). This effectively prevents most consistency
checks from succeeding (they previously were not affected by
higher-level cancellation because the consistency check is triggered
from a local queue that talks directly to the replica, i.e. had
something like a minutes-long timeout).

This caused disastrous behavior in the `clearrange` suite of roachtests.
That test imports a large table. After the import, most ranges have
estimates (due to the ctx cancellation preventing the consistency
checks, which as a byproduct trigger stats adjustments) and their stats
claim that they are very small. Before recent PR cockroachdb#74674, `ClearRange` on
such ranges would use individual point deletions instead of the much
more efficient pebble range deletions, effectively writing a lot of data
and running the nodes out of disk.

Failures of `clearrange` with cockroachdb#74674 were also observed, but they did
not involve out-of-disk situations, so are possibly an alternative
failure mode (that may still be related to the newly introduced presence
of context cancellation).

Touches cockroachdb#68303.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 25, 2022
75448: kvserver: use Background() in computeChecksumPostApply goroutine r=erikgrinaker a=tbg

On the leaseholder, `ctx` passed to `computeChecksumPostApply` is that
of the proposal. As of #71806, this context is canceled right after the
corresponding proposal is signaled (and the client goroutine returns
from `sendWithRangeID`). This effectively prevents most consistency
checks from succeeding (they previously were not affected by
higher-level cancellation because the consistency check is triggered
from a local queue that talks directly to the replica, i.e. had
something like a minutes-long timeout).

This caused disastrous behavior in the `clearrange` suite of roachtests.
That test imports a large table. After the import, most ranges have
estimates (due to the ctx cancellation preventing the consistency
checks, which as a byproduct trigger stats adjustments) and their stats
claim that they are very small. Before recent PR #74674, `ClearRange` on
such ranges would use individual point deletions instead of the much
more efficient pebble range deletions, effectively writing a lot of data
and running the nodes out of disk.

Failures of `clearrange` with #74674 were also observed, but they did
not involve out-of-disk situations, so are possibly an alternative
failure mode (that may still be related to the newly introduced presence
of context cancellation).

Touches #68303.

Release note: None


Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
nicktrav pushed a commit that referenced this pull request Jan 28, 2022
On the leaseholder, `ctx` passed to `computeChecksumPostApply` is that
of the proposal. As of #71806, this context is canceled right after the
corresponding proposal is signaled (and the client goroutine returns
from `sendWithRangeID`). This effectively prevents most consistency
checks from succeeding (they previously were not affected by
higher-level cancellation because the consistency check is triggered
from a local queue that talks directly to the replica, i.e. had
something like a minutes-long timeout).

This caused disastrous behavior in the `clearrange` suite of roachtests.
That test imports a large table. After the import, most ranges have
estimates (due to the ctx cancellation preventing the consistency
checks, which as a byproduct trigger stats adjustments) and their stats
claim that they are very small. Before recent PR #74674, `ClearRange` on
such ranges would use individual point deletions instead of the much
more efficient pebble range deletions, effectively writing a lot of data
and running the nodes out of disk.

Failures of `clearrange` with #74674 were also observed, but they did
not involve out-of-disk situations, so are possibly an alternative
failure mode (that may still be related to the newly introduced presence
of context cancellation).

Touches #68303.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jan 31, 2022
…egistry

PR cockroachdb#71806 originally landed with a very simple solution for the problem
of joining the breaker's cancellation chain with that of the incoming
context: spawning a goroutine. This was never meant to be the solution
in prod (spawning goroutines is not cheap) and this PR switches to a
real solution (which may still have soom room for improvement).

Replica circuit breakers are still off (cockroachdb#74705) and so this change
should not change any behavior, unless breakers are turned on. The
provided benchmarks show both the "end-to-end" `Replica.Send` as well as
the micro-bench `breaker.{R,Un}egister` perf and contrast it with that
of having the breakers disabled.

This sets the stage for evaluating and, if necessary, minimizing the
overhead, which (along with some additional end-to-end testing) then
allows us to turn breakers on by default (cockroachdb#74705).

`BenchmarkReplicaCircuitBreakerSendOverhead` measures `Replica.Send` of
an no-result single-point read. We see that we lose a few percent,
though this benchmark is pretty micro already.

`BenchmarkReplicaCircuitBreaker_Register` is even more micro, measuring
the overhead of making a cancel, and calling `Register()` followed by
`Unregister()`. The "old" measurement is essentially just creating a
cancelable context (so the +XXXX% numbers don't mean a lot; we're doing
work for which there was previously no analog). Here we can clearly see
that sharding the mutex map can help, though note that this would
already only be visible at the `Replica.Send` level at much higher
concurrencies than in this test (16).

I also ran the kv95 roachtest with what corresponds to the `mutexmap-1`
strategy above (i.e. a single mutex); [raw data here]. In this
configuration (relative to disabling breakers) `kv95/enc=false/nodes=1`
and `kv95/enc=false/nodes=1/cpu=32` sustained a -1.62% hit (in qps,
median over five runs). Once we've chosen a suitable shard count, I
expect that to drop a bit as well, and think we can turn on breakers in
production, at least from a performance point of view.

[raw data here]: https://gist.github.com/tbg/44b53a10d18490a6dabd57e1d1d3225e

old = disabled, new = enabled:

```
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.63µs ± 4%    1.77µs ± 5%     +8.38%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.66µs ± 4%    1.75µs ± 5%     +5.74%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.63µs ± 6%    1.75µs ± 4%     +7.48%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.58µs ± 7%    1.73µs ± 4%     +9.47%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.57µs ± 4%    1.62µs ± 6%     +3.19%  (p=0.046 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.56µs ± 5%    1.63µs ± 7%     +4.20%  (p=0.007 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.53µs ± 5%    1.62µs ± 8%     +6.01%  (p=0.001 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.55µs ± 7%    1.61µs ± 5%     +3.50%  (p=0.049 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.57µs ± 7%    1.62µs ± 6%     +3.53%  (p=0.042 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.53µs ± 7%    1.64µs ± 5%     +6.84%  (p=0.000 n=11+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.42kB ± 0%    1.44kB ± 0%     +1.25%  (p=0.000 n=7+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.42kB ± 0%    1.44kB ± 0%     +1.19%  (p=0.000 n=9+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.42kB ± 0%    1.43kB ± 0%     +1.24%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.42kB ± 0%    1.43kB ± 0%     +1.08%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=10+10)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.42kB ± 0%    1.43kB ± 0%     +1.12%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.42kB ± 0%    1.43kB ± 0%     +1.09%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.41kB ± 0%    1.43kB ± 0%     +1.13%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.41kB ± 0%    1.43kB ± 0%     +1.23%  (p=0.000 n=9+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
```

```
ReplicaCircuitBreaker_Register/X/mutexmap-1-16        30.8ns ±21%   787.6ns ± 2%  +2457.74%  (p=0.000 n=11+11)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16        31.6ns ± 3%   782.0ns ± 2%  +2376.38%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16        31.0ns ± 0%   778.8ns ± 2%  +2409.61%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16        31.0ns ± 0%   775.8ns ± 1%  +2403.11%  (p=0.000 n=10+10)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16       31.0ns ± 1%   288.8ns ± 2%   +833.06%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16       31.1ns ± 1%   324.6ns ± 4%   +945.05%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16       31.1ns ± 0%   193.3ns ± 2%   +522.26%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16       31.0ns ± 0%   286.1ns ± 1%   +822.80%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16       31.0ns ± 0%   194.4ns ± 2%   +527.12%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16       31.0ns ± 1%   121.3ns ± 2%   +291.01%  (p=0.000 n=10+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
```

Created via:

```
$ ./dev bench ./pkg/kv/kvserver/ -f BenchmarkReplicaCircuitBreaker --count 10 | tee out.txt
$ for f in enabled=true enabled=false; do grep -F "${f}" out.txt | sed "s/${f}/X/" > "${f}"; done && benchstat 'enabled=false' 'enabled=true'
```

For the record, I also tried a `sync.Map`, but it was vastly inferior
for this use case as it is allocation-heavy and its writes are fairly
slow.

Touches cockroachdb#33007.
Fixes cockroachdb#74707.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Feb 4, 2022
…egistry

PR cockroachdb#71806 originally landed with a very simple solution for the problem
of joining the breaker's cancellation chain with that of the incoming
context: spawning a goroutine. This was never meant to be the solution
in prod (spawning goroutines is not cheap) and this PR switches to a
real solution (which may still have soom room for improvement).

Replica circuit breakers are still off (cockroachdb#74705) and so this change
should not change any behavior, unless breakers are turned on. The
provided benchmarks show both the "end-to-end" `Replica.Send` as well as
the micro-bench `breaker.{R,Un}egister` perf and contrast it with that
of having the breakers disabled.

This sets the stage for evaluating and, if necessary, minimizing the
overhead, which (along with some additional end-to-end testing) then
allows us to turn breakers on by default (cockroachdb#74705).

`BenchmarkReplicaCircuitBreakerSendOverhead` measures `Replica.Send` of
an no-result single-point read. We see that we lose a few percent,
though this benchmark is pretty micro already.

`BenchmarkReplicaCircuitBreaker_Register` is even more micro, measuring
the overhead of making a cancel, and calling `Register()` followed by
`Unregister()`. The "old" measurement is essentially just creating a
cancelable context (so the +XXXX% numbers don't mean a lot; we're doing
work for which there was previously no analog). Here we can clearly see
that sharding the mutex map can help, though note that this would
already only be visible at the `Replica.Send` level at much higher
concurrencies than in this test (16).

I also ran the kv95 roachtest with what corresponds to the `mutexmap-1`
strategy above (i.e. a single mutex); [raw data here]. In this
configuration (relative to disabling breakers) `kv95/enc=false/nodes=1`
and `kv95/enc=false/nodes=1/cpu=32` sustained a -1.62% hit (in qps,
median over five runs). Once we've chosen a suitable shard count, I
expect that to drop a bit as well, and think we can turn on breakers in
production, at least from a performance point of view.

[raw data here]: https://gist.github.com/tbg/44b53a10d18490a6dabd57e1d1d3225e

old = disabled, new = enabled:

```
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.63µs ± 4%    1.77µs ± 5%     +8.38%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.66µs ± 4%    1.75µs ± 5%     +5.74%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.63µs ± 6%    1.75µs ± 4%     +7.48%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.58µs ± 7%    1.73µs ± 4%     +9.47%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.57µs ± 4%    1.62µs ± 6%     +3.19%  (p=0.046 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.56µs ± 5%    1.63µs ± 7%     +4.20%  (p=0.007 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.53µs ± 5%    1.62µs ± 8%     +6.01%  (p=0.001 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.55µs ± 7%    1.61µs ± 5%     +3.50%  (p=0.049 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.57µs ± 7%    1.62µs ± 6%     +3.53%  (p=0.042 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.53µs ± 7%    1.64µs ± 5%     +6.84%  (p=0.000 n=11+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.42kB ± 0%    1.44kB ± 0%     +1.25%  (p=0.000 n=7+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.42kB ± 0%    1.44kB ± 0%     +1.19%  (p=0.000 n=9+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.42kB ± 0%    1.43kB ± 0%     +1.24%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.42kB ± 0%    1.43kB ± 0%     +1.08%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=10+10)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.42kB ± 0%    1.43kB ± 0%     +1.12%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.42kB ± 0%    1.43kB ± 0%     +1.09%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.41kB ± 0%    1.43kB ± 0%     +1.13%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.41kB ± 0%    1.43kB ± 0%     +1.23%  (p=0.000 n=9+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
```

```
ReplicaCircuitBreaker_Register/X/mutexmap-1-16        30.8ns ±21%   787.6ns ± 2%  +2457.74%  (p=0.000 n=11+11)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16        31.6ns ± 3%   782.0ns ± 2%  +2376.38%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16        31.0ns ± 0%   778.8ns ± 2%  +2409.61%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16        31.0ns ± 0%   775.8ns ± 1%  +2403.11%  (p=0.000 n=10+10)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16       31.0ns ± 1%   288.8ns ± 2%   +833.06%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16       31.1ns ± 1%   324.6ns ± 4%   +945.05%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16       31.1ns ± 0%   193.3ns ± 2%   +522.26%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16       31.0ns ± 0%   286.1ns ± 1%   +822.80%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16       31.0ns ± 0%   194.4ns ± 2%   +527.12%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16       31.0ns ± 1%   121.3ns ± 2%   +291.01%  (p=0.000 n=10+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
```

Created via:

```
$ ./dev bench ./pkg/kv/kvserver/ -f BenchmarkReplicaCircuitBreaker --count 10 | tee out.txt
$ for f in enabled=true enabled=false; do grep -F "${f}" out.txt | sed "s/${f}/X/" > "${f}"; done && benchstat 'enabled=false' 'enabled=true'
```

For the record, I also tried a `sync.Map`, but it was vastly inferior
for this use case as it is allocation-heavy and its writes are fairly
slow.

Touches cockroachdb#33007.
Fixes cockroachdb#74707.

Release note: None
craig bot pushed a commit that referenced this pull request Feb 4, 2022
75224: kvserver: replace circuit breaker cancel goroutine with per-Replica registry r=erikgrinaker a=tbg

PR #71806 originally landed with a very simple solution for the problem
of joining the breaker's cancellation chain with that of the incoming
context: spawning a goroutine. This was never meant to be the solution
in prod (spawning goroutines is not cheap) and this PR switches to a
real solution (which may still have soom room for improvement).

Replica circuit breakers are still off (#74705) and so this change
should not change any behavior, unless breakers are turned on. The
provided benchmarks show both the "end-to-end" `Replica.Send` as well as
the micro-bench `breaker.{R,Un}egister` perf and contrast it with that
of having the breakers disabled.

This sets the stage for evaluating and, if necessary, minimizing the
overhead, which (along with some additional end-to-end testing) then
allows us to turn breakers on by default (#74705).

`BenchmarkReplicaCircuitBreakerSendOverhead` measures `Replica.Send` of
an no-result single-point read. We see that we lose a few percent,
though this benchmark is pretty micro already.

`BenchmarkReplicaCircuitBreaker_Register` is even more micro, measuring
the overhead of making a cancel, and calling `Register()` followed by
`Unregister()`. The "old" measurement is essentially just creating a
cancelable context (so the +XXXX% numbers don't mean a lot; we're doing
work for which there was previously no analog). Here we can clearly see
that sharding the mutex map can help, though note that this would
already only be visible at the `Replica.Send` level at much higher
concurrencies than in this test (16).

I also ran the kv95 roachtest with what corresponds to the `mutexmap-1`
strategy above (i.e. a single mutex); [raw data here]. In this
configuration (relative to disabling breakers) `kv95/enc=false/nodes=1`
and `kv95/enc=false/nodes=1/cpu=32` sustained a -1.62% hit (in qps,
median over five runs). Once we've chosen a suitable shard count, I
expect that to drop a bit as well, and think we can turn on breakers in
production, at least from a performance point of view.

[raw data here]: https://gist.github.com/tbg/44b53a10d18490a6dabd57e1d1d3225e

old = disabled, new = enabled:

```
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.63µs ± 4%    1.77µs ± 5%     +8.38%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.66µs ± 4%    1.75µs ± 5%     +5.74%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.63µs ± 6%    1.75µs ± 4%     +7.48%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.58µs ± 7%    1.73µs ± 4%     +9.47%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.57µs ± 4%    1.62µs ± 6%     +3.19%  (p=0.046 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.56µs ± 5%    1.63µs ± 7%     +4.20%  (p=0.007 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.53µs ± 5%    1.62µs ± 8%     +6.01%  (p=0.001 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.55µs ± 7%    1.61µs ± 5%     +3.50%  (p=0.049 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.57µs ± 7%    1.62µs ± 6%     +3.53%  (p=0.042 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.53µs ± 7%    1.64µs ± 5%     +6.84%  (p=0.000 n=11+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.42kB ± 0%    1.44kB ± 0%     +1.25%  (p=0.000 n=7+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.42kB ± 0%    1.44kB ± 0%     +1.19%  (p=0.000 n=9+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.42kB ± 0%    1.43kB ± 0%     +1.24%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.42kB ± 0%    1.43kB ± 0%     +1.08%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=10+10)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.42kB ± 0%    1.43kB ± 0%     +1.12%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.42kB ± 0%    1.43kB ± 0%     +1.09%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.41kB ± 0%    1.43kB ± 0%     +1.13%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.41kB ± 0%    1.43kB ± 0%     +1.23%  (p=0.000 n=9+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
```

```
ReplicaCircuitBreaker_Register/X/mutexmap-1-16        30.8ns ±21%   787.6ns ± 2%  +2457.74%  (p=0.000 n=11+11)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16        31.6ns ± 3%   782.0ns ± 2%  +2376.38%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16        31.0ns ± 0%   778.8ns ± 2%  +2409.61%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16        31.0ns ± 0%   775.8ns ± 1%  +2403.11%  (p=0.000 n=10+10)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16       31.0ns ± 1%   288.8ns ± 2%   +833.06%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16       31.1ns ± 1%   324.6ns ± 4%   +945.05%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16       31.1ns ± 0%   193.3ns ± 2%   +522.26%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16       31.0ns ± 0%   286.1ns ± 1%   +822.80%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16       31.0ns ± 0%   194.4ns ± 2%   +527.12%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16       31.0ns ± 1%   121.3ns ± 2%   +291.01%  (p=0.000 n=10+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
```

Created via:

```
$ ./dev bench ./pkg/kv/kvserver/ -f BenchmarkReplicaCircuitBreaker --count 10 | tee out.txt
$ for f in enabled=true enabled=false; do grep -F "${f}" out.txt | sed "s/${f}/X/" > "${f}"; done && benchstat 'enabled=false' 'enabled=true'
```

For the record, I also tried a `sync.Map`, but it was vastly inferior
for this use case as it is allocation-heavy and its writes are fairly
slow.

Touches #33007.
Fixes #74707.

Release note: None

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kvserver: circuit-break requests to unavailable ranges

3 participants