Skip to content

kvserver: replace circuit breaker cancel goroutine with per-Replica registry#75224

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:repl-registry
Feb 4, 2022
Merged

kvserver: replace circuit breaker cancel goroutine with per-Replica registry#75224
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:repl-registry

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Jan 20, 2022

PR #71806 originally landed with a very simple solution for the problem
of joining the breaker's cancellation chain with that of the incoming
context: spawning a goroutine. This was never meant to be the solution
in prod (spawning goroutines is not cheap) and this PR switches to a
real solution (which may still have soom room for improvement).

Replica circuit breakers are still off (#74705) and so this change
should not change any behavior, unless breakers are turned on. The
provided benchmarks show both the "end-to-end" Replica.Send as well as
the micro-bench breaker.{R,Un}egister perf and contrast it with that
of having the breakers disabled.

This sets the stage for evaluating and, if necessary, minimizing the
overhead, which (along with some additional end-to-end testing) then
allows us to turn breakers on by default (#74705).

BenchmarkReplicaCircuitBreakerSendOverhead measures Replica.Send of
an no-result single-point read. We see that we lose a few percent,
though this benchmark is pretty micro already.

BenchmarkReplicaCircuitBreaker_Register is even more micro, measuring
the overhead of making a cancel, and calling Register() followed by
Unregister(). The "old" measurement is essentially just creating a
cancelable context (so the +XXXX% numbers don't mean a lot; we're doing
work for which there was previously no analog). Here we can clearly see
that sharding the mutex map can help, though note that this would
already only be visible at the Replica.Send level at much higher
concurrencies than in this test (16).

I also ran the kv95 roachtest with what corresponds to the mutexmap-1
strategy above (i.e. a single mutex); raw data here. In this
configuration (relative to disabling breakers) kv95/enc=false/nodes=1
and kv95/enc=false/nodes=1/cpu=32 sustained a -1.62% hit (in qps,
median over five runs). Once we've chosen a suitable shard count, I
expect that to drop a bit as well, and think we can turn on breakers in
production, at least from a performance point of view.

old = disabled, new = enabled:

ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.63µs ± 4%    1.77µs ± 5%     +8.38%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.66µs ± 4%    1.75µs ± 5%     +5.74%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.63µs ± 6%    1.75µs ± 4%     +7.48%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.58µs ± 7%    1.73µs ± 4%     +9.47%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.57µs ± 4%    1.62µs ± 6%     +3.19%  (p=0.046 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.56µs ± 5%    1.63µs ± 7%     +4.20%  (p=0.007 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.53µs ± 5%    1.62µs ± 8%     +6.01%  (p=0.001 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.55µs ± 7%    1.61µs ± 5%     +3.50%  (p=0.049 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.57µs ± 7%    1.62µs ± 6%     +3.53%  (p=0.042 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.53µs ± 7%    1.64µs ± 5%     +6.84%  (p=0.000 n=11+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.42kB ± 0%    1.44kB ± 0%     +1.25%  (p=0.000 n=7+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.42kB ± 0%    1.44kB ± 0%     +1.19%  (p=0.000 n=9+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.42kB ± 0%    1.43kB ± 0%     +1.24%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.42kB ± 0%    1.43kB ± 0%     +1.08%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=10+10)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.42kB ± 0%    1.43kB ± 0%     +1.12%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.42kB ± 0%    1.43kB ± 0%     +1.09%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.41kB ± 0%    1.43kB ± 0%     +1.13%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.41kB ± 0%    1.43kB ± 0%     +1.23%  (p=0.000 n=9+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-1-16        30.8ns ±21%   787.6ns ± 2%  +2457.74%  (p=0.000 n=11+11)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16        31.6ns ± 3%   782.0ns ± 2%  +2376.38%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16        31.0ns ± 0%   778.8ns ± 2%  +2409.61%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16        31.0ns ± 0%   775.8ns ± 1%  +2403.11%  (p=0.000 n=10+10)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16       31.0ns ± 1%   288.8ns ± 2%   +833.06%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16       31.1ns ± 1%   324.6ns ± 4%   +945.05%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16       31.1ns ± 0%   193.3ns ± 2%   +522.26%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16       31.0ns ± 0%   286.1ns ± 1%   +822.80%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16       31.0ns ± 0%   194.4ns ± 2%   +527.12%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16       31.0ns ± 1%   121.3ns ± 2%   +291.01%  (p=0.000 n=10+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)

Created via:

$ ./dev bench ./pkg/kv/kvserver/ -f BenchmarkReplicaCircuitBreaker --count 10 | tee out.txt
$ for f in enabled=true enabled=false; do grep -F "${f}" out.txt | sed "s/${f}/X/" > "${f}"; done && benchstat 'enabled=false' 'enabled=true'

For the record, I also tried a sync.Map, but it was vastly inferior
for this use case as it is allocation-heavy and its writes are fairly
slow.

Touches #33007.
Fixes #74707.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg force-pushed the repl-registry branch 5 times, most recently from 15b2d70 to 1c1f5d5 Compare January 27, 2022 14:58
@tbg tbg force-pushed the repl-registry branch 5 times, most recently from 2c7d1cd to 8c06818 Compare January 31, 2022 13:04
@tbg tbg marked this pull request as ready for review January 31, 2022 13:16
@tbg tbg requested a review from a team as a code owner January 31, 2022 13:16
@tbg tbg requested a review from erikgrinaker January 31, 2022 13:16
@tbg tbg force-pushed the repl-registry branch 2 times, most recently from b3602a6 to 412a65a Compare January 31, 2022 13:24
Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, only minor comments. Performance-wise, I think this is about as good as one can expect for an initial implementation, although I am a bit surprised that it shows up that prominently in kv95.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Feb 3, 2022

No need to re-review just yet, I'll ping you.

…egistry

PR cockroachdb#71806 originally landed with a very simple solution for the problem
of joining the breaker's cancellation chain with that of the incoming
context: spawning a goroutine. This was never meant to be the solution
in prod (spawning goroutines is not cheap) and this PR switches to a
real solution (which may still have soom room for improvement).

Replica circuit breakers are still off (cockroachdb#74705) and so this change
should not change any behavior, unless breakers are turned on. The
provided benchmarks show both the "end-to-end" `Replica.Send` as well as
the micro-bench `breaker.{R,Un}egister` perf and contrast it with that
of having the breakers disabled.

This sets the stage for evaluating and, if necessary, minimizing the
overhead, which (along with some additional end-to-end testing) then
allows us to turn breakers on by default (cockroachdb#74705).

`BenchmarkReplicaCircuitBreakerSendOverhead` measures `Replica.Send` of
an no-result single-point read. We see that we lose a few percent,
though this benchmark is pretty micro already.

`BenchmarkReplicaCircuitBreaker_Register` is even more micro, measuring
the overhead of making a cancel, and calling `Register()` followed by
`Unregister()`. The "old" measurement is essentially just creating a
cancelable context (so the +XXXX% numbers don't mean a lot; we're doing
work for which there was previously no analog). Here we can clearly see
that sharding the mutex map can help, though note that this would
already only be visible at the `Replica.Send` level at much higher
concurrencies than in this test (16).

I also ran the kv95 roachtest with what corresponds to the `mutexmap-1`
strategy above (i.e. a single mutex); [raw data here]. In this
configuration (relative to disabling breakers) `kv95/enc=false/nodes=1`
and `kv95/enc=false/nodes=1/cpu=32` sustained a -1.62% hit (in qps,
median over five runs). Once we've chosen a suitable shard count, I
expect that to drop a bit as well, and think we can turn on breakers in
production, at least from a performance point of view.

[raw data here]: https://gist.github.com/tbg/44b53a10d18490a6dabd57e1d1d3225e

old = disabled, new = enabled:

```
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.63µs ± 4%    1.77µs ± 5%     +8.38%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.66µs ± 4%    1.75µs ± 5%     +5.74%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.63µs ± 6%    1.75µs ± 4%     +7.48%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.58µs ± 7%    1.73µs ± 4%     +9.47%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.57µs ± 4%    1.62µs ± 6%     +3.19%  (p=0.046 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.56µs ± 5%    1.63µs ± 7%     +4.20%  (p=0.007 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.53µs ± 5%    1.62µs ± 8%     +6.01%  (p=0.001 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.55µs ± 7%    1.61µs ± 5%     +3.50%  (p=0.049 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.57µs ± 7%    1.62µs ± 6%     +3.53%  (p=0.042 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.53µs ± 7%    1.64µs ± 5%     +6.84%  (p=0.000 n=11+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.42kB ± 0%    1.44kB ± 0%     +1.25%  (p=0.000 n=7+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.42kB ± 0%    1.44kB ± 0%     +1.19%  (p=0.000 n=9+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.42kB ± 0%    1.43kB ± 0%     +1.24%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.42kB ± 0%    1.43kB ± 0%     +1.08%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=10+10)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.42kB ± 0%    1.43kB ± 0%     +1.12%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.42kB ± 0%    1.43kB ± 0%     +1.09%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.41kB ± 0%    1.43kB ± 0%     +1.13%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.41kB ± 0%    1.43kB ± 0%     +1.23%  (p=0.000 n=9+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
```

```
ReplicaCircuitBreaker_Register/X/mutexmap-1-16        30.8ns ±21%   787.6ns ± 2%  +2457.74%  (p=0.000 n=11+11)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16        31.6ns ± 3%   782.0ns ± 2%  +2376.38%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16        31.0ns ± 0%   778.8ns ± 2%  +2409.61%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16        31.0ns ± 0%   775.8ns ± 1%  +2403.11%  (p=0.000 n=10+10)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16       31.0ns ± 1%   288.8ns ± 2%   +833.06%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16       31.1ns ± 1%   324.6ns ± 4%   +945.05%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16       31.1ns ± 0%   193.3ns ± 2%   +522.26%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16       31.0ns ± 0%   286.1ns ± 1%   +822.80%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16       31.0ns ± 0%   194.4ns ± 2%   +527.12%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16       31.0ns ± 1%   121.3ns ± 2%   +291.01%  (p=0.000 n=10+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
```

Created via:

```
$ ./dev bench ./pkg/kv/kvserver/ -f BenchmarkReplicaCircuitBreaker --count 10 | tee out.txt
$ for f in enabled=true enabled=false; do grep -F "${f}" out.txt | sed "s/${f}/X/" > "${f}"; done && benchstat 'enabled=false' 'enabled=true'
```

For the record, I also tried a `sync.Map`, but it was vastly inferior
for this use case as it is allocation-heavy and its writes are fairly
slow.

Touches cockroachdb#33007.
Fixes cockroachdb#74707.

Release note: None
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Feb 4, 2022

RFAL.

Copy link
Copy Markdown
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Feb 4, 2022

TFTR!

bors r=erikgrinaker

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 4, 2022

Build succeeded:

@craig craig bot merged commit fdf9a8f into cockroachdb:master Feb 4, 2022
@tbg tbg deleted the repl-registry branch February 4, 2022 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kvserver: replace circuit breaker cancel goroutine with per-Replica registry

3 participants