sql: introduce BenchmarkWriter benchmark#76244
Conversation
| for _, sinkType := range []string{"blackHoleSink", "realSink"} { | ||
| b.Run(fmt.Sprintf("sinkType=%s", sinkType), func(b *testing.B) { | ||
| for _, numOfConcurrentWriter := range []int{1, 24, 48, 64, 92, 128} { | ||
| b.Run(fmt.Sprintf("concurrentWriter=%d", numOfConcurrentWriter), func(b *testing.B) { | ||
| starter := make(chan struct{}) |
There was a problem hiding this comment.
nit: I think you can clean it up a bit
run := func(b *testing.B, sink messageSink) {
w := newWriter(sink, blockPool)
starter := make(chan struct{})
b.SetBytes(messageBlockSize * entrySize)
var wg sync.WaitGroup
for writerIdx := 0; writerIdx < numOfConcurrentWriter; writerIdx++ {
// ...
}
b.ResetTimer()
close(starter)
wg.Wait()
}
for _, sinkType := range []struct {
name string
new func() (_ messageSink, cleanup func()
}{
{
name: "blackHole",
new: func() (messageSink, func()) {
blackHole := newBlackHoleSink(blockPool, channelSize)
blackHole.start()
return blackHold, blackHole.stop
}
// ...
} {
//...|
|
||
| <-starter | ||
|
|
||
| for i := 0; i < b.N; i++ { |
There was a problem hiding this comment.
it's probably interesting to divide b.N by numOfConcurrentWriter
There was a problem hiding this comment.
Done.
The updated benchmark number
$ benchstat ~/tmp/blackhole ~/tmp/real [14:58:33]
name old time/op new time/op delta
Writer/concurrentWriter=1-24 72.2ns ± 0% 788.1ns ± 6% +991.61% (p=0.000 n=9+10)
Writer/concurrentWriter=24-24 117ns ±11% 227ns ± 9% +94.19% (p=0.000 n=10+10)
Writer/concurrentWriter=48-24 123ns ±16% 176ns ± 9% +43.10% (p=0.000 n=10+9)
Writer/concurrentWriter=64-24 122ns ± 2% 181ns ± 2% +47.90% (p=0.000 n=8+8)
Writer/concurrentWriter=92-24 142ns ± 8% 164ns ± 4% +15.06% (p=0.000 n=10+10)
Writer/concurrentWriter=128-24 160ns ± 7% 174ns ± 5% +8.74% (p=0.019 n=10+10)
name old speed new speed delta
Writer/concurrentWriter=1-24 340GB/s ± 0% 31GB/s ± 6% -90.83% (p=0.000 n=9+10)
Writer/concurrentWriter=24-24 211GB/s ±10% 108GB/s ± 9% -48.61% (p=0.000 n=10+10)
Writer/concurrentWriter=48-24 201GB/s ±14% 140GB/s ± 9% -30.47% (p=0.000 n=10+9)
Writer/concurrentWriter=64-24 201GB/s ± 2% 136GB/s ± 2% -32.39% (p=0.000 n=8+8)
Writer/concurrentWriter=92-24 173GB/s ± 8% 150GB/s ± 4% -13.38% (p=0.000 n=10+10)
Writer/concurrentWriter=128-24 154GB/s ± 7% 142GB/s ± 5% -8.15% (p=0.019 n=10+10)
name old alloc/op new alloc/op delta
Writer/concurrentWriter=1-24 0.00B 219.00B ± 3% +Inf% (p=0.000 n=10+9)
Writer/concurrentWriter=24-24 0.00B 22.60B ± 3% +Inf% (p=0.000 n=10+10)
Writer/concurrentWriter=48-24 0.00B 19.00B ± 0% +Inf% (p=0.000 n=10+10)
Writer/concurrentWriter=64-24 0.00B 19.00B ± 0% +Inf% (p=0.000 n=10+10)
Writer/concurrentWriter=92-24 0.00B 17.60B ± 3% +Inf% (p=0.000 n=10+10)
Writer/concurrentWriter=128-24 0.00B 17.30B ± 4% +Inf% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Writer/concurrentWriter=1-24 0.00 1.00 ± 0% +Inf% (p=0.000 n=10+10)
Writer/concurrentWriter=24-24 0.00 1.00 ± 0% +Inf% (p=0.000 n=10+10)
Writer/concurrentWriter=48-24 0.00 1.00 ± 0% +Inf% (p=0.000 n=10+10)
Writer/concurrentWriter=64-24 0.00 0.00 ~ (all equal)
Writer/concurrentWriter=92-24 0.00 0.00 ~ (all equal)
Writer/concurrentWriter=128-24 0.00 0.00 ~ (all equal)
There was a problem hiding this comment.
I feel like this benchmark is not capturing the reality of what's going on in the kv benchmark. 1 extra microsecond of computation time would not lead to the 8% drop in kv0 throughput we're seeing. I think it's worth keeping in mind the ways in which microbenchmarks don't tell the full picture.
Did you try running the kv0 benchmark on a cluster and looking at profiles?
There was a problem hiding this comment.
CPU Profile
Memory Profile
Heap Profile
I'm not too sure how to interpret the graphs here. It seems like there is a lot of time spent in the cache.UnorderedCache to perform eviction, and there some allocations performed in cache.UnorderedCache.
I'm not sure if all the small allocation made by the cache.UnorderedCache can cause more GC pressure? My reasoning here is that the kv workload is throwing nearing 20k QPS and since we perform 2 writes per operation, we are performing 40k writes per second. That would be a lot of small object allocated on the heap (used for the linked list used to perform FIFO eviction).
There was a problem hiding this comment.
I did another kv95 run by turning txnidcache.Cache into effectively a no-op sink. It actually performs on par with before TxnIDCache was introduced.
| Branch | elapsed | errors | ops(total) | ops/sec(cum) | avg(ms) | p50(ms) | p95(ms) | p99(ms) | pMax(ms) |
|---|---|---|---|---|---|---|---|---|---|
| baseline | 2700.0s | 0 | 55785736 | 20661.4 | 3.1 | 2.4 | 8.4 | 13.1 | 75.5 |
| cache-normal | 2700.0s | 0 | 49867068 | 18469.3 | 3.5 | 2.5 | 10.0 | 17.8 | 671.1 |
| cache-noop-sink | 2700.0s | 0 | 56440227 | 20903.8 | 3.1 | 2.4 | 8.1 | 13.1 | 71.3 |
I think at this point it's reasonable to believe that the slowdown comes from the sink which couldn't keep up with the incoming writes, causing the channel to fill up which resulted in back pressure and cause the concurrentWriteBuffer to wait.
I think at this point there are a few options:
- have multiple
cache.UnorderedCacheand each cache has its own goroutine / channel. - try out using the ring buffer. This might introduce slightly slower read, but I think since read doesn't happen during the query execution hot path, it's an acceptable trade off.
- maybe both?
I will try out these two ideas tomorrow.
There was a problem hiding this comment.
Here was the diff I introduced
diff --git a/pkg/sql/contention/txnidcache/txn_id_cache.go b/pkg/sql/contention/txnidcache/txn_id_cache.go
index 9384ed7044..e335ec6e39 100644
--- a/pkg/sql/contention/txnidcache/txn_id_cache.go
+++ b/pkg/sql/contention/txnidcache/txn_id_cache.go
@@ -171,14 +171,15 @@ func NewTxnIDCache(st *cluster.Settings, metrics *Metrics) *Cache {
// Start implements the Provider interface.
func (t *Cache) Start(ctx context.Context, stopper *stop.Stopper) {
addBlockToStore := func(msgBlock *messageBlock) {
- t.mu.Lock()
- defer t.mu.Unlock()
- for blockIdx := range msgBlock {
- if !msgBlock[blockIdx].valid() {
- break
- }
- t.mu.store.Add(msgBlock[blockIdx].TxnID, msgBlock[blockIdx].TxnFingerprintID)
- }
+ // TODO(azhng): wip: imagine if we are using a blackhoel sink
+ // t.mu.Lock()
+ // defer t.mu.Unlock()
+ // for blockIdx := range msgBlock {
+ // if !msgBlock[blockIdx].valid() {
+ // break
+ // }
+ // t.mu.store.Add(msgBlock[blockIdx].TxnID, msgBlock[blockIdx].TxnFingerprintID)
+ // }
}
consumeBlock := func(b *messageBlock) {
9a32847 to
193c423
Compare
| name: "blackHoleSink", | ||
| new: func() (_ messageSink, _ *sync.Pool, cleanup func()) { | ||
| blockPool := &sync.Pool{ | ||
| New: func() interface{} { | ||
| return &messageBlock{} | ||
| }, | ||
| } | ||
|
|
||
| blackHole := newBlackHoleSink(blockPool, channelSize) | ||
| blackHole.start() | ||
|
|
||
| return blackHole, blockPool, blackHole.stop | ||
| }, | ||
| }, | ||
| { | ||
| name: "realSink", |
There was a problem hiding this comment.
nit: you don't need the word Sink in these names, you already have sink=
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @Azhng)
pkg/sql/contention/txnidcache/writer_test.go, line 30 at r2 (raw file):
pool *sync.Pool // Simulate a real sink
nit: period
7857ad0 to
edbbefb
Compare
|
Chatted with @ajwerner offline, updated the benchmark to populate New benchmark yields significantly bigger drop in throughput when the realsink is used. |
This commit introduces the new BenchmarkWriter benchmark to measure the write performance for TxnID Cache. Informs cockroachdb#76013 Release note: None
edbbefb to
00fd06a
Compare
76350: sql: introduce new FIFO cache for txnID cache r=Azhng a=Azhng Reviewer note: first commit belongs to #76244 --- Previously, txnID cache relied on cache.UnorderedCache for storage and FIFO eviction behavior. However, due to unintended allocation behavior within cache.UnorderedCache, this yielded about 20% throughput drop in kv95 benchmark. This commit introduces a new FIFO cache, built to remove all allocation during write operation. This commit removes all performance hit caused by the previous use of cache.UnorderedCache. Resolves #76013 Release note: None 76590: sql/catalog,migrations: propagate HasPostDeserializationChanges, simplify migrations r=ajwerner a=ajwerner This is an imperfect change that feels a bit ad-hoc for my liking, but does improve the situation. The problem is that in recent changes, we no longer just retrieve mutable descriptors and then sometimes clone Immutable descriptors. Instead we create the more general Immutable descriptors and clone out mutables. The problem is that when we did this, we threw away information regarding whether the descriptor itself had been modified by post-deserialization. In general, this is mostly an exercise in plumbing. The key feature is that when we retreive a Mutable descriptor and it was changed, we need to know that. This change tests that. There is some ad-hoc code to propagate isUncommitedVersion in various places which I don't feel great about, but it also felt wrong not to. A follow-up should come through to really nail down the properties here. The existence of NewBuilder and the fact that it's called in various places is another mechanism by which this information could be erased, but that's less of a concern for me. This change makes it possible to simplify migrations around descriptor upgrades. ### migrations: simplify descriptor migrations We have two migrations which have the goal of iterating through the descriptors and re-writing them if there are any changes due to the post-deserialization logic. They were needlessly complex and low-level. Probably part of the reason was that we were swallowing this information. This commit reworks them to use shared library code. Probably one of them could be eliminated altogether. Release note: None Co-authored-by: Azhng <archer.xn@gmail.com> Co-authored-by: Andrew Werner <awerner32@gmail.com>



This commit introduces the new BenchmarkWriter benchmark to measure the
write performance for TxnID Cache.
Informs #76013
Release note: None