Skip to content

sql: introduce BenchmarkWriter benchmark#76244

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
Azhng:txn-id-writer-bench
Feb 16, 2022
Merged

sql: introduce BenchmarkWriter benchmark#76244
craig[bot] merged 1 commit intocockroachdb:masterfrom
Azhng:txn-id-writer-bench

Conversation

@Azhng
Copy link
Copy Markdown
Contributor

@Azhng Azhng commented Feb 8, 2022

This commit introduces the new BenchmarkWriter benchmark to measure the
write performance for TxnID Cache.

Informs #76013

Release note: None

@Azhng Azhng requested review from a team and ajwerner February 8, 2022 17:49
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Comment on lines +75 to +79
for _, sinkType := range []string{"blackHoleSink", "realSink"} {
b.Run(fmt.Sprintf("sinkType=%s", sinkType), func(b *testing.B) {
for _, numOfConcurrentWriter := range []int{1, 24, 48, 64, 92, 128} {
b.Run(fmt.Sprintf("concurrentWriter=%d", numOfConcurrentWriter), func(b *testing.B) {
starter := make(chan struct{})
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can clean it up a bit

run := func(b *testing.B, sink messageSink) {
    w := newWriter(sink, blockPool)
    starter := make(chan struct{})
    b.SetBytes(messageBlockSize * entrySize)
    var wg sync.WaitGroup
    for writerIdx := 0; writerIdx < numOfConcurrentWriter; writerIdx++ {
	// ...
    }
    b.ResetTimer()
    close(starter)
    wg.Wait()
}
for _, sinkType := range []struct {
    name string
    new  func() (_ messageSink, cleanup func()
}{
   {
       name: "blackHole",
       new:  func() (messageSink, func()) {
             blackHole := newBlackHoleSink(blockPool, channelSize)
						blackHole.start()
					  return blackHold, blackHole.stop
       }
       // ...
   } {
   //...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


<-starter

for i := 0; i < b.N; i++ {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably interesting to divide b.N by numOfConcurrentWriter

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

The updated benchmark number

$ benchstat ~/tmp/blackhole ~/tmp/real                                                                                                                           [14:58:33]
name                            old time/op    new time/op    delta
Writer/concurrentWriter=1-24      72.2ns ± 0%   788.1ns ± 6%  +991.61%  (p=0.000 n=9+10)
Writer/concurrentWriter=24-24      117ns ±11%     227ns ± 9%   +94.19%  (p=0.000 n=10+10)
Writer/concurrentWriter=48-24      123ns ±16%     176ns ± 9%   +43.10%  (p=0.000 n=10+9)
Writer/concurrentWriter=64-24      122ns ± 2%     181ns ± 2%   +47.90%  (p=0.000 n=8+8)
Writer/concurrentWriter=92-24      142ns ± 8%     164ns ± 4%   +15.06%  (p=0.000 n=10+10)
Writer/concurrentWriter=128-24     160ns ± 7%     174ns ± 5%    +8.74%  (p=0.019 n=10+10)

name                            old speed      new speed      delta
Writer/concurrentWriter=1-24     340GB/s ± 0%    31GB/s ± 6%   -90.83%  (p=0.000 n=9+10)
Writer/concurrentWriter=24-24    211GB/s ±10%   108GB/s ± 9%   -48.61%  (p=0.000 n=10+10)
Writer/concurrentWriter=48-24    201GB/s ±14%   140GB/s ± 9%   -30.47%  (p=0.000 n=10+9)
Writer/concurrentWriter=64-24    201GB/s ± 2%   136GB/s ± 2%   -32.39%  (p=0.000 n=8+8)
Writer/concurrentWriter=92-24    173GB/s ± 8%   150GB/s ± 4%   -13.38%  (p=0.000 n=10+10)
Writer/concurrentWriter=128-24   154GB/s ± 7%   142GB/s ± 5%    -8.15%  (p=0.019 n=10+10)

name                            old alloc/op   new alloc/op   delta
Writer/concurrentWriter=1-24       0.00B        219.00B ± 3%     +Inf%  (p=0.000 n=10+9)
Writer/concurrentWriter=24-24      0.00B         22.60B ± 3%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=48-24      0.00B         19.00B ± 0%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=64-24      0.00B         19.00B ± 0%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=92-24      0.00B         17.60B ± 3%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=128-24     0.00B         17.30B ± 4%     +Inf%  (p=0.000 n=10+10)

name                            old allocs/op  new allocs/op  delta
Writer/concurrentWriter=1-24        0.00           1.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=24-24       0.00           1.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=48-24       0.00           1.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=64-24       0.00           0.00           ~     (all equal)
Writer/concurrentWriter=92-24       0.00           0.00           ~     (all equal)
Writer/concurrentWriter=128-24      0.00           0.00           ~     (all equal)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this benchmark is not capturing the reality of what's going on in the kv benchmark. 1 extra microsecond of computation time would not lead to the 8% drop in kv0 throughput we're seeing. I think it's worth keeping in mind the ways in which microbenchmarks don't tell the full picture.

Did you try running the kv0 benchmark on a cluster and looking at profiles?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPU Profile

Screen Shot 2022-02-08 at 17 59 42

Memory Profile

Screen Shot 2022-02-08 at 17 55 44

Heap Profile

Screen Shot 2022-02-08 at 18 03 07

I'm not too sure how to interpret the graphs here. It seems like there is a lot of time spent in the cache.UnorderedCache to perform eviction, and there some allocations performed in cache.UnorderedCache.

I'm not sure if all the small allocation made by the cache.UnorderedCache can cause more GC pressure? My reasoning here is that the kv workload is throwing nearing 20k QPS and since we perform 2 writes per operation, we are performing 40k writes per second. That would be a lot of small object allocated on the heap (used for the linked list used to perform FIFO eviction).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another kv95 run by turning txnidcache.Cache into effectively a no-op sink. It actually performs on par with before TxnIDCache was introduced.

Branch elapsed errors ops(total) ops/sec(cum) avg(ms) p50(ms) p95(ms) p99(ms) pMax(ms)
baseline 2700.0s 0 55785736 20661.4 3.1 2.4 8.4 13.1 75.5
cache-normal 2700.0s 0 49867068 18469.3 3.5 2.5 10.0 17.8 671.1
cache-noop-sink 2700.0s 0 56440227 20903.8 3.1 2.4 8.1 13.1 71.3

I think at this point it's reasonable to believe that the slowdown comes from the sink which couldn't keep up with the incoming writes, causing the channel to fill up which resulted in back pressure and cause the concurrentWriteBuffer to wait.

I think at this point there are a few options:

  1. have multiple cache.UnorderedCache and each cache has its own goroutine / channel.
  2. try out using the ring buffer. This might introduce slightly slower read, but I think since read doesn't happen during the query execution hot path, it's an acceptable trade off.
  3. maybe both?

I will try out these two ideas tomorrow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here was the diff I introduced

diff --git a/pkg/sql/contention/txnidcache/txn_id_cache.go b/pkg/sql/contention/txnidcache/txn_id_cache.go
index 9384ed7044..e335ec6e39 100644
--- a/pkg/sql/contention/txnidcache/txn_id_cache.go
+++ b/pkg/sql/contention/txnidcache/txn_id_cache.go
@@ -171,14 +171,15 @@ func NewTxnIDCache(st *cluster.Settings, metrics *Metrics) *Cache {
 // Start implements the Provider interface.
 func (t *Cache) Start(ctx context.Context, stopper *stop.Stopper) {
 	addBlockToStore := func(msgBlock *messageBlock) {
-		t.mu.Lock()
-		defer t.mu.Unlock()
-		for blockIdx := range msgBlock {
-			if !msgBlock[blockIdx].valid() {
-				break
-			}
-			t.mu.store.Add(msgBlock[blockIdx].TxnID, msgBlock[blockIdx].TxnFingerprintID)
-		}
+		// TODO(azhng): wip: imagine if we are using a blackhoel sink
+		// t.mu.Lock()
+		// defer t.mu.Unlock()
+		// for blockIdx := range msgBlock {
+		// 	if !msgBlock[blockIdx].valid() {
+		// 		break
+		// 	}
+		// 	t.mu.store.Add(msgBlock[blockIdx].TxnID, msgBlock[blockIdx].TxnFingerprintID)
+		// }
 	}
 
 	consumeBlock := func(b *messageBlock) {

@Azhng Azhng force-pushed the txn-id-writer-bench branch from 9a32847 to 193c423 Compare February 8, 2022 19:59
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +111 to +126
name: "blackHoleSink",
new: func() (_ messageSink, _ *sync.Pool, cleanup func()) {
blockPool := &sync.Pool{
New: func() interface{} {
return &messageBlock{}
},
}

blackHole := newBlackHoleSink(blockPool, channelSize)
blackHole.start()

return blackHole, blockPool, blackHole.stop
},
},
{
name: "realSink",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you don't need the word Sink in these names, you already have sink=

Copy link
Copy Markdown
Contributor

@maryliag maryliag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just one tiny nit
:lgtm:

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @Azhng)


pkg/sql/contention/txnidcache/writer_test.go, line 30 at r2 (raw file):

	pool *sync.Pool

	// Simulate a real sink

nit: period

@Azhng Azhng force-pushed the txn-id-writer-bench branch 2 times, most recently from 7857ad0 to edbbefb Compare February 9, 2022 21:54
@Azhng
Copy link
Copy Markdown
Contributor Author

Azhng commented Feb 9, 2022

Chatted with @ajwerner offline, updated the benchmark to populate TxnIDFingerprint field in order to defeat Golang's small integer optimization. However, instead of pre-populating a slice of randomly generated inputs, I just use the index value to generate some unique input. This is because I noticed that, when same value is inserted more than once, the
subsequent insertion might not create an allocation. (which is unrealistic in the actual workload)

New benchmark yields significantly bigger drop in throughput when the realsink is used.

azhng@crlMBP-C02F814DMD6TMTA1: ~/go/src/github.com/cockroachdb/cockroach txn-id-writer-bench ⚡
$ benchstat ~/tmp/blackhole ~/tmp/real                                                                                                                         [16:52:41]
name                            old time/op    new time/op    delta
Writer/concurrentWriter=1-24      76.7ns ± 2%   786.8ns ± 7%  +925.35%  (p=0.000 n=10+10)
Writer/concurrentWriter=24-24      109ns ± 1%     760ns ±15%  +596.05%  (p=0.000 n=8+10)
Writer/concurrentWriter=48-24      116ns ± 3%     735ns ±12%  +535.04%  (p=0.000 n=8+10)
Writer/concurrentWriter=64-24      121ns ± 3%     775ns ± 8%  +538.66%  (p=0.000 n=8+10)
Writer/concurrentWriter=92-24      136ns ± 5%     790ns ± 4%  +480.75%  (p=0.000 n=9+10)
Writer/concurrentWriter=128-24     161ns ±12%     782ns ± 6%  +384.94%  (p=0.000 n=10+10)

name                            old speed      new speed      delta
Writer/concurrentWriter=1-24     320GB/s ± 2%    31GB/s ± 7%   -90.23%  (p=0.000 n=10+10)
Writer/concurrentWriter=24-24    225GB/s ± 1%    33GB/s ±17%   -85.53%  (p=0.000 n=8+10)
Writer/concurrentWriter=48-24    212GB/s ± 3%    34GB/s ±13%   -84.18%  (p=0.000 n=8+10)
Writer/concurrentWriter=64-24    203GB/s ± 2%    32GB/s ± 7%   -84.32%  (p=0.000 n=8+10)
Writer/concurrentWriter=92-24    181GB/s ± 5%    31GB/s ± 3%   -82.78%  (p=0.000 n=9+10)
Writer/concurrentWriter=128-24   153GB/s ±11%    31GB/s ± 6%   -79.43%  (p=0.000 n=10+10)

name                            old alloc/op   new alloc/op   delta
Writer/concurrentWriter=1-24       0.00B        238.67B ± 2%     +Inf%  (p=0.000 n=10+9)
Writer/concurrentWriter=24-24      0.00B        239.12B ± 2%     +Inf%  (p=0.000 n=10+8)
Writer/concurrentWriter=48-24      0.00B        241.88B ± 2%     +Inf%  (p=0.000 n=10+8)
Writer/concurrentWriter=64-24      0.00B        239.20B ± 2%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=92-24      0.00B        238.00B ± 1%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=128-24     0.00B        237.90B ± 2%     +Inf%  (p=0.000 n=10+10)

name                            old allocs/op  new allocs/op  delta
Writer/concurrentWriter=1-24        0.00           2.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=24-24       0.00           2.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=48-24       0.00           2.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=64-24       0.00           2.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=92-24       0.00           2.00 ± 0%     +Inf%  (p=0.000 n=10+10)
Writer/concurrentWriter=128-24      0.00           2.00 ± 0%     +Inf%  (p=0.000 n=10+10)

This commit introduces the new BenchmarkWriter benchmark to measure the
write performance for TxnID Cache.

Informs cockroachdb#76013

Release note: None
@Azhng Azhng force-pushed the txn-id-writer-bench branch from edbbefb to 00fd06a Compare February 14, 2022 16:39
craig bot pushed a commit that referenced this pull request Feb 16, 2022
76350: sql: introduce new FIFO cache for txnID cache r=Azhng a=Azhng

Reviewer note: first commit belongs to #76244

---

Previously, txnID cache relied on cache.UnorderedCache for storage and
FIFO eviction behavior. However, due to unintended allocation behavior
within cache.UnorderedCache, this yielded about 20% throughput drop in
kv95 benchmark.

This commit introduces a new FIFO cache, built to remove all allocation
during write operation. This commit removes all performance hit caused
by the previous use of cache.UnorderedCache.

Resolves #76013

Release note: None

76590: sql/catalog,migrations: propagate HasPostDeserializationChanges, simplify migrations r=ajwerner a=ajwerner

This is an imperfect change that feels a bit ad-hoc for my liking, but does
improve the situation. The problem is that in recent changes, we no longer
just retrieve mutable descriptors and then sometimes clone Immutable
descriptors. Instead we create the more general Immutable descriptors and
clone out mutables. The problem is that when we did this, we threw away
information regarding whether the descriptor itself had been modified
by post-deserialization. In general, this is mostly an exercise in plumbing.
The key feature is that when we retreive a Mutable descriptor and it was
changed, we need to know that. This change tests that.

There is some ad-hoc code to propagate isUncommitedVersion in various places
which I don't feel great about, but it also felt wrong not to. A follow-up
should come through to really nail down the properties here.

The existence of NewBuilder and the fact that it's called in various places
is another mechanism by which this information could be erased, but that's
less of a concern for me. This change makes it possible to simplify migrations
around descriptor upgrades.

### migrations: simplify descriptor migrations

We have two migrations which have the goal of iterating through the descriptors
and re-writing them if there are any changes due to the post-deserialization
logic. They were needlessly complex and low-level. Probably part of the reason
was that we were swallowing this information. This commit reworks them to use
shared library code.

Probably one of them could be eliminated altogether.

Release note: None

Co-authored-by: Azhng <archer.xn@gmail.com>
Co-authored-by: Andrew Werner <awerner32@gmail.com>
@craig craig bot merged commit 00fd06a into cockroachdb:master Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants