kvflowcontroller: eliminate mutex contention#109170
kvflowcontroller: eliminate mutex contention#109170craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
aadityasondhi
left a comment
There was a problem hiding this comment.
Looks good to me. So the "sharding" happens now since you only lock the individual bucket and not the entire controller?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)
pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 70 at r1 (raw file):
// streams get closed permanently (tenants get deleted, nodes removed) // or when completely inactive (no tokens deducted/returned over 30+ // minutes), clear these out.
IIUC, this per bucket mutex works trivially now because there is no concern that a bucket will be GC'd and then recreated for the same stream (since buckets are never GC'd), resulting in a race where tokens are added/subtracted from the stale bucket.
pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 162 at r1 (raw file):
for _, b := range c.mu.buckets { b.mu.Lock() sum += int64(b.tokensLocked(wc))
this can use tokens(wc)?
d1d8141 to
d36d654
Compare
Fixes cockroachdb#105508. Under kv0/enc=false/nodes=3/cpu=96 we observed significant mutex contention on kvflowcontroller.Controller.mu. We were using a single mutex to adjust flow tokens across all replication streams. There's a natural sharding available here - by replication stream - that eliminates the contention and fixes the throughput drop. The kv0 test surfaced other performance optimizations (mutex contention, allocated objects, etc.) that we'll address in subsequent PRs. Release note: None
d36d654 to
4d81146
Compare
irfansharif
left a comment
There was a problem hiding this comment.
So the "sharding" happens now since you only lock the individual bucket and not the entire controller?
Yes.
TFTR! bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 70 at r1 (raw file):
Previously, sumeerbhola wrote…
IIUC, this per bucket mutex works trivially now because there is no concern that a bucket will be GC'd and then recreated for the same stream (since buckets are never GC'd), resulting in a race where tokens are added/subtracted from the stale bucket.
Yes. Can figure out something then - perhaps always grabbing a read lock when reading or adding some synchronization state gc-ed within each bucket.
pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 162 at r1 (raw file):
Previously, sumeerbhola wrote…
this can use
tokens(wc)?
Yes, done.
|
bors r+ |
|
Build succeeded: |
Fixes #105508.
Under kv0/enc=false/nodes=3/cpu=96 we observed significant mutex contention on kvflowcontroller.Controller.mu. We were using a single mutex to adjust flow tokens across all replication streams. There's a natural sharding available here - by replication stream - that eliminates the contention and fixes the throughput drop.
The kv0 test surfaced other performance optimizations (mutex contention, allocated objects, etc.) that we'll address in subsequent PRs.
Release note: None