Skip to content

kvflowcontroller: eliminate mutex contention#109170

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
irfansharif:230821.kvflowcontrol-mutex
Aug 23, 2023
Merged

kvflowcontroller: eliminate mutex contention#109170
craig[bot] merged 1 commit intocockroachdb:masterfrom
irfansharif:230821.kvflowcontrol-mutex

Conversation

@irfansharif
Copy link
Copy Markdown
Contributor

Fixes #105508.

Under kv0/enc=false/nodes=3/cpu=96 we observed significant mutex contention on kvflowcontroller.Controller.mu. We were using a single mutex to adjust flow tokens across all replication streams. There's a natural sharding available here - by replication stream - that eliminates the contention and fixes the throughput drop.

The kv0 test surfaced other performance optimizations (mutex contention, allocated objects, etc.) that we'll address in subsequent PRs.

Release note: None

@irfansharif irfansharif requested a review from a team as a code owner August 21, 2023 19:08
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Aug 21, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@aadityasondhi aadityasondhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. So the "sharding" happens now since you only lock the individual bucket and not the entire controller?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif)


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 70 at r1 (raw file):

		// streams get closed permanently (tenants get deleted, nodes removed)
		// or when completely inactive (no tokens deducted/returned over 30+
		// minutes), clear these out.

IIUC, this per bucket mutex works trivially now because there is no concern that a bucket will be GC'd and then recreated for the same stream (since buckets are never GC'd), resulting in a race where tokens are added/subtracted from the stale bucket.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 162 at r1 (raw file):

				for _, b := range c.mu.buckets {
					b.mu.Lock()
					sum += int64(b.tokensLocked(wc))

this can use tokens(wc)?

@irfansharif irfansharif force-pushed the 230821.kvflowcontrol-mutex branch from d1d8141 to d36d654 Compare August 23, 2023 13:44
Fixes cockroachdb#105508.

Under kv0/enc=false/nodes=3/cpu=96 we observed significant mutex
contention on kvflowcontroller.Controller.mu. We were using a single
mutex to adjust flow tokens across all replication streams. There's a
natural sharding available here - by replication stream - that
eliminates the contention and fixes the throughput drop.

The kv0 test surfaced other performance optimizations (mutex contention,
allocated objects, etc.) that we'll address in subsequent PRs.

Release note: None
@irfansharif irfansharif force-pushed the 230821.kvflowcontrol-mutex branch from d36d654 to 4d81146 Compare August 23, 2023 13:44
Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the "sharding" happens now since you only lock the individual bucket and not the entire controller?

Yes.

TFTR! bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller.go line 70 at r1 (raw file):

Previously, sumeerbhola wrote…

IIUC, this per bucket mutex works trivially now because there is no concern that a bucket will be GC'd and then recreated for the same stream (since buckets are never GC'd), resulting in a race where tokens are added/subtracted from the stale bucket.

Yes. Can figure out something then - perhaps always grabbing a read lock when reading or adding some synchronization state gc-ed within each bucket.


pkg/kv/kvserver/kvflowcontrol/kvflowcontroller/kvflowcontroller_metrics.go line 162 at r1 (raw file):

Previously, sumeerbhola wrote…

this can use tokens(wc)?

Yes, done.

@irfansharif
Copy link
Copy Markdown
Contributor Author

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 23, 2023

Build succeeded:

@craig craig bot merged commit 34fb6d7 into cockroachdb:master Aug 23, 2023
@irfansharif irfansharif deleted the 230821.kvflowcontrol-mutex branch August 23, 2023 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kvflowcontroller: mutex contention

4 participants