Skip to content

sql/contention: add contention registry for global contention observability#57736

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
asubiotto:cont-map
Dec 18, 2020
Merged

sql/contention: add contention registry for global contention observability#57736
craig[bot] merged 2 commits intocockroachdb:masterfrom
asubiotto:cont-map

Conversation

@asubiotto
Copy link
Copy Markdown
Contributor

This PR adds a contention registry that groups roachpb.ContentionEvents by
tableID/indexID pair and collects metadata about them. This data structure is
backed by an LRU cache.

Release note: None

Addresses #57114

My goal here is to discuss the basic structure of the registry and get this basic version in. Once that's done, I'll work on:

  1. Add a pointer to a global registry in the distsql receiver and add thread safety.
  2. Adding a serialized version so registries can be sent between nodes and merged as well as sent back to the admin UI.
  3. Implement a virtual table and status server endpoint that will merge and return these serialized versions.

cc @tbg

@asubiotto asubiotto requested review from a team, RaduBerinde and jordanlewis December 9, 2020 12:38
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@asubiotto
Copy link
Copy Markdown
Contributor Author

Friendly ping

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice code!

One thing is that we could make this more simple/efficient if we don't keep track of individual keys. If there's a chance we won't actually ever show the keys, we should decide now.. (CC @jordanlewis)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, and @RaduBerinde)


pkg/sql/contention/registry.go, line 89 at r2 (raw file):

	// this index have spent contended.
	cumulativeContentionTime time.Duration
	// orderedKeyMap is an LRU cache that keeps track of up to

LRU is questionable here.. Should we keep the "top" txns (in terms of the total duration), rather than LRU? (can be a TODO)


pkg/sql/contention/registry.go, line 168 at r2 (raw file):

}

// Registry is an object that keeps track of aggregated contention information.

[nit] consider moving the struct to the top of the file, to serve as a kind of file-level doc.


pkg/sql/contention/registry.go, line 227 at r2 (raw file):

		writeChild(prefix, fmt.Sprintf("cumulative contention time: %s\n", v.cumulativeContentionTime))
		writeChild(prefix, fmt.Sprintf("keys:\n"))
		keyPrefix := prefix + "\t"

[nit] Tabs can lead to inconsistent alignment, why not just spaces?


pkg/sql/contention/testdata/contention_registry, line 4 at r2 (raw file):

----

evcheck tableid=1 indexid=1 key=key id=a duration=1

Very cool tests!

@tbg
Copy link
Copy Markdown
Member

tbg commented Dec 15, 2020

Nice! I only skimmed this but it's great to see this come "together" (KV is still going to hold up their end of the bargain but we're not quite there yet).

Copy link
Copy Markdown
Contributor Author

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

One thing is that we could make this more simple/efficient if we don't keep track of individual keys. If there's a chance we won't actually ever show the keys, we should decide now..

Agreed cc @awoods187 @Annebirzin , what do you think? Should we keep track of individual keys for contention or do we prefer to just show the index for privacy?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/contention/registry.go, line 89 at r2 (raw file):

Previously, RaduBerinde wrote…

LRU is questionable here.. Should we keep the "top" txns (in terms of the total duration), rather than LRU? (can be a TODO)

I chose LRU because I think it's more important to show a current contention view rather than the largest contention problems since contention began. My thinking is that a user cares a lot more about what the current largest hotspot is. If we were to evict based on the shortest duration txns, we could get in a state where we had an index with a ton of contention, the contention problem was fixed, but the index still remains in the contention registry and won't be evicted until relatively more contention happens on many other keys.


pkg/sql/contention/registry.go, line 227 at r2 (raw file):

Previously, RaduBerinde wrote…

[nit] Tabs can lead to inconsistent alignment, why not just spaces?

Done.


pkg/sql/contention/testdata/contention_registry, line 4 at r2 (raw file):

Previously, RaduBerinde wrote…

Very cool tests!

Thanks!

@asubiotto
Copy link
Copy Markdown
Contributor Author

One other question, what do we think about the constant registry size values (i.e. max indexes/keys to keep track of)?

Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

The values seem reasonable. We will probably need to do some benchmarking at some point to understand impact.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @jordanlewis)


pkg/sql/contention/registry.go, line 89 at r2 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

I chose LRU because I think it's more important to show a current contention view rather than the largest contention problems since contention began. My thinking is that a user cares a lot more about what the current largest hotspot is. If we were to evict based on the shortest duration txns, we could get in a state where we had an index with a ton of contention, the contention problem was fixed, but the index still remains in the contention registry and won't be evicted until relatively more contention happens on many other keys.

That makes sense, not sure what I was thinking (I might have thought about it as if it was a per-statement structure, but even there LRU makes more sense).

…bility

This commit adds a contention registry that groups roachpb.ContentionEvents by
tableID/indexID pair and collects metadata about them. This data structure is
backed by an LRU cache.

Release note: None
@asubiotto
Copy link
Copy Markdown
Contributor Author

TFTRs!

Will merge to avoid having this PR sit around for too long. We can always come back and simplify this code if we decide not to track individual keys. It'll be easier to remove this code than add it back in if we need it.

bors r=RaduBerinde

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 18, 2020

Build succeeded:

@craig craig bot merged commit 3239d37 into cockroachdb:master Dec 18, 2020
@asubiotto asubiotto deleted the cont-map branch January 11, 2021 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants