sql/contention: add contention registry for global contention observability#57736
sql/contention: add contention registry for global contention observability#57736craig[bot] merged 2 commits intocockroachdb:masterfrom asubiotto:cont-map
Conversation
|
Friendly ping |
RaduBerinde
left a comment
There was a problem hiding this comment.
Very nice code!
One thing is that we could make this more simple/efficient if we don't keep track of individual keys. If there's a chance we won't actually ever show the keys, we should decide now.. (CC @jordanlewis)
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @jordanlewis, and @RaduBerinde)
pkg/sql/contention/registry.go, line 89 at r2 (raw file):
// this index have spent contended. cumulativeContentionTime time.Duration // orderedKeyMap is an LRU cache that keeps track of up to
LRU is questionable here.. Should we keep the "top" txns (in terms of the total duration), rather than LRU? (can be a TODO)
pkg/sql/contention/registry.go, line 168 at r2 (raw file):
} // Registry is an object that keeps track of aggregated contention information.
[nit] consider moving the struct to the top of the file, to serve as a kind of file-level doc.
pkg/sql/contention/registry.go, line 227 at r2 (raw file):
writeChild(prefix, fmt.Sprintf("cumulative contention time: %s\n", v.cumulativeContentionTime)) writeChild(prefix, fmt.Sprintf("keys:\n")) keyPrefix := prefix + "\t"
[nit] Tabs can lead to inconsistent alignment, why not just spaces?
pkg/sql/contention/testdata/contention_registry, line 4 at r2 (raw file):
---- evcheck tableid=1 indexid=1 key=key id=a duration=1
Very cool tests!
|
Nice! I only skimmed this but it's great to see this come "together" (KV is still going to hold up their end of the bargain but we're not quite there yet). |
asubiotto
left a comment
There was a problem hiding this comment.
TFTRs!
One thing is that we could make this more simple/efficient if we don't keep track of individual keys. If there's a chance we won't actually ever show the keys, we should decide now..
Agreed cc @awoods187 @Annebirzin , what do you think? Should we keep track of individual keys for contention or do we prefer to just show the index for privacy?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)
pkg/sql/contention/registry.go, line 89 at r2 (raw file):
Previously, RaduBerinde wrote…
LRU is questionable here.. Should we keep the "top" txns (in terms of the total duration), rather than LRU? (can be a TODO)
I chose LRU because I think it's more important to show a current contention view rather than the largest contention problems since contention began. My thinking is that a user cares a lot more about what the current largest hotspot is. If we were to evict based on the shortest duration txns, we could get in a state where we had an index with a ton of contention, the contention problem was fixed, but the index still remains in the contention registry and won't be evicted until relatively more contention happens on many other keys.
pkg/sql/contention/registry.go, line 227 at r2 (raw file):
Previously, RaduBerinde wrote…
[nit] Tabs can lead to inconsistent alignment, why not just spaces?
Done.
pkg/sql/contention/testdata/contention_registry, line 4 at r2 (raw file):
Previously, RaduBerinde wrote…
Very cool tests!
Thanks!
|
One other question, what do we think about the constant registry size values (i.e. max indexes/keys to keep track of)? |
RaduBerinde
left a comment
There was a problem hiding this comment.
The values seem reasonable. We will probably need to do some benchmarking at some point to understand impact.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @jordanlewis)
pkg/sql/contention/registry.go, line 89 at r2 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I chose LRU because I think it's more important to show a current contention view rather than the largest contention problems since contention began. My thinking is that a user cares a lot more about what the current largest hotspot is. If we were to evict based on the shortest duration txns, we could get in a state where we had an index with a ton of contention, the contention problem was fixed, but the index still remains in the contention registry and won't be evicted until relatively more contention happens on many other keys.
That makes sense, not sure what I was thinking (I might have thought about it as if it was a per-statement structure, but even there LRU makes more sense).
Release note: None
…bility This commit adds a contention registry that groups roachpb.ContentionEvents by tableID/indexID pair and collects metadata about them. This data structure is backed by an LRU cache. Release note: None
|
TFTRs! Will merge to avoid having this PR sit around for too long. We can always come back and simplify this code if we decide not to track individual keys. It'll be easier to remove this code than add it back in if we need it. bors r=RaduBerinde |
|
Build succeeded: |
This PR adds a contention registry that groups roachpb.ContentionEvents by
tableID/indexID pair and collects metadata about them. This data structure is
backed by an LRU cache.
Release note: None
Addresses #57114
My goal here is to discuss the basic structure of the registry and get this basic version in. Once that's done, I'll work on:
cc @tbg