kv/concurrency: expose lock table counters, introduce timeseries metrics#67350
kv/concurrency: expose lock table counters, introduce timeseries metrics#67350craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
3834442 to
264adf1
Compare
dhartunian
left a comment
There was a problem hiding this comment.
@nvanbenschoten should we expose these metrics in a dashboard? Are these metrics something a customer could look at and take action, or mostly for CRDB use? Happy to make a ticket for our team to pick up.
(we have a separate discussion ongoing w/ Tobias about how we can automate the dashboard gen for DB Console + Grafana via chart_catalog.go potentially but nothing is in place yet.)
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @sumeerbhola)
andreimatei
left a comment
There was a problem hiding this comment.
Reviewed 5 of 25 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
pkg/kv/kvserver/concurrency/concurrency_control.go, line 302 at r2 (raw file):
// LockTableDebug returns a debug string representing the state of the // lockTable. LockTableDebug() string
You sure you wanna take this away (and the corresponding field in the range state proto)?
In fact, instead of taking it away, how about including timing info in it so we can tell how long locks have been held for?
pkg/kv/kvserver/concurrency/lock_table.go, line 348 at r2 (raw file):
// is comparable in system throughput, one can eliminate the above anomalies. // // TODO(nvanbenschoten): should we be Reset-ing these btree snapshots when we
i don't know what we're talking about here, but are we afraid of a bug or just a perf (or memory?) optimization? If it's the latter, I'd hint to it in the TODO, otherwise it sounds scary.
pkg/kv/kvserver/concurrency/metrics.go, line 23 at r2 (raw file):
// LockTableMetrics holds information about the state of a lockTable. type LockTableMetrics struct { Locks int64
I think pretty much all the fields here would benefit from comments :)
pkg/kv/kvserver/intentresolver/metrics.go, line 23 at r1 (raw file):
Unit: metric.Unit_COUNT, } metaFinalizedTxnCleanupFailed = metric.Metadata{
while you're here, mind putting a comment on this one with some words about what it means?
pkg/kv/kvserver/spanlatch/manager.go, line 562 at r2 (raw file):
// Metrics holds information about the state of a Manager. type Metrics struct {
I've clicked around a bit and generally failed to find info about the distinction between global and local latches. I think a comment somewhere spelling out why there's separate (perhaps something about MVCC access?) would be good.
pkg/server/serverpb/status.proto, line 226 at r2 (raw file):
int64 write_latches = 17; int64 locks = 18; int64 locks_with_wait_queues = 19;
I think the last 3 would benefit from comments
|
Consider adding longest-held lock/latch info. |
sumeerbhola
left a comment
There was a problem hiding this comment.
Consider adding longest-held lock/latch info.
+1. Fine to do some of this in followup PRs since it will be require data-structure changes.
One thing to fully flesh out would be how we expect these to be used:
Someone looks at the cluster graphs which show the max value of these metrics across all nodes. When that is high, ungroups the metrics to find the node with the max. Goes to the node, and finds (how? clicking on each range in the db console wouldn't be feasible) the range responsible for the max (or the top-k ranges). Then looks at the top-k keys in the range with the longest queues or oldest waiters (how?).
Reviewed 12 of 12 files at r1, 25 of 25 files at r2, 3 of 3 files at r3.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/kv/kvserver/replica_metrics.go, line 145 at r2 (raw file):
} m.ReadLatches = latchMetrics.LocalReadCount + latchMetrics.GlobalReadCount
Was the distinction between local and global key latches, that we no longer have with this change. ever useful in troubleshooting?
Also, can we simplify this a bit by add ingLatchMetrics and LockMetrics as fields in ReplicaMetrics instead of flattening the structs.
pkg/kv/kvserver/concurrency/lock_table.go, line 348 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
i don't know what we're talking about here, but are we afraid of a bug or just a perf (or memory?) optimization? If it's the latter, I'd hint to it in the TODO, otherwise it sounds scary.
Memory optimization, but a pretty important one. Looks like we are not decrementing the refs in the btree except when we take another snapshot for the same guard. Which means more allocations for the copy-on-write approach.
pkg/kv/kvserver/concurrency/metrics.go, line 23 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I think pretty much all the fields here would benefit from comments :)
+1
pkg/kv/kvserver/intentresolver/metrics.go, line 23 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
while you're here, mind putting a comment on this one with some words about what it means?
Also it is hard to for me to understand the difference between these 2 metrics given the current comments, so anything that helps with that would be good. And the units would be helpful -- I think this one is in number of txns, while the next one is in number of requests (single intent or intent range). And IIUC we'll count in both places when a txn is cleaning up its intents, while the next one also includes intent resolution done by others.
pkg/kv/kvserver/kvserverpb/state.proto, line 166 at r2 (raw file):
// lock_table contains a textual representation of the contents of the lock // table. string lock_table = 17;
was this string showing up in the db console?
A summary, such as the top-5 keys with the longest queues, and longest wait durations would be useful.
pkg/kv/kvserver/spanlatch/manager.go, line 562 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I've clicked around a bit and generally failed to find info about the distinction between global and local latches. I think a comment somewhere spelling out why there's separate (perhaps something about MVCC access?) would be good.
I don't understand this either, since we are not making this distinction for the lock table metrics. Was it originally a concern about latching for some particular range local keys, like range descriptor keys?
264adf1 to
477375b
Compare
nvb
left a comment
There was a problem hiding this comment.
@nvanbenschoten should we expose these metrics in a dashboard? Are these metrics something a customer could look at and take action, or mostly for CRDB use? Happy to make a ticket for our team to pick up.
Good question. I think we could expose these metrics in a dashboard, but I'd want to defer to your team on the exact organization of them because these are all "contention" related and I know you have been thinking a lot about contention observability lately.
+1. Fine to do some of this in followup PRs since it will be require data-structure changes.
👍 I opened #67619 to track this.
One thing to fully flesh out would be how we expect these to be used:
Someone looks at the cluster graphs which show the max value of these metrics across all nodes. When that is high, ungroups the metrics to find the node with the max. Goes to the node, and finds (how? clicking on each range in the db console wouldn't be feasible) the range responsible for the max (or the top-k ranges). Then looks at the top-k keys in the range with the longest queues or oldest waiters (how?).
Agreed, this is an important story that we should be able to tell. There has been discussion in the past about contention visualization and hotspot visualization, along the lines of https://cloud.google.com/bigtable/docs/keyvis-overview and https://docs.pingcap.com/tidb/stable/troubleshoot-hot-spot-issues. I don't know where those projects currently stand. cc. @thtruo.
With these new metrics, I think the process for a CRL engineer will be:
- Observe high
kv.concurrency.max_lock_wait_queue_waiters_for_lockmetric. - Drill down to find which node it's on using a custom chart.
- Look at the hot ranges page to find what range it's likely on, with the assumption that the hottest range is also the most contended. This isn't great because of the assumption it's making.
- Look at that range's status page
/_status/range/[range_id]and findtopKLocksByWaitQueueWaiters
Or, from a debug.zip, the process can be automated as:
➜ jq --slurp 'map(.top_k_locks_by_wait_queue_waiters) | flatten | max_by(.waiters)' nodes/*/ranges/*.json
{
"pretty_key": "/Table/57/1/1/0",
"key": "vYmJiA==",
"held": true,
"waiters": 18,
"waiting_writers": 16
}
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @sumeerbhola)
pkg/kv/kvserver/replica_metrics.go, line 145 at r2 (raw file):
Was the distinction between local and global key latches, that we no longer have with this change. ever useful in troubleshooting?
No, not once. I removed the distinction.
Also, can we simplify this a bit by add ingLatchMetrics and LockMetrics as fields in ReplicaMetrics instead of flattening the structs.
Done.
pkg/kv/kvserver/concurrency/concurrency_control.go, line 302 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
You sure you wanna take this away (and the corresponding field in the range state proto)?
In fact, instead of taking it away, how about including timing info in it so we can tell how long locks have been held for?
I don't think we've ever found this useful and I'm also concerned with running this in production due to the efficiency of (*lockTable).String. It was added on a whim in 7c84afc because we didn't have anything better. I've added a top-k summary of locks based on wait-queue length to the range debug page that I think will be more useful.
pkg/kv/kvserver/concurrency/metrics.go, line 23 at r2 (raw file):
Previously, sumeerbhola wrote…
+1
Done. I also reworked a bit of this to support a top-k ordering of locks by wait-queue length and also eventually by duration (#67619).
pkg/kv/kvserver/intentresolver/metrics.go, line 23 at r1 (raw file):
Previously, sumeerbhola wrote…
Also it is hard to for me to understand the difference between these 2 metrics given the current comments, so anything that helps with that would be good. And the units would be helpful -- I think this one is in number of txns, while the next one is in number of requests (single intent or intent range). And IIUC we'll count in both places when a txn is cleaning up its intents, while the next one also includes intent resolution done by others.
Done.
pkg/kv/kvserver/kvserverpb/state.proto, line 166 at r2 (raw file):
Previously, sumeerbhola wrote…
was this string showing up in the db console?
A summary, such as the top-5 keys with the longest queues, and longest wait durations would be useful.
Not in the db console, but it was possible to get to from a debug.zip, though I don't think we ever did. But I had concerns about printing the full debug string here, which can be very expensive and also disruptive.
What we're adding in this PR will be accessible through the db console's debug pages and through debug.zips. I added a top-k listing of locks, sorted by wait-queue length, and then included more information about each lock, including the key (pretty and raw), whether it is held or not, and the breakdown of its queues. Once we add timing information (#67619), that can be added here as well.
pkg/kv/kvserver/spanlatch/manager.go, line 562 at r2 (raw file):
Previously, sumeerbhola wrote…
I don't understand this either, since we are not making this distinction for the lock table metrics. Was it originally a concern about latching for some particular range local keys, like range descriptor keys?
I collapsed these, as it didn't look like we ever cared about this distinction.
477375b to
4d2dca1
Compare
These were added recently in 615ddd0. `intents.finalized_txns.timed_out` was redundant with `intentresolver.finalized_txns.failed` and `intents.resolve_conflicting.rejected` is more cleanly implemented inside of the IntentResolver instead of at each of the callers into the intent resolver.
4d2dca1 to
538a0da
Compare
This change exposes access to a collection of counters that describe the state of a lockTable, including information on locks and on lock wait-queues. Some of these counters are plumbed up through debug endpoints, providing access to this information in the DB console and through debug.zips. The change does not yet expose any new timeseries metrics using these counters, but a future commit will. The process of collecting these metrics is made efficient enough that I do not have major concerns about calling this once every 10s per range (from `computePeriodicMetrics`). The vast majority of ranges in large clusters will have empty lock tables, so this will only add about 50ns per replica in `updateReplicationGauges`. Even on a node with 100k ranges, this will only take about 5ms. ``` name time/op LockTableMetrics/locks=0-16 43.0ns ± 4% LockTableMetrics/locks=1-16 76.1ns ± 3% LockTableMetrics/locks=16-16 389ns ± 3% LockTableMetrics/locks=256-16 5.58µs ± 5% LockTableMetrics/locks=4096-16 89.6µs ± 4% name alloc/op LockTableMetrics/locks=0-16 0.00B LockTableMetrics/locks=1-16 0.00B LockTableMetrics/locks=16-16 0.00B LockTableMetrics/locks=256-16 0.00B LockTableMetrics/locks=4096-16 0.00B name allocs/op LockTableMetrics/locks=0-16 0.00 LockTableMetrics/locks=1-16 0.00 LockTableMetrics/locks=16-16 0.00 LockTableMetrics/locks=256-16 0.00 LockTableMetrics/locks=4096-16 0.00 ```
This commit introduces the following four timeseries metrics: - `kv.concurrency.locks` - `kv.concurrency.locks_with_wait_queues` - `kv.concurrency.lock_wait_queue_waiters` - `kv.concurrency.max_lock_wait_queue_waiters_for_lock` These are computed as aggregations over all of the ranges on a given store using the lock table counters exposed in the previous commit.
538a0da to
650a201
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 1 of 30 files at r4, 1 of 29 files at r7, 25 of 25 files at r8, 3 of 3 files at r9.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @sumeerbhola)
pkg/kv/kvserver/kvserverpb/state.proto, line 166 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Not in the db console, but it was possible to get to from a debug.zip, though I don't think we ever did. But I had concerns about printing the full debug string here, which can be very expensive and also disruptive.
What we're adding in this PR will be accessible through the db console's debug pages and through debug.zips. I added a top-k listing of locks, sorted by wait-queue length, and then included more information about each lock, including the key (pretty and raw), whether it is held or not, and the breakdown of its queues. Once we add timing information (#67619), that can be added here as well.
excellent
|
TFTRs! bors r=sumeerbhola,andreimateidhartunian |
|
Build succeeded: |
Replaces #66343.
This change exposes access to a collection of counters that describe the state of a lockTable, including information on locks and on lock wait-queues. Some of these counters are plumbed up through debug endpoints, providing access to this information in the DB console and through debug.zips.
It then introduces following four timeseries metrics:
kv.concurrency.lockskv.concurrency.locks_with_wait_queueskv.concurrency.lock_wait_queue_waiterskv.concurrency.max_lock_wait_queue_waiters_for_lockThese are computed as aggregations over all of the ranges on a given store using the new lock table counters. They are collected periodically - every 10s.
The process of collecting these metrics is made efficient enough that I do not have major concerns about calling this once every 10s per range (from
computePeriodicMetrics). The vast majority of ranges in large clusters will have empty lock tables, so this will only add about 50ns per replica inupdateReplicationGauges. Even on a node with 100k ranges, this will only take about 5ms.