kv: add new kv.lock_table.wait_queue.waiting metric by nvb · Pull Request #66343 · cockroachdb/cockroach

nvb · 2021-06-11T00:10:30Z

This metric tracks the number of requests waiting in lock wait-queues in the
lock table. It would have been helpful to have during a recent customer
investigation, because not all requests that wait in lock wait-queues will
eventually push and end up in a txn wait-queue.

cockroach-teamcity · 2021-06-11T00:10:36Z

This change is

This metric tracks the number of requests waiting in lock wait-queues in the lock table. It would have been helpful to have during a recent customer investigation, because not all requests that wait in lock wait-queues will eventually push and end up in a txn wait-queue.

sumeerbhola

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 148 at r1 (raw file):

	w.lockWaitQueueWaiters.Inc(1)
	defer w.lockWaitQueueWaiters.Dec(1)

Would this lead to unnecessary worries if the total number was high but they were waiting on a large number of keys?
Would it be better to have a metric for the longest and mean queue length and longest wait duration? Keeping these up-to-date in an incremental manner would be costly, but given we sample metrics at a low frequency we could compute these non-incrementally at the same low frequency. I suppose we'd also need to iterate across all replicas lock tables -- we could also log information about the replicas whose wait durations or queue lengths were higher than some threshold, to complement the aggregated-across-all-replicas information in the metrics.

nvb · 2021-07-08T03:37:53Z

These were good points, thanks for raising them. Adding just this single metric was useful, but also not nearly enough. The lockTable state has been lacking observability for a while. I opened #67350 to improve upon this PR and introduce more comprehensive metric tracking at this level to address the general observability issue.

67350: kv/concurrency: expose lock table counters, introduce timeseries metrics r=sumeerbhola,andreimateidhartunian a=nvanbenschoten Replaces #66343. This change exposes access to a collection of counters that describe the state of a lockTable, including information on locks and on lock wait-queues. Some of these counters are plumbed up through debug endpoints, providing access to this information in the DB console and through debug.zips. It then introduces following four timeseries metrics: - `kv.concurrency.locks` - `kv.concurrency.locks_with_wait_queues` - `kv.concurrency.lock_wait_queue_waiters` - `kv.concurrency.max_lock_wait_queue_waiters_for_lock` These are computed as aggregations over all of the ranges on a given store using the new lock table counters. They are collected periodically - every 10s. The process of collecting these metrics is made efficient enough that I do not have major concerns about calling this once every 10s per range (from `computePeriodicMetrics`). The vast majority of ranges in large clusters will have empty lock tables, so this will only add about 50ns per replica in `updateReplicationGauges`. Even on a node with 100k ranges, this will only take about 5ms. ``` name time/op LockTableMetrics/locks=0-16 43.0ns ± 4% LockTableMetrics/locks=1-16 76.1ns ± 3% LockTableMetrics/locks=16-16 389ns ± 3% LockTableMetrics/locks=256-16 5.58µs ± 5% LockTableMetrics/locks=4096-16 89.6µs ± 4% name alloc/op LockTableMetrics/locks=0-16 0.00B LockTableMetrics/locks=1-16 0.00B LockTableMetrics/locks=16-16 0.00B LockTableMetrics/locks=256-16 0.00B LockTableMetrics/locks=4096-16 0.00B name allocs/op LockTableMetrics/locks=0-16 0.00 LockTableMetrics/locks=1-16 0.00 LockTableMetrics/locks=16-16 0.00 LockTableMetrics/locks=256-16 0.00 LockTableMetrics/locks=4096-16 0.00 ``` Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

nvb requested a review from aayushshah15 June 11, 2021 00:10

nvb force-pushed the nvanbenschoten/waitQueueCountMetric branch from 100db44 to 33518b7 Compare June 11, 2021 03:16

sumeerbhola reviewed Jun 11, 2021

View reviewed changes

nvb mentioned this pull request Jul 8, 2021

kv/concurrency: expose lock table counters, introduce timeseries metrics #67350

Merged

nvb closed this Jul 8, 2021

nvb deleted the nvanbenschoten/waitQueueCountMetric branch July 8, 2021 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: add new kv.lock_table.wait_queue.waiting metric#66343

kv: add new kv.lock_table.wait_queue.waiting metric#66343
nvb wants to merge 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/waitQueueCountMetric

nvb commented Jun 11, 2021

Uh oh!

cockroach-teamcity commented Jun 11, 2021

Uh oh!

sumeerbhola left a comment

Uh oh!

nvb commented Jul 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvb commented Jun 11, 2021

Uh oh!

cockroach-teamcity commented Jun 11, 2021

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Jul 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants