kv/concurrency: expose lock table counters, introduce timeseries metrics by nvb · Pull Request #67350 · cockroachdb/cockroach

nvb · 2021-07-08T03:32:38Z

Replaces #66343.

This change exposes access to a collection of counters that describe the state of a lockTable, including information on locks and on lock wait-queues. Some of these counters are plumbed up through debug endpoints, providing access to this information in the DB console and through debug.zips.

It then introduces following four timeseries metrics:

kv.concurrency.locks
kv.concurrency.locks_with_wait_queues
kv.concurrency.lock_wait_queue_waiters
kv.concurrency.max_lock_wait_queue_waiters_for_lock

These are computed as aggregations over all of the ranges on a given store using the new lock table counters. They are collected periodically - every 10s.

The process of collecting these metrics is made efficient enough that I do not have major concerns about calling this once every 10s per range (from computePeriodicMetrics). The vast majority of ranges in large clusters will have empty lock tables, so this will only add about 50ns per replica in updateReplicationGauges. Even on a node with 100k ranges, this will only take about 5ms.

name                            time/op
LockTableMetrics/locks=0-16     43.0ns ± 4%
LockTableMetrics/locks=1-16     76.1ns ± 3%
LockTableMetrics/locks=16-16     389ns ± 3%
LockTableMetrics/locks=256-16   5.58µs ± 5%
LockTableMetrics/locks=4096-16  89.6µs ± 4%

name                            alloc/op
LockTableMetrics/locks=0-16      0.00B
LockTableMetrics/locks=1-16      0.00B
LockTableMetrics/locks=16-16     0.00B
LockTableMetrics/locks=256-16    0.00B
LockTableMetrics/locks=4096-16   0.00B

name                            allocs/op
LockTableMetrics/locks=0-16       0.00
LockTableMetrics/locks=1-16       0.00
LockTableMetrics/locks=16-16      0.00
LockTableMetrics/locks=256-16     0.00
LockTableMetrics/locks=4096-16    0.00

cockroach-teamcity · 2021-07-08T03:32:48Z

This change is

dhartunian

@nvanbenschoten should we expose these metrics in a dashboard? Are these metrics something a customer could look at and take action, or mostly for CRDB use? Happy to make a ticket for our team to pick up.

(we have a separate discussion ongoing w/ Tobias about how we can automate the dashboard gen for DB Console + Grafana via chart_catalog.go potentially but nothing is in place yet.)

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @sumeerbhola)

andreimatei

Reviewed 5 of 25 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)

pkg/kv/kvserver/concurrency/concurrency_control.go, line 302 at r2 (raw file):

	// LockTableDebug returns a debug string representing the state of the
	// lockTable.
	LockTableDebug() string

You sure you wanna take this away (and the corresponding field in the range state proto)?
In fact, instead of taking it away, how about including timing info in it so we can tell how long locks have been held for?

pkg/kv/kvserver/concurrency/lock_table.go, line 348 at r2 (raw file):

	// is comparable in system throughput, one can eliminate the above anomalies.
	//
	// TODO(nvanbenschoten): should we be Reset-ing these btree snapshots when we

i don't know what we're talking about here, but are we afraid of a bug or just a perf (or memory?) optimization? If it's the latter, I'd hint to it in the TODO, otherwise it sounds scary.

pkg/kv/kvserver/concurrency/metrics.go, line 23 at r2 (raw file):

// LockTableMetrics holds information about the state of a lockTable.
type LockTableMetrics struct {
	Locks                 int64

I think pretty much all the fields here would benefit from comments :)

pkg/kv/kvserver/intentresolver/metrics.go, line 23 at r1 (raw file):

		Unit:        metric.Unit_COUNT,
	}
	metaFinalizedTxnCleanupFailed = metric.Metadata{

while you're here, mind putting a comment on this one with some words about what it means?

pkg/kv/kvserver/spanlatch/manager.go, line 562 at r2 (raw file):

// Metrics holds information about the state of a Manager.
type Metrics struct {

I've clicked around a bit and generally failed to find info about the distinction between global and local latches. I think a comment somewhere spelling out why there's separate (perhaps something about MVCC access?) would be good.

pkg/server/serverpb/status.proto, line 226 at r2 (raw file):

  int64 write_latches = 17;
  int64 locks = 18;
  int64 locks_with_wait_queues = 19;

I think the last 3 would benefit from comments

andreimatei · 2021-07-08T18:08:40Z

Consider adding longest-held lock/latch info.

sumeerbhola

Consider adding longest-held lock/latch info.

+1. Fine to do some of this in followup PRs since it will be require data-structure changes.

One thing to fully flesh out would be how we expect these to be used:
Someone looks at the cluster graphs which show the max value of these metrics across all nodes. When that is high, ungroups the metrics to find the node with the max. Goes to the node, and finds (how? clicking on each range in the db console wouldn't be feasible) the range responsible for the max (or the top-k ranges). Then looks at the top-k keys in the range with the longest queues or oldest waiters (how?).

Reviewed 12 of 12 files at r1, 25 of 25 files at r2, 3 of 3 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)

pkg/kv/kvserver/replica_metrics.go, line 145 at r2 (raw file):

	}

	m.ReadLatches = latchMetrics.LocalReadCount + latchMetrics.GlobalReadCount

Was the distinction between local and global key latches, that we no longer have with this change. ever useful in troubleshooting?

Also, can we simplify this a bit by add ingLatchMetrics and LockMetrics as fields in ReplicaMetrics instead of flattening the structs.

pkg/kv/kvserver/concurrency/lock_table.go, line 348 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

i don't know what we're talking about here, but are we afraid of a bug or just a perf (or memory?) optimization? If it's the latter, I'd hint to it in the TODO, otherwise it sounds scary.

Memory optimization, but a pretty important one. Looks like we are not decrementing the refs in the btree except when we take another snapshot for the same guard. Which means more allocations for the copy-on-write approach.

pkg/kv/kvserver/concurrency/metrics.go, line 23 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think pretty much all the fields here would benefit from comments :)

+1

pkg/kv/kvserver/intentresolver/metrics.go, line 23 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

while you're here, mind putting a comment on this one with some words about what it means?

Also it is hard to for me to understand the difference between these 2 metrics given the current comments, so anything that helps with that would be good. And the units would be helpful -- I think this one is in number of txns, while the next one is in number of requests (single intent or intent range). And IIUC we'll count in both places when a txn is cleaning up its intents, while the next one also includes intent resolution done by others.

pkg/kv/kvserver/kvserverpb/state.proto, line 166 at r2 (raw file):

  // lock_table contains a textual representation of the contents of the lock
  // table.
  string lock_table = 17;

was this string showing up in the db console?

A summary, such as the top-5 keys with the longest queues, and longest wait durations would be useful.

pkg/kv/kvserver/spanlatch/manager.go, line 562 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I've clicked around a bit and generally failed to find info about the distinction between global and local latches. I think a comment somewhere spelling out why there's separate (perhaps something about MVCC access?) would be good.

I don't understand this either, since we are not making this distinction for the lock table metrics. Was it originally a concern about latching for some particular range local keys, like range descriptor keys?

nvb

@nvanbenschoten should we expose these metrics in a dashboard? Are these metrics something a customer could look at and take action, or mostly for CRDB use? Happy to make a ticket for our team to pick up.

Good question. I think we could expose these metrics in a dashboard, but I'd want to defer to your team on the exact organization of them because these are all "contention" related and I know you have been thinking a lot about contention observability lately.

+1. Fine to do some of this in followup PRs since it will be require data-structure changes.

👍 I opened #67619 to track this.

One thing to fully flesh out would be how we expect these to be used:
Someone looks at the cluster graphs which show the max value of these metrics across all nodes. When that is high, ungroups the metrics to find the node with the max. Goes to the node, and finds (how? clicking on each range in the db console wouldn't be feasible) the range responsible for the max (or the top-k ranges). Then looks at the top-k keys in the range with the longest queues or oldest waiters (how?).

Agreed, this is an important story that we should be able to tell. There has been discussion in the past about contention visualization and hotspot visualization, along the lines of https://cloud.google.com/bigtable/docs/keyvis-overview and https://docs.pingcap.com/tidb/stable/troubleshoot-hot-spot-issues. I don't know where those projects currently stand. cc. @thtruo.

With these new metrics, I think the process for a CRL engineer will be:

Observe high kv.concurrency.max_lock_wait_queue_waiters_for_lock metric.
Drill down to find which node it's on using a custom chart.
Look at the hot ranges page to find what range it's likely on, with the assumption that the hottest range is also the most contended. This isn't great because of the assumption it's making.
Look at that range's status page /_status/range/[range_id] and find topKLocksByWaitQueueWaiters

Or, from a debug.zip, the process can be automated as:

➜ jq --slurp 'map(.top_k_locks_by_wait_queue_waiters) | flatten | max_by(.waiters)' nodes/*/ranges/*.json
{
  "pretty_key": "/Table/57/1/1/0",
  "key": "vYmJiA==",
  "held": true,
  "waiters": 18,
  "waiting_writers": 16
}

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @sumeerbhola)

pkg/kv/kvserver/replica_metrics.go, line 145 at r2 (raw file):

Was the distinction between local and global key latches, that we no longer have with this change. ever useful in troubleshooting?

No, not once. I removed the distinction.

Also, can we simplify this a bit by add ingLatchMetrics and LockMetrics as fields in ReplicaMetrics instead of flattening the structs.

Done.

pkg/kv/kvserver/concurrency/concurrency_control.go, line 302 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

You sure you wanna take this away (and the corresponding field in the range state proto)?
In fact, instead of taking it away, how about including timing info in it so we can tell how long locks have been held for?

I don't think we've ever found this useful and I'm also concerned with running this in production due to the efficiency of (*lockTable).String. It was added on a whim in 7c84afc because we didn't have anything better. I've added a top-k summary of locks based on wait-queue length to the range debug page that I think will be more useful.

pkg/kv/kvserver/concurrency/metrics.go, line 23 at r2 (raw file):

Previously, sumeerbhola wrote…

+1

Done. I also reworked a bit of this to support a top-k ordering of locks by wait-queue length and also eventually by duration (#67619).

pkg/kv/kvserver/intentresolver/metrics.go, line 23 at r1 (raw file):

Previously, sumeerbhola wrote…

Also it is hard to for me to understand the difference between these 2 metrics given the current comments, so anything that helps with that would be good. And the units would be helpful -- I think this one is in number of txns, while the next one is in number of requests (single intent or intent range). And IIUC we'll count in both places when a txn is cleaning up its intents, while the next one also includes intent resolution done by others.

Done.

pkg/kv/kvserver/kvserverpb/state.proto, line 166 at r2 (raw file):

Previously, sumeerbhola wrote…

was this string showing up in the db console?

A summary, such as the top-5 keys with the longest queues, and longest wait durations would be useful.

Not in the db console, but it was possible to get to from a debug.zip, though I don't think we ever did. But I had concerns about printing the full debug string here, which can be very expensive and also disruptive.

What we're adding in this PR will be accessible through the db console's debug pages and through debug.zips. I added a top-k listing of locks, sorted by wait-queue length, and then included more information about each lock, including the key (pretty and raw), whether it is held or not, and the breakdown of its queues. Once we add timing information (#67619), that can be added here as well.

pkg/kv/kvserver/spanlatch/manager.go, line 562 at r2 (raw file):

Previously, sumeerbhola wrote…

I don't understand this either, since we are not making this distinction for the lock table metrics. Was it originally a concern about latching for some particular range local keys, like range descriptor keys?

I collapsed these, as it didn't look like we ever cared about this distinction.

These were added recently in 615ddd0. `intents.finalized_txns.timed_out` was redundant with `intentresolver.finalized_txns.failed` and `intents.resolve_conflicting.rejected` is more cleanly implemented inside of the IntentResolver instead of at each of the callers into the intent resolver.

This change exposes access to a collection of counters that describe the state of a lockTable, including information on locks and on lock wait-queues. Some of these counters are plumbed up through debug endpoints, providing access to this information in the DB console and through debug.zips. The change does not yet expose any new timeseries metrics using these counters, but a future commit will. The process of collecting these metrics is made efficient enough that I do not have major concerns about calling this once every 10s per range (from `computePeriodicMetrics`). The vast majority of ranges in large clusters will have empty lock tables, so this will only add about 50ns per replica in `updateReplicationGauges`. Even on a node with 100k ranges, this will only take about 5ms. ``` name time/op LockTableMetrics/locks=0-16 43.0ns ± 4% LockTableMetrics/locks=1-16 76.1ns ± 3% LockTableMetrics/locks=16-16 389ns ± 3% LockTableMetrics/locks=256-16 5.58µs ± 5% LockTableMetrics/locks=4096-16 89.6µs ± 4% name alloc/op LockTableMetrics/locks=0-16 0.00B LockTableMetrics/locks=1-16 0.00B LockTableMetrics/locks=16-16 0.00B LockTableMetrics/locks=256-16 0.00B LockTableMetrics/locks=4096-16 0.00B name allocs/op LockTableMetrics/locks=0-16 0.00 LockTableMetrics/locks=1-16 0.00 LockTableMetrics/locks=16-16 0.00 LockTableMetrics/locks=256-16 0.00 LockTableMetrics/locks=4096-16 0.00 ```

This commit introduces the following four timeseries metrics: - `kv.concurrency.locks` - `kv.concurrency.locks_with_wait_queues` - `kv.concurrency.lock_wait_queue_waiters` - `kv.concurrency.max_lock_wait_queue_waiters_for_lock` These are computed as aggregations over all of the ranges on a given store using the lock table counters exposed in the previous commit.

sumeerbhola

Reviewed 1 of 30 files at r4, 1 of 29 files at r7, 25 of 25 files at r8, 3 of 3 files at r9.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @sumeerbhola)

pkg/kv/kvserver/kvserverpb/state.proto, line 166 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Not in the db console, but it was possible to get to from a debug.zip, though I don't think we ever did. But I had concerns about printing the full debug string here, which can be very expensive and also disruptive.

What we're adding in this PR will be accessible through the db console's debug pages and through debug.zips. I added a top-k listing of locks, sorted by wait-queue length, and then included more information about each lock, including the key (pretty and raw), whether it is held or not, and the breakdown of its queues. Once we add timing information (#67619), that can be added here as well.

excellent

nvb · 2021-07-16T16:00:32Z

TFTRs!

bors r=sumeerbhola,andreimateidhartunian

craig · 2021-07-16T17:01:14Z

Build succeeded:

GitHub CI (Cockroach)

nvb requested review from andreimatei and sumeerbhola July 8, 2021 03:32

nvb requested a review from a team as a code owner July 8, 2021 03:32

nvb mentioned this pull request Jul 8, 2021

kv: add new kv.lock_table.wait_queue.waiting metric #66343

Closed

nvb changed the title ~~kv/concurrency: expose LockTableMetrics, introduce timeseries metrics~~ kv/concurrency: expose lock table counters, introduce timeseries metrics Jul 8, 2021

nvb force-pushed the nvanbenschoten/concMetrics branch 2 times, most recently from 3834442 to 264adf1 Compare July 8, 2021 04:37

dhartunian requested a review from a team July 8, 2021 14:20

dhartunian reviewed Jul 8, 2021

View reviewed changes

andreimatei approved these changes Jul 8, 2021

View reviewed changes

sumeerbhola requested changes Jul 9, 2021

View reviewed changes

nvb mentioned this pull request Jul 14, 2021

kv: track lock acquisition time in lock table #67619

Closed

nvb force-pushed the nvanbenschoten/concMetrics branch from 264adf1 to 477375b Compare July 14, 2021 22:22

nvb commented Jul 14, 2021

View reviewed changes

nvb force-pushed the nvanbenschoten/concMetrics branch from 477375b to 4d2dca1 Compare July 15, 2021 04:56

nvb force-pushed the nvanbenschoten/concMetrics branch from 4d2dca1 to 538a0da Compare July 15, 2021 06:08

nvb added 2 commits July 15, 2021 11:39

nvb force-pushed the nvanbenschoten/concMetrics branch from 538a0da to 650a201 Compare July 15, 2021 15:39

sumeerbhola approved these changes Jul 16, 2021

View reviewed changes

craig bot merged commit 4a1d6c7 into cockroachdb:master Jul 16, 2021

nvb deleted the nvanbenschoten/concMetrics branch July 19, 2021 18:51

nvb mentioned this pull request Sep 13, 2021

storage/concurrency: prep lock-table for 20.1 release #44976

Closed

35 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/concurrency: expose lock table counters, introduce timeseries metrics#67350

kv/concurrency: expose lock table counters, introduce timeseries metrics#67350
craig[bot] merged 3 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/concMetrics

nvb commented Jul 8, 2021

Uh oh!

cockroach-teamcity commented Jul 8, 2021

Uh oh!

dhartunian left a comment

Uh oh!

andreimatei left a comment

Uh oh!

andreimatei commented Jul 8, 2021

Uh oh!

sumeerbhola left a comment

Uh oh!

nvb left a comment

Uh oh!

sumeerbhola left a comment

Uh oh!

nvb commented Jul 16, 2021

Uh oh!

craig bot commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

nvb commented Jul 8, 2021

Uh oh!

cockroach-teamcity commented Jul 8, 2021

Uh oh!

dhartunian left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei commented Jul 8, 2021

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Jul 16, 2021

Uh oh!

craig bot commented Jul 16, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants