Skip to content

kv/concurrency: expose lock table counters, introduce timeseries metrics#67350

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/concMetrics
Jul 16, 2021
Merged

kv/concurrency: expose lock table counters, introduce timeseries metrics#67350
craig[bot] merged 3 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/concMetrics

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Jul 8, 2021

Replaces #66343.

This change exposes access to a collection of counters that describe the state of a lockTable, including information on locks and on lock wait-queues. Some of these counters are plumbed up through debug endpoints, providing access to this information in the DB console and through debug.zips.

It then introduces following four timeseries metrics:

  • kv.concurrency.locks
  • kv.concurrency.locks_with_wait_queues
  • kv.concurrency.lock_wait_queue_waiters
  • kv.concurrency.max_lock_wait_queue_waiters_for_lock

These are computed as aggregations over all of the ranges on a given store using the new lock table counters. They are collected periodically - every 10s.

The process of collecting these metrics is made efficient enough that I do not have major concerns about calling this once every 10s per range (from computePeriodicMetrics). The vast majority of ranges in large clusters will have empty lock tables, so this will only add about 50ns per replica in updateReplicationGauges. Even on a node with 100k ranges, this will only take about 5ms.

name                            time/op
LockTableMetrics/locks=0-16     43.0ns ± 4%
LockTableMetrics/locks=1-16     76.1ns ± 3%
LockTableMetrics/locks=16-16     389ns ± 3%
LockTableMetrics/locks=256-16   5.58µs ± 5%
LockTableMetrics/locks=4096-16  89.6µs ± 4%

name                            alloc/op
LockTableMetrics/locks=0-16      0.00B
LockTableMetrics/locks=1-16      0.00B
LockTableMetrics/locks=16-16     0.00B
LockTableMetrics/locks=256-16    0.00B
LockTableMetrics/locks=4096-16   0.00B

name                            allocs/op
LockTableMetrics/locks=0-16       0.00
LockTableMetrics/locks=1-16       0.00
LockTableMetrics/locks=16-16      0.00
LockTableMetrics/locks=256-16     0.00
LockTableMetrics/locks=4096-16    0.00

@nvb nvb requested review from andreimatei and sumeerbhola July 8, 2021 03:32
@nvb nvb requested a review from a team as a code owner July 8, 2021 03:32
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb nvb changed the title kv/concurrency: expose LockTableMetrics, introduce timeseries metrics kv/concurrency: expose lock table counters, introduce timeseries metrics Jul 8, 2021
@nvb nvb force-pushed the nvanbenschoten/concMetrics branch 2 times, most recently from 3834442 to 264adf1 Compare July 8, 2021 04:37
@dhartunian dhartunian requested a review from a team July 8, 2021 14:20
Copy link
Copy Markdown
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvanbenschoten should we expose these metrics in a dashboard? Are these metrics something a customer could look at and take action, or mostly for CRDB use? Happy to make a ticket for our team to pick up.

(we have a separate discussion ongoing w/ Tobias about how we can automate the dashboard gen for DB Console + Grafana via chart_catalog.go potentially but nothing is in place yet.)

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @sumeerbhola)

Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 25 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)


pkg/kv/kvserver/concurrency/concurrency_control.go, line 302 at r2 (raw file):

	// LockTableDebug returns a debug string representing the state of the
	// lockTable.
	LockTableDebug() string

You sure you wanna take this away (and the corresponding field in the range state proto)?
In fact, instead of taking it away, how about including timing info in it so we can tell how long locks have been held for?


pkg/kv/kvserver/concurrency/lock_table.go, line 348 at r2 (raw file):

	// is comparable in system throughput, one can eliminate the above anomalies.
	//
	// TODO(nvanbenschoten): should we be Reset-ing these btree snapshots when we

i don't know what we're talking about here, but are we afraid of a bug or just a perf (or memory?) optimization? If it's the latter, I'd hint to it in the TODO, otherwise it sounds scary.


pkg/kv/kvserver/concurrency/metrics.go, line 23 at r2 (raw file):

// LockTableMetrics holds information about the state of a lockTable.
type LockTableMetrics struct {
	Locks                 int64

I think pretty much all the fields here would benefit from comments :)


pkg/kv/kvserver/intentresolver/metrics.go, line 23 at r1 (raw file):

		Unit:        metric.Unit_COUNT,
	}
	metaFinalizedTxnCleanupFailed = metric.Metadata{

while you're here, mind putting a comment on this one with some words about what it means?


pkg/kv/kvserver/spanlatch/manager.go, line 562 at r2 (raw file):

// Metrics holds information about the state of a Manager.
type Metrics struct {

I've clicked around a bit and generally failed to find info about the distinction between global and local latches. I think a comment somewhere spelling out why there's separate (perhaps something about MVCC access?) would be good.


pkg/server/serverpb/status.proto, line 226 at r2 (raw file):

  int64 write_latches = 17;
  int64 locks = 18;
  int64 locks_with_wait_queues = 19;

I think the last 3 would benefit from comments

@andreimatei
Copy link
Copy Markdown
Contributor

Consider adding longest-held lock/latch info.

Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding longest-held lock/latch info.

+1. Fine to do some of this in followup PRs since it will be require data-structure changes.

One thing to fully flesh out would be how we expect these to be used:
Someone looks at the cluster graphs which show the max value of these metrics across all nodes. When that is high, ungroups the metrics to find the node with the max. Goes to the node, and finds (how? clicking on each range in the db console wouldn't be feasible) the range responsible for the max (or the top-k ranges). Then looks at the top-k keys in the range with the longest queues or oldest waiters (how?).

Reviewed 12 of 12 files at r1, 25 of 25 files at r2, 3 of 3 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)


pkg/kv/kvserver/replica_metrics.go, line 145 at r2 (raw file):

	}

	m.ReadLatches = latchMetrics.LocalReadCount + latchMetrics.GlobalReadCount

Was the distinction between local and global key latches, that we no longer have with this change. ever useful in troubleshooting?

Also, can we simplify this a bit by add ingLatchMetrics and LockMetrics as fields in ReplicaMetrics instead of flattening the structs.


pkg/kv/kvserver/concurrency/lock_table.go, line 348 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

i don't know what we're talking about here, but are we afraid of a bug or just a perf (or memory?) optimization? If it's the latter, I'd hint to it in the TODO, otherwise it sounds scary.

Memory optimization, but a pretty important one. Looks like we are not decrementing the refs in the btree except when we take another snapshot for the same guard. Which means more allocations for the copy-on-write approach.


pkg/kv/kvserver/concurrency/metrics.go, line 23 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think pretty much all the fields here would benefit from comments :)

+1


pkg/kv/kvserver/intentresolver/metrics.go, line 23 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

while you're here, mind putting a comment on this one with some words about what it means?

Also it is hard to for me to understand the difference between these 2 metrics given the current comments, so anything that helps with that would be good. And the units would be helpful -- I think this one is in number of txns, while the next one is in number of requests (single intent or intent range). And IIUC we'll count in both places when a txn is cleaning up its intents, while the next one also includes intent resolution done by others.


pkg/kv/kvserver/kvserverpb/state.proto, line 166 at r2 (raw file):

  // lock_table contains a textual representation of the contents of the lock
  // table.
  string lock_table = 17;

was this string showing up in the db console?

A summary, such as the top-5 keys with the longest queues, and longest wait durations would be useful.


pkg/kv/kvserver/spanlatch/manager.go, line 562 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I've clicked around a bit and generally failed to find info about the distinction between global and local latches. I think a comment somewhere spelling out why there's separate (perhaps something about MVCC access?) would be good.

I don't understand this either, since we are not making this distinction for the lock table metrics. Was it originally a concern about latching for some particular range local keys, like range descriptor keys?

Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvanbenschoten should we expose these metrics in a dashboard? Are these metrics something a customer could look at and take action, or mostly for CRDB use? Happy to make a ticket for our team to pick up.

Good question. I think we could expose these metrics in a dashboard, but I'd want to defer to your team on the exact organization of them because these are all "contention" related and I know you have been thinking a lot about contention observability lately.

+1. Fine to do some of this in followup PRs since it will be require data-structure changes.

👍 I opened #67619 to track this.

One thing to fully flesh out would be how we expect these to be used:
Someone looks at the cluster graphs which show the max value of these metrics across all nodes. When that is high, ungroups the metrics to find the node with the max. Goes to the node, and finds (how? clicking on each range in the db console wouldn't be feasible) the range responsible for the max (or the top-k ranges). Then looks at the top-k keys in the range with the longest queues or oldest waiters (how?).

Agreed, this is an important story that we should be able to tell. There has been discussion in the past about contention visualization and hotspot visualization, along the lines of https://cloud.google.com/bigtable/docs/keyvis-overview and https://docs.pingcap.com/tidb/stable/troubleshoot-hot-spot-issues. I don't know where those projects currently stand. cc. @thtruo.

With these new metrics, I think the process for a CRL engineer will be:

  1. Observe high kv.concurrency.max_lock_wait_queue_waiters_for_lock metric.
  2. Drill down to find which node it's on using a custom chart.
  3. Look at the hot ranges page to find what range it's likely on, with the assumption that the hottest range is also the most contended. This isn't great because of the assumption it's making.
  4. Look at that range's status page /_status/range/[range_id] and find topKLocksByWaitQueueWaiters

Or, from a debug.zip, the process can be automated as:

➜ jq --slurp 'map(.top_k_locks_by_wait_queue_waiters) | flatten | max_by(.waiters)' nodes/*/ranges/*.json
{
  "pretty_key": "/Table/57/1/1/0",
  "key": "vYmJiA==",
  "held": true,
  "waiters": 18,
  "waiting_writers": 16
}

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @sumeerbhola)


pkg/kv/kvserver/replica_metrics.go, line 145 at r2 (raw file):

Was the distinction between local and global key latches, that we no longer have with this change. ever useful in troubleshooting?

No, not once. I removed the distinction.

Also, can we simplify this a bit by add ingLatchMetrics and LockMetrics as fields in ReplicaMetrics instead of flattening the structs.

Done.


pkg/kv/kvserver/concurrency/concurrency_control.go, line 302 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

You sure you wanna take this away (and the corresponding field in the range state proto)?
In fact, instead of taking it away, how about including timing info in it so we can tell how long locks have been held for?

I don't think we've ever found this useful and I'm also concerned with running this in production due to the efficiency of (*lockTable).String. It was added on a whim in 7c84afc because we didn't have anything better. I've added a top-k summary of locks based on wait-queue length to the range debug page that I think will be more useful.


pkg/kv/kvserver/concurrency/metrics.go, line 23 at r2 (raw file):

Previously, sumeerbhola wrote…

+1

Done. I also reworked a bit of this to support a top-k ordering of locks by wait-queue length and also eventually by duration (#67619).


pkg/kv/kvserver/intentresolver/metrics.go, line 23 at r1 (raw file):

Previously, sumeerbhola wrote…

Also it is hard to for me to understand the difference between these 2 metrics given the current comments, so anything that helps with that would be good. And the units would be helpful -- I think this one is in number of txns, while the next one is in number of requests (single intent or intent range). And IIUC we'll count in both places when a txn is cleaning up its intents, while the next one also includes intent resolution done by others.

Done.


pkg/kv/kvserver/kvserverpb/state.proto, line 166 at r2 (raw file):

Previously, sumeerbhola wrote…

was this string showing up in the db console?

A summary, such as the top-5 keys with the longest queues, and longest wait durations would be useful.

Not in the db console, but it was possible to get to from a debug.zip, though I don't think we ever did. But I had concerns about printing the full debug string here, which can be very expensive and also disruptive.

What we're adding in this PR will be accessible through the db console's debug pages and through debug.zips. I added a top-k listing of locks, sorted by wait-queue length, and then included more information about each lock, including the key (pretty and raw), whether it is held or not, and the breakdown of its queues. Once we add timing information (#67619), that can be added here as well.


pkg/kv/kvserver/spanlatch/manager.go, line 562 at r2 (raw file):

Previously, sumeerbhola wrote…

I don't understand this either, since we are not making this distinction for the lock table metrics. Was it originally a concern about latching for some particular range local keys, like range descriptor keys?

I collapsed these, as it didn't look like we ever cared about this distinction.

@nvb nvb force-pushed the nvanbenschoten/concMetrics branch from 477375b to 4d2dca1 Compare July 15, 2021 04:56
These were added recently in 615ddd0. `intents.finalized_txns.timed_out` was redundant
with `intentresolver.finalized_txns.failed` and `intents.resolve_conflicting.rejected`
is more cleanly implemented inside of the IntentResolver instead of at each of the
callers into the intent resolver.
@nvb nvb force-pushed the nvanbenschoten/concMetrics branch from 4d2dca1 to 538a0da Compare July 15, 2021 06:08
nvb added 2 commits July 15, 2021 11:39
This change exposes access to a collection of counters that describe the state
of a lockTable, including information on locks and on lock wait-queues. Some of
these counters are plumbed up through debug endpoints, providing access to this
information in the DB console and through debug.zips.

The change does not yet expose any new timeseries metrics using these counters,
but a future commit will.

The process of collecting these metrics is made efficient enough that I do not
have major concerns about calling this once every 10s per range (from
`computePeriodicMetrics`). The vast majority of ranges in large clusters will
have empty lock tables, so this will only add about 50ns per replica in
`updateReplicationGauges`. Even on a node with 100k ranges, this will only take
about 5ms.

```
name                            time/op
LockTableMetrics/locks=0-16     43.0ns ± 4%
LockTableMetrics/locks=1-16     76.1ns ± 3%
LockTableMetrics/locks=16-16     389ns ± 3%
LockTableMetrics/locks=256-16   5.58µs ± 5%
LockTableMetrics/locks=4096-16  89.6µs ± 4%

name                            alloc/op
LockTableMetrics/locks=0-16      0.00B
LockTableMetrics/locks=1-16      0.00B
LockTableMetrics/locks=16-16     0.00B
LockTableMetrics/locks=256-16    0.00B
LockTableMetrics/locks=4096-16   0.00B

name                            allocs/op
LockTableMetrics/locks=0-16       0.00
LockTableMetrics/locks=1-16       0.00
LockTableMetrics/locks=16-16      0.00
LockTableMetrics/locks=256-16     0.00
LockTableMetrics/locks=4096-16    0.00
```
This commit introduces the following four timeseries metrics:
- `kv.concurrency.locks`
- `kv.concurrency.locks_with_wait_queues`
- `kv.concurrency.lock_wait_queue_waiters`
- `kv.concurrency.max_lock_wait_queue_waiters_for_lock`

These are computed as aggregations over all of the ranges on a given store using
the lock table counters exposed in the previous commit.
@nvb nvb force-pushed the nvanbenschoten/concMetrics branch from 538a0da to 650a201 Compare July 15, 2021 15:39
Copy link
Copy Markdown
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 30 files at r4, 1 of 29 files at r7, 25 of 25 files at r8, 3 of 3 files at r9.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @nvanbenschoten, and @sumeerbhola)


pkg/kv/kvserver/kvserverpb/state.proto, line 166 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Not in the db console, but it was possible to get to from a debug.zip, though I don't think we ever did. But I had concerns about printing the full debug string here, which can be very expensive and also disruptive.

What we're adding in this PR will be accessible through the db console's debug pages and through debug.zips. I added a top-k listing of locks, sorted by wait-queue length, and then included more information about each lock, including the key (pretty and raw), whether it is held or not, and the breakdown of its queues. Once we add timing information (#67619), that can be added here as well.

excellent

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Jul 16, 2021

TFTRs!

bors r=sumeerbhola,andreimateidhartunian

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 16, 2021

Build succeeded:

@craig craig bot merged commit 4a1d6c7 into cockroachdb:master Jul 16, 2021
@nvb nvb deleted the nvanbenschoten/concMetrics branch July 19, 2021 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants