kvserver/concurrency,tracing: add tracing tags while waiting for locks by andreimatei · Pull Request #74878 · cockroachdb/cockroach

andreimatei · 2022-01-14T23:19:37Z

When the evaluation of a request blocks on a lock acquisition, this
patch annotates the tracing span representing the evaluation with
tags about the lock's key, holder, time when the waiting started, and
number of locks that the evaluation waited on previously.

To do this, the patch adds support in the tracing library for "lazy
tags": tags that are only stringified on demand, when a span's recording
is collected. This is useful if the stringification is expensive (so we
only want to pay for it if someone actually looks at the span), and also
for structured tags whose state can change over time. A lazy tag can
implement one of two interfaces: Stringer or LazyTag. The latter allows
the tag to be represented as a set of key-value pairs in a recording.

The point of the new contention tags is to be featured prominently in
the visualizer for the active spans registry that I'm working on.
Looking at queries/requests experiencing contention is likely to be a
very common scenario.

This patch spruces up the contentionEventHelper in the lock table, to
maintain the state of a waiter as a lazy tag.

Release note: None
Release justification: Low risk, high observability value.

cockroach-teamcity · 2022-01-14T23:19:45Z

This change is

andreimatei · 2022-01-14T23:25:16Z

@nvanbenschoten remind me pls - if a request is looking to acquire locks itself, then the newStateC in the lockTableWaiterImpl is only ever signal with one lock to wait on; when that first lock becomes available, the request loops around in executeBatchWithConcurrencyRetries, takes the lock, and starts waiting on the next one, right? As opposed to a request that doesn't want to acquire anything (so it just wants for the conflicting locks to go away), for which the newStateC can be signaled with multiple locks, right? If so, who controls this distinction - signaling newStateC multiple times or not?

tbg · 2022-01-17T07:40:51Z

pkg/kv/kvserver/concurrency/lock_table_waiter.go

+// 1m01s
+// 1h05m01s
+// 1d02h05m
+func formatDuration(d time.Duration) string {


cockroach/pkg/util/humanizeutil/duration.go

Lines 20 to 29 in 6798516

// Duration formats a duration in a user-friendly way. The result is not exact

// and the granularity is no smaller than microseconds.

//

// Examples:

// 0 -> "0µs"

// 123456ns -> "123µs"

// 12345678ns -> "12ms"

// 12345678912ns -> "1.2s"

//

func Duration(val time.Duration) redact.SafeString {

?

tbg · 2022-01-17T07:42:23Z

pkg/kv/kvserver/concurrency/lock_table_waiter.go

 			h.tBegin = timeutil.Now()
 		}
-	case waitElsewhere, waitQueueMaxLengthExceeded, doneWaiting:
+		h.sp.SetStatusTag(tagWaitKey, attribute.StringValue(s.key.String()))


The analog to #71622 could also be helpful here. I know, you're trying to get us to a place where we have no need for such stop-gaps, but still, it might get us places for the foreseeable future. I would certainly appreciate having this in the debug.zip (where the labels end up)

tbg · 2022-01-17T07:45:31Z

pkg/util/tracing/crdbspan.go

+	defer s.mu.Unlock()
+	k := attribute.Key(key)
+	for i := range s.mu.tags {
+		if s.mu.tags[i].Key == k {


Should store them separately, otherwise you can remove regular tags, no?

tbg · 2022-01-17T07:47:00Z

Can you have someone from @cockroachdb/obs-inf-prs on this PR since they own tracing? I think there are probably some finer points to discuss.

The overall idea LGTM.

dhartunian

Reviewed 1 of 1 files at r1, 1 of 5 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, @dhartunian, and @nvanbenschoten)

-- commits, line 14 at r2:
nit: waited

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 855 at r2 (raw file):

// A contentionEventHelper should be created when a request starts waiting in
// the lock table (on what might turn out to be multiple locks). close() should
// be called when the waiting is over.

Just curious in terms of API design, if this should mirror a lock with a start method to match close. If I create an instance of this struct without a start time, it seems like that would break things.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 862 at r2 (raw file):

	// contentionStart marks the start of the contention tracked by this
	// contentionEventHelper.
	contentionStart time.Time

I'm a bit confused about the difference between tBegin below and contentionStart. Can you clarify the difference between "start of the contention" and "start of the contention event"? Also, is there a meaning to using both Begin and Start?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 944 at r2 (raw file):

func (h *contentionEventHelper) close() {
	if h.closed || h.sp == nil {

Do we ever set closed to true?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 948 at r2 (raw file):

	}
	// Clear most tags.
	for _, tag := range []string{"lock_wait_key", "lock_wait_start", "lock_holder_txn"} {

Just checking my understanding here: These tags will only be visible while the span is actually waiting on the locks, right? So you have to "catch" the span while it's stuck in order to see these tags?

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @dhartunian, @nvanbenschoten, and @tbg)

-- commits, line 14 at r2:

Previously, dhartunian (David Hartunian) wrote…

nit: waited

done

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 855 at r2 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Just curious in terms of API design, if this should mirror a lock with a start method to match close. If I create an instance of this struct without a start time, it seems like that would break things.

I've given it a constructor.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 862 at r2 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I'm a bit confused about the difference between tBegin below and contentionStart. Can you clarify the difference between "start of the contention" and "start of the contention event"? Also, is there a meaning to using both Begin and Start?

please see now; I've reworked this.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 927 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

The analog to #71622 could also be helpful here. I know, you're trying to get us to a place where we have no need for such stop-gaps, but still, it might get us places for the foreseeable future. I would certainly appreciate having this in the debug.zip (where the labels end up)

What if I promise to put trace registry dumps in the debug.zip? :)
Regardless, I'd rather not put that label in this patch. Let's discuss it separately.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 944 at r2 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Do we ever set closed to true?

we do now :)

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 948 at r2 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Just checking my understanding here: These tags will only be visible while the span is actually waiting on the locks, right? So you have to "catch" the span while it's stuck in order to see these tags?

I've reworked this. Now, there is technically a single tag that stays with the span forever, and that span renders differently depending on whether the span is currently blocked versus when it's no longer blocked. In the latter case, it just has some aggregate info about the total amount of time spent waiting on locks.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 963 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

cockroach/pkg/util/humanizeutil/duration.go

Lines 20 to 29 in 6798516

// Duration formats a duration in a user-friendly way. The result is not exact

// and the granularity is no smaller than microseconds.

//

// Examples:

// 0 -> "0µs"

// 123456ns -> "123µs"

// 12345678ns -> "12ms"

// 12345678912ns -> "1.2s"

//

func Duration(val time.Duration) redact.SafeString {

?

thanks, used that one

pkg/util/tracing/crdbspan.go, line 403 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Should store them separately, otherwise you can remove regular tags, no?

yeah... I purposefully didn't go through that trouble. When no Otel tracer is attached, removing regular tags works just as well as removing "status tags", if one does it, and I think that's fine.

andreimatei

@AlexTalks can I please ask you for a review for the kv parts, and @dhartunian for the tracing library changes?
And @tbg, @nvanbenschoten, for whatever you feel like :).

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @dhartunian, @nvanbenschoten, and @tbg)

AlexTalks

Reviewed 9 of 9 files at r7, 1 of 1 files at r8, 3 of 3 files at r9, 5 of 7 files at r10.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, @dhartunian, @nvanbenschoten, and @tbg)

-- commits, line 22 at r9:
The changes make sense but this commit message may be a little confusing - as you mentioned, waitElsewhere is basically the same as waitFor, it's just that the lock table is under memory pressure (the number of locks tracked has exceeded configured limits for this replica). It might be that a slight rewording is necessary to make this clear here.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 896 at r10 (raw file):

		closed bool

		// lockWait represents the time the span spent waiting on locks. If closed

the cumulative time the *request spent waiting on locks

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 906 at r10 (raw file):

		// closed or if the contentionEventTracer has not yet been notified of the
		// first lock to wait on.
		curState *waitingState

I'm not sure it makes sense to additionally track this state on the contentionEventHelper when it seems state change tracking logic could be accomplished using local variables in WaitOn(..), no?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 931 at r10 (raw file):

	t, ok := sp.GetLazyTag(tagContentionHelper)
	if ok {
		h = t.(*contentionEventTracer)

It may be worth discussing if having a per-span contentionEventTracer is the best way to do this, or if it would make sense to capture discrete events by contended txn/key and then be able to look at the individual contentions (and their durations).

You could potentially still have a per-span contentionEventTracer but collect each individual event in it (along with duration) so in ExpandToRecordingTags(..) you could still collect the cumulative duration of contentions (or, of course, render it a different way).

dhartunian

Reviewed 9 of 9 files at r7, 1 of 1 files at r8, 3 of 3 files at r9, 2 of 7 files at r10.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, @dhartunian, @nvanbenschoten, and @tbg)

-- commits, line 38 at r10:
I want to check my understanding of the data we're collecting and where/when it will get surfaced:

When the span is waiting on a lock and we capture it in-flight, we'll get a snapshot of the contentionEventTracer state within via the ExpandingTag mechanism.
If the span waited on locks in the past these each produce ContentionEvents via the emit() call that add structured data to the span.

Does the contentionEventTracer leave data in its wake besides the structured ContentionEvents?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 846 at r10 (raw file):

// tagContentionHelper is the tracing span tag that the *contentionEventHelper
// lives under.
const tagContentionHelper = "contention_helper"

this should be content_tracer now?

Also a bunch of other cases of helper being used below that should be changed to tracer.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 865 at r10 (raw file):

// waiting on, if any, is not
// counted.
const tagNumLocks = "lock_num"

nit: can we name this to match the string?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 870 at r10 (raw file):

// waited on locks. If the span is currently waiting on a lock, the time it has
// already waited on this lock is not reflected.
const tagWaited = "lock_wait"

checking my understanding: this is recorded after the lock is released so that's why "current" wait time is not factored in?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 935 at r10 (raw file):

		defer h.mu.Unlock()
		if !h.mu.closed {
			log.Fatalf(ctx, "unexpectedly found non-closed contentionEventTracer")

does this really warrant Fatal? Or is this more meant for catching errors during test?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 1028 at r10 (raw file):

		// NB: we're guaranteed to have `s.{txn,key}` populated here.
		differentLock := false
		var firstLock bool

random q: why initialize these variables differently?

pkg/util/tracing/span_inner.go, line 168 at r10 (raw file):

}

func (s *spanInner) SetLazyTag(key string, value interface{}, statusTag bool) *spanInner {

The API here expands quite a bit, do you think it's worth consolidating a bit? can "lazy" be a bool like statusTag?

andreimatei · 2022-03-03T00:27:58Z

Sorry to let this sit; please give it another look.
I've simplified the changes to the span api.

dhartunian

Much easier to follow. Thanks for making these changes. Just some nits re: unused methods being added.

Reviewed 9 of 9 files at r11, 1 of 1 files at r12, 3 of 3 files at r13, 3 of 7 files at r14, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei, @dhartunian, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 1004 at r10 (raw file):

// notify processes an event from the lock table.
// compares the waitingState's active txn (if any) against the current
// ContentionEvent (if any). If the they match, we are continuing to handle the

nit: the they

pkg/util/tracing/crdbspan.go, line 536 at r14 (raw file):

// clearTag removes a tag, if it exists.
func (s *crdbSpan) clearTag(key string) {

This seems unused. Only called by spanInner.ClearStatusTag which is also unused.

pkg/util/tracing/span_inner.go, line 154 at r14 (raw file):

}

func (s *spanInner) SetTag(key string, value attribute.Value, statusTag bool) *spanInner {

I don't see a usage that sets statusTag to true. Am I missing something?

pkg/util/tracing/span_inner.go, line 196 at r14 (raw file):

}

func (s *spanInner) ClearStatusTag(key string) *spanInner {

Unused. Same as clearTag().

andreimatei

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks, @dhartunian, @nvanbenschoten, and @tbg)

-- commits, line 22 at r9:

Previously, AlexTalks (Alex Sarkesian) wrote…

The changes make sense but this commit message may be a little confusing - as you mentioned, waitElsewhere is basically the same as waitFor, it's just that the lock table is under memory pressure (the number of locks tracked has exceeded configured limits for this replica). It might be that a slight rewording is necessary to make this clear here.

I added a few words, please see now

-- commits, line 38 at r10:

When the span is waiting on a lock and we capture it in-flight, we'll get a snapshot of the contentionEventTracer state within via the ExpandingTag mechanism.

Right. Except that you'll get some data from the ExpandingTag even if the span's recording is captured after the span finished waiting; see below.

If the span waited on locks in the past these each produce ContentionEvents via the emit() call that add structured data to the span.

Right. But note that, at least currently, the structured events are no-ops on non-recording spans, just like non-structured events.
Besides the structured events, the contentionEventTracer also counts the number of locks and the total time waited. See below.

Does the contentionEventTracer leave data in its wake besides the structured ContentionEvents?

It does. Or, rather, there's no "in its wake"; once created, the contentionEventTracer stays attached to the span as a LazyTag until span.Finish(). If/when the span's recording is collected, it will include the number of locks and the total time waited. If the recording is collected while the span is waiting on a lock, we get extra info about that lock.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 846 at r10 (raw file):

Previously, dhartunian (David Hartunian) wrote…

this should be content_tracer now?

Also a bunch of other cases of helper being used below that should be changed to tracer.

done

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 865 at r10 (raw file):

Previously, dhartunian (David Hartunian) wrote…

nit: can we name this to match the string?

I'd prefer not to, unless you feel strongly. I think it's useful for all these consts to have a common prefix, so that code completion groups them. When rendered as tags in a span, this consideration no longer applies.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 870 at r10 (raw file):

Previously, dhartunian (David Hartunian) wrote…

checking my understanding: this is recorded after the lock is released so that's why "current" wait time is not factored in?

that was the case, but I've now changed it to include the current current lock (if any)

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 896 at r10 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

the cumulative time the *request spent waiting on locks

done

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 906 at r10 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

I'm not sure it makes sense to additionally track this state on the contentionEventHelper when it seems state change tracking logic could be accomplished using local variables in WaitOn(..), no?

well, curState is used in Render(), which is called async - there's no opportunity for the lockTableWaiterImpl to provide data in that case.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 931 at r10 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

It may be worth discussing if having a per-span contentionEventTracer is the best way to do this, or if it would make sense to capture discrete events by contended txn/key and then be able to look at the individual contentions (and their durations).

You could potentially still have a per-span contentionEventTracer but collect each individual event in it (along with duration) so in ExpandToRecordingTags(..) you could still collect the cumulative duration of contentions (or, of course, render it a different way).

The point of only storing aggregate information (in addition to extra info about the current lock being waited on, if any), is so that all this tracing takes a fixed memory budget when recording is not enabled. When recording is enabled, we get the structured events after each lock.
You don't buy?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 935 at r10 (raw file):

Previously, dhartunian (David Hartunian) wrote…

does this really warrant Fatal? Or is this more meant for catching errors during test?

I personally am in the crash-happy camp for bugs, on the argument that it generally leads to better and tighter code. If somehow getOrCreateContentionEventHelper were called with a span with an open contentionEventHelper, then either we forgot to close() it (in which case we're been counting bogus wait time), or we're somehow using the span for two concurrent wait table interactions, in which case they would step over each other. Both are a form of data corruption. Unless there's good reason to relax this, I'd rather generally assert that my data is sane.
You object?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 1004 at r10 (raw file):

Previously, dhartunian (David Hartunian) wrote…

nit: the they

done

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 1028 at r10 (raw file):

Previously, dhartunian (David Hartunian) wrote…

random q: why initialize these variables differently?

changed now

pkg/util/tracing/crdbspan.go, line 536 at r14 (raw file):

Previously, dhartunian (David Hartunian) wrote…

This seems unused. Only called by spanInner.ClearStatusTag which is also unused.

removed, thanks

pkg/util/tracing/span_inner.go, line 168 at r10 (raw file):

Previously, dhartunian (David Hartunian) wrote…

The API here expands quite a bit, do you think it's worth consolidating a bit? can "lazy" be a bool like statusTag?

please see now; I've simplified things

pkg/util/tracing/span_inner.go, line 154 at r14 (raw file):

Previously, dhartunian (David Hartunian) wrote…

I don't see a usage that sets statusTag to true. Am I missing something?

this was detritus; removed

pkg/util/tracing/span_inner.go, line 196 at r14 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Unused. Same as clearTag().

removed

AlexTalks

Reviewed 1 of 7 files at r10.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @dhartunian, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 931 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

The point of only storing aggregate information (in addition to extra info about the current lock being waited on, if any), is so that all this tracing takes a fixed memory budget when recording is not enabled. When recording is enabled, we get the structured events after each lock.
You don't buy?

I think that makes sense. I'm not entirely sure how to reason about what our memory budget is here, but I understand the thought process.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 882 at r18 (raw file):

// over.
//
// contentionEventTracer is thread-safe because ExpandToRecordingTags can be

s/ExpandToRecordingTags/Render

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 896 at r18 (raw file):

		// lockWait accumulates time waited for locks when the contentionEventTracer
		// is closed.
		lockWait time.Duration

Very useful!

It could potentially be even more useful to make this available to lockTableGuardImpl so that it could be used in metrics, though that may be out of scope here.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 913 at r18 (raw file):

		// not necessarily the last update - if two successive states refer to the
		// same	lock, curLockStart is not updated.
		curLockStart time.Time

This may duplicate lockTableGuardImpl.mu.requestWaitBegin, which I had initially included in waitingState (which would have made it available here). Perhaps this should be refactored to avoid duplication? @nvanbenschoten any thoughts?

(Also, that said, I don't think that allows us to account for waitElsewhere)

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 940 at r18 (raw file):

		}
		h.mu.closed = false
		h.mu.waitStart = timeutil.Now()

Nit: If it does make sense to continue tracking this here, the lockTableImpl is created with a timeutil.TimeSource that helps with testing - it would be good to pass that in to lockTableWaiterImpl, and use that instead of timeutil.Now().

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 947 at r18 (raw file):

		onEvent: onEvent,
	}
	h.mu.waitStart = timeutil.Now()

Nit: Same as above.

dhartunian

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei, @dhartunian, @nvanbenschoten, and @tbg)

-- commits, line 38 at r10:

Previously, andreimatei (Andrei Matei) wrote…

When the span is waiting on a lock and we capture it in-flight, we'll get a snapshot of the contentionEventTracer state within via the ExpandingTag mechanism.

Right. Except that you'll get some data from the ExpandingTag even if the span's recording is captured after the span finished waiting; see below.

If the span waited on locks in the past these each produce ContentionEvents via the emit() call that add structured data to the span.

Right. But note that, at least currently, the structured events are no-ops on non-recording spans, just like non-structured events.
Besides the structured events, the contentionEventTracer also counts the number of locks and the total time waited. See below.

Does the contentionEventTracer leave data in its wake besides the structured ContentionEvents?

It does. Or, rather, there's no "in its wake"; once created, the contentionEventTracer stays attached to the span as a LazyTag until span.Finish(). If/when the span's recording is collected, it will include the number of locks and the total time waited. If the recording is collected while the span is waiting on a lock, we get extra info about that lock.

Ack. Thankyou!

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 935 at r10 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I personally am in the crash-happy camp for bugs, on the argument that it generally leads to better and tighter code. If somehow getOrCreateContentionEventHelper were called with a span with an open contentionEventHelper, then either we forgot to close() it (in which case we're been counting bogus wait time), or we're somehow using the span for two concurrent wait table interactions, in which case they would step over each other. Both are a form of data corruption. Unless there's good reason to relax this, I'd rather generally assert that my data is sane.
You object?

Makes sense, thx for clarifying 👍

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks, @dhartunian, @nvanbenschoten, and @tbg)

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 882 at r18 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

s/ExpandToRecordingTags/Render

Done.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 896 at r18 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

Very useful!

It could potentially be even more useful to make this available to lockTableGuardImpl so that it could be used in metrics, though that may be out of scope here.

yeah... I'll leave this for next time.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 913 at r18 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

This may duplicate lockTableGuardImpl.mu.requestWaitBegin, which I had initially included in waitingState (which would have made it available here). Perhaps this should be refactored to avoid duplication? @nvanbenschoten any thoughts?

(Also, that said, I don't think that allows us to account for waitElsewhere)

as discussed offline, I've plumbed requestWaitBegin through the waitState over from the lockTableGuard to the contentionEventTracer.

I've also prepended a commit to rename requestWaitBegin -> curLockWaitStart if you don't mind, to reflect the understanding that the field is scoped to a lock.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 940 at r18 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

Nit: If it does make sense to continue tracking this here, the lockTableImpl is created with a timeutil.TimeSource that helps with testing - it would be good to pass that in to lockTableWaiterImpl, and use that instead of timeutil.Now().

removed

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 947 at r18 (raw file):

Previously, AlexTalks (Alex Sarkesian) wrote…

Nit: Same as above.

done

AlexTalks

Reviewed 9 of 9 files at r19, 1 of 1 files at r20.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @AlexTalks, @nvanbenschoten, and @tbg)

nvb

Reviewed 9 of 9 files at r19, 1 of 1 files at r20, 3 of 3 files at r21, 1 of 1 files at r22, 1 of 1 files at r23, 9 of 9 files at r24, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @tbg)

pkg/kv/kvserver/concurrency/concurrency_control.go, line 719 at r24 (raw file):

	Metrics() LockTableMetrics

	// Clock returns the lockTable's time source.

Instead of adding to this interface, let's give lockTableWaiterImpl a new contentionEventClock field that gets set to the same time source on construction.

pkg/kv/kvserver/concurrency/lock_table.go, line 96 at r24 (raw file):

	queuedWriters int               // how many writers are waiting?
	queuedReaders int               // how many readers are waiting?
	// lockWaitStart represents the timestamp when the request started waiting on

nit: move below guardAccess. Otherwise though, I'm glad we decided to pass this information through the waitingState.

pkg/kv/kvserver/concurrency/lock_table.go, line 1562 at r24 (raw file):

	g.mu.Lock()
	lockWaitStart := g.mu.curLockWaitStart
	g.mu.Unlock()

Can we move this down to the existing critical section blow? It looks like. we only need it right after g.mu.curLockWaitStart = timeProvider.Now().

pkg/kv/kvserver/concurrency/lock_table.go, line 2046 at r24 (raw file):

		g.mu.Lock()
		g.mu.state = waitState
		g.mu.state.lockWaitStart = g.mu.curLockWaitStart

Assigning to g.mu.state.lockWaitStart is nice. It prevents lockWaitStart times from accidentally ending up in the wrong guard's state. We should do that everywhere else as well.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 887 at r21 (raw file):

	// Otherwise,
	switch s.kind {
	case waitFor, waitForDistinguished, waitSelf, waitElsewhere:

Should we write a test for this? Similar to pkg/kv/kvserver/concurrency/testdata/concurrency_manager/wait_self, which was added in dc689ae.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 908 at r24 (raw file):

		// closed or if the contentionEventTracer has not yet been notified of the
		// first lock to wait on.
		curState *waitingState

Does this need to be a pointer? I assume it causes the argument to notify to escape to the heap and raises (unfounded) aliasing concerns.

andreimatei

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)

pkg/kv/kvserver/concurrency/concurrency_control.go, line 719 at r24 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Instead of adding to this interface, let's give lockTableWaiterImpl a new contentionEventClock field that gets set to the same time source on construction.

done

pkg/kv/kvserver/concurrency/lock_table.go, line 96 at r24 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: move below guardAccess. Otherwise though, I'm glad we decided to pass this information through the waitingState.

done

pkg/kv/kvserver/concurrency/lock_table.go, line 1562 at r24 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we move this down to the existing critical section blow? It looks like. we only need it right after g.mu.curLockWaitStart = timeProvider.Now().

done. Please see now,

pkg/kv/kvserver/concurrency/lock_table.go, line 2046 at r24 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Assigning to g.mu.state.lockWaitStart is nice. It prevents lockWaitStart times from accidentally ending up in the wrong guard's state. We should do that everywhere else as well.

done

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 887 at r21 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we write a test for this? Similar to pkg/kv/kvserver/concurrency/testdata/concurrency_manager/wait_self, which was added in dc689ae.

done

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 908 at r24 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this need to be a pointer? I assume it causes the argument to notify to escape to the heap and raises (unfounded) aliasing concerns.

done

The comment was claiming that a span's tag are only maintained if the span is recording, but that hasn't been the case in a while. Release note: None

An inner function is no longer serving any purpose. Release note: None

…ents Before this patch, no contention event was produced when requests were waiting for locks by pushing the txn that currently has a conflicting lock (or a reservation on a conflicting lock). This corresponds to the waitElsewhere signal from the lock table. (waitElsewhere is not the common case; the common case is for a request to wait in the lock table (through the waitFor signal). waitElsewhere happens instead of waitFor when the lock table is under memory pressure and sheds its state.) This patch makes the waitElsewhere state be handled like the other waiting states for the purposes of generating contention events. This patch also adds a data-driven test for the waitElsewhere functionality. Release note: None

Rename to curLockWaitStart to emphasize that it is set once per lock, not once per request or once per wait state. Release note: None

waitQueueMaxLengthExceeded is not really a wait state, so setting curLockWaitStart for it doesn't make much sense. Other final states like also don't update curLockWaitStart. Release note: None

nvb

Reviewed 4 of 7 files at r33, 9 of 9 files at r36, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 2 stale) (waiting on @andreimatei and @tbg)

pkg/kv/kvserver/concurrency/concurrency_manager_test.go, line 82 at r36 (raw file):

// debug-set-clock           ts=<secs>
// debug-set-discovered-locks-threshold-to-consult-finalized-txn-cache n=<count>
// debug-set-max-locks <limit>

n=<count>

When the evaluation of a request blocks on a lock acquisition, this patch annotates the tracing span representing the evaluation with tags about the lock's key, holder, time when the waiting started, and number of locks that the evaluation waited on previously. To do this, the patch adds support in the tracing library for "lazy tags": tags that are only stringified on demand, when a span's recording is collected. This is useful if the stringification is expensive (so we only want to pay for it if someone actually looks at the span), and also for structured tags whose state can change over time. A lazy tag can implement one of two interfaces: Stringer or LazyTag. The latter allows the tag to be represented as a set of key-value pairs in a recording. The point of the new contention tags is to be featured prominently in the visualizer for the active spans registry that I'm working on. Looking at queries/requests experiencing contention is likely to be a very common scenario. This patch spruces up the contentionEventHelper in the lock table, to maintain the state of a waiter as a lazy tag. Release note: None Release justification: Low risk, high observability value.

andreimatei

TFTRs!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 3 stale) (waiting on @nvanbenschoten and @tbg)

pkg/kv/kvserver/concurrency/concurrency_manager_test.go, line 82 at r36 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

n=<count>

done

Code quote:

// debug-set-max-locks <limit>

craig · 2022-03-15T02:38:33Z

Build succeeded:

GitHub CI (Cockroach)

andreimatei requested review from AlexTalks, nvb and tbg January 14, 2022 23:19

andreimatei requested a review from a team as a code owner January 14, 2022 23:19

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch from d871ee2 to cfdb3e9 Compare January 14, 2022 23:21

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch 2 times, most recently from 0835e50 to b32e587 Compare January 17, 2022 00:08

tbg reviewed Jan 17, 2022

View reviewed changes

dhartunian reviewed Jan 18, 2022

View reviewed changes

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch from b32e587 to 0347f21 Compare January 28, 2022 01:28

andreimatei requested a review from a team as a code owner January 28, 2022 01:28

andreimatei requested a review from a team January 28, 2022 01:28

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch 2 times, most recently from 99cdca7 to fc92a65 Compare January 28, 2022 01:32

andreimatei commented Jan 28, 2022

View reviewed changes

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch from fc92a65 to f204bd2 Compare January 28, 2022 15:07

AlexTalks suggested changes Feb 2, 2022

View reviewed changes

dhartunian reviewed Feb 4, 2022

View reviewed changes

otan removed the request for review from a team February 21, 2022 03:15

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch from f204bd2 to a70534c Compare March 3, 2022 00:26

dhartunian approved these changes Mar 8, 2022

View reviewed changes

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch from a70534c to dbf2e89 Compare March 10, 2022 00:08

andreimatei requested a review from a team March 10, 2022 00:08

andreimatei commented Mar 10, 2022

View reviewed changes

AlexTalks suggested changes Mar 10, 2022

View reviewed changes

dhartunian approved these changes Mar 10, 2022

View reviewed changes

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch from dbf2e89 to 8d9ba98 Compare March 10, 2022 23:56

andreimatei commented Mar 10, 2022

View reviewed changes

AlexTalks approved these changes Mar 11, 2022

View reviewed changes

nvb reviewed Mar 14, 2022

View reviewed changes

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch from 8d9ba98 to 896e7b1 Compare March 14, 2022 22:52

andreimatei commented Mar 14, 2022

View reviewed changes

andreimatei added 5 commits March 14, 2022 20:28

util/tracing: remove stale comment about tags

4488d12

The comment was claiming that a span's tag are only maintained if the span is recording, but that hasn't been the case in a while. Release note: None

tracing: small cleanup

ce82fc8

An inner function is no longer serving any purpose. Release note: None

concurrency: rename lockTableGuard.requestWaitBegin

20a1d2b

Rename to curLockWaitStart to emphasize that it is set once per lock, not once per request or once per wait state. Release note: None

concurrency: remove superfluous setting of curLockWaitStart

4f70978

waitQueueMaxLengthExceeded is not really a wait state, so setting curLockWaitStart for it doesn't make much sense. Other final states like also don't update curLockWaitStart. Release note: None

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch from 896e7b1 to 2aa2f23 Compare March 15, 2022 00:29

nvb approved these changes Mar 15, 2022

View reviewed changes

andreimatei force-pushed the tracing.cleanup.kv-lock-contention branch from 2aa2f23 to 70ef641 Compare March 15, 2022 01:47

andreimatei commented Mar 15, 2022

View reviewed changes

craig bot merged commit 957dfa8 into cockroachdb:master Mar 15, 2022

erikgrinaker mentioned this pull request Mar 15, 2022

ccl/kvccl/kvtenantccl: TestTenantTracesAreRedacted failed [tracing deadlock] #77812

Closed

andreimatei deleted the tracing.cleanup.kv-lock-contention branch March 18, 2022 15:23

	// Duration formats a duration in a user-friendly way. The result is not exact
	// and the granularity is no smaller than microseconds.
	//
	// Examples:
	// 0 -> "0µs"
	// 123456ns -> "123µs"
	// 12345678ns -> "12ms"
	// 12345678912ns -> "1.2s"
	//
	func Duration(val time.Duration) redact.SafeString {

Conversation

andreimatei commented Jan 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jan 14, 2022

Uh oh!

andreimatei commented Jan 14, 2022

Uh oh!

tbg Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

tbg Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

tbg Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

tbg commented Jan 17, 2022

Uh oh!

dhartunian left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

AlexTalks left a comment

Choose a reason for hiding this comment

Uh oh!

dhartunian left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei commented Mar 3, 2022

Uh oh!

dhartunian left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

AlexTalks left a comment

Choose a reason for hiding this comment

Uh oh!

dhartunian left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

AlexTalks left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Mar 15, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

andreimatei commented Jan 14, 2022 •

edited

Loading