Skip to content

tracing: link child into parent, even if not verbose#59103

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:trace-remember-children
Jan 21, 2021
Merged

tracing: link child into parent, even if not verbose#59103
craig[bot] merged 1 commit intocockroachdb:masterfrom
tbg:trace-remember-children

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Jan 18, 2021

Prior to this change, when a child was derived from a local parent, we
would not register the child with the parent. In effect, this meant that
any payloads in the child would not be collected.

Release note: None

Prior to this change, when a child was derived from a local parent, we
would not register the child with the parent. In effect, this meant that
any payloads in the child would not be collected.

Release note: None
@tbg tbg requested review from asubiotto, irfansharif and knz January 18, 2021 10:27
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)


pkg/util/tracing/crdbspan.go, line 105 at r1 (raw file):

	}
	if recType == RecordingOff {
		return

Is it ok to not clear previously recorded info in this case?

@tbg tbg requested a review from asubiotto January 18, 2021 11:10
Copy link
Copy Markdown
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @irfansharif, and @knz)


pkg/util/tracing/crdbspan.go, line 105 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Is it ok to not clear previously recorded info in this case?

Hmm. You've alerted me to a real problem that will remain here whether we clear here or not. SQL uses (very) long-running spans that naively would accumulate more and more payloads. Since nobody ever "resets" the span (as is done when calling SetVerbose during verbose tracing), payloads that have long been handled will sit around forever. I think we need an explicit contract around marking payloads as consumed and around resetting spans. It's surprisingly tricky because:

  1. we want to be able to continue tracing multiple statements in one span, so we can't simply discard the old recording after each stmt (at least not when verbose)
  2. we definitely want to discard the payloads after each stmt, as they will have been handled and we don't want to have to book-keep which ones have been handled already in SQL

As an example of 2), if we verbose trace a stmt which creates a contention event, and then run five more statements that don't, we don't want the contention event to be fed into the SQL infra five times.

Looking at this all, my suggestion is:

  1. we change the Structured() visitor to take a consume bool. If that is set, all of the payloads are removed from the spans they are taken from (incl linked children). If it's not set, we're just looking at the payloads but not mutating anything. The idea is that the place where you put the payloads into your contention registry has true for consume.
  2. We make the three nil lines below explicit in a method (*Span).ResetRecording(). You call it immediately before SetVerbose(true) in the SQL code.

What do you think? Any concerns or ideas on how to do it differently?

@asubiotto
Copy link
Copy Markdown
Contributor

SGTM. Could you give me an example of a case when we will be looking at the structured payload but not consuming anything?

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 18, 2021

I personally don't know if there is one, but it seems "wrong" to not offer introspection without also clearing (unit tests if nothing else).
Perhaps the better API is to have ResetRecording() and ResetPayloads, but then for the latter we're traversing twice (since we also always collect the payloads).

Copy link
Copy Markdown
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR :lgtm:, but I don't yet have enough context on the discussion below re: long lived spans and perpetual collection of payloads.

Reviewed 3 of 3 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @knz)

@asubiotto
Copy link
Copy Markdown
Contributor

Perhaps the better API is to have ResetRecording() and ResetPayloads, but then for the latter we're traversing twice (since we also always collect the payloads).

IIUC, this could lead to cases in which callers forget to reset these recordings/payloads. I prefer forcing the caller to decide explicitly what to do.

This PR LGTM as well. We can probably get it in and focus on the clearing on consumption problem in a separate one, up to you.

@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jan 21, 2021

bors r=irfansharif

Irfan will pick up #59145 separately.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 21, 2021

Build succeeded:

@craig craig bot merged commit c978f14 into cockroachdb:master Jan 21, 2021
craig bot pushed a commit that referenced this pull request Jan 28, 2021
59132: sql: introduce sql.statement_stats.sample_rate to sample execution stats r=RaduBerinde,dhartunian a=asubiotto

Depends on #58897 
Depends on #59103 

This PR puts the "always-on" into always-on EXPLAIN ANALYZE. Take a look at separate commits for details. What actually goes on is that we're taking the slightly safer route of introducing a cluster setting which defines a sample rate for execution stats. These execution stats are collected and propagated using background tracing, which is cheaper than verbose tracing. This allows us to power the new DB Console statement stats views. Currently we still need user input in order to turn up this sample rate.

The sample rate is 0 by default in this PR for safety reasons. I'd like to discuss the default value of this cluster setting or whether we need it at all separately before the 21.1 release, but this gives us a nice escape hatch if for whatever reason stats collection results in poor performance.

Closes #54556 

59536: colexec,bazel: pin the `types` dependency in generated files r=irfansharif a=irfansharif

This is a workaround for bazel auto-generated code. goimports does not
automatically pick up the right packages when run within the bazel
sandbox, so we have to pin it by hand.

Release note: None

Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants