tracing: link child into parent, even if not verbose#59103
tracing: link child into parent, even if not verbose#59103craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
Prior to this change, when a child was derived from a local parent, we would not register the child with the parent. In effect, this meant that any payloads in the child would not be collected. Release note: None
asubiotto
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @tbg)
pkg/util/tracing/crdbspan.go, line 105 at r1 (raw file):
} if recType == RecordingOff { return
Is it ok to not clear previously recorded info in this case?
tbg
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @irfansharif, and @knz)
pkg/util/tracing/crdbspan.go, line 105 at r1 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
Is it ok to not clear previously recorded info in this case?
Hmm. You've alerted me to a real problem that will remain here whether we clear here or not. SQL uses (very) long-running spans that naively would accumulate more and more payloads. Since nobody ever "resets" the span (as is done when calling SetVerbose during verbose tracing), payloads that have long been handled will sit around forever. I think we need an explicit contract around marking payloads as consumed and around resetting spans. It's surprisingly tricky because:
- we want to be able to continue tracing multiple statements in one span, so we can't simply discard the old recording after each stmt (at least not when verbose)
- we definitely want to discard the payloads after each stmt, as they will have been handled and we don't want to have to book-keep which ones have been handled already in SQL
As an example of 2), if we verbose trace a stmt which creates a contention event, and then run five more statements that don't, we don't want the contention event to be fed into the SQL infra five times.
Looking at this all, my suggestion is:
- we change the
Structured()visitor to take aconsume bool. If that is set, all of the payloads are removed from the spans they are taken from (incl linked children). If it's not set, we're just looking at the payloads but not mutating anything. The idea is that the place where you put the payloads into your contention registry hastruefor consume. - We make the three
nillines below explicit in a method(*Span).ResetRecording(). You call it immediately beforeSetVerbose(true)in the SQL code.
What do you think? Any concerns or ideas on how to do it differently?
|
SGTM. Could you give me an example of a case when we will be looking at the structured payload but not consuming anything? |
|
I personally don't know if there is one, but it seems "wrong" to not offer introspection without also clearing (unit tests if nothing else). |
irfansharif
left a comment
There was a problem hiding this comment.
This PR , but I don't yet have enough context on the discussion below re: long lived spans and perpetual collection of payloads.
Reviewed 3 of 3 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @knz)
IIUC, this could lead to cases in which callers forget to reset these recordings/payloads. I prefer forcing the caller to decide explicitly what to do. This PR LGTM as well. We can probably get it in and focus on the clearing on consumption problem in a separate one, up to you. |
|
bors r=irfansharif Irfan will pick up #59145 separately. |
|
Build succeeded: |
59132: sql: introduce sql.statement_stats.sample_rate to sample execution stats r=RaduBerinde,dhartunian a=asubiotto Depends on #58897 Depends on #59103 This PR puts the "always-on" into always-on EXPLAIN ANALYZE. Take a look at separate commits for details. What actually goes on is that we're taking the slightly safer route of introducing a cluster setting which defines a sample rate for execution stats. These execution stats are collected and propagated using background tracing, which is cheaper than verbose tracing. This allows us to power the new DB Console statement stats views. Currently we still need user input in order to turn up this sample rate. The sample rate is 0 by default in this PR for safety reasons. I'd like to discuss the default value of this cluster setting or whether we need it at all separately before the 21.1 release, but this gives us a nice escape hatch if for whatever reason stats collection results in poor performance. Closes #54556 59536: colexec,bazel: pin the `types` dependency in generated files r=irfansharif a=irfansharif This is a workaround for bazel auto-generated code. goimports does not automatically pick up the right packages when run within the bazel sandbox, so we have to pin it by hand. Release note: None Co-authored-by: Alfonso Subiotto Marques <alfonso@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Prior to this change, when a child was derived from a local parent, we
would not register the child with the parent. In effect, this meant that
any payloads in the child would not be collected.
Release note: None