colflow: clean up vectorized stats propagation by yuzefovich · Pull Request #61380 · cockroachdb/cockroach

yuzefovich · 2021-03-03T00:03:45Z

Previously, in order to propagate execution statistics we were creating
temporary tracing spans, setting the stats on them, and finishing the
spans right away. This allowed for using (to be more precise, abusing)
the existing infrastructure. The root of the problem is that in the
vectorized engine we do not start per-operator span if stats collection
is enabled at the moment, so we had to get around that limitation.

However, this way is not how tracing spans are intended to be used and
creates some performance slowdown in the hot path, so this commit
refactors the situation. Now we are ensuring that there is always
a tracing span available at the "root" components (either root
materializer or an outbox), so when root components are finishing the
vectorized stats collectors for their subtree of operators, there is
a span to record the stats into.

This required the following minor adjustments:

in the materializer, we now delegate attachment of the stats to the
tracing span to the drainHelper (which does so on ConsumerDone). Note
that the drainHelper doesn't get the recording from the span and leaves
that to the materializer (this is needed in order to avoid collecting
duplicate trace data).
in the outbox, we now start a "remote child span" (if there is a span
in the parent context) in the beginning of Run method, and we attach
that stats in sendMetadata.

Addresses: #59379.
Fixes: #59555.

Release justification: low-risk update to existing functionality.

Release note: None

cockroach-teamcity · 2021-03-03T00:03:51Z

This change is

asubiotto

Reviewed 22 of 25 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @yuzefovich)

pkg/sql/colexec/materializer.go, line 223 at r1 (raw file):

	// trailing meta callback, so we tell the ProcessorBase to skip trace data.
	// If we don't do so, then we would get duplicate traces.
	m.SkipTraceData = true

I don't like this knob. The underlying problem is that we pass on the recording before adding the stats. I think there are two better ways to fix this:

Add stats in drainHelper's ConsumerDone method above. This will be called before the trailing meta stage.
Call StartInternalNoSpan on the ProcessorBase in Start and do manual span management in the materializer.

I think 1) will be cleaner, but I prefer either to having to add a new knob to ProcessorBase if we can avoid it, because I see this as a specific need rather than a more general one.

pkg/sql/colflow/vectorized_flow.go, line 365 at r1 (raw file):

	vectorizedStatsCollectors []vectorizedStatsCollector,
) []*execinfrapb.ComponentStats {
	result := make([]*execinfrapb.ComponentStats, 0, len(vectorizedStatsCollectors))

Should we pool this?

pkg/sql/colflow/vectorized_flow.go, line 992 at r1 (raw file):

			// appended to.
			vscq := append([]vectorizedStatsCollector(nil), s.vectorizedStatsCollectorsQueue...)
			getStats = func() []*execinfrapb.ComponentStats {

nit: I wonder if keeping this as a callback is the best way forward. It seems a bit hacky when we could just pass down the vectorized stats collectors and they could be finished by the outbox/materializer just like metadata sources are drained explicitly, instead of passing down a callback that will drain metadata sources. I don't feel strongly about this though, but it might be nice to keep "execution" code outside the planning part.

pkg/sql/colflow/colrpc/outbox.go, line 306 at r1 (raw file):

			}
		}
		o.span.Finish()

nit: seems like you could also remove these two lines and rely on the defer above since we don't need to Finish the span before grabbing the recording.

pkg/sql/logictest/testdata/logic_test/inverted_index_geospatial, line 64 at r1 (raw file):

            └── • scan
                  cluster nodes: <hidden>
                  actual row count: 4

🤔 is this right?

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @asubiotto and @RaduBerinde)

pkg/sql/colexec/materializer.go, line 223 at r1 (raw file):