sql: use real contention events instead of the mock ones#59785
sql: use real contention events instead of the mock ones#59785craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
Hm, I guess it might be possible to attribute each contention event from the trace to a particular processor if each processor would record into the trace the spans (as in Am I missing something? Do you see another way to keep the "contention time" info for EXPLAIN ANALYZE diagrams when the events are recorded into the trace? |
9e400d9 to
3356f40
Compare
|
Do we have plans to use node-level statistics? It is not used currently, so I added another commit that removes that. |
3356f40 to
d547517
Compare
asubiotto
left a comment
There was a problem hiding this comment.
Thanks for getting this out so quickly!
Am I missing something? Do you see another way to keep the "contention time" info for EXPLAIN ANALYZE diagrams when the events are recorded into the trace?
This is what I meant when talking about the execution leaves. Instead of calling GetCumulativeContentionTime in the leaves and populating metadata with it, my thinking is that the processors would use a helper to inspect their span's payload (from the context) and calculate cumulative contention time from those. I'm guessing that there might be some subtlety in doing this in cases where we have two input branches (e.g. a joinreader) and want to get the contention time for only one branch, but that might be limited to using a separate context for the lookups to get only spans related to those lookups.
Reviewed 20 of 20 files at r1, 4 of 4 files at r2, 1 of 1 files at r3, 9 of 9 files at r4, 1 of 33 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @RaduBerinde, and @yuzefovich)
pkg/sql/distsql_running_test.go, line 270 at r5 (raw file):
// additionally verifies that the metric stays at zero if there is no // contention. func TestDistSQLReceiverReportsContention(t *testing.T) {
cc @tbg
@yuzefovich could you add a check that the contention registry is updated properly?
pkg/sql/execstats/traceanalyzer.go, line 89 at r3 (raw file):
// TODO(asubiotto): Flatten this struct, we're currently allocating a map per // stat. type NodeLevelStats struct {
The question of whether we want to keep this is a good one. I think we do because it won't be long before our users start asking for a break down of higher-level stats by node. However, I do think that we could probably clean this up a bit and maybe offer an option to calculate query level stats from just the trace for performance reasons. In any case, this commit and discussion belongs in another PR.
pkg/sql/execstats/traceanalyzer.go, line 96 at r3 (raw file):
KVTimeGroupedByNode map[roachpb.NodeID]time.Duration NetworkMessagesGroupedByNode map[roachpb.NodeID]int64 ContentionTimeGroupedByNode map[roachpb.NodeID]time.Duration
I'm confused why this commit removes this, what does it fix?
f021eec to
1cf00d8
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Ok, I think it makes sense. IIUC because all processors already start a child span in the ProcessorBase.StartInternal method, the contention events only for the appropriate processor will show up in its trace (i.e. if we have a join reader that takes in an input from another table reader, the former will only see its own events). I added a similar logic to the ColBatchScan.
Is my logic here correct? Any ideas on how to test it programmatically? I'm not sure on how we can inject the contention events into the trace, I'm guessing we need to plumb a testing callback into the KV concurrency manager?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @RaduBerinde, and @tbg)
pkg/sql/distsql_running_test.go, line 270 at r5 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
cc @tbg
@yuzefovich could you add a check that the contention registry is updated properly?
Done.
pkg/sql/execstats/traceanalyzer.go, line 89 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
The question of whether we want to keep this is a good one. I think we do because it won't be long before our users start asking for a break down of higher-level stats by node. However, I do think that we could probably clean this up a bit and maybe offer an option to calculate query level stats from just the trace for performance reasons. In any case, this commit and discussion belongs in another PR.
Ok, I removed the removal commit.
pkg/sql/execstats/traceanalyzer.go, line 96 at r3 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I'm confused why this commit removes this, what does it fix?
It wasn't being used (which is a bug being fixed in another PR).
f4570e2 to
f51981a
Compare
508ca68 to
5e28964
Compare
|
Added another commit that adds or improves the stats collection on miscellaneous joiners. |
5e28964 to
745b562
Compare
tbg
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @asubiotto, @RaduBerinde, @tbg, and @yuzefovich)
pkg/sql/distsql_running_test.go, line 307 at r5 (raw file):
// Soft check to protect against flakiness where an internal query // causes the contention metric to increment. require.GreaterOrEqual(t, metrics.ContendedQueriesCount.Count(), int64(1))
I think it would be useful to have a testing knob that intercepts the contention events handled by DistSQL (i.e. whatever increments the metric). That way, you can explicitly verify that you got what you expected all the way to the collection. Just a suggestion.
|
FYI once #59992 is in, the contention metadata should flow without turning tracing on. |
|
Ah, found it. There's some short-circuiting in |
asubiotto
left a comment
There was a problem hiding this comment.
Reviewed 11 of 33 files at r5, 41 of 41 files at r6, 8 of 8 files at r7.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @asubiotto, @RaduBerinde, @tbg, and @yuzefovich)
pkg/sql/distsql_running_test.go, line 285 at r6 (raw file):
if contention { // Begin a contending transaction. conn, err := db.Conn(ctx)
nit: I think this and the conn below need to be closed
pkg/sql/colexecbase/operator.go, line 39 at r6 (raw file):
// happen when 'workmem' setting is too low - just bail, even if we have // disk spilling for that operator. // TODO(yuzefovich): we probably should move ctx argument from Next into
I don't think capturing the context is a good idea (it's an anti-pattern in go), but we should probably change Init to accept a context. The pattern would be that callers should use the same context for all calls.
745b562 to
80740e2
Compare
This commit switches to using the real contention events that KV layer adds to the trace and starts adding them into the contention registry. Previously, we were using the payload that came as part of ProducerMetadata, and the corresponding field is now removed. This required that we change in how the cumulative contention time is calculated. Now all components that report it (colbatchscan, table reader, join reader, inverted joiner) examine the trace obtained from their context to find all contention events and accumulate total time. This works because every processor starts a child spand in `startInternal` already and this commit adds similar logic to the colbatchscan. Release note: None
Previously, there was no stats collection on the zigzag joiner, so this commit adds it. Additionally, it adds reporting of the number of bytes read by the join reader and the inverted joiner as well as make all processors mentioned implement `execinfra.KVReader` interface (which will allow for all to show up nicely on the EXPLAIN ANALYZE diagrams when run via the vectorized engine). Release note: None
80740e2 to
feb1523
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
TFTRs!
I'll go ahead and merge this, and we can address the TODOs later.
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @RaduBerinde, and @tbg)
pkg/sql/distsql_running_test.go, line 307 at r5 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I think it would be useful to have a testing knob that intercepts the contention events handled by DistSQL (i.e. whatever increments the metric). That way, you can explicitly verify that you got what you expected all the way to the collection. Just a suggestion.
Good point, left a TODO for this since it requires a bit of plumbing.
pkg/sql/distsql_running_test.go, line 285 at r6 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
nit: I think this and the conn below need to be closed
Done.
pkg/sql/colexecbase/operator.go, line 39 at r6 (raw file):
Previously, asubiotto (Alfonso Subiotto Marqués) wrote…
I don't think capturing the context is a good idea (it's an anti-pattern in go), but we should probably change
Initto accept a context. The pattern would be that callers should use the same context for all calls.
But what if we derive a child context in Init? Should we return it then? What does the caller then do with that context? With the parent?
This might be an anti-pattern, but to me it makes a lot more sense. When initializing an operator, you pass in a context that can be used to signal for cancellation. At the same time, the operator is free to derive a child context from it and use that throughout its life.
So I think ideally I'd like to see the following:
// We have a root context (either of the root materializer or an outbox, also of a parallel unordered synchronizer)
// which can be used to shut everything down.
_ = ctx
...
operator.Init(ctx)
...
// Here and below the operator itself is responsible for using the appropriate context.
for operator.Next() {}
...
operator.DrainMeta()
|
Build failed (retrying...): |
|
Build succeeded: |
asubiotto
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto, @RaduBerinde, @tbg, and @yuzefovich)
pkg/sql/colexecbase/operator.go, line 39 at r6 (raw file):
But what if we derive a child context in Init? Should we return it then? What does the caller then do with that context? With the parent?
Exactly, just like a lot of other places (e.g. setting up and running flows). The pattern would be:
...
ctx = operator.Init(ctx)
...
// Here and below the operator itself is responsible for using the appropriate context.
for operator.Next(ctx) {}
...
operator.DrainMeta(ctx)
And we can probably enforce this with a testing-level assertion where Init adds a special value to the context and asserts that it is stored in the context used in Next and DrainMeta

sql: use real contention events instead of the mock ones
This commit switches to using the real contention events that KV layer
adds to the trace and starts adding them into the contention registry.
Previously, we were using the payload that came as part of
ProducerMetadata, and the corresponding field is now removed.
This required that we change in how the cumulative contention time is
calculated. Now all components that report it (colbatchscan, table
reader, join reader, inverted joiner) examine the trace obtained from
their context to find all contention events and accumulate total time.
This works because every processor starts a child spand in
startInternalalready and this commit adds similar logic to thecolbatchscan.
Fixes: #59639.
Release note: None
rowexec: improve stats collection from miscellaneous joiners
Previously, there was no stats collection on the zigzag joiner, so this
commit adds it.
Additionally, it adds reporting of the number of bytes read by the join
reader and the inverted joiner as well as make all processors mentioned
implement
execinfra.KVReaderinterface (which will allow for all toshow up nicely on the EXPLAIN ANALYZE diagrams when run via the
vectorized engine).
Release note: None