colflow: fix premature vectorized stats collection#62179
colflow: fix premature vectorized stats collection#62179yuzefovich wants to merge 1 commit intocockroachdb:masterfrom
Conversation
f18f901 to
439092f
Compare
|
I briefly looked into why |
RaduBerinde
left a comment
There was a problem hiding this comment.
(but I would wait for Alfonso's review too)
Yeah, I think there are some fixes that are necessary in the new factory, safe to ignore for now.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @asubiotto and @yuzefovich)
pkg/sql/colflow/vectorized_flow.go, line 728 at r1 (raw file):
if s.recordingStats && len(s.vectorizedStatsCollectorsQueue) > 0 { vscs := append([]vectorizedStatsCollector(nil), s.vectorizedStatsCollectorsQueue...) s.vectorizedStatsCollectorsQueue = s.vectorizedStatsCollectorsQueue[:0]
[nit] we seem to repeat this code in quite a few places, would be helpful to move to a function. I don't know what to call this operation - "reset queue"?
Previously, we could call `finishVectorizedStatsCollectors` prematurely in some edge cases (e.g. when the right side of the hash join is empty, we could collect the stats from the left input tree too early). This was caused by the fact that we accumulate the vectorized stats collectors into a queue which is handled by the outboxes or the root materializer. This commit begins sharing the responsibility of the collecting stats with the hash router too. The reasoning for this change is that when the hash router exits, all stats collectors in its input tree should have the final info, and the hash router can collect correct stats. Those stats are added as another metadata object to be returned by the last to exit router output. Release note (bug fix): Previously, CockroachDB when collecting execution statistics (e.g. when running EXPLAIN ANALYZE) could collect them prematurely which would result in incorrect stats. This is now fixed.
439092f to
3249984
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @asubiotto)
pkg/sql/colflow/vectorized_flow.go, line 728 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] we seem to repeat this code in quite a few places, would be helpful to move to a function. I don't know what to call this operation - "reset queue"?
Done.
asubiotto
left a comment
There was a problem hiding this comment.
I think you should mention in your commit message that the reason this works is that currently the hash router is the only component to split an input stream. We should probably add a regression test that would fail when a new component that splits a stream is added but does not finish its input stat collectors.
Reviewed 1 of 6 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)
|
I'll update the commit message, but what kind of unit test do you have in mind? It seems to me it is not possible to write such a unit test before we actually introduce that another component that would split streams. |
|
I was thinking more along the lines of checking invariants in logic tests. |
|
I guess I see an option of introducing some special checker operator that intercept calls to |
|
Just to make sure we're on the same page: the problem is not that we don't finish some of the stats collectors, but that we do it too early. I imagine it would be easy to add a check for the former, but I don't see how we would enforce the latter. |
asubiotto
left a comment
There was a problem hiding this comment.
Right. Can't we inject a dummy stat collector that is also a MetadataSource in each input tree and simply assert that the collector is not finished before DrainMeta is called?
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @yuzefovich)
|
That's an interesting idea, I'll need to think a bit more (I'm concerned that we might be entangling stats collection and metadata collection even more, but I guess it is acceptable), but I think it would work. |
|
Closing in favor of #62255 |
Previously, we could call
finishVectorizedStatsCollectorsprematurelyin some edge cases (e.g. when the right side of the hash join is empty,
we could collect the stats from the left input tree too early). This was
caused by the fact that we accumulate the vectorized stats collectors
into a queue which is handled by the outboxes or the root materializer.
This commit begins sharing the responsibility of the collecting stats
with the hash router too. The reasoning for this change is that when the
hash router exits, all stats collectors in its input tree should have
the final info, and the hash router can collect correct stats. Those
stats are added as another metadata object to be returned by the last to
exit router output.
Fixes: #56928.
Release note (bug fix): Previously, CockroachDB when collecting
execution statistics (e.g. when running EXPLAIN ANALYZE) could collect
them prematurely which would result in incorrect stats. This is now
fixed.