Skip to content

colflow: fix premature stats collection and improve tracking#62255

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
yuzefovich:stats-fix
Mar 22, 2021
Merged

colflow: fix premature stats collection and improve tracking#62255
craig[bot] merged 1 commit intocockroachdb:masterfrom
yuzefovich:stats-fix

Conversation

@yuzefovich
Copy link
Copy Markdown
Member

This commit achieves two goals:

  1. previously, we could call finishVectorizedStatsCollectors prematurely
    in some edge cases (e.g. when the right side of the hash join is empty,
    we could collect the stats from the left input tree too early). This was
    caused by the fact that we accumulate the vectorized stats collectors
    into a queue which is handled by the outboxes or the root materializer.
    This commit begins sharing the responsibility of the collecting stats
    with the hash router too. Only the hash router needs updating to achieve
    the fix because it is currently the only component that splits an input
    stream. The change is also sound because when the hash router exits, all
    stats collectors in its input tree should have the final info, and the
    hash router can collect correct stats. Those stats are added as another
    metadata object to be returned by the last to exit router output.

  2. vectorizedStatsCollectorsQueue on the flow creator is removed in
    favor of more precise tracking of which stats collectors belong to which
    tree. The motivation behind this changes is that the follow-up commits
    will unify the retrieval of stats and draining of the metadata sources so
    that the former always occurred right before the latter.

Also this commits renames metadataSourcesQueue to metadataSources and
changes the order of arguments in a few functions.

Fixes: #56928.

Release note (bug fix): Previously, CockroachDB when collecting
execution statistics (e.g. when running EXPLAIN ANALYZE) could collect
them prematurely which would result in incorrect stats. This is now
fixed.

This commit achieves two goals:

1. previously, we could call `finishVectorizedStatsCollectors` prematurely
in some edge cases (e.g. when the right side of the hash join is empty,
we could collect the stats from the left input tree too early). This was
caused by the fact that we accumulate the vectorized stats collectors
into a queue which is handled by the outboxes or the root materializer.
This commit begins sharing the responsibility of the collecting stats
with the hash router too. Only the hash router needs updating to achieve
the fix because it is currently the only component that splits an input
stream. The change is also sound because when the hash router exits, all
stats collectors in its input tree should have the final info, and the
hash router can collect correct stats. Those stats are added as another
metadata object to be returned by the last to exit router output.

2. `vectorizedStatsCollectorsQueue` on the flow creator is removed in
favor of more precise tracking of which stats collectors belong to which
tree. The motivation behind this changes is that the follow-up commits
will unify the retrieval of stats and draining of the metadata sources so
that the former always occurred right before the latter.

Also this commits renames `metadataSourcesQueue` to `metadataSources` and
changes the order of arguments in a few functions.

Release note (bug fix): Previously, CockroachDB when collecting
execution statistics (e.g. when running EXPLAIN ANALYZE) could collect
them prematurely which would result in incorrect stats. This is now
fixed.
@yuzefovich yuzefovich requested review from a team and asubiotto March 19, 2021 17:10
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@yuzefovich
Copy link
Copy Markdown
Member Author

yuzefovich commented Mar 19, 2021

This PR is a slightly updated version of #62179 which additionally achieves the second goal mentioned in the commit message.

To provide more context why I decided to go down this route: I began implementing Alfonso's suggestion to introduce an invariant checker that all stats are collected before DrainMeta is called and ran into multiple issues due to the fact that we drain metadata sources in more places than in which we finish the stats collectors. I started putting some hacks in place and realized that hard-to-reason-about code was getting even worse, so I started thinking about refactoring the code (prototype in #62221) in which we carefully track stats collectors, metadata sources, and closers for each input tree, with all of those components have the same "lifecycle".

Essentially in order to get the stats invariants checker working, we need to have the same behavior for the stats collectors as we added in 2d4ad44 for the metadata sources (the materializers for wrapped processors acquiring the ownership), so I think it'll be beneficial to generalize that.

To that end #62221 will attempt the following:

  • a new struct is introduced (it is essentially the same as opDAGWithMetaSources and will also replace SynchronizerInput):
type OpWithMetaInfo struct {
  Root colexecop.Operator
  StatsCollectors []colexecop.VectorizedStatsCollector
  MetadataSources execinfrapb.MetadataSources
  ToClose colexecop.Closers
}

It will track a root of the tree along with "meta" objects within that tree the ownership of which hasn't been claimed yet

  • all components that currently drain metadata sources will also be responsible for collecting stats, right before draining
  • a stats invariants checker is introduced which asserts that the stats collectors are finished before DrainMeta. This will mean that synchronizers will also be responsible for collecting stats in the input goroutines.

I think such design will lead to much cleaner and more maintainable code overall. However, the changes are pretty invasive, so all of them wouldn't be backportable, so I extracted the first step together with the fix into this PR. Please let me know your thoughts.

Copy link
Copy Markdown
Contributor

@asubiotto asubiotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 25 of 25 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @yuzefovich)


pkg/sql/colflow/vectorized_flow.go, line 1023 at r1 (raw file):

			return err
		}
		s.vectorizedStatsCollectorsQueue = s.vectorizedStatsCollectorsQueue[:0]

Can we remove this queue?

Copy link
Copy Markdown
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @asubiotto)


pkg/sql/colflow/vectorized_flow.go, line 1023 at r1 (raw file):

Previously, asubiotto (Alfonso Subiotto Marqués) wrote…

Can we remove this queue?

It is removed in this commit, yes. If your question is whether it is safe to do so, then I believe it is because we're extracting the tracking of the stats collectors to be more explicit, and not as a "queue".

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 22, 2021

Build failed (retrying...):

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 22, 2021

Build succeeded:

@craig craig bot merged commit 27b4ab1 into cockroachdb:master Mar 22, 2021
@yuzefovich yuzefovich deleted the stats-fix branch March 22, 2021 20:09
@rafiss rafiss added this to the 21.1 milestone Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: vectorized stats collectors can finish prematurely

4 participants