sql: introduce hash group-join operator by yuzefovich · Pull Request #93483 · cockroachdb/cockroach

yuzefovich · 2022-12-13T02:12:52Z

This PR introduces the hash group-join operator (which combines a hash join followed by a hash aggregation when the join's equality columns are the same as aggregation's grouping columns into a single operator) to the execution engines . The optimizer is currently unaware of this new operator - the changes are plumbed only from the DistSQL physical planning. Naive implementations (which simply use a hash joiner followed by a hash aggregator) are introduced to both engines with the proper disk-spilling. The usage of this new operator is gated behind an experimental session variable.

See each commit for details.

Addresses: #38307.

Epic: None

cockroach-teamcity · 2022-12-13T02:13:01Z

This change is

DrewKimball

I'm excited to see how this turns out!

Reviewed 7 of 7 files at r1, 29 of 29 files at r2, 15 of 15 files at r3, 7 of 7 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)

-- commits line 109 at r4:
Could we just reuse the same setting? Since this operator is only planned when a hash aggregate would previously have been planned, anyway.

pkg/sql/distsql_physical_planner.go line 1990 at r2 (raw file):

			}
			if len(groupCols) == len(orderedGroupCols) {
				// We will plan streaming aggregation.

[nit] We may want a TODO here to consider whether an ordered aggregation is always better when this is supported in the optimizer. Consider a case where the join produces a huge result set that gets grouped down to very few rows - it might be better to avoid materializing the join result, especially in a distributed setting.

pkg/sql/colexec/hash_group_joiner.go line 113 at r4 (raw file):

		return h.hj.ExportBuffered(input)
	}
	if !h.hjLeftSource.zeroBatchEnqueued {

[nit] not related to this PR, but should we make this check part of the spilling queue interface for convenience?

pkg/sql/colexec/colbuilder/execplan.go line 1360 at r4 (raw file):

						// TODO(yuzefovich): think through whether the hash
						// group-join needs to maintain the ordering.
						execinfrapb.Ordering{}, /* outputOrdering */

Will order-sensitive aggregate functions run into problems here?

pkg/sql/colexec/colexecjoin/crossjoiner.go line 290 at r3 (raw file):

func (c *crossJoiner) Reset(ctx context.Context) {
	if r, ok := c.InputOne.(colexecop.Resetter); ok {

[nit] would it be more convenient to add this logic to the TwoInputInitHelper?

This commit introduces a specification of a hash group-join processor which combines two operations (the hash join followed by the hash aggregation) into one when the join's equality columns are exactly the same as the aggregation's grouping columns. The hash join cannot have an ON expression and the PostProcessSpec on top of the hash join is only allowed to have a projection set (i.e. renders, limits, and offsets are prohibited). Currently, there are some other additional limitations (like only inner and outer joins are allowed), but these will be lifted in the future. Similarly, cross and merge joins as the first part of this "composite" processor will be added later. Furthermore, there is no optimizer support (neither for costing nor for exec building), so the plan tree is also oblivious of the hash group join. In particular, the regular `EXPLAIN` now does not represent reality, and `EXPLAIN ANALYZE` now duplicates the same merged stats across two nodes in the tree. This work can (mostly) wait since converting a hash join followed by a hash aggregation into a hash group-join is always beneficial, and the feature is currently experimental and not documented. The main contribution of this commit is in the physical planning. The planning code for the aggregators has been taught to replace the last stage of the hash joiners with the hash group-join stage. Effectively, we're stitching two processor specs into one while preserving the same distribution. This planning is hidden behind an experimental session variable. Additionally, this commit introduces naive row-by-row execution support which just uses the existing processors with only minor adjustments for `EXPLAIN (VEC)` output and the stats collection. Support in the vectorized engine will be added in a separate commit. Epic: None Release note: None

This commit extracts out the `joinHelper` from `colexecjoin` package as a two input init reset helper into `colexecop` package. The new struct will be used by the upcoming hash group joiner. Additionally, this commit extracts a couple of structs that hold arguments to the constructors of the hash joiner and the hash aggregator. It also extracts out a couple of helper functions for the vectorized planning code when constructing these two operators. These things make it easier to implement naive hash group-join in the follow-up commit. This commit is effectively a noop with the only "meaningful" change being not making copies of the input types when constructing join output types because the function constructing output types already makes a copy. Epic: None Release note: None

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @michae2)

-- commits line 109 at r4: