colexec: fix hash aggregator when spilling to disk#63372
Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom Apr 9, 2021
Merged
colexec: fix hash aggregator when spilling to disk#63372craig[bot] merged 3 commits intocockroachdb:masterfrom
craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
Member
rytaft
approved these changes
Apr 9, 2021
Collaborator
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 16 of 16 files at r1, 3 of 3 files at r2, 11 of 11 files at r3.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)
This commit introduces nicer aliases for the specification of the aggregate functions and uses the aliases throughout the code base. Release note: None
This commit is only a test change. It cleans up the aggregator test cases in the following ways: - removing some of the defaults in favor of explicit setting (easier to read each test case in isolation) - reordering the fields to have uniform assignment order - inserting any_not_null aggregates for the cases when the input is ordered (this will be needed by the follow up commit that will enforce a particular order on the output). This change simulates how specs are created in the production. - removing a couple of impossible in production test cases (when some columns are unused). Release note: None
In some cases the aggregation is expected to maintain the required ordering in order to eliminate an explicit sort afterwards. It is always the case that the required ordering is a prefix of ordered grouping columns. With the introduction of disk spilling for the vectorized hash aggregator in 21.1 release the ordering was no longer maintained if the spilling occurs. In all previous cases (row-by-row processors and in-memory columnar operator) the ordering was maintained by construction, but with `hashBasedPartitioner` the ordering can be arbitrary. In order to fix this issue we now do what we did for the external distinct - we plan an external sort on top of the external hash aggregator to restore the required ordering. Note that this will only kick in if the spilling to disk occurred. This required changes to the AggregatorSpec to propagate the required output ordering. Release note (bug fix): In 21.1 alpha and beta releases CockroachDB could return the output in an incorrect order if the query containing hash aggregation was executed via the vectorized engine and spilling to temporary storage was required, in some cases.
Member
Author
|
Thanks for a quick review! bors r+ |
Contributor
|
This PR was included in a batch that was canceled, it will be automatically retried |
Contributor
|
Build succeeded: |
Member
Member
Author
|
The third commit is responsible for noticeable regression in a micro-benchmark: |
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
execinfrapb: introduce aliases for agg funcs and use everywhere
This commit introduces nicer aliases for the specification of the
aggregate functions and uses the aliases throughout the code base.
Release note: None
colexec: clean up aggregator test cases
This commit is only a test change. It cleans up the aggregator test
cases in the following ways:
read each test case in isolation)
ordered (this will be needed by the follow up commit that will enforce
a particular order on the output). This change simulates how specs are
created in the production.
columns are unused).
Release note: None
colexec: fix hash aggregator when spilling to disk
In some cases the aggregation is expected to maintain the required
ordering in order to eliminate an explicit sort afterwards. It is always
the case that the required ordering is a prefix of ordered grouping
columns. With the introduction of disk spilling for the vectorized hash
aggregator in 21.1 release the ordering was no longer maintained if the
spilling occurs. In all previous cases (row-by-row processors and
in-memory columnar operator) the ordering was maintained by
construction, but with
hashBasedPartitionerthe ordering can bearbitrary.
In order to fix this issue we now do what we did for the external
distinct - we plan an external sort on top of the external hash
aggregator to restore the required ordering. Note that this will only
kick in if the spilling to disk occurred. This required changes to the
AggregatorSpec to propagate the required output ordering.
Fixes: #63159.
Release note (bug fix): In 21.1 alpha and beta releases CockroachDB
could return the output in an incorrect order if the query containing
hash aggregation was executed via the vectorized engine and spilling to
temporary storage was required, in some cases.