-
Notifications
You must be signed in to change notification settings - Fork 4.1k
physicalplan: add support for multi-stage execution of aggregate functions #58347
Description
Most aggregate functions support multi-stage distributed execution in which on every node we perform a local aggregation, then we route all the intermediate results to all of the nodes (according to the grouping columns), and then we perform a final aggregation that gives us the result. For some aggregate functions the decomposition into local and final stages is trivial - e.g. in order to calculate sum we need to perform local sum and then the final sum; for other functions it is a bit more involved - e.g. in order to compute avg we calculate partial sum and count locally, then we finalize sum and count at the final stage and we render the result as final_sum / final_count.
(I believe Postgres calls the multi-stage execution a partial mode.)
The decomposition of supported functions is defined here
| var DistAggregationTable = map[execinfrapb.AggregatorSpec_Func]DistAggregationInfo{ |
We are currently missing the following functions:
-
sqrdiff(I think it should be possible to decompose it) -
corr -
var_pop -
stddev_pop -
covar_pop -
covar_samp -
regr_intercept -
regr_r2 -
regr_slope -
regr_sxx -
regr_syy -
regr_sxy -
regr_count -
regr_avgx -
regr_avgy
avg and variance are good examples of how more complicated decompositions are handled.
cc @mneverov in case you're interested
Jira issue: CRDB-3407