Most aggregate functions support multi-stage distributed execution in which on every node we perform a local aggregation, then we route all the intermediate results to all of the nodes (according to the grouping columns), and then we perform a final aggregation that gives us the result. For some aggregate functions the decomposition into local and final stages is trivial - e.g. in order to calculate sum we need to perform local sum and then the final sum; for other functions it is a bit more involved - e.g. in order to compute avg we calculate partial sum and count locally, then we finalize sum and count at the final stage and we render the result as final_sum / final_count.
(I believe Postgres calls the multi-stage execution a partial mode.)
The decomposition of supported functions is defined here
|
var DistAggregationTable = map[execinfrapb.AggregatorSpec_Func]DistAggregationInfo{ |
We are currently missing the following functions:
avg and variance are good examples of how more complicated decompositions are handled.
cc @mneverov in case you're interested
Jira issue: CRDB-3407
Most aggregate functions support multi-stage distributed execution in which on every node we perform a local aggregation, then we route all the intermediate results to all of the nodes (according to the grouping columns), and then we perform a final aggregation that gives us the result. For some aggregate functions the decomposition into local and final stages is trivial - e.g. in order to calculate
sumwe need to perform localsumand then the finalsum; for other functions it is a bit more involved - e.g. in order to computeavgwe calculate partialsumandcountlocally, then we finalizesumandcountat the final stage and we render the result asfinal_sum / final_count.(I believe Postgres calls the multi-stage execution a partial mode.)
The decomposition of supported functions is defined here
cockroach/pkg/sql/physicalplan/aggregator_funcs.go
Line 87 in 9f678e7
We are currently missing the following functions:
sqrdiff(I think it should be possible to decompose it)corrvar_popstddev_popcovar_popcovar_sampregr_interceptregr_r2regr_sloperegr_sxxregr_syyregr_sxyregr_countregr_avgxregr_avgyavgandvarianceare good examples of how more complicated decompositions are handled.cc @mneverov in case you're interested
Jira issue: CRDB-3407