Skip to content

physicalplan: add support for multi-stage execution of aggregate functions #58347

@yuzefovich

Description

@yuzefovich

Most aggregate functions support multi-stage distributed execution in which on every node we perform a local aggregation, then we route all the intermediate results to all of the nodes (according to the grouping columns), and then we perform a final aggregation that gives us the result. For some aggregate functions the decomposition into local and final stages is trivial - e.g. in order to calculate sum we need to perform local sum and then the final sum; for other functions it is a bit more involved - e.g. in order to compute avg we calculate partial sum and count locally, then we finalize sum and count at the final stage and we render the result as final_sum / final_count.

(I believe Postgres calls the multi-stage execution a partial mode.)

The decomposition of supported functions is defined here

var DistAggregationTable = map[execinfrapb.AggregatorSpec_Func]DistAggregationInfo{

We are currently missing the following functions:

  • sqrdiff (I think it should be possible to decompose it)
  • corr
  • var_pop
  • stddev_pop
  • covar_pop
  • covar_samp
  • regr_intercept
  • regr_r2
  • regr_slope
  • regr_sxx
  • regr_syy
  • regr_sxy
  • regr_count
  • regr_avgx
  • regr_avgy

avg and variance are good examples of how more complicated decompositions are handled.

cc @mneverov in case you're interested

Jira issue: CRDB-3407

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-sql-queriesSQL Queries Teamgood first issuemeta-issueContains a list of several other issues.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions