Implement shuffle-based-groupby for simpler aggregations

In #9406 @rjzamora did some nice work in implementing a shuffle-based groupby/agg operation for dask dataframe. However, it is currently only implemented for `groupby(...).agg({...})`, and some of the simpler aggregation functions which might benefit from it are un-implemented. This includes `groupby(...).min`, `groupby(...).sum and similar:

https://github.com/dask/dask/blob/024df342f77ec6d7b41227b6619461732825c7c6/dask/dataframe/groupby.py#L1239-L1250

So, the following snippet would use a shuffle-based approach:
```python
import dask.datasets

ddf = dask.datasets.timeseries()
ddf.groupby("id").agg({"x": "mean"}, split_out=2)
```

While the following semantically identical one would not:

```python
import dask.datasets

ddf = dask.datasets.timeseries()
ddf.groupby("id").x.mean(split_out=2)
```


It should be relatively straightforward to retrofit the simpler aggregate functions to also take advantage of the new shuffle-base approach where appropriate.


	def _aca_agg(
	self,
	token,
	func,
	aggfunc=None,
	meta=None,
	split_every=None,
	split_out=1,
	chunk_kwargs=None,
	aggregate_kwargs=None,
	):
	if aggfunc is None:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement shuffle-based-groupby for simpler aggregations #9487

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Implement shuffle-based-groupby for simpler aggregations #9487

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions