Skip to content

[DOC] Add docstring for split_out and split_every in dask groupby-aggregate API #6386

@VibhuJawa

Description

@VibhuJawa

I think it might be helpful to add docstrings for split_out and split_every in the dask groupby-aggregate API

We can probably add something

`split_out`:  Number of output results in group-by like aggergations (defaults to 1)

And use below for split_every

split_every: int >= 2 or dict(axis: int), optional
Determines the depth of the recursive aggregation. If set to or more
than the number of input chunks, the aggregation will be performed in
two steps, one ``chunk`` function per input chunk and a single
``aggregate`` function at the end. If set to less than that, an
intermediate ``combine`` function will be used, so that any one
``combine`` or ``aggregate`` function has no more than ``split_every``
inputs. The depth of the aggregation graph will be
:math:`log_{split_every}(input chunks along reduced axes)`. Setting to
a low value can reduce cache size and network transfers, at the cost of
more CPU and a larger dask graph.

Happy to do a pr if split_every's docstring above is correct for group-by and we feel it's fine to add both in places where split_every/split_out is present.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dataframedocumentationImprove or add to documentationgood first issueClearly described and easy to accomplish. Good for beginners to the project.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions