-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
dataframedocumentationImprove or add to documentationImprove or add to documentationgood first issueClearly described and easy to accomplish. Good for beginners to the project.Clearly described and easy to accomplish. Good for beginners to the project.
Description
I think it might be helpful to add docstrings for split_out and split_every in the dask groupby-aggregate API
We can probably add something
`split_out`: Number of output results in group-by like aggergations (defaults to 1)And use below for split_every
Lines 82 to 93 in 9968a49
| split_every: int >= 2 or dict(axis: int), optional | |
| Determines the depth of the recursive aggregation. If set to or more | |
| than the number of input chunks, the aggregation will be performed in | |
| two steps, one ``chunk`` function per input chunk and a single | |
| ``aggregate`` function at the end. If set to less than that, an | |
| intermediate ``combine`` function will be used, so that any one | |
| ``combine`` or ``aggregate`` function has no more than ``split_every`` | |
| inputs. The depth of the aggregation graph will be | |
| :math:`log_{split_every}(input chunks along reduced axes)`. Setting to | |
| a low value can reduce cache size and network transfers, at the cost of | |
| more CPU and a larger dask graph. | |
Happy to do a pr if split_every's docstring above is correct for group-by and we feel it's fine to add both in places where split_every/split_out is present.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
dataframedocumentationImprove or add to documentationImprove or add to documentationgood first issueClearly described and easy to accomplish. Good for beginners to the project.Clearly described and easy to accomplish. Good for beginners to the project.