-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
corediscussionDiscussing a topic with no specific actions yetDiscussing a topic with no specific actions yetneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.
Description
The split_every parameter is currently dishomogeneous across the various dask modules:
dask.array
- domain:
Noneor Integral ≥ 2 or mapping {axis: (Integral ≥ 2)} - If
None, fall back todask.config.get("split_every"). - The key appears neither in
dask/dask.yamlnor indask/dask_schema.yaml. - If the key does not appear in the dask config, fall back to 4 (hardcoded).
Falseis interpreted the same asNone.- float (e.g. 1e3) and np.float64 are rejected, incoherently with the
shapeandchunksparameters. - The docstring in
dask.array.reductions.reductionis very clear about what split_every does - except that it appears nowhere in the rendered documentation; https://docs.dask.org/en/latest/array-api.html almost never mentions split_every due to the docs being auto-generated from numpy. - The same docstring says "Omit to let dask heuristically decide a good default". This is incorrect; there's no heuristic; just a hard default.
dask.dataframe
- domain:
NoneorFalse(which means no recursion) or int ≥ 2. - If
None, fall back to 8 (hardcoded). The dask config is ignored. - float and np.float64 are rejected.
npartitionsaccepts them, butchunksizedoesn't.
dask.bag
Same as dask.dataframe, except that npartitions does not accept float / np.float64.
dask.graph_manipulation
(new in #7282)
Same as dask.dataframe, except that float and np.float64 are accepted by split_every and rounded down to the nearest int.
Proposed design
- dask.array to interpret
Falseas no recursion, coherently with the other modules - all modules to read the default from dask config, which will be set in
dask/dask.yamlas either 4 or 8 (please discuss) - no hardcoded defaults (coherently with the design of dask.optimize)
dask/dask-schema.yamlto define the domain asFalseor int/float ≥ 2. floats will be rounded down to the nearest int. All modules to accept as functoin parameterNone,False, or Number ≥ 2. Additionally, dask.array will accept, exclusively as a function parameter, {axis: (Number ≥ 2)}.- review sphinx documentation
Alternate design (not recommended)
- deprecate the top-level
split_everykey in dask config - new config keys
array.split_every,dataframe.split_every,bag.split_everyandgraph_manipulation.split_everywhich reflect the current mismatch in defaults and domain - no hardcoded defaults
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
corediscussionDiscussing a topic with no specific actions yetDiscussing a topic with no specific actions yetneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.It's been a while since this was pushed on. Needs attention from the owner or a maintainer.