-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Currently the optimization pipeline for each collection is hardcoded into functions like dask.array.optimization.optimization. In some applications people want to turn off various parts of this pipeline. For example in napari/napari#718 the Napari folks need to turn off task fusion so that they can get opportunistic caching to work well. This has come up a few times in other cases as well.
It would be nice for users to be able to turn off and on various parts of the optimization pipeline (or even add parts) with some sort of configuration. This might be some yaml config
array:
optimization:
- optimize_blockwise
- fuse_roots
- cull
- hold_keys
- fuse
- inline
- optimize_slicesOr it might be by registering some custom module with hand-specified Python code
array:
optimization: my_module.optimizeor in python
dask.config.set(array__optimization=my_optimization_func)My guess is that the choice here depends on how easy it is to create a generic optimization pipeline and still keep things efficient. Currently we're careful about avoiding recomputing dependencies (this can slow things down) but the optimization functions themselves aren't currently that uniform (some do and don't modify dependencies) so maybe we need to establish a more uniform interface so that things can be chained together more robustly.