Skip to content

Configure Optimization Pipeline #6083

@mrocklin

Description

@mrocklin

Currently the optimization pipeline for each collection is hardcoded into functions like dask.array.optimization.optimization. In some applications people want to turn off various parts of this pipeline. For example in napari/napari#718 the Napari folks need to turn off task fusion so that they can get opportunistic caching to work well. This has come up a few times in other cases as well.

It would be nice for users to be able to turn off and on various parts of the optimization pipeline (or even add parts) with some sort of configuration. This might be some yaml config

array:
  optimization:
  - optimize_blockwise
  - fuse_roots
  - cull
  - hold_keys
  - fuse
  - inline
  - optimize_slices

Or it might be by registering some custom module with hand-specified Python code

array:
  optimization: my_module.optimize

or in python

dask.config.set(array__optimization=my_optimization_func)

My guess is that the choice here depends on how easy it is to create a generic optimization pipeline and still keep things efficient. Currently we're careful about avoiding recomputing dependencies (this can slow things down) but the optimization functions themselves aren't currently that uniform (some do and don't modify dependencies) so maybe we need to establish a more uniform interface so that things can be chained together more robustly.

cc @jcrist and @eriknw for feedback

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions