Configure Optimization Pipeline

Currently the optimization pipeline for each collection is hardcoded into functions like `dask.array.optimization.optimization`.  In some applications people want to turn off various parts of this pipeline.  For example in https://github.com/napari/napari/issues/718 the Napari folks need to turn off task fusion so that they can get opportunistic caching to work well.  This has come up a few times in other cases as well.

It would be nice for users to be able to turn off and on various parts of the optimization pipeline (or even add parts) with some sort of configuration.  This might be some yaml config

```yaml
array:
  optimization:
  - optimize_blockwise
  - fuse_roots
  - cull
  - hold_keys
  - fuse
  - inline
  - optimize_slices
```

Or it might be by registering some custom module with hand-specified Python code

```yaml
array:
  optimization: my_module.optimize
```

or in python

```python
dask.config.set(array__optimization=my_optimization_func)
```

My guess is that the choice here depends on how easy it is to create a generic optimization pipeline and still keep things efficient.  Currently we're careful about avoiding recomputing dependencies (this can slow things down) but the optimization functions themselves aren't currently that uniform (some do and don't modify dependencies) so maybe we need to establish a more uniform interface so that things can be chained together more robustly.

cc @jcrist and @eriknw for feedback

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configure Optimization Pipeline #6083

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Configure Optimization Pipeline #6083

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions