Perform all graph optimizations for all collection types

Discussion originally started here: https://dask.discourse.group/t/optimal-graph-optimization-when-mixing-dask-objects/3867

Short summary is that there are cases where a single graph consists of tasks from different types and they may not be optimally optimized. For example, a series of Array operations ending in a Delayed task will be optimized as a Delayed task and miss all the possible Array graph optimizations. This gets even more complicated when `da.store` is involved.

I don't have the larger context for the discussion in issues like #11458 to understand it and the help I got on discourse lead to filing this issue for more maintainers eyes. The main question is, can all collection optimizations always be performed during optimizations? In my real world use case, in a simple data case, I see a 35+% improvement in execution time by forcing Array optimizations on a graph that ends in a Delayed object.

Here is a basic example showing the issue:

```python
import dask
import dask.array as da

@dask.delayed(pure=True)
def delayed_func(arr1):
    return arr1 + 1

start = da.zeros((2, 2), chunks=1)
subarr1 = start[:1]
subarr2 = subarr1[:, :1]
delay_result = delayed_func(subarr2)

assert len(delay_result.dask.keys()) == 9  # zeros * 4 -> getitem * 3 -> finalize -> delayed_func
if True:
    # current dask
    assert len(dask.optimize(delay_result)[0].dask.keys()) == 5  # zeros * 1 -> getitem * 2 -> finalize -> delayed_func
else:
    # if Arrays were detected in Delayed graphs:
    assert len(dask.optimize(delay_result)[0].dask.keys()) == 2  # finalize-getitem-zeros -> delayed_func
# use array optimize instead of delayed
with dask.config.set(delayed_optimize=da.optimize):
    assert len(dask.optimize(delay_result)[0].dask.keys()) == 2  # finalize-getitem-zeros -> delayed_func
```

Related:

* https://github.com/dask/dask/issues/8380
* https://github.com/dask/dask/pull/9732
* Maybe https://github.com/dask/dask/issues/6732?
* https://github.com/dask/dask/issues/11458

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perform all graph optimizations for all collection types #11854

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Perform all graph optimizations for all collection types #11854

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions