Skip to content

Perform all graph optimizations for all collection types #11854

@djhoese

Description

@djhoese

Discussion originally started here: https://dask.discourse.group/t/optimal-graph-optimization-when-mixing-dask-objects/3867

Short summary is that there are cases where a single graph consists of tasks from different types and they may not be optimally optimized. For example, a series of Array operations ending in a Delayed task will be optimized as a Delayed task and miss all the possible Array graph optimizations. This gets even more complicated when da.store is involved.

I don't have the larger context for the discussion in issues like #11458 to understand it and the help I got on discourse lead to filing this issue for more maintainers eyes. The main question is, can all collection optimizations always be performed during optimizations? In my real world use case, in a simple data case, I see a 35+% improvement in execution time by forcing Array optimizations on a graph that ends in a Delayed object.

Here is a basic example showing the issue:

import dask
import dask.array as da

@dask.delayed(pure=True)
def delayed_func(arr1):
    return arr1 + 1

start = da.zeros((2, 2), chunks=1)
subarr1 = start[:1]
subarr2 = subarr1[:, :1]
delay_result = delayed_func(subarr2)

assert len(delay_result.dask.keys()) == 9  # zeros * 4 -> getitem * 3 -> finalize -> delayed_func
if True:
    # current dask
    assert len(dask.optimize(delay_result)[0].dask.keys()) == 5  # zeros * 1 -> getitem * 2 -> finalize -> delayed_func
else:
    # if Arrays were detected in Delayed graphs:
    assert len(dask.optimize(delay_result)[0].dask.keys()) == 2  # finalize-getitem-zeros -> delayed_func
# use array optimize instead of delayed
with dask.config.set(delayed_optimize=da.optimize):
    assert len(dask.optimize(delay_result)[0].dask.keys()) == 2  # finalize-getitem-zeros -> delayed_func

Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.needs triageNeeds a response from a contributor

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions