Skip to content

Dask array with non-highlevelgraph leads to KeyError #5850

@bmerry

Description

@bmerry

I have some older code that assigns a plain dict directly into array.dask, as a workaround for an old dask bug. If that's no longer supported then feel free to close this bug (although it would be nice to add one or two sanity checks). Since dask 2.8.0 it raises a KeyError inside the optimizer.

Here's a minimum reproducing example:

#!/usr/bin/env python3
import dask
import dask.array as da
import numpy as np

a = da.from_array(np.ones((4, 4), np.float64), chunks=(2, 4))
a.dask = dict(a.dask)
b = da.from_array(np.zeros((4, 4), np.float64), chunks=(2, 4))

x = a + b
da.compute(x)
Traceback (most recent call last):
  File "./bw.py", line 11, in <module>
    da.compute(x)
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 433, in compute
    dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 219, in collections_to_dsk
    [opt(dsk, keys, **kwargs) for opt, (dsk, keys) in groups.items()],
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 219, in <listcomp>
    [opt(dsk, keys, **kwargs) for opt, (dsk, keys) in groups.items()],
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/array/optimization.py", line 43, in optimize
    dsk = fuse_roots(dsk, keys=keys)
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/blockwise.py", line 822, in fuse_roots
    and not any(dependencies[dep] for dep in deps)  # no need to fuse if 0 or 1
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/blockwise.py", line 822, in <genexpr>
    and not any(dependencies[dep] for dep in deps)  # no need to fuse if 0 or 1
KeyError: 'array-d6b7f7b33e2b14cd720de87a517147e8'

(run against dask 2.10.0, Python 3.6)

From what I can tell, the trouble starts when HighLevelGraph.from_collections is given a dependency that doesn't use a HighLevelGraph, and it names the resulting layer just using id: here. However, the Blockwise for the addition keys its indices based on the array names. Then optimize_blockwise uses those indices keys to construct the new dependencies, here. That means the new dependencies don't line up with the layers. The final explosion happens in fuse_roots, but it could presumably happen anywhere that expected the dependencies to be consistent.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions