Dask array with non-highlevelgraph leads to KeyError

I have some [older code](https://github.com/ska-sa/katsdpcal/blob/5caa806649de33324977727a5ca1b5b76d61ba08/katsdpcal/scan.py#L86-L92) that assigns a plain dict directly into `array.dask`, as a workaround for an old dask bug. If that's no longer supported then feel free to close this bug (although it would be nice to add one or two sanity checks). Since dask 2.8.0 it raises a KeyError inside the optimizer.

Here's a minimum reproducing example:
```python
#!/usr/bin/env python3
import dask
import dask.array as da
import numpy as np

a = da.from_array(np.ones((4, 4), np.float64), chunks=(2, 4))
a.dask = dict(a.dask)
b = da.from_array(np.zeros((4, 4), np.float64), chunks=(2, 4))

x = a + b
da.compute(x)
```

```
Traceback (most recent call last):
  File "./bw.py", line 11, in <module>
    da.compute(x)
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 433, in compute
    dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 219, in collections_to_dsk
    [opt(dsk, keys, **kwargs) for opt, (dsk, keys) in groups.items()],
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 219, in <listcomp>
    [opt(dsk, keys, **kwargs) for opt, (dsk, keys) in groups.items()],
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/array/optimization.py", line 43, in optimize
    dsk = fuse_roots(dsk, keys=keys)
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/blockwise.py", line 822, in fuse_roots
    and not any(dependencies[dep] for dep in deps)  # no need to fuse if 0 or 1
  File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/blockwise.py", line 822, in <genexpr>
    and not any(dependencies[dep] for dep in deps)  # no need to fuse if 0 or 1
KeyError: 'array-d6b7f7b33e2b14cd720de87a517147e8'
```
(run against dask 2.10.0, Python 3.6)

From what I can tell, the trouble starts when HighLevelGraph.from_collections is given a dependency that doesn't use a HighLevelGraph, and it names the resulting layer just using `id`: [here](https://github.com/dask/dask/blob/0274a5c6db618e7af8444562149ecc5774ae22a3/dask/highlevelgraph.py#L134). However, the Blockwise for the addition keys its `indices` based on the array names. Then `optimize_blockwise` uses those `indices` keys to construct the new dependencies, [here](https://github.com/dask/dask/blob/0274a5c6db618e7af8444562149ecc5774ae22a3/dask/blockwise.py#L552). That means the new dependencies don't line up with the layers. The final explosion happens in `fuse_roots`, but it could presumably happen anywhere that expected the dependencies to be consistent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dask array with non-highlevelgraph leads to KeyError #5850

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Dask array with non-highlevelgraph leads to KeyError #5850

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions