-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
I have some older code that assigns a plain dict directly into array.dask, as a workaround for an old dask bug. If that's no longer supported then feel free to close this bug (although it would be nice to add one or two sanity checks). Since dask 2.8.0 it raises a KeyError inside the optimizer.
Here's a minimum reproducing example:
#!/usr/bin/env python3
import dask
import dask.array as da
import numpy as np
a = da.from_array(np.ones((4, 4), np.float64), chunks=(2, 4))
a.dask = dict(a.dask)
b = da.from_array(np.zeros((4, 4), np.float64), chunks=(2, 4))
x = a + b
da.compute(x)Traceback (most recent call last):
File "./bw.py", line 11, in <module>
da.compute(x)
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 433, in compute
dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 219, in collections_to_dsk
[opt(dsk, keys, **kwargs) for opt, (dsk, keys) in groups.items()],
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/base.py", line 219, in <listcomp>
[opt(dsk, keys, **kwargs) for opt, (dsk, keys) in groups.items()],
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/array/optimization.py", line 43, in optimize
dsk = fuse_roots(dsk, keys=keys)
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/blockwise.py", line 822, in fuse_roots
and not any(dependencies[dep] for dep in deps) # no need to fuse if 0 or 1
File "/home/bmerry/work/sdp/env3/lib/python3.6/site-packages/dask/blockwise.py", line 822, in <genexpr>
and not any(dependencies[dep] for dep in deps) # no need to fuse if 0 or 1
KeyError: 'array-d6b7f7b33e2b14cd720de87a517147e8'
(run against dask 2.10.0, Python 3.6)
From what I can tell, the trouble starts when HighLevelGraph.from_collections is given a dependency that doesn't use a HighLevelGraph, and it names the resulting layer just using id: here. However, the Blockwise for the addition keys its indices based on the array names. Then optimize_blockwise uses those indices keys to construct the new dependencies, here. That means the new dependencies don't line up with the layers. The final explosion happens in fuse_roots, but it could presumably happen anywhere that expected the dependencies to be consistent.