-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Okay! This is a weird one, and I'm still investigating what is going on, so bear with me. I was working on #7417 and ran into some surprising behavior with blockwise optimization. It appears to happen (at least) when
- a collection with the same name is used more than once in a high level graph,
- when there are at least two blockwise layers that might be combined, and
- when there are multiple partitions.
A minimal example: the following succeeds
import dask.array as da
import numpy as np
x = np.array([[1,1],[2,2]])
xx = da.from_array(x, chunks=1) # create a multi-partition dask array
# Two blockwise operations in a row!
xx = xx * 2
z= da.matmul(xx,xx)
# optimize_graph=False turns off other optimizations, but I've manually verified
# that optimize_blockwise it the culprit
z.compute(optimize_graph=False)
# produces the right answer
# [[12,12], [24,24]]But if I turn on graph optimization (and optimize_blockwise does seem to be the important one), I get the wrong answer:
import dask.array as da
import numpy as np
x = np.array([[1,1],[2,2]])
xx = da.from_array(x, chunks=1) # create a multi-partition dask array
# Two blockwise operations in a row!
xx = xx * 2
z= da.matmul(xx,xx)
z.compute(optimize_graph=True)
# produces the wrong answer
# [[8,8], [32,32]]It does seem to be related to xx appearing twice. If I create a second array with a different name, but which is otherwise identical, the snippet succeeds. I'm a bit at a loss as to what might be happening right now, so this is a bit of an in-progress investigation that I'm putting out in case someone with fresher eyes has an idea. @gjoseph92 especially might be interested.
Environment:
- Dask version:
main - Python version: `3.93
- Operating System: ubuntu
- Install method (conda, pip, source): source