Skip to content

Blockwise optimization bug when collections appear multiple times #8535

@ian-r-rose

Description

@ian-r-rose

Okay! This is a weird one, and I'm still investigating what is going on, so bear with me. I was working on #7417 and ran into some surprising behavior with blockwise optimization. It appears to happen (at least) when

  1. a collection with the same name is used more than once in a high level graph,
  2. when there are at least two blockwise layers that might be combined, and
  3. when there are multiple partitions.

A minimal example: the following succeeds

import dask.array as da
import numpy as np

x = np.array([[1,1],[2,2]])
xx = da.from_array(x, chunks=1)  # create a multi-partition dask array

# Two blockwise operations in a row!
xx = xx * 2
z= da.matmul(xx,xx)

# optimize_graph=False turns off other optimizations, but I've manually verified
# that optimize_blockwise it the culprit
z.compute(optimize_graph=False) 

# produces the right answer
# [[12,12], [24,24]]

But if I turn on graph optimization (and optimize_blockwise does seem to be the important one), I get the wrong answer:

import dask.array as da
import numpy as np

x = np.array([[1,1],[2,2]])
xx = da.from_array(x, chunks=1)  # create a multi-partition dask array

# Two blockwise operations in a row!
xx = xx * 2
z= da.matmul(xx,xx)

z.compute(optimize_graph=True) 

# produces the wrong answer
# [[8,8], [32,32]]

It does seem to be related to xx appearing twice. If I create a second array with a different name, but which is otherwise identical, the snippet succeeds. I'm a bit at a loss as to what might be happening right now, so this is a bit of an in-progress investigation that I'm putting out in case someone with fresher eyes has an idea. @gjoseph92 especially might be interested.

Environment:

  • Dask version: main
  • Python version: `3.93
  • Operating System: ubuntu
  • Install method (conda, pip, source): source

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions