Skip to content

Duplicated computations with mixed array/dataframe delayed output #7545

@chrisroat

Description

@chrisroat

What happened:

A computation from a delayed call returning two outputs is done twice when one output is array and one output is a dataframe. Interestingly, if both outputs are arrays, or both outputs are dataframes, the duplication does not occur.

If the two outputs (one array, one dataframe) are fed to another delayed call, a computation on the output of that 2nd delayed call does not cause in the extra computation.

In the code below, compute is printed twice under 'Individually'.

What you expected to happen:

compute should be printed once under 'Individually'.

Minimal Complete Verifiable Example:

import dask
import dask.array as da
import dask.dataframe as dd
import numpy as np
import pandas as pd

def calc():
    print('compute')
    return np.zeros(1), pd.DataFrame({'z': [1]})

res = dask.delayed(calc, nout=2)()
res0 = da.from_delayed(res[0], shape=(1,), dtype=np.float64)
res1 = dd.from_delayed(res[1], meta=[('z', np.int64)])

print('Individually')
_ = dask.compute(res0, res1, scheduler='synchronous')

def comb(a, b):
    return a, b
res_comb = dask.delayed(comb)(res0, res1)

print()
print('Combined')
_ = dask.compute(res_comb, scheduler='synchronous')
Individually
compute
compute

Combined
compute

Anything else we need to know?:

I do not think this is a meta/dtype/shape inference issue, as duplication isn't happening with the "Combined" computation.

Environment:

  • Dask version: 2021.04.0+16.ge83379d5
  • Python version: 3.8.8
  • Operating System: MacOS 11.2.3
  • Install method (conda, pip, source): conda

Metadata

Metadata

Assignees

No one assigned

    Labels

    arraydataframeneeds attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions