Skip to content

apply_gufunc bugs with meta inference #7668

@gjoseph92

Description

@gjoseph92

What happened:
apply_gufunc returns a tuple with the wrong number of outputs for multi-output signatures.

What you expected to happen:
apply_gufunc would return the same number of items as the signature implied.

Minimal Complete Verifiable Example:

import dask.array as da
from dask.array import apply_gufunc
import numpy

>>> len(apply_gufunc(lambda a, b: (a.max(axis=1), b.max()), '(i,I),(I)->(i),()', da.ones((10, 10), chunks=-1), da.ones(10, chunks=-1), output_dtypes=['float64', 'float64']))
1

This should return a tuple of 2 items (one array and one scalar), as the signature implies.

The issue is that meta_from_array encounters a "zero-size array to reduction operation" error. This error used to propagate (which would have taken us down a more useful codepath in apply_gufunc), but as of recently (#6736) it just selects the first element from args_meta:

dask/dask/array/utils.py

Lines 158 to 161 in ac1bd05

except ValueError as e:
# min/max functions have no identity, attempt to use the first meta
if "zero-size array to reduction operation" in str(e):
meta = args_meta[0]

@pentschev I wonder if this meta = args_meta[0] should be conditioned on len(args_meta) = 1? When there are multiple arguments, I think the only sensible behavior is to return None, since we don't know how the arguments will be combined. Even when there's only a single argument, I personally thing it's incorrect to assume in generality that the output type will be the same as the input type; an arbitrary user-defined function could do all sorts of other things before/after calling np.min.

Even if meta_from_array had raised an error, this still wouldn't have worked. This logic for generating meta from the output_dtypes

dask/dask/array/gufunc.py

Lines 436 to 440 in ac1bd05

if isinstance(output_dtypes, tuple):
meta = tuple(
meta_from_array(sample, dtype=odt)
for ocd, odt in zip(output_coredimss, output_dtypes)
)
will rarely/never be reached, because output_dtypes is typically a list, not a tuple.

Overall, there is inconsistency throughout apply_gufunc as to whether nout, the type+length of output_dtypes, or the type+length of metas is the source of truth for how many items to return.

Anything else we need to know?:

Environment:

  • Dask version: ac1bd05
  • Python version: 3.8.8
  • Operating System: macOS
  • Install method (conda, pip, source): source

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions