-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
What happened:
apply_gufunc returns a tuple with the wrong number of outputs for multi-output signatures.
What you expected to happen:
apply_gufunc would return the same number of items as the signature implied.
Minimal Complete Verifiable Example:
import dask.array as da
from dask.array import apply_gufunc
import numpy
>>> len(apply_gufunc(lambda a, b: (a.max(axis=1), b.max()), '(i,I),(I)->(i),()', da.ones((10, 10), chunks=-1), da.ones(10, chunks=-1), output_dtypes=['float64', 'float64']))
1This should return a tuple of 2 items (one array and one scalar), as the signature implies.
The issue is that meta_from_array encounters a "zero-size array to reduction operation" error. This error used to propagate (which would have taken us down a more useful codepath in apply_gufunc), but as of recently (#6736) it just selects the first element from args_meta:
Lines 158 to 161 in ac1bd05
| except ValueError as e: | |
| # min/max functions have no identity, attempt to use the first meta | |
| if "zero-size array to reduction operation" in str(e): | |
| meta = args_meta[0] |
@pentschev I wonder if this meta = args_meta[0] should be conditioned on len(args_meta) = 1? When there are multiple arguments, I think the only sensible behavior is to return None, since we don't know how the arguments will be combined. Even when there's only a single argument, I personally thing it's incorrect to assume in generality that the output type will be the same as the input type; an arbitrary user-defined function could do all sorts of other things before/after calling np.min.
Even if meta_from_array had raised an error, this still wouldn't have worked. This logic for generating meta from the output_dtypes
Lines 436 to 440 in ac1bd05
| if isinstance(output_dtypes, tuple): | |
| meta = tuple( | |
| meta_from_array(sample, dtype=odt) | |
| for ocd, odt in zip(output_coredimss, output_dtypes) | |
| ) |
output_dtypes is typically a list, not a tuple.
Overall, there is inconsistency throughout apply_gufunc as to whether nout, the type+length of output_dtypes, or the type+length of metas is the source of truth for how many items to return.
Anything else we need to know?:
Environment:
- Dask version: ac1bd05
- Python version: 3.8.8
- Operating System: macOS
- Install method (conda, pip, source): source