ENH: da.ufuncs now supports DataFrame/Series#1669
Conversation
dask/array/ufunc.py
Outdated
| def wrapped(x, *args, **kwargs): | ||
| if hasattr(x, '_elemwise_'): | ||
| return x._elemwise_(numpy_ufunc, x, *args, **kwargs) | ||
| # return x._elemwise_(numpy_ufunc, x, *args, **kwargs, |
There was a problem hiding this comment.
Current test all succeeded even if it ignores wrap_elemwise args (like logical_and = wrap_elemwise(np.logical_and, dtype='bool')). Can I remove it?
There was a problem hiding this comment.
Ah, perhaps we've gotten better at determining dtypes since when this code was originally written.
|
cc @shoyer this seems like something you might find interesting |
|
This all looks pretty good to me. +1 |
73799c4 to
dc93c1c
Compare
|
This has fallen behind master. I would like to make sure that this gets in, even if it does not support all of the ufuncs. @sinhrks if I were to rebase and merge this in is there anything that would be seriously lacking? While I would like to have |
6e1e617 to
24e320b
Compare
|
Two questions:
|
|
|
Merging soon if no further comments |
dask/array/core.py
Outdated
| # We don't inspect the values of 0d dask arrays, because these could | ||
| # hold potentially very expensive calculations. | ||
| vals = [np.empty((1,) * a.ndim, dtype=a.dtype) | ||
| vals = [np.empty((1, ) * a.ndim, dtype=a.dtype) |
There was a problem hiding this comment.
nit: I don't think adding this extra white-space is an improvement
| return x.__array_wrap__(numpy_ufunc(x, *args, **kwargs)) | ||
|
|
||
|
|
||
| def wrap_elemwise(numpy_ufunc, array_wrap=False): |
There was a problem hiding this comment.
I don't think this will pickle properly, which will likely be an annoyance later (see pydata/xarray#901).
There was a problem hiding this comment.
What the soln should be, make a callable class which wraps ufuncs like methodcaller?
There was a problem hiding this comment.
We do a fair amount of this. I agree that it's not optimal but at least these are dask.array functions, rather than functions that we're trying to send over the network. Actually, I appreciate that wrapped calls Array._elemwise with the only well defined easily pickleable functions (__array_wrap__ and the provided numpy_ufunc).
I think that it's nice to avoid dynamic functions when convenient but that, in this case, it seems like the right choice. Other alternatives seem more complex. If we really wanted to ensure picklability across the project then we might consider something like toolz.curry.
Also, for what it's worth, I've started to think of cloudpickle as the default serialization solution for Python functions.
| def wrapped(x, *args, **kwargs): | ||
| if hasattr(x, '_elemwise'): | ||
| if array_wrap: | ||
| return x._elemwise(__array_wrap__, numpy_ufunc, |
There was a problem hiding this comment.
is __array_wrap__ defined for dask.array? Why exactly do you need this, only for some ufuncs?
There was a problem hiding this comment.
This is for letting some ufuncs which not calls __array_wraps__ handles DataFrame.
np.isreal(pd.Series([1, 2, 3]))
# array([ True, True, True], dtype=bool)
|
I'm going to try to release in the next two days. I may take on this PR tomorrow if you don't mind @sinhrks . |
|
@mrocklin Thx for the support. Pls fix it if you already have an idea in your mind:) |
|
To me it looks like all raised issues are handled. The serialization issue is long-standing and something we would need to fix throughout the codebase in many places. The |
On current master,
da.ufuncsdoesn't work withdask.dataframeinstances.On current master:
After the PR:
Now these return dask instances.
To make impl simple, I'm adding
_elemwiseproperty toArray/_Framewhich returns function used for element-wise op.__array__interface toDataFrame__array_wrap__with non-defaultIndex