dask.dataframe.Series.__array_*_ for better use with ufuncs. #582
dask.dataframe.Series.__array_*_ for better use with ufuncs. #582mrocklin merged 4 commits intodask:masterfrom
Conversation
Note that this does not do "the right thing" which would be to make np.ufunc(dask.dataframe) return a dask.dataframe instead of a numpy ufunc.
|
Interesting my tests pass on pandas 0.16.1 but not 0.16.2 |
dask/dataframe/core.py
Outdated
There was a problem hiding this comment.
I don't think you need __array_prepare__ in this case
|
We could also raise an exception when people call ufuncs on dask.arrays/dataframes. Would that be preferable to silently forcing computation? |
|
Where is the failure in pandas 0.16.1. I would be for raising a warning. |
|
Note that the underlying problem of #580 is that we don't have a way to say for any function |
|
OK. Merging this. If you feel like some adventure it'd still be good to get the dask.array ufuncs to work on dask.dataframe objects. I think that the best way to do this is likely some sort of protocol so that dask.dataframe (or dask.bag) logic doesn't find its way into dask.array |
dask.dataframe.Series.__array_*_ for better use with ufuncs.
This "fixes" #580
However it does not do the right thing. The right thing would be for
np.ufunc(dask.Series)to return adask.Seriesnow it forces computation and returns a pandas dataframe.We could eventually do that when it
__numpy_ufunc__is included in NumPy, which might happen in numpy 1.10 or 1.11