Skip to content

dask.dataframe.Series.__array_*_ for better use with ufuncs. #582

Merged
mrocklin merged 4 commits intodask:masterfrom
cowlicks:gh-580
Aug 14, 2015
Merged

dask.dataframe.Series.__array_*_ for better use with ufuncs. #582
mrocklin merged 4 commits intodask:masterfrom
cowlicks:gh-580

Conversation

@cowlicks
Copy link
Member

This "fixes" #580

However it does not do the right thing. The right thing would be for np.ufunc(dask.Series) to return a dask.Series now it forces computation and returns a pandas dataframe.

We could eventually do that when it __numpy_ufunc__ is included in NumPy, which might happen in numpy 1.10 or 1.11

Note that this does not do "the right thing" which would be to make
np.ufunc(dask.dataframe) return a dask.dataframe instead of a numpy
ufunc.
@cowlicks cowlicks changed the title Gh 580 dask.dataframe.Series.__array_*_ for better use with ufuncs. Aug 13, 2015
@cowlicks
Copy link
Member Author

Interesting my tests pass on pandas 0.16.1 but not 0.16.2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need __array_prepare__ in this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't. Thanks.

@cowlicks
Copy link
Member Author

We could also raise an exception when people call ufuncs on dask.arrays/dataframes. Would that be preferable to silently forcing computation?

@mrocklin
Copy link
Member

Where is the failure in pandas 0.16.1.

I would be for raising a warning.

@mrocklin
Copy link
Member

Note that the underlying problem of #580 is that we don't have a way to say

log(myseries)

for any function log. Arguably the dask.array "ufuncs" should operate smoothly on dask.dataframe Series objects. That's probably the full solution in the near term.

@cowlicks
Copy link
Member Author

@mrocklin the in 0.16.1 the way I was creating the output series here worked without the name kwarg because the name was not being checked in assert_series_equal. In 0.16.2 assert_series_equal started to check names.

@mrocklin
Copy link
Member

OK. Merging this. If you feel like some adventure it'd still be good to get the dask.array ufuncs to work on dask.dataframe objects. I think that the best way to do this is likely some sort of protocol so that dask.dataframe (or dask.bag) logic doesn't find its way into dask.array

mrocklin added a commit that referenced this pull request Aug 14, 2015
dask.dataframe.Series.__array_*_ for better use with ufuncs.
@mrocklin mrocklin merged commit 3cf5053 into dask:master Aug 14, 2015
@cowlicks cowlicks deleted the gh-580 branch August 14, 2015 15:07
@ryan-williams ryan-williams mentioned this pull request Feb 16, 2021
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants