Return da.Array rather than dd.Series for non-ufunc elementwise functions on dd.Series#8558
Return da.Array rather than dd.Series for non-ufunc elementwise functions on dd.Series#8558
da.Array rather than dd.Series for non-ufunc elementwise functions on dd.Series#8558Conversation
|
@jorisvandenbossche or @rjzamora would either of you be able to sanity check this? |
|
Have we tested this on older versions of NumPy (like 1.18)? |
That's the version of numpy installed on the python 3.7 tests https://github.com/dask/dask/runs/4781480587?check_suite_focus=true for instance |
Do you mean that |
|
In fact this is how it did work in the past ( #1669 ), which interestingly is the same PR that adds Given how old that PR is and how little that code has changed, I wonder if something else is causing issues here. |
That is what I mean, it seems like with this PR we come a lot closer to matching the numpy + pandas object behavior: import numpy as np
import pandas as pd
import dask.array as da
import dask.dataframe as dd
s = pd.Series(np.random.randn(100))
ds = dd.from_pandas(s, 3)
da.sin(ds) # dd.Series
np.sin(s) # pd.Series
da.real(ds) # da.Array
np.real(s) # np.Array |
|
The
Isn't it the other way around, i.e. further from numpy/pandas' behaviour? With the numpy / pandas combo, calling a numpy ufunc on a pandas Series returns a pandas Series ( While you mention that with this PR it will start returning a |
|
Sorry @jorisvandenbossche I think I'm missing something, can you provide an example that gives surprising behavior? The example that I gave above seems to match the numpy on pandas behavior. |
|
Ah, sorry, I misinterpreted your code snippet in #8558 (comment). I wasn't looking carefully enough to notice the difference between So just to be very explicit, the following wasn't actually a correct statement:
So for ufuncs the output did not change (according to the code snippet in #8558 (comment), the ufunc So yes, that indeed seems to bring the behaviour closer to numpy/pandas. |
|
Ah sorry for the confusion @jorisvandenbossche I don't know why I thought those were all ufuncs 🙃 |
|
@jsignell Does this PR need a review? Do you have a suggestion for someone to do that? |
|
This PR needs a decision more than it needs review. I'll add a deprecation cycle and then it'll be ready for review. |
48150cd to
bb40230
Compare
|
grrr it's hard to deprecate this in a way that is at all usable. I'll think some more about it unless anyone else has ideas. |
|
@ncclementi in case you are looking into "test upstream" failures this resolves 12 or so, but it is kind of hard to implement in a backwards compatible way. |
|
@jsignell Thanks for the heads up, it looks like there are not many left. I think if we find a solution to #8722 (comment) that should solve all upstream problems. |
bb40230 to
bb3bd23
Compare
|
I talked with @jcrist and we agreed that since the previous behavior was really more of a bug than anything else, we think it's ok to merge this without a deprecation. |
__array_wrap__da.Array rather than dd.Series for non-ufunc elementwise functions on dd.Series
|
I'll try kicking off ci later when github is healthier |
|
I'm planning on merging this when tests pass. |
|
Thanks @jsignell |
This seems to work fine, but it does change the output type of ufuncs. I think this is better than it was originally and this code is very old, but I'd appreciate a second set of eyes. Maybe someone from @dask/array