Support new axis=None behavior in pandas 2.0 for certain reductions#9867
Support new axis=None behavior in pandas 2.0 for certain reductions#9867jrbourbeau merged 10 commits intodask:mainfrom
axis=None behavior in pandas 2.0 for certain reductions#9867Conversation
…ith_reduction-fixup
axis=None behavior in pandas 2.0 for certain reductionsaxis=None behavior in pandas 2.0 for certain reductions
|
Alright, I've decided to punt on |
and |
|
We don't actually have a |
j-bennet
left a comment
There was a problem hiding this comment.
Looks good! Left some minor comments.
dask/dataframe/core.py
Outdated
| enforce_metadata=False, | ||
| parent_meta=self._meta, | ||
| ) | ||
| if isinstance(result, DataFrame): |
There was a problem hiding this comment.
In _reduction_agg, you do this for both DataFrame and Series, but here just DataFrame. Is this enough?
There was a problem hiding this comment.
It should be okay as the axis=None case (where a Scalar is returned, not a Series) is handled in the if-block above. _reduction_agg is handling both cases, so needs to be a bit more flexible. That said, I included a commit that should make the logic (hopefully) easier to reason about
| Further, this method currently does not support filtering out NaN | ||
| values, which is again a difference to Pandas. | ||
| """ | ||
| if PANDAS_GT_200 and axis is None: |
There was a problem hiding this comment.
Could be a decorator, since you do this in two places.
There was a problem hiding this comment.
Agreed. Though in this particular case I think the cognitive load on readers is actually smaller if we just duplicate this simple logic twice. If the logic was much more involved, or if it was used in lots of places, totally agree using a decorator (or some other isolated utility) would be a good practice.
| assert_eq(dds.skew(), pds.skew() / bias_factor) | ||
|
|
||
| if PANDAS_GT_200: | ||
| # TODO: Remove this `if`-block once `axis=None` support is added |
There was a problem hiding this comment.
Might be helpful to open an issue and mention the issue here, so the person implementing it could grep the codebase for issue number.
…ith_reduction-fixup
|
Thanks for reviewing @j-bennet |
In
pandas=2.0certainDataFramereductions (e.g.DataFrame.min(),DataFrame.max()) will now be performed over both axes and a scalar returned whenaxis=None. This PR implements the same thing here.xref pandas-dev/pandas#50593
DataFrame.minDataFrame.maxDataFrame.meanDataFrame.medianDataFrame.skewDataFrame.kurtEDIT: Note that this change is currently showing up in the test suite as
dask/dataframe/tests/test_ufunc.py::test_ufunc_with_reductionfailing with tracebacks like the followingDetails: