Skip to content

Update numeric_only default in quantile for pandas 2.0#9854

Merged
jrbourbeau merged 7 commits intodask:mainfrom
j-bennet:j-bennet/9736-upstream-fix-quantile
Feb 10, 2023
Merged

Update numeric_only default in quantile for pandas 2.0#9854
jrbourbeau merged 7 commits intodask:mainfrom
j-bennet:j-bennet/9736-upstream-fix-quantile

Conversation

@j-bennet
Copy link
Copy Markdown
Contributor

@j-bennet j-bennet commented Jan 19, 2023

Fix for upstream failure: dask/dataframe/tests/test_dataframe.py::test_dataframe_quantile.

Previously, numeric_only was true by default in quantile, but now we need to set it explicitly.

Related: #9736.

  • Tests added / passed
  • Passes pre-commit run --all-files

Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @j-bennet!

Is this part of the larger change in the default of numeric_only (xref #9471)? If so, while I agree this change gets this test passing, I think we'll also want to add numeric_only= support to quantile. Said another way, with pandas=2.0, dask's quantile and pandas' quantile don't have the same behavior.

@j-bennet j-bennet marked this pull request as draft January 20, 2023 17:46
@j-bennet
Copy link
Copy Markdown
Contributor Author

@jrbourbeau

while I agree this change gets this test passing, I think we'll also want to add numeric_only= support to quantile. Said another way, with pandas=2.0, dask's quantile and pandas' quantile don't have the same behavior.

quantile in Dask already supports numeric_only, but our default is True, and the new default in Pandas is False.

If we want to mimic Pandas behavior, and have a different default depending on Pandas version, then yes, this needs more changes. I'll get on it.

Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quantile in Dask already supports numeric_only

Ah, great!

If we want to mimic Pandas behavior, and have a different default depending on Pandas version, then yes, this needs more changes

👍

@j-bennet j-bennet marked this pull request as ready for review January 20, 2023 20:52
@j-bennet j-bennet marked this pull request as draft January 20, 2023 20:53
@j-bennet j-bennet marked this pull request as ready for review January 20, 2023 20:53
assert result.name == 0.5
tm.assert_index_equal(result.index, pd.Index(["A", "X", "B"]))
assert (result == expected[0]).all()
if numeric_only is False or (PANDAS_GT_200 and numeric_only is None):
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opted for an explicit fail branch to state exactly what errors I'm expecting (TypeError in Pandas, but NotImplementedError in Dask), but this can be an xfail instead, what do you think @jrbourbeau ?

@j-bennet
Copy link
Copy Markdown
Contributor Author

@jrbourbeau Please take another look at this one when you have the chance.

@jrbourbeau jrbourbeau changed the title Fix quantile failure in upstream, pandas 2.0 compatibility Update numeric_only default in quantile for pandas 2.0 Feb 9, 2023
Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @j-bennet! This should be good to go -- will merge after CI finishes

@jrbourbeau jrbourbeau merged commit 1708577 into dask:main Feb 10, 2023
@j-bennet j-bennet deleted the j-bennet/9736-upstream-fix-quantile branch February 10, 2023 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants