Pandas 1.5.0 compatibility by ian-r-rose · Pull Request #8961 · dask/dask

ian-r-rose · 2022-04-20T21:46:06Z

Closes ⚠️ Upstream CI failed ⚠️ #8776
Tests added / passed
Passes pre-commit run --all-files

I've filtered a number of warnings with links to some new relevant issues as this has been dragging on for a while, and we should really get our upstream CI green.

FutureWarning to propagate to the user, but it doesn't require any additional action on the Dask side.

ian-r-rose · 2022-04-22T17:09:42Z

@rjzamora @madsbk I'm not sure who maintains the gpuCI check, but just flagging that the upstream changes to the group_keys defaults seem to affect cudf as well.

jrbourbeau

Thanks for handling all of this @ian-r-rose

jrbourbeau · 2022-04-26T15:21:34Z

dask/dataframe/groupby.py


 def _groupby_slice_apply(
-    df, grouper, key, func, *args, group_keys=True, dropna=None, observed=None, **kwargs
+    df, grouper, key, func, *args, group_keys=None, dropna=None, observed=None, **kwargs


Should these also be using GROUP_KEYS_DEFAULT?

Sure. I'm not sure it matters since this is only invoked from the top-level functions that already have the default set, but I don't see the harm.

jrbourbeau · 2022-04-26T15:22:13Z

dask/dataframe/tests/test_groupby.py



-def test_groupby_group_keys():
+@pytest.mark.parametrize("group_keys", [True, False, None])


Nice -- thank you for expanding this test

rjzamora · 2022-04-26T15:52:51Z

Thanks for the heads up @ian-r-rose - Sorry I missed this.

It looks like cudf does not support anything other than group_keys=True. Can you explain the change to group_keys=None for newer Pandas versions? Does it make sense for cudf to simply treat None as True?

jrbourbeau · 2022-04-26T15:56:15Z

There's a nice explanation of the change here https://pandas.pydata.org/docs/dev/whatsnew/v1.5.0.html#using-group-keys-with-transformers-in-groupby-apply

ian-r-rose · 2022-04-26T15:58:25Z

Unfortunately, the changelog entry is not fully consistent with the actual code, which changed the defaults to NoDefault.no_default. The precedent in dask seems to be to default to None for that case.

rjzamora · 2022-04-26T16:01:42Z

@shwina @galipremsagar - Is this group_keys change already on our radar?

jsignell · 2022-04-27T16:11:35Z

Thanks for taking this on @ian-r-rose! If we want to pass cudf tests we could xfail or skip the ones where groupby_keys != True

jrbourbeau

Thanks @ian-r-rose! This should be good to go after CI finishes

EDIT: Just merged main to include #8986 which should fix some unrelated CI failures

…test-upstream]

ian-r-rose · 2022-04-27T21:04:34Z

Woo!

github-actions bot added the dataframe label Apr 20, 2022

ian-r-rose force-pushed the group-keys branch from c6d8e66 to 4e53e04 Compare April 20, 2022 23:21

Ian Rose added 6 commits April 22, 2022 08:42

group_keys got a new default value in pandas>1.5.0

7c538e0

Workaround for groubpy/shift shuffle bug.

a5522ea

Add warning filter with link to upstream pandas issue.

5c8207e

Filter FutureWarning and link to relevant feature request.

9fe123b

Filter group_keys warning at the top level. We should allow this

11b1796

FutureWarning to propagate to the user, but it doesn't require any additional action on the Dask side.

Relax check_freq a bit more

6bc3fe7

ian-r-rose force-pushed the group-keys branch from 4e53e04 to 6bc3fe7 Compare April 22, 2022 15:43

ian-r-rose added tests Unit tests and/or continuous integration upstream labels Apr 22, 2022

jrbourbeau reviewed Apr 26, 2022

View reviewed changes

Also use GROUP_KEYS_DEFAULT for internal functions.

9ad1e58

Parametrize groupby_cudf across group_keys

6b47bb3

ian-r-rose force-pushed the group-keys branch from 376d813 to 6b47bb3 Compare April 27, 2022 16:37

jrbourbeau approved these changes Apr 27, 2022

View reviewed changes

Merge branch 'main' of https://github.com/dask/dask into group-keys […

e990d6d

…test-upstream]

jrbourbeau merged commit 8e44bfc into dask:main Apr 27, 2022

jrbourbeau mentioned this pull request Apr 28, 2022

Release 2022.4.2 dask/community#240

Closed

pavithraes mentioned this pull request Jul 1, 2022

⚠️ Upstream CI failed ⚠️ #9204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pandas 1.5.0 compatibility#8961

Pandas 1.5.0 compatibility#8961
jrbourbeau merged 9 commits intodask:mainfrom
ian-r-rose:group-keys

ian-r-rose commented Apr 20, 2022

Uh oh!

ian-r-rose commented Apr 22, 2022

Uh oh!

jrbourbeau left a comment

Uh oh!

jrbourbeau Apr 26, 2022

Uh oh!

ian-r-rose Apr 26, 2022

Uh oh!

jrbourbeau Apr 26, 2022

Uh oh!

rjzamora commented Apr 26, 2022

Uh oh!

jrbourbeau commented Apr 26, 2022

Uh oh!

ian-r-rose commented Apr 26, 2022

Uh oh!

rjzamora commented Apr 26, 2022

Uh oh!

jsignell commented Apr 27, 2022

Uh oh!

jrbourbeau left a comment •

edited

Loading

Uh oh!

ian-r-rose commented Apr 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants



		def test_groupby_group_keys():
		@pytest.mark.parametrize("group_keys", [True, False, None])

Uh oh!

Conversation

ian-r-rose commented Apr 20, 2022

Uh oh!

ian-r-rose commented Apr 22, 2022

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

jrbourbeau Apr 26, 2022

Choose a reason for hiding this comment

Uh oh!

ian-r-rose Apr 26, 2022

Choose a reason for hiding this comment

Uh oh!

jrbourbeau Apr 26, 2022

Choose a reason for hiding this comment

Uh oh!

rjzamora commented Apr 26, 2022

Uh oh!

jrbourbeau commented Apr 26, 2022

Uh oh!

ian-r-rose commented Apr 26, 2022

Uh oh!

rjzamora commented Apr 26, 2022

Uh oh!

jsignell commented Apr 27, 2022

Uh oh!

jrbourbeau left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ian-r-rose commented Apr 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jrbourbeau left a comment •

edited

Loading