Skip to content

Pandas 1.5.0 compatibility#8961

Merged
jrbourbeau merged 9 commits intodask:mainfrom
ian-r-rose:group-keys
Apr 27, 2022
Merged

Pandas 1.5.0 compatibility#8961
jrbourbeau merged 9 commits intodask:mainfrom
ian-r-rose:group-keys

Conversation

@ian-r-rose
Copy link
Copy Markdown
Collaborator

I've filtered a number of warnings with links to some new relevant issues as this has been dragging on for a while, and we should really get our upstream CI green.

@ian-r-rose
Copy link
Copy Markdown
Collaborator Author

@rjzamora @madsbk I'm not sure who maintains the gpuCI check, but just flagging that the upstream changes to the group_keys defaults seem to affect cudf as well.

@ian-r-rose ian-r-rose added tests Unit tests and/or continuous integration upstream labels Apr 22, 2022
Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for handling all of this @ian-r-rose


def _groupby_slice_apply(
df, grouper, key, func, *args, group_keys=True, dropna=None, observed=None, **kwargs
df, grouper, key, func, *args, group_keys=None, dropna=None, observed=None, **kwargs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these also be using GROUP_KEYS_DEFAULT?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'm not sure it matters since this is only invoked from the top-level functions that already have the default set, but I don't see the harm.



def test_groupby_group_keys():
@pytest.mark.parametrize("group_keys", [True, False, None])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- thank you for expanding this test

@rjzamora
Copy link
Copy Markdown
Member

Thanks for the heads up @ian-r-rose - Sorry I missed this.

It looks like cudf does not support anything other than group_keys=True. Can you explain the change to group_keys=None for newer Pandas versions? Does it make sense for cudf to simply treat None as True?

@jrbourbeau
Copy link
Copy Markdown
Member

There's a nice explanation of the change here https://pandas.pydata.org/docs/dev/whatsnew/v1.5.0.html#using-group-keys-with-transformers-in-groupby-apply

@ian-r-rose
Copy link
Copy Markdown
Collaborator Author

Unfortunately, the changelog entry is not fully consistent with the actual code, which changed the defaults to NoDefault.no_default. The precedent in dask seems to be to default to None for that case.

@rjzamora
Copy link
Copy Markdown
Member

@shwina @galipremsagar - Is this group_keys change already on our radar?

@jsignell
Copy link
Copy Markdown
Member

Thanks for taking this on @ian-r-rose! If we want to pass cudf tests we could xfail or skip the ones where groupby_keys != True

Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ian-r-rose! This should be good to go after CI finishes

EDIT: Just merged main to include #8986 which should fix some unrelated CI failures

@jrbourbeau jrbourbeau merged commit 8e44bfc into dask:main Apr 27, 2022
@ian-r-rose
Copy link
Copy Markdown
Collaborator Author

Woo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataframe tests Unit tests and/or continuous integration upstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

⚠️ Upstream CI failed ⚠️

4 participants