Skip to content

[REVIEW] Use ignore_index for pandas' group_split_dispatch#6251

Merged
TomAugspurger merged 3 commits intodask:masterfrom
rjzamora:ignore-index-bugfix
Jun 23, 2020
Merged

[REVIEW] Use ignore_index for pandas' group_split_dispatch#6251
TomAugspurger merged 3 commits intodask:masterfrom
rjzamora:ignore-index-bugfix

Conversation

@rjzamora
Copy link
Member

@rjzamora rjzamora commented May 28, 2020

After thinking about this comment from @jcrist in #6247 , I decided to put some effort into making the pandas handling of ignore_index in group_split_dispatch more consistent with that of cudf (allowing us to test that the ignore_index argument is actually handled). With the changes in this PR, passing ignore_index=True will ensure that the original index will be replaced with a default RangeIndex.

If others feel this is not actually the behavior we want, I'll be happy to close :)

  • Tests added / passed
  • Passes black dask / flake8 dask

@rjzamora rjzamora changed the title Use ignore_index for pandas' group_split_dispatch [REVIEW] Use ignore_index for pandas' group_split_dispatch May 28, 2020
@jcrist
Copy link
Member

jcrist commented May 29, 2020

I think this makes sense, but would want to check with someone else (cc @TomAugspurger) first before merging.

Copy link
Member

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one question.

Comment on lines +354 to +356
if ignore_index:
df2._meta = df2._meta.reset_index(drop=True)
return df2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to do this here rather than in rearrange_column_by_tasks?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question -It seemed to me that I would need to add this in two places in rearrange_by_column_task (since the _simple_rearrange_by_column_tasks code path could be used), but only once in rearrange_by_column.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this reasoning sufficient, or should I move the changes down into rearrange_by_column_task?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger -- just a gentle ping on this

@TomAugspurger
Copy link
Member

Sorry for the delay here, thanks!

@TomAugspurger TomAugspurger merged commit 5195e5b into dask:master Jun 23, 2020
kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020
@rjzamora rjzamora deleted the ignore-index-bugfix branch May 21, 2024 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants