Skip to content

Make repartition a no-op when divisions match#9924

Merged
jrbourbeau merged 1 commit intodask:mainfrom
jrbourbeau:repartition-fastpath
Feb 9, 2023
Merged

Make repartition a no-op when divisions match#9924
jrbourbeau merged 1 commit intodask:mainfrom
jrbourbeau:repartition-fastpath

Conversation

@jrbourbeau
Copy link
Copy Markdown
Member

No need to actually repartition anything if the input divisions are already equal to the existing divisions

@j-bennet
Copy link
Copy Markdown
Contributor

j-bennet commented Feb 6, 2023

Does this close #9922?

@jrbourbeau
Copy link
Copy Markdown
Member Author

No, this is separate from that issue. Though that issue was what made me think about this use case.

@jrbourbeau
Copy link
Copy Markdown
Member Author

Will plan to merge this in a few hours if not further comment. I don't think the changes here are particularly controversial.

Copy link
Copy Markdown
Contributor

@j-bennet j-bennet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 🚀

@jrbourbeau jrbourbeau merged commit 0058050 into dask:main Feb 9, 2023
@jrbourbeau jrbourbeau deleted the repartition-fastpath branch February 9, 2023 23:10
@epizut
Copy link
Copy Markdown
Contributor

epizut commented Feb 10, 2023

No, this is separate from that issue. Though that issue was what made me think about this use case.

I am the author of #9922, my complaint was to avoid adding unnecessary repartition nodes when old and new divisions are equal. So to me, this PR solves my issue.

I mentioned force in the title, but it was not related to the force parameter of repartition()

"""

# no-op fastpath for when we already have matching divisions
if is_dask_collection(df) and df.divisions == divisions:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is failing if divisions is a list, it only works with tuples

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an issue because if we concat a few Dask DataFrames with identical divisions then it will still repartition every dds:
dask.dataframe.multi.concat() calls align_partitions() that calls df.repartition() with a list divisions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I open a separate issue?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate issue would be good, thanks @epizut

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants