Make repartition a no-op when divisions match#9924
Conversation
|
Does this close #9922? |
|
No, this is separate from that issue. Though that issue was what made me think about this use case. |
|
Will plan to merge this in a few hours if not further comment. I don't think the changes here are particularly controversial. |
I am the author of #9922, my complaint was to avoid adding unnecessary repartition nodes when old and new divisions are equal. So to me, this PR solves my issue. I mentioned |
| """ | ||
|
|
||
| # no-op fastpath for when we already have matching divisions | ||
| if is_dask_collection(df) and df.divisions == divisions: |
There was a problem hiding this comment.
This test is failing if divisions is a list, it only works with tuples
There was a problem hiding this comment.
It's an issue because if we concat a few Dask DataFrames with identical divisions then it will still repartition every dds:
dask.dataframe.multi.concat() calls align_partitions() that calls df.repartition() with a list divisions
There was a problem hiding this comment.
Shall I open a separate issue?
No need to actually repartition anything if the input divisions are already equal to the existing divisions