Skip to content

Satisfy broadcast argument in DataFrame.merge#9852

Merged
rjzamora merged 1 commit intodask:mainfrom
rjzamora:fix-broadcast
Jan 25, 2023
Merged

Satisfy broadcast argument in DataFrame.merge#9852
rjzamora merged 1 commit intodask:mainfrom
rjzamora:fix-broadcast

Conversation

@rjzamora
Copy link
Copy Markdown
Member

@rjzamora rjzamora commented Jan 19, 2023

Minimal change needed to ensure broadcast=True will be satisfied when the broadcast algorithm is not prohibited by the how or shuffle arguments.

@rjzamora rjzamora added dataframe bug Something is broken labels Jan 19, 2023
@rjzamora
Copy link
Copy Markdown
Member Author

This fix is relatively minor - Planning to merge by the end of the day.

@rjzamora rjzamora merged commit 2a2b9d3 into dask:main Jan 25, 2023
@rjzamora rjzamora changed the title Satisfy broadcast argument in DataFrame merge Satisfy broadcast argument in DataFrame.merge Jan 25, 2023
@rjzamora rjzamora deleted the fix-broadcast branch January 25, 2023 14:27
Copy link
Copy Markdown
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving forward with this @rjzamora. Question: based on this comment in the original issue

The dask.dataframe.merge API should use a broadcast-based merge when broadcast=True. The only exception should be when a broadcast-based algorithm is prohibited for the specified how and/or shuffle arguments

could we just raise an error if the user specifies values for how / shuffle that are inconsistent with broadcast=True? That seems like a better UX than silently not using broadcast

@rjzamora
Copy link
Copy Markdown
Member Author

could we just raise an error if the user specifies values for how / shuffle that are inconsistent with broadcast=True? That seems like a better UX than silently not using broadcast

Yes, it definitely makes sense to raise an error any time brodcast=True and such behavior is not supported. Good call

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something is broken dataframe

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dataframe merge does not always satisfy broadcast argument

2 participants