Skip to content

Use Blockwise/map_partitions for more tasks#8831

Merged
jcrist merged 4 commits intodask:mainfrom
bryanwweber:use-blockwise-for-joins
Mar 23, 2022
Merged

Use Blockwise/map_partitions for more tasks#8831
jcrist merged 4 commits intodask:mainfrom
bryanwweber:use-blockwise-for-joins

Conversation

@bryanwweber
Copy link
Contributor

@bryanwweber bryanwweber commented Mar 22, 2022

Complete the other two items from #8306

  • hash_join's merge_chunk
  • stack_partitions should use HighLevelGraph.from_collections instead of merging all of the input graphs

This replaces the low-level graph generation with map_partitions to
generate the high-level graph. Discussed in GH8306.
Bryan Weber added 3 commits March 22, 2022 17:06
Creates a high-level graph wrapper for the partition stacking operation.
Removing the forced materialization of the incoming DataFrame means that
any blockwise operations on the DataFrame will remain HLG layers at this
point.
Several tests appeared to be duplicated from the first for-loop in
test_concat. These are removed. The remaining tests are moved to a new
function since they test Series concat. These tests are also
de-duplicated.
With the changes here, blockwise operations should remain blockwise for
concat operations.
@bryanwweber bryanwweber changed the title [WIP] Use Blockwise/map_partitions for more tasks Use Blockwise/map_partitions for more tasks Mar 23, 2022
Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks @bryanwweber!

@jcrist jcrist merged commit 68cce79 into dask:main Mar 23, 2022
@bryanwweber bryanwweber deleted the use-blockwise-for-joins branch March 23, 2022 16:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use Blockwise/map_partitions in various DataFrame join methods

2 participants