Use Blockwise/map_partitions for more tasks#8831
Merged
Conversation
This replaces the low-level graph generation with map_partitions to generate the high-level graph. Discussed in GH8306.
added 3 commits
March 22, 2022 17:06
Creates a high-level graph wrapper for the partition stacking operation. Removing the forced materialization of the incoming DataFrame means that any blockwise operations on the DataFrame will remain HLG layers at this point.
Several tests appeared to be duplicated from the first for-loop in test_concat. These are removed. The remaining tests are moved to a new function since they test Series concat. These tests are also de-duplicated.
With the changes here, blockwise operations should remain blockwise for concat operations.
map_partitions for more tasksmap_partitions for more tasks
jcrist
approved these changes
Mar 23, 2022
Member
jcrist
left a comment
There was a problem hiding this comment.
This looks good to me, thanks @bryanwweber!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Complete the other two items from #8306
hash_join'smerge_chunkstack_partitionsshould useHighLevelGraph.from_collectionsinstead of merging all of the input graphsmap_partitionsin various DataFrame join methods #8306pre-commit run --all-files