Skip to content

Use blockwise in single_partition_join#8341

Merged
jsignell merged 8 commits intodask:mainfrom
gjoseph92:join-blockwise
Nov 15, 2021
Merged

Use blockwise in single_partition_join#8341
jsignell merged 8 commits intodask:mainfrom
gjoseph92:join-blockwise

Conversation

@gjoseph92
Copy link
Collaborator

@gjoseph92 gjoseph92 commented Nov 3, 2021

Blocked by #8338

@gjoseph92 gjoseph92 changed the title WIP use blockwise in single_partition_join Use blockwise in single_partition_join Nov 5, 2021
@gjoseph92 gjoseph92 requested a review from jsignell November 5, 2021 17:05
@gjoseph92 gjoseph92 marked this pull request as ready for review November 5, 2021 17:05
@gjoseph92
Copy link
Collaborator Author

@jsignell this looks green besides #7918

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gjoseph92. I left a few small comments, but overall this looks good and I'm looking forward to seeing it merged in

return None


def hlg_layer(hlg: HighLevelGraph, prefix: str) -> Layer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could very well be missing something, but it appears this utility isn't used anywhere (outside of the unit test making sure this behaves as expected). I'd prefer to not include it if it's not used anywhere

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's not used in this PR. I was going to use it here once this is merged: #8310 (comment). Just felt like a generally useful utility for testing things about HLGs.

raise KeyError(f"No layer starts with {prefix!r}: {list(hlg.layers)}")


def hlg_layer_topological(hlg: HighLevelGraph, i: int) -> Layer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it makes sense to just return the i=-1 layer since It looks like we're always calling this with i=-1 (outside of test_hlg_layer_topological)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that. It would probably be nice if i=-1 was the default, since I think that'll be the most common use. I just couldn't think of a name that made it obvious it returned the last layer by default, but where that could also be changed (last_hlg_layer(hlg, i=-4) sounds weird when i isn't -1). I just wanted someone reading a test to not have to go look at the implementation for this function to know what the default was. So since I couldn't come up with a good name, I just didn't give a default.

Comment on lines +1403 to +1404
@pytest.mark.parametrize("flip", [False, True])
def test_cheap_single_partition_merge_divisions(flip):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're not using flip in this test

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks! The test already tried both orders, so the flip was unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use Blockwise/map_partitions in various DataFrame join methods

3 participants