Use blockwise in single_partition_join#8341
Conversation
fdab199 to
568538c
Compare
single_partition_joinsingle_partition_join
Co-authored-by: Julia Signell <jsignell@gmail.com>
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @gjoseph92. I left a few small comments, but overall this looks good and I'm looking forward to seeing it merged in
| return None | ||
|
|
||
|
|
||
| def hlg_layer(hlg: HighLevelGraph, prefix: str) -> Layer: |
There was a problem hiding this comment.
I could very well be missing something, but it appears this utility isn't used anywhere (outside of the unit test making sure this behaves as expected). I'd prefer to not include it if it's not used anywhere
There was a problem hiding this comment.
You're right, it's not used in this PR. I was going to use it here once this is merged: #8310 (comment). Just felt like a generally useful utility for testing things about HLGs.
| raise KeyError(f"No layer starts with {prefix!r}: {list(hlg.layers)}") | ||
|
|
||
|
|
||
| def hlg_layer_topological(hlg: HighLevelGraph, i: int) -> Layer: |
There was a problem hiding this comment.
I'm wondering if it makes sense to just return the i=-1 layer since It looks like we're always calling this with i=-1 (outside of test_hlg_layer_topological)
There was a problem hiding this comment.
I thought about that. It would probably be nice if i=-1 was the default, since I think that'll be the most common use. I just couldn't think of a name that made it obvious it returned the last layer by default, but where that could also be changed (last_hlg_layer(hlg, i=-4) sounds weird when i isn't -1). I just wanted someone reading a test to not have to go look at the implementation for this function to know what the default was. So since I couldn't come up with a good name, I just didn't give a default.
dask/dataframe/tests/test_multi.py
Outdated
| @pytest.mark.parametrize("flip", [False, True]) | ||
| def test_cheap_single_partition_merge_divisions(flip): |
There was a problem hiding this comment.
It looks like we're not using flip in this test
There was a problem hiding this comment.
Good catch, thanks! The test already tried both orders, so the flip was unnecessary.
Blocked by #8338map_partitionsin various DataFrame join methods #8306pre-commit run --all-files