Avoid graph materialization during Blockwise culling#6815
Avoid graph materialization during Blockwise culling#6815jrbourbeau merged 23 commits intodask:masterfrom
Conversation
|
cc @madsbk (for visibility) |
|
@rjzamora this look promising, please ping when it is ready for review. |
Great - Will do! I should be able to take care of this today.
Sounds good. In that case, I won't include any attempts at serialization here. |
|
@madsbk - I think this is ready for an initial review. Note that I eventually decided to remove the specialized code path for "flat" layers, because the distinction was starting to get a bit ugly, and I'm not convinced there is a clear performance motivation (yet). My intuition is that we should add a separate code path after profiling shows that culling and/or graph materialization is slower than it needs to be for these special cases. |
madsbk
left a comment
There was a problem hiding this comment.
@rjzamora this is great work, awesome that you support all kinds of blockwise!
I suggest that you remove key_deps and non_blockwise_keys and remove the use of BasicLayer so that the cached dict just becomes:
self._cached_dict = {
"dsk": dsk,
}| return ret | ||
|
|
||
|
|
||
| def _get_coord_mapping( |
There was a problem hiding this comment.
Great idea of separating the abstract coordinates from the task generation. Since _get_coord_mapping() implements the bulk of the blockwise complexity, I would be great if you could add some doc describing the input/output arguments and how the coordinates are calculated.
There was a problem hiding this comment.
Right - That makes sense :)
I added a simple docstring with more info, but I may need to expand a bit further.
|
Thanks for the review @madsbk !
Good suggestion - For now, I actually removed the use of |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks for working on this @rjzamora. I've left a few small comments, but overall this LGTM
dask/blockwise.py
Outdated
| self._dims = broadcast_dimensions(self.indices, self.numblocks) | ||
| for k, v in self.new_axes.items(): | ||
| self._dims[k] = len(v) if isinstance(v, tuple) else 1 |
There was a problem hiding this comment.
This logic now exists in make_blockwise_graph too, do you think it's worth moving to a separate function?
There was a problem hiding this comment.
Okay - I added a _make_dims function to be used in both these cases. How does that sound?
Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @rjzamora, I'll merge once CI builds finish
Awesome, I am working on serializing |
Thanks Mads! |
Closes #6792
Follow up to #6715 - Originally intended to avoid graph materialization for "flat" blockwise operations, but now attempts all Blockwise layers.
TODO: