Skip to content

Avoid graph materialization during Blockwise culling#6815

Merged
jrbourbeau merged 23 commits intodask:masterfrom
rjzamora:flat-blockwise
Nov 13, 2020
Merged

Avoid graph materialization during Blockwise culling#6815
jrbourbeau merged 23 commits intodask:masterfrom
rjzamora:flat-blockwise

Conversation

@rjzamora
Copy link
Member

@rjzamora rjzamora commented Nov 7, 2020

Closes #6792

Follow up to #6715 - Originally intended to avoid graph materialization for "flat" blockwise operations, but now attempts all Blockwise layers.

TODO:

  • General clean up (v1)
  • Validate "high-level" culling (test that graph is not materialized)
  • Address serialization (may address in a separate PR)

@rjzamora
Copy link
Member Author

rjzamora commented Nov 9, 2020

cc @madsbk (for visibility)

@madsbk
Copy link
Contributor

madsbk commented Nov 10, 2020

@rjzamora this look promising, please ping when it is ready for review.
When this is in, I will be happy to implement its serialization.

@rjzamora
Copy link
Member Author

@rjzamora this look promising, please ping when it is ready for review.

Great - Will do! I should be able to take care of this today.

When this is in, I will be happy to implement its serialization.

Sounds good. In that case, I won't include any attempts at serialization here.

@rjzamora rjzamora marked this pull request as ready for review November 10, 2020 23:54
@rjzamora
Copy link
Member Author

@madsbk - I think this is ready for an initial review.

Note that I eventually decided to remove the specialized code path for "flat" layers, because the distinction was starting to get a bit ugly, and I'm not convinced there is a clear performance motivation (yet). My intuition is that we should add a separate code path after profiling shows that culling and/or graph materialization is slower than it needs to be for these special cases.

Copy link
Contributor

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjzamora this is great work, awesome that you support all kinds of blockwise!

I suggest that you remove key_deps and non_blockwise_keys and remove the use of BasicLayer so that the cached dict just becomes:

            self._cached_dict = {
                "dsk": dsk,
            }

return ret


def _get_coord_mapping(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea of separating the abstract coordinates from the task generation. Since _get_coord_mapping() implements the bulk of the blockwise complexity, I would be great if you could add some doc describing the input/output arguments and how the coordinates are calculated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - That makes sense :)

I added a simple docstring with more info, but I may need to expand a bit further.

@rjzamora
Copy link
Member Author

Thanks for the review @madsbk !

I suggest that you remove key_deps and non_blockwise_keys and remove the use of BasicLayer

Good suggestion - For now, I actually removed the use of key_deps and non_blockwise_keys altogether from Blockwise. Let me know if we still need any of that logic for any other Blockwise functionality that I'm not thinking of.

@rjzamora rjzamora changed the title [WIP] Avoid graph materialization during Blockwise culling Avoid graph materialization during Blockwise culling Nov 11, 2020
Copy link
Contributor

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work!

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @rjzamora. I've left a few small comments, but overall this LGTM

Comment on lines +272 to +274
self._dims = broadcast_dimensions(self.indices, self.numblocks)
for k, v in self.new_axes.items():
self._dims[k] = len(v) if isinstance(v, tuple) else 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic now exists in make_blockwise_graph too, do you think it's worth moving to a separate function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - I added a _make_dims function to be used in both these cases. How does that sound?

rjzamora and others added 3 commits November 12, 2020 23:10
Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rjzamora, I'll merge once CI builds finish

@jrbourbeau jrbourbeau merged commit fe3d402 into dask:master Nov 13, 2020
@rjzamora rjzamora deleted the flat-blockwise branch November 16, 2020 15:58
@madsbk
Copy link
Contributor

madsbk commented Nov 16, 2020

Thanks @rjzamora, I'll merge once CI builds finish

Awesome, I am working on serializing Blockwise without materializing.

@rjzamora
Copy link
Member Author

Awesome, I am working on serializing Blockwise without materializing.

Thanks Mads!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add comprehensive get_output_keys defintition to Blockwise

3 participants