Avoid graph materialization during Blockwise culling by rjzamora · Pull Request #6815 · dask/dask

rjzamora · 2020-11-07T05:52:01Z

Follow up to #6715 - Originally intended to avoid graph materialization for "flat" blockwise operations, but now attempts all Blockwise layers.

TODO:

General clean up (v1)
Validate "high-level" culling (test that graph is not materialized)
Address serialization (may address in a separate PR)

rjzamora · 2020-11-09T15:26:45Z

cc @madsbk (for visibility)

madsbk · 2020-11-10T14:00:47Z

@rjzamora this look promising, please ping when it is ready for review.
When this is in, I will be happy to implement its serialization.

rjzamora · 2020-11-10T14:50:40Z

@rjzamora this look promising, please ping when it is ready for review.

Great - Will do! I should be able to take care of this today.

When this is in, I will be happy to implement its serialization.

Sounds good. In that case, I won't include any attempts at serialization here.

rjzamora · 2020-11-11T03:22:21Z

@madsbk - I think this is ready for an initial review.

Note that I eventually decided to remove the specialized code path for "flat" layers, because the distinction was starting to get a bit ugly, and I'm not convinced there is a clear performance motivation (yet). My intuition is that we should add a separate code path after profiling shows that culling and/or graph materialization is slower than it needs to be for these special cases.

madsbk

@rjzamora this is great work, awesome that you support all kinds of blockwise!

I suggest that you remove key_deps and non_blockwise_keys and remove the use of BasicLayer so that the cached dict just becomes:

            self._cached_dict = {
                "dsk": dsk,
            }

madsbk · 2020-11-11T08:32:21Z

dask/blockwise.py

        return ret


+def _get_coord_mapping(


Great idea of separating the abstract coordinates from the task generation. Since _get_coord_mapping() implements the bulk of the blockwise complexity, I would be great if you could add some doc describing the input/output arguments and how the coordinates are calculated.

Right - That makes sense :)

I added a simple docstring with more info, but I may need to expand a bit further.

dask/blockwise.py

rjzamora · 2020-11-11T19:34:38Z

Thanks for the review @madsbk !

I suggest that you remove key_deps and non_blockwise_keys and remove the use of BasicLayer

Good suggestion - For now, I actually removed the use of key_deps and non_blockwise_keys altogether from Blockwise. Let me know if we still need any of that logic for any other Blockwise functionality that I'm not thinking of.

madsbk

LGTM, nice work!

jrbourbeau

Thanks for working on this @rjzamora. I've left a few small comments, but overall this LGTM

dask/blockwise.py

dask/tests/test_highgraph.py

jrbourbeau · 2020-11-13T04:52:52Z

dask/blockwise.py

+            self._dims = broadcast_dimensions(self.indices, self.numblocks)
+            for k, v in self.new_axes.items():
+                self._dims[k] = len(v) if isinstance(v, tuple) else 1


This logic now exists in make_blockwise_graph too, do you think it's worth moving to a separate function?

Okay - I added a _make_dims function to be used in both these cases. How does that sound?

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

jrbourbeau

Thanks @rjzamora, I'll merge once CI builds finish

madsbk · 2020-11-16T16:14:16Z

Thanks @rjzamora, I'll merge once CI builds finish

Awesome, I am working on serializing Blockwise without materializing.

rjzamora · 2020-11-16T16:17:53Z

Awesome, I am working on serializing Blockwise without materializing.

Thanks Mads!

rjzamora added 11 commits November 5, 2020 20:38

improve flat-blockwise handling in get_output_keys

df92dbe

starting to attempt culling

be976a7

avoid graph materialization during culling

e57932c

fix typo

665ef65

fix io bug

98a1eed

revert debug statements

70a0f45

update docstring with output_blocks

175922e

slight improvement in get_output_keys

8146e7b

capture culling in get_output_keys

aea5a27

start to handle general culling in blockwise

139d280

add default value to output_blocks pop

6c1b7a5

rjzamora added 3 commits November 10, 2020 12:13

add simple blockwise-culling test

640cda4

remove extra add

46e0950

avoid code duplication - adding _get_coord_mapping

375b644

rjzamora marked this pull request as ready for review November 10, 2020 23:54

rjzamora added 2 commits November 10, 2020 19:00

moving away from specialized code path for flat layers

5f39da7

fix tuple bug

c0f29e2

add constant deps

d8ad27a

madsbk suggested changes Nov 11, 2020

View reviewed changes

rjzamora added 3 commits November 11, 2020 10:15

remove use of BasicLayer

2cf40bc

remove use of key_deps and non_blockwise_keys in make_blockwise_graph

c93fc7c

improve doc strings

d453252

rjzamora changed the title ~~[WIP] Avoid graph materialization during Blockwise culling~~ Avoid graph materialization during Blockwise culling Nov 11, 2020

madsbk approved these changes Nov 12, 2020

View reviewed changes

jrbourbeau reviewed Nov 13, 2020

View reviewed changes

rjzamora and others added 3 commits November 12, 2020 23:10

Apply suggestions from code review

754a465

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

assert that blockwise layers remain Blockwise in test

c63d8b8

add _make_dims

b90c82a

jrbourbeau approved these changes Nov 13, 2020

View reviewed changes

jrbourbeau merged commit fe3d402 into dask:master Nov 13, 2020

rjzamora deleted the flat-blockwise branch November 16, 2020 15:58

Uh oh!

Conversation

rjzamora commented Nov 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjzamora commented Nov 9, 2020

Uh oh!

madsbk commented Nov 10, 2020

Uh oh!

rjzamora commented Nov 10, 2020

Uh oh!

rjzamora commented Nov 11, 2020

Uh oh!

madsbk left a comment

Choose a reason for hiding this comment

Uh oh!

madsbk Nov 11, 2020

Choose a reason for hiding this comment

Uh oh!

rjzamora Nov 11, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rjzamora commented Nov 11, 2020

Uh oh!

madsbk left a comment

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jrbourbeau Nov 13, 2020

Choose a reason for hiding this comment

Uh oh!

rjzamora Nov 13, 2020

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

madsbk commented Nov 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjzamora commented Nov 16, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rjzamora commented Nov 7, 2020 •

edited

Loading

madsbk commented Nov 16, 2020 •

edited

Loading