Add split_every to graph_manipulation by crusaderky · Pull Request #7282 · dask/dask

crusaderky · 2021-02-26T18:40:21Z

Although the outputs of the map stage of dask.graph_manipulation.checkpoint are just a bunch of None's, it's been observed that the distributed scheduler incorrectly concentrates the pre-map data (potentially gigabytes) onto the worker that computes the final node of checkpoint. This will eventually need to be fixed in distributed but won't be an easy fix.

This PR sidesteps the problem, removing the nexus node and replacing it with a recursive aggregation equivalent to the one already implemented in dask.array, dask.bag, and dask.dataframe.

Follow-up: #7283

rename

test test test invalid

crusaderky · 2021-03-02T21:43:35Z

dask/graph_manipulation.py

+
+    SplitEvery = Union[Number, Literal[False], None]
+except ImportError:
+    SplitEvery = Union[Number, bool, None]  # type: ignore


The alternative to this ugly thing was to add a dependency to typing_extensions. Which, if we are going to be serious with type annotations, we'll eventually have to do.

Another alternative was to wrap the definition in if TYPE_CHECKING. As long as you have a recent version of mypy, you don't need to maintain backwards compatibility with older versions of Python.

I think this is fine for now. Since Dask development doesn't use type annotations I'm slightly against adding infrastructure to support them.

That said, if you're passionate about type annotations then I'd encourage you to engage on dask/distributed#2803. I recall previous discussions where type annotations were brought up with mixed reviews, but that was a while ago so it could be that people's opinions on them have changes since then.

crusaderky · 2021-03-02T21:50:52Z

dask/graph_manipulation.py

-    try:
-        layers_to_clone = set(child.__dask_layers__())
-    except AttributeError:
-        layers_to_clone = prev_coll_names.copy()


These two hunks fix a bug where a collection defines __dask_layers__ which returns 2+ layers, but __dask_graph__ returns a plain dict. Unit tests have been updated to trigger the issue.

crusaderky · 2021-03-02T21:51:46Z

dask/graph_manipulation.py

        if is_bound:
-            new_deps[new_layer_name].add(blocker_key)
+            new_dep.add(blocker_key)
+        new_deps[new_layer_name] = new_dep


this change is to make mypy happy, as new_deps is (correctly) a dict of read-only AbstractSets.

crusaderky · 2021-03-02T21:52:27Z

@jrbourbeau ready for review

jrbourbeau

Thanks for your work on this @crusaderky!

Also, I appreciate you raising #7283 to improve split_every consistency across the project

* upstream/master: (43 commits) bump version to 2021.03.0 Bump minimum version of distributed (dask#7328) Fix `percentiles_summary` with `dask_cudf` (dask#7325) Temporarily revert recent Array.__setitem__ updates (dask#7326) Blockwise.clone (dask#7312) NEP-35 duck array update (dask#7321) Don't allow setting `.name` for array (dask#7222) Use nearest interpolation for creating percentiles of integer input (dask#7305) Test `exp` with CuPy arrays (dask#7322) Check that computed chunks have right size and dtype (dask#7277) pytest.mark.flaky (dask#7319) Contributing docs: add note to pull the latest git tags before pip installing Dask (dask#7308) Support for Python 3.9 (dask#7289) Add broadcast-based merge implementation (dask#7143) Add split_every to graph_manipulation (dask#7282) Typo in optimize docs (dask#7306) dask.graph_manipulation support for xarray.Dataset (dask#7276) Add plot width and height support for Bokeh 2.3.0 (dask#7297) Add numpy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from (dask#6997) Remove "cleanup" task in dataframe on-disk shuffle. The partd directory (dask#7260) ...

crusaderky added 6 commits February 24, 2021 15:25

Change __dask_postpersist__ to accommodate xarray.Dataset

9677453

rename

Merge remote-tracking branch 'upstream/master' into rename

ae062b0

split_every POC

d3a06c7

cull_layers

ba41d9c

Merge remote-tracking branch 'upstream/master' into rename

375693e

Merge branch 'rename' into split_every

6177e22

crusaderky mentioned this pull request Feb 26, 2021

Harmonize split_every across modules #7283

Open

graph_manipulation split_every parameter

5c909b2

test test test invalid

crusaderky force-pushed the split_every branch from 13e7881 to 5c909b2 Compare March 1, 2021 11:09

crusaderky added 2 commits March 2, 2021 21:38

Merge branch 'master' into rename

d5676a9

Merge branch 'rename' into split_every

31f5cc2

crusaderky changed the title ~~WIP add split_every to graph_manipulation~~ Add split_every to graph_manipulation Mar 2, 2021

crusaderky marked this pull request as ready for review March 2, 2021 21:42

crusaderky commented Mar 2, 2021

View reviewed changes

crusaderky closed this Mar 2, 2021

crusaderky reopened this Mar 2, 2021

jrbourbeau approved these changes Mar 3, 2021

View reviewed changes

jrbourbeau merged commit 293e6c1 into dask:master Mar 3, 2021

crusaderky deleted the split_every branch March 3, 2021 11:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add split_every to graph_manipulation#7282

Add split_every to graph_manipulation#7282
jrbourbeau merged 9 commits intodask:masterfrom
crusaderky:split_every

crusaderky commented Feb 26, 2021 •

edited

Loading

Uh oh!

crusaderky Mar 2, 2021

Uh oh!

crusaderky Mar 2, 2021

Uh oh!

jrbourbeau Mar 3, 2021

Uh oh!

crusaderky Mar 2, 2021 •

edited

Loading

Uh oh!

crusaderky Mar 2, 2021 •

edited

Loading

Uh oh!

crusaderky commented Mar 2, 2021

Uh oh!

jrbourbeau left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

crusaderky commented Feb 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

crusaderky Mar 2, 2021

Choose a reason for hiding this comment

Uh oh!

crusaderky Mar 2, 2021

Choose a reason for hiding this comment

Uh oh!

jrbourbeau Mar 3, 2021

Choose a reason for hiding this comment

Uh oh!

crusaderky Mar 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crusaderky Mar 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

crusaderky commented Mar 2, 2021

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

crusaderky commented Feb 26, 2021 •

edited

Loading

crusaderky Mar 2, 2021 •

edited

Loading

crusaderky Mar 2, 2021 •

edited

Loading