Map on HighLevelGraph Layers by madsbk · Pull Request #6689 · dask/dask

madsbk · 2020-09-30T14:33:37Z

This PR introduces two HighLevelGraph methods:

    def map_basic_layers(
        self, func: Callable[[BasicLayer], Mapping]
    ) -> "HighLevelGraph":
        """Map `func` on each basic layer and returns a new high level graph.
        `func` should take a BasicLayer as input and return a new Mapping as output
        and **cannot** change the dependencies between Layers.
        If `func` returns a non-BasicLayer type, it will be wrapped in a `BasicLayer`
        object automatically.

        Parameters
        ----------
        func : callable
            The function to call on each BasicLayer

        Returns
        -------
        hlg : HighLevelGraph
            A high level graph containing the transformed BasicLayers and the other
            Layers untouched
        """

    def map_tasks(self, func: Callable[[Iterable], Iterable]) -> "HighLevelGraph":
        """Map `func` on all tasks and returns a new high level graph.
        `func` should take an iterable of the tasks as input and return a new
        iterable as output and **cannot** change the dependencies between Layers.

        Warning
        -------
        A layer is allowed to ignore the map on tasks that are part of its internals.
        For instance, Blockwise will only invoke `func` on the input literals.

        Parameters
        ----------
        func : callable
            The function to call on tasks

        Returns
        -------
        hlg : HighLevelGraph
            A high level graph containing the transformed tasks
        """

The motivation is to avoid materialization of high level graphs in Distributed: dask/distributed#4119

Tests added / passed
Passes black dask / flake8 dask

mrocklin · 2020-10-02T14:16:32Z

In principle this seems fine to me. At first I was thinking that maybe we should make a map_layers method instead, or make these private at first. I think that I'm inclined instead to mark layers generally as experimental and go ahead with things as they are.

jrbourbeau

Thanks for your work here @madsbk! Overall this looks good, I've left a few comments below

dask/highlevelgraph.py

jrbourbeau · 2020-10-08T18:57:07Z

dask/highlevelgraph.py

+                layers[k] = layer
+            else:
+                layers[k] = v
+        return HighLevelGraph(layers, self.dependencies)


Should we also pass self.key_dependencies through here?

In map_basic_layers() we cannot assume that key dependencies doesn't changes, func is allowed to modify both keys and tasks.
As an optimization, we could allow the user to provide a new key_dependencies but let's save that for another PR.

dask/highlevelgraph.py

dask/tests/test_highgraph.py

jrbourbeau · 2020-10-08T19:42:18Z

dask/tests/test_highgraph.py

+    if use_layer_map_task:
+        # Overwrite Blockwise.map_tasks with Layer.map_tasks
+        blockwise_layer = list(dsk.layers.values())[1]
+        blockwise_layer.map_tasks = partial(Layer.map_tasks, blockwise_layer)


I'm a little confused as to what we're trying to test here

Updated the description to more descriptive, hope it makes more sense?

if use_layer_map_task: # In order to test the default map_tasks() implementation on a Blockwise Layer, # we overwrite Blockwise.map_tasks with Layer.map_tasks blockwise_layer = list(dsk.layers.values())[1] blockwise_layer.map_tasks = partial(Layer.map_tasks, blockwise_layer)

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

jrbourbeau

Thanks for the updates @madsbk! Overall this looks good to me. I've left one small comment about a simplification we can make and another about a potential API change

dask/highlevelgraph.py

jrbourbeau · 2020-10-13T18:20:06Z

dask/highlevelgraph.py


        return HighLevelGraph(ret_layers, ret_dependencies, ret_key_deps)

+    def map_basic_layers(


API question: What do you think about having a more generic map_layers method where the func being mapped is responsible for checking the layer type? This would be to prevent us from needing to add map_*_layers methods as the number of layer types grows.

Apologies for not bringing this point up earlier in the process

My sense here is that this probably makes sense, but that it also probably doesn't need to be handled right now. The API choices here don't preclude us adding intermediate methods like map_layers later.

Sounds good, we can revisit this point later on as needed 👍

API question: What do you think about having a more generic map_layers method where the func being mapped is responsible for checking the layer type? This would be to prevent us from needing to add map_*_layers methods as the number of layer types grows.

Make sense, my thinking behind map_basic_layers() was to have a map for the layers that are materialized low level graphs. And if the user wants to apply something on all layers, it is always possible to access the layers of a HLG directly.

But I agree, let's wait a bit before we settle on the API.

jrbourbeau

LGTM, thanks for all your work here @madsbk! Planning to merge this once CI passes

madsbk added 7 commits September 30, 2020 16:24

HLG: updated doc

47baeb0

Makes sure that all layers are Layer when creating HLGs

03dee1c

cull: added return type hint

78d4f51

HLG: renamed __keys to _keys to enable HACK in distributed

11ff43c

Implemented map_basic_layers

0f95bce

Implement map_tasks()

719f0da

fixed typo

139d3f0

madsbk marked this pull request as ready for review September 30, 2020 18:48

madsbk mentioned this pull request Oct 1, 2020

Implement pass HighLevelGraphs through _graph_to_futures dask/distributed#4139

Merged

madsbk changed the title ~~Map on HighLevelGraph Layers~~ [REVIEW] Map on HighLevelGraph Layers Oct 1, 2020

This was referenced Oct 1, 2020

HighLevelGraphs to the Scheduler dask/distributed#4140

Merged

2.28.0 performance related issues #6694

Closed

madsbk mentioned this pull request Oct 8, 2020

Revert "Revert "Use HighLevelGraph layers everywhere in collections (… #6707

Merged

jrbourbeau reviewed Oct 8, 2020

View reviewed changes

jrbourbeau mentioned this pull request Oct 8, 2020

Ensure HighLevelGraph layers always contain Layer instances #6716

Merged

2 tasks

madsbk and others added 6 commits October 9, 2020 08:45

Merge branch 'master' of github.com:dask/dask into hlg_map_on_layers

915be49

Using assert_eq() in tests

4b45c8d

map_tasks(): pass along key_dependencies

c043395

map_basic_layers(): allow func to return aany kind of Layer

9e512f2

Updated comment

648f08e

Using isinstance() instead of type() in test

7c7cdad

Co-authored-by: James Bourbeau <jrbourbeau@users.noreply.github.com>

jrbourbeau reviewed Oct 13, 2020

View reviewed changes

Simplify map_basic_layers

9686d37

jrbourbeau approved these changes Oct 13, 2020

View reviewed changes

jrbourbeau changed the title ~~[REVIEW] Map on HighLevelGraph Layers~~ Map on HighLevelGraph Layers Oct 13, 2020

jrbourbeau merged commit 6c71a42 into dask:master Oct 13, 2020

kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020

Map on HighLevelGraph Layers (dask#6689)

06680f4

madsbk deleted the hlg_map_on_layers branch February 16, 2021 09:01


		return HighLevelGraph(ret_layers, ret_dependencies, ret_key_deps)

		def map_basic_layers(

Uh oh!

Conversation

madsbk commented Sep 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrocklin commented Oct 2, 2020

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

madsbk commented Sep 30, 2020 •

edited

Loading