Implement pass HighLevelGraphs through _graph_to_futures by madsbk · Pull Request #4139 · dask/distributed

madsbk · 2020-10-01T08:22:23Z

This PR implements #4119 by using the map methods introduced in dask/dask#6689

[Question]

What is the policy when a Distributed PR depend on a new Dask feature? Should the PR implement a fall back implementation if the version of Dask is older than Distributed? Or do we assume that Dask and Distributed are the same version?

Notice

CI will fail until dask/dask#6689 has been merged.

mrocklin · 2020-10-02T14:03:07Z

What is the policy when a Distributed PR depend on a new Dask feature? Should the PR implement a fall back implementation if the version of Dask is older than Distributed? Or do we assume that Dask and Distributed are the same version?

If we can build in some flexibility that's nice, but there have also been releases where we have strict pinnings. It's ok for us to do that if necessary.

mrocklin · 2020-10-02T14:18:10Z

Thanks for writing this up. Two pieces of follow on work seem evident.

We should think about computing dependencies on the scheduler
We should think about computing order / priorities on the scheduler

jrbourbeau · 2020-10-13T20:49:55Z

Restarting CI here since dask/dask#6689 has been merged

jrbourbeau · 2020-10-14T18:28:17Z

Looking into CI failures now. Interestingly, some tests (like distributed/tests/test_client.py::test_get_sync) don't fail when run individually but do fail when running the full test suite. Other tests like distributed/dashboard/tests/test_scheduler_bokeh.py::test_compute_per_key consistently fail individually with

E           RuntimeError: Cycle detected between the following keys:
E             ->

…_graph_to_futures

madsbk · 2020-10-19T15:26:02Z

dask/dask#6747 hopefully fixes this issue.

jrbourbeau · 2020-10-19T16:55:44Z

Restarting CI here since dask/dask#6747 has been merged

…_graph_to_futures

…ributed into hlg_through_graph_to_futures

madsbk · 2020-10-20T09:20:33Z

I cannot reproduce the CI errors (test_broken_worker_during_computation) on my own machine and I don't think it has anything to do with this PR.
The PR doesn't seems to introduce any significant overhead either when running the following.

from distributed import Client, LocalCluster
from dask.datasets import timeseries
from dask.dataframe.shuffle import shuffle
from dask.distributed import wait
import dask.dataframe as dd
import dask
import time
import argparse
from dask.base import tokenize
from dask.dataframe.core import new_dd_object
from dask.highlevelgraph import HighLevelGraph
import pandas as pd


def create_empty_df(npartitions):
    d = dask.delayed(pd.DataFrame)
    meta = dd.from_pandas(pd.DataFrame({"a": []}), npartitions=npartitions)._meta
    ret = dd.from_delayed([d({"a": []}) for _ in range(npartitions)])
    return ret


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("npartitions", type=int)
    parser.add_argument("backend", choices=["tasks", "dynamic-tasks"])
    parser.add_argument("--no-optimize", action="store_false")
    parser.add_argument("--max-branch", type=int, default=None)
    args = parser.parse_args()

    cluster = LocalCluster(n_workers=1, threads_per_worker=1)
    client = Client(cluster)

    df = create_empty_df(args.npartitions)
    t1 = time.time()
    df = shuffle(df, "a", shuffle=args.backend, max_branch=args.max_branch)
    df = df.persist(optimize_graph=args.no_optimize)
    t2 = time.time()
    print(f"persist: {t2-t1}")
    wait(df)
    t3 = time.time()
    print(f"persist: {t2-t1}, compute: {t3-t2}")

    client.close()

@jrbourbeau, any thoughts?

jrbourbeau · 2020-10-20T16:19:09Z

Yeah, I think the test_broken_worker_during_computation is unrelated to the changes here as I've seen it pop up in other PRs. I've opened up #4173 for tracking the failure.

This PR also relies on recent changes in Dask (e.g. HighLevelGraph.map_tasks) so I'm going to bump the minimum supported Dask version. Ping @dask/maintenance if there are any objections to bumping our Dask version here (this was the PR I brought up on our call earlier today)

jrbourbeau

Thanks @madsbk! This is in

…k#4139)" This reverts commit dd28f7a.

madsbk added 2 commits October 1, 2020 08:52

Performing alias substitution using map_basic_layers()

cca176c

Performing future unpacking through map_tasks()

3c061fa

madsbk mentioned this pull request Oct 8, 2020

Revert "Revert "Use HighLevelGraph layers everywhere in collections (… dask/dask#6707

Merged

get_dependencies -> get_all_dependencies

6cd48bc

Merge branch 'master' of github.com:dask/distributed into hlg_through…

fde14e7

…_graph_to_futures

jrbourbeau and others added 3 commits October 19, 2020 22:01

Trigger CI

f2ef284

Merge branch 'master' of github.com:dask/distributed into hlg_through…

a7afdfc

…_graph_to_futures

Merge branch 'hlg_through_graph_to_futures' of github.com:madsbk/dist…

f927f88

…ributed into hlg_through_graph_to_futures

Bump minimum dask version

4424c3d

jrbourbeau approved these changes Oct 20, 2020

View reviewed changes

jrbourbeau merged commit dd28f7a into dask:master Oct 20, 2020

madsbk deleted the hlg_through_graph_to_futures branch October 21, 2020 07:50

rjzamora mentioned this pull request Oct 21, 2020

[BUG] Failing to find dependencies in tuple of futures #4177

Closed

pentschev added a commit to pentschev/distributed that referenced this pull request Oct 23, 2020

Revert "Implement pass HighLevelGraphs through _graph_to_futures (das…

6cb3962

…k#4139)" This reverts commit dd28f7a.

sonicxml added a commit to sonicxml/distributed that referenced this pull request Dec 3, 2020

Revert "Implement pass HighLevelGraphs through _graph_to_futures (das…

a3048f0

…k#4139)" This reverts commit dd28f7a.

TomAugspurger mentioned this pull request Feb 15, 2021

Pass HighLevelGraphs through graph_to_fututures #4119

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement pass HighLevelGraphs through _graph_to_futures#4139

Implement pass HighLevelGraphs through _graph_to_futures#4139
jrbourbeau merged 8 commits intodask:masterfrom
madsbk:hlg_through_graph_to_futures

madsbk commented Oct 1, 2020 •

edited

Loading

Uh oh!

mrocklin commented Oct 2, 2020

Uh oh!

mrocklin commented Oct 2, 2020

Uh oh!

jrbourbeau commented Oct 13, 2020

Uh oh!

jrbourbeau commented Oct 14, 2020

Uh oh!

madsbk commented Oct 19, 2020

Uh oh!

jrbourbeau commented Oct 19, 2020

Uh oh!

madsbk commented Oct 20, 2020 •

edited

Loading

Uh oh!

jrbourbeau commented Oct 20, 2020

Uh oh!

jrbourbeau left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

madsbk commented Oct 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[Question]

Notice

Uh oh!

mrocklin commented Oct 2, 2020

Uh oh!

mrocklin commented Oct 2, 2020

Uh oh!

jrbourbeau commented Oct 13, 2020

Uh oh!

jrbourbeau commented Oct 14, 2020

Uh oh!

madsbk commented Oct 19, 2020

Uh oh!

jrbourbeau commented Oct 19, 2020

Uh oh!

madsbk commented Oct 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrbourbeau commented Oct 20, 2020

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

madsbk commented Oct 1, 2020 •

edited

Loading

madsbk commented Oct 20, 2020 •

edited

Loading