Improve ordering for specific workloads by TomAugspurger · Pull Request #6779 · dask/dask

TomAugspurger · 2020-10-29T18:33:31Z

The current ordering algorithm struggles with workloads where there's
a single, common root node that's downstream of a tasks with some shared
dependencies.

            s
    /   /       \   \
   /   /         \   \
 s00  s10       s01  s11
  |    |         |    |
 b00  b10       b01  b11
 / \  / \       / \ / \
a0  a1  a2    a3  a4  a5

Previously, we started at a0, finished s00, and then jumped down to a3
after deciding that we should work on s01 based on a tiebreaker. We'd
instead like to work on s10, thanks to the shared dependency a1.

The ordering algorithm does just fine in the absence of the common root
node s. So this PR proposes to just ignore it, at least in the part of
the ordering where we made the wrong choice previously.

cc @eriknw and @tomwhite.

Closes #6745

The current ordering algorithm struggles with workloads where there's a single, common root node that's downstream of a tasks with some shared dependencies. s / / \ \ / / \ \ s00 s10 s01 s11 | | | | b00 b10 b01 b11 / \ / \ / \ / \ a0 a1 a2 a3 a4 a5 Previously, we started at a0, finished s00, and then jumped down to a3 after deciding that we should work on s01 based on a tiebreaker. We'd instead like to work on s10, thakns to the shared dependency a1. The ordering algorithm does just fine in the absence of the common root node `s`. So this PR proposes to just ignore it, at least in the part of the ordering where we made the wrong choice previously.

TomAugspurger · 2020-10-29T18:40:26Z

Running the original example from #6745 (comment), the memory usage stayed below ~2.5GB the whole time. Finished in 3.5 minutes.

And here's the ordering:

dask/order.py

eriknw · 2020-10-29T21:08:33Z

dask/order.py

                item = inner_stack_pop()
                if item in result:
                    continue
+                if skip_root_node and item in root_nodes:


I'm curious, how will this item get processed if we always skip it? Actually, I think I know the answer: it will get processed when we handle finish_now down below.

I guess the answer is sometimes. As you may see I had to cancel the CI since we got caught in some infinite loops. Hopefully I've caught all those through trial and error.

Interesting. In that case, this check could probably go a couple lines lower inside the if num_needed[item]: branch.

Co-authored-by: Erik Welch <erik.n.welch@gmail.com>

eriknw · 2020-10-29T21:19:12Z

Seems like a pragmatic approach. It would be nice to be able to generalize it somewhat to still work if there are multiple subgraphs that aren't connected. I'll mull this over.

TomAugspurger · 2020-10-30T11:43:20Z

Thanks, no rush at all on this.

I am a bit hesitant to add this kind of ad-hoc, special case to order, since there are potentially so many of them (we've had similar issues with leaf nodes that are common to all the tasks, for example). So my feelings won't be hurt if you think this isn't appropriate for inclusion. But this one had a pretty high payoff in terms of memory usage : lines of code, that it seemed worth proposing.

…to 6745-order

TomAugspurger · 2020-11-04T18:11:17Z

Just noting that some pangeo users are running into this issue as well (as in the sgkit example, they're using da.store).

We might want to change da.store to avoid this. But I think we'll want to fix the ordering issue regardless (though unsure if this is the best fix for the ordering issue).

TomAugspurger · 2020-11-05T14:59:34Z

And FWIW, I ran the order benchmarks in dask-benchmarks on this. They showed

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

TomAugspurger · 2020-11-06T14:58:59Z

Just confirming that when there are multiple output nodes, we do revert to the bad ordering:

Details

import dask
f = lambda *args: None

kwargs = {
    "node_attr": {"penwidth": "4"},
    "cmap": "autumn",
    "color": "order",
}


dsk = {}
for i in range(2):
    dsk[(f"a-{i}", 0)] = (0,)
    dsk[(f"a-{i}", 1)] = (1,)
    dsk[(f"a-{i}", 2)] = (2,)
    dsk[(f"b-{i}", 0)] = (f, (f"a-{i}", 0), (f"a-{i}", 1))
    dsk[(f"b-{i}", 1)] = (f, (f"a-{i}", 1), (f"a-{i}", 2))
    dsk[(f"store-{i}", 0, 0)] = (f"b-{i}", 0)
    dsk[(f"store-{i}", 1, 0)] = (f"b-{i}", 1)

    # right half
    dsk[(f"a-{i}", 3)] = (3,)
    dsk[(f"a-{i}", 4)] = (4,)
    dsk[(f"a-{i}", 5)] = (5,)
    dsk[(f"b-{i}", 2)] = (f, (f"a-{i}", 3), (f"a-{i}", 4))
    dsk[(f"b-{i}", 3)] = (f, (f"a-{i}", 4), (f"a-{i}", 5))
    dsk[(f"store-{i}", 0, 1)] = (f"b-{i}", 2)
    dsk[(f"store-{i}", 1, 1)] = (f"b-{i}", 3)
    dsk[f"store-{i}"] = (
        f,
        (f"store-{i}", 0, 0),
        (f"store-{i}", 1, 0),
        (f"store-{i}", 0, 1),
        (f"store-{i}", 1, 1),
    )

dask.visualize(dsk, filename="multi-out", **kwargs)

This is identical to the ordering on master. We'd like for the 9 and 6 in the second row to swap (and the 24 and 21).

I'll look a bit more at this today, time permitting, but I'm a bit pessimistic that we'll be able to solve it. Simply expanding the check to skip all the root nodes causes other ordering problems.

eriknw · 2020-11-09T20:25:34Z

I've looked at this again, and agree that doing better than what's currently in this PR would be difficult. I don't see any "easy wins" to be gained from minor tweaks. I'm okay if we merge this PR as is.

TomAugspurger · 2020-11-16T15:02:19Z

Thanks Erik. cc @mrocklin or @jcrist if you have concerns / want to merge.

mrocklin · 2020-11-19T17:29:36Z

Let us not let better be the enemy of the good.

Merging. Thanks @TomAugspurger, @eriknw, and others

TomAugspurger added 2 commits October 29, 2020 14:14

singletons

f90d068

avoid loops

0f94d49

eriknw reviewed Oct 29, 2020

View reviewed changes

dask/order.py Outdated Show resolved Hide resolved

eriknw reviewed Oct 29, 2020

View reviewed changes

Update dask/order.py

71f625f

Co-authored-by: Erik Welch <erik.n.welch@gmail.com>

TomAugspurger added 3 commits November 4, 2020 11:57

Merge remote-tracking branch 'upstream/master' into 6745-order

8f4ac88

Move check

ed085b4

Merge branch '6745-order' of https://github.com/TomAugspurger/dask in…

ccdddb8

…to 6745-order

mrocklin merged commit 439c4ab into dask:master Nov 19, 2020

fjetter mentioned this pull request Sep 26, 2023

[WIP] Fixes for dask.order - Remove change of tactical goal in single dep path #10505

Closed

Uh oh!

Conversation

TomAugspurger commented Oct 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Oct 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eriknw Oct 29, 2020

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Oct 29, 2020

Choose a reason for hiding this comment

Uh oh!

eriknw Oct 29, 2020

Choose a reason for hiding this comment

Uh oh!

eriknw commented Oct 29, 2020

Uh oh!

TomAugspurger commented Oct 30, 2020

Uh oh!

TomAugspurger commented Nov 4, 2020

Uh oh!

TomAugspurger commented Nov 5, 2020

Uh oh!

TomAugspurger commented Nov 6, 2020

Uh oh!

eriknw commented Nov 9, 2020

Uh oh!

TomAugspurger commented Nov 16, 2020

Uh oh!

mrocklin commented Nov 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TomAugspurger commented Oct 29, 2020 •

edited

Loading

TomAugspurger commented Oct 29, 2020 •

edited

Loading