Skip to content

Improve ordering for specific workloads#6779

Merged
mrocklin merged 7 commits intodask:masterfrom
TomAugspurger:6745-order
Nov 19, 2020
Merged

Improve ordering for specific workloads#6779
mrocklin merged 7 commits intodask:masterfrom
TomAugspurger:6745-order

Conversation

@TomAugspurger
Copy link
Copy Markdown
Member

@TomAugspurger TomAugspurger commented Oct 29, 2020

The current ordering algorithm struggles with workloads where there's
a single, common root node that's downstream of a tasks with some shared
dependencies.

            s
    /   /       \   \
   /   /         \   \
 s00  s10       s01  s11
  |    |         |    |
 b00  b10       b01  b11
 / \  / \       / \ / \
a0  a1  a2    a3  a4  a5

Previously, we started at a0, finished s00, and then jumped down to a3
after deciding that we should work on s01 based on a tiebreaker. We'd
instead like to work on s10, thanks to the shared dependency a1.

The ordering algorithm does just fine in the absence of the common root
node s. So this PR proposes to just ignore it, at least in the part of
the ordering where we made the wrong choice previously.

cc @eriknw and @tomwhite.

Closes #6745

The current ordering algorithm struggles with workloads where there's
a single, common root node that's downstream of a tasks with some shared
dependencies.

                s
        /   /       \   \
       /   /         \   \
     s00  s10       s01  s11
      |    |         |    |
     b00  b10       b01  b11
     / \  / \       / \ / \
    a0  a1  a2    a3  a4  a5

Previously, we started at a0, finished s00, and then jumped down to a3
after deciding that we should work on s01 based on a tiebreaker. We'd
instead like to work on s10, thakns to the shared dependency a1.

The ordering algorithm does just fine in the absence of the common root
node `s`. So this PR proposes to just ignore it, at least in the part of
the ordering where we made the wrong choice previously.
@TomAugspurger
Copy link
Copy Markdown
Member Author

TomAugspurger commented Oct 29, 2020

Running the original example from #6745 (comment), the memory usage stayed below ~2.5GB the whole time. Finished in 3.5 minutes.

And here's the ordering:

min-example-order

dask/order.py Outdated
item = inner_stack_pop()
if item in result:
continue
if skip_root_node and item in root_nodes:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, how will this item get processed if we always skip it? Actually, I think I know the answer: it will get processed when we handle finish_now down below.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the answer is sometimes. As you may see I had to cancel the CI since we got caught in some infinite loops. Hopefully I've caught all those through trial and error.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. In that case, this check could probably go a couple lines lower inside the if num_needed[item]: branch.

Co-authored-by: Erik Welch <erik.n.welch@gmail.com>
@eriknw
Copy link
Copy Markdown
Member

eriknw commented Oct 29, 2020

Seems like a pragmatic approach. It would be nice to be able to generalize it somewhat to still work if there are multiple subgraphs that aren't connected. I'll mull this over.

@TomAugspurger
Copy link
Copy Markdown
Member Author

Thanks, no rush at all on this.

I am a bit hesitant to add this kind of ad-hoc, special case to order, since there are potentially so many of them (we've had similar issues with leaf nodes that are common to all the tasks, for example). So my feelings won't be hurt if you think this isn't appropriate for inclusion. But this one had a pretty high payoff in terms of memory usage : lines of code, that it seemed worth proposing.

@TomAugspurger
Copy link
Copy Markdown
Member Author

Just noting that some pangeo users are running into this issue as well (as in the sgkit example, they're using da.store).

We might want to change da.store to avoid this. But I think we'll want to fix the ordering issue regardless (though unsure if this is the best fix for the ordering issue).

@TomAugspurger
Copy link
Copy Markdown
Member Author

And FWIW, I ran the order benchmarks in dask-benchmarks on this. They showed

BENCHMARKS NOT SIGNIFICANTLY CHANGED.

@TomAugspurger
Copy link
Copy Markdown
Member Author

Just confirming that when there are multiple output nodes, we do revert to the bad ordering:

Details
import dask
f = lambda *args: None

kwargs = {
    "node_attr": {"penwidth": "4"},
    "cmap": "autumn",
    "color": "order",
}


dsk = {}
for i in range(2):
    dsk[(f"a-{i}", 0)] = (0,)
    dsk[(f"a-{i}", 1)] = (1,)
    dsk[(f"a-{i}", 2)] = (2,)
    dsk[(f"b-{i}", 0)] = (f, (f"a-{i}", 0), (f"a-{i}", 1))
    dsk[(f"b-{i}", 1)] = (f, (f"a-{i}", 1), (f"a-{i}", 2))
    dsk[(f"store-{i}", 0, 0)] = (f"b-{i}", 0)
    dsk[(f"store-{i}", 1, 0)] = (f"b-{i}", 1)

    # right half
    dsk[(f"a-{i}", 3)] = (3,)
    dsk[(f"a-{i}", 4)] = (4,)
    dsk[(f"a-{i}", 5)] = (5,)
    dsk[(f"b-{i}", 2)] = (f, (f"a-{i}", 3), (f"a-{i}", 4))
    dsk[(f"b-{i}", 3)] = (f, (f"a-{i}", 4), (f"a-{i}", 5))
    dsk[(f"store-{i}", 0, 1)] = (f"b-{i}", 2)
    dsk[(f"store-{i}", 1, 1)] = (f"b-{i}", 3)
    dsk[f"store-{i}"] = (
        f,
        (f"store-{i}", 0, 0),
        (f"store-{i}", 1, 0),
        (f"store-{i}", 0, 1),
        (f"store-{i}", 1, 1),
    )

dask.visualize(dsk, filename="multi-out", **kwargs)

multi-out

This is identical to the ordering on master. We'd like for the 9 and 6 in the second row to swap (and the 24 and 21).

I'll look a bit more at this today, time permitting, but I'm a bit pessimistic that we'll be able to solve it. Simply expanding the check to skip all the root nodes causes other ordering problems.

@eriknw
Copy link
Copy Markdown
Member

eriknw commented Nov 9, 2020

I've looked at this again, and agree that doing better than what's currently in this PR would be difficult. I don't see any "easy wins" to be gained from minor tweaks. I'm okay if we merge this PR as is.

@TomAugspurger
Copy link
Copy Markdown
Member Author

Thanks Erik. cc @mrocklin or @jcrist if you have concerns / want to merge.

@mrocklin
Copy link
Copy Markdown
Member

Let us not let better be the enemy of the good.

Merging. Thanks @TomAugspurger, @eriknw, and others

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Concatenating then rechunking zarr files uses lots of memory

3 participants