Redo `dask.order.order`. Fix #5584. Use structural info, not key names by eriknw · Pull Request #5646 · dask/dask

eriknw · 2019-11-27T20:22:06Z

This is a rewrite of dask.order.order, but the goals remain the same and many previous lessons were taken into consideration. The new version relies less on the key name by using more metrics and using a different strategy for walking up and down the DAG.

Performance appears to be about the same (often a little faster).

@TomAugspurger also suggested I add some benchmarks to dask-benchmarks.

Tests added / passed
Passes black dask / flake8 dask

@TomAugspurger

…names. This is a substantial rewrite of `dask.order.order`, but the goals remain the same and many previous lessons were taken into consideration. The new version relies less on the key name by using more metrics and using a different strategy for walking up and down the DAG. Performance appears to be about the same (sometimes a litle faster, sometimes a little slower, and never a lot slower). I still need to wrap up some cosmetics (doc strings, code comments, etc). @TomAugspurger also suggested I add some benchmarks to dask-benchmarks.

mrocklin · 2019-11-27T21:12:52Z

Thanks @eriknw . When you get back from the holidays, I would be curious about the following:

What were the workloads that motivated this change?
At a high level how does the proposed change change behavior?

The ordering code is fairly core to Dask's functioning today, so changes here should be scrutinized. It would be good if someone else close to this code (@jcrist @jrbourbeau ?) could review as well.

eriknw · 2019-11-27T21:53:52Z

Certainly. I expect this to be thoroughly scrutinized and will provide much more supporting material.

I began by trying to understand what was happening in dask.order.order here: Tasks order and scheduling #5584 (comment). The old order would alternate up and down the DAG in a sometimes self-avoiding way, which resulted in poor ordering. This behavior may not show up in other well-patterned DAGs due to good ordering of similar keys. In the linked issue, the keys were hashes from dask.delayed, which revealed this problem that could easily happen elsewhere.
At a high level, the proposed change doesn't change any previously expected or established behavior (or at least my understanding of it), but it does the ordering more robustly. It relies less on using the key and more on using the structure. The ordering looks good to me in all examples that I looked at from previous issues (but I'm sure I didn't look at or fully understand everything).

Here's a simple example from the tests where the old order failed:

def test_avoid_broker_nodes(abcde):
    r"""

    b0    b1  b2
    |      \  /
    a0      a1

    a0 should be run before a1
    """
    a, b, c, d, e = abcde

    ....

    # Switch name of 0, 1 for "b"s too
    dsk = {
        (a, 0): (f,),
        (a, 1): (f,),
        (b, 1): (f, (a, 0)),
        (b, 0): (f, (a, 1)),
        (b, 2): (f, (a, 1)),
    }
    o = order(dsk)
    assert o[(a, 0)] < o[(a, 1)]

I found the old approach--although clever and short--to be irredeemable when trying to address these. I don't mean to sound too critical (the previous way and all the discussions around it were incredibly helpful), so not improving the old way is probably a lack of imagination on my part along with my preference for trying something new.

dask/tests/test_order.py

TomAugspurger · 2019-12-03T17:13:30Z

dask/tests/test_order.py

    o = order(dsk)
-    expected = {"y": 0}
-    expected.update({k: i + 1 for i, k in enumerate(x_keys)})
+    expected = {"y": 10}


Can you share why this changed?

It looked to me that the old code here was wrong. "y" cannot be 0. It has 10 dependencies.

dask/tests/test_order.py

eriknw · 2019-12-03T19:27:38Z

Here is a notebook showing a few examples:

https://gist.github.com/eriknw/0ab7cf29a4f4f6037a22ede14b7f58b8

eriknw · 2019-12-03T19:54:13Z

This is awaiting further review and scrutiny.

As it turns out, I don't think this PR is all that different in spirit and practice than the original. This makes a few changes how we traverse the graph:

We introduce two stacks--an inner and outer--in order to complete the current "tactical target" (an easy-to-compute root) before moving on to the next "tactical target". (There are actually three stacks if you count the pool from which to choose the initial node in of a weakly connected subgraph.)
We begin at a leaf node, not a root node. Although different, this isn't a significant change in and of itself.
We use three different key functions--where to start, how to choose a dependency, and how to choose a dependent--and these are designed to try to work together. I can discuss these keys in more detail upon request.

This PR adds a lot more structure. Although more complicated, I think this is also more maintainable and modifiable, and I expect to be around a while to address issues that may arise.

In my benchmarks on my machine, this PR is about the same speed or a little faster than the original. About two thirds of the time is spent in get_dependencies, reverse_dict, ndependencies, and graph_metrics (formerly ndependents).

eriknw · 2019-12-04T18:20:29Z

Here is the notebook above run using master. There is at least one serious regression!

https://gist.github.com/eriknw/b8478260d6c883cc3c80a1816a993442

Here's the serious regression:

We should not be calculation all the dependents on the top at the beginning. I'd like to add this as a test.

Avoid sorting or taking the min when there are many, many edges. This respects the use case here: dask#3298 Minor performance improvements.

eriknw · 2019-12-06T15:30:28Z

I think the people able to review this are pretty busy this week, so here are relevant previous issues and PRs:
#5584
#4310
#4098
#3652
#3588
dask/dask-ml#206
#3554
#3524
#3303
#3298
#3271
#3066
#3056
#3055
#3017
#3013

Also, here are some benchmarks.

https://gist.github.com/eriknw/82c204b7923f079e267ab245f7228f0a

For these on my machine, the current PR is always faster than master, and sometimes significantly (x3-x10) faster (perhaps when there are many more edges than nodes?). To be fair, some of the performance improvements of this PR could also be applied to master.

To reiterate, master has had regressions based on previously established and expected behavior. I believe this PR fixes these. Okay, I'll try to not revisit this issue until it's been reviewed!

TomAugspurger · 2019-12-06T18:00:21Z

Are you able to reproduce the failing test locally?

=================================== FAILURES ===================================

_____________________________ test_describe_empty ______________________________

[gw2] linux -- Python 3.6.7 /home/travis/miniconda/envs/test-environment/bin/python

I don't really see how it would be related.

Also, if you're able to, adding benchmarks to https://github.com/dask/dask-benchmarks/ can be done before this is merged. Those should be relatively easy to review.

eriknw · 2019-12-06T19:11:31Z

Aha, I can reproduce the failing test:

import dask
import pandas as pd
import dask.dataframe as dd
df_len0 = pd.DataFrame({"A": [], "B": []})
ddf_len0 = dd.from_pandas(df_len0, 2)
raise_when_computed = ddf_len0.describe(percentiles_method="dask")

We expect raise_when_computed to, uh, raise when computed. However, we modify the order the tasks are run, so this the PR raises a different exception.

Here is the DAG on master:

and here is the DAG on this PR:

I don't have a strong opinion which DAG to prefer. I can modify dependencies_key to once again include -total_dependencies[x] (which is used in master) to go where the work is. Actually, yeah, let me do that and add this DAG to the order tests.

Re-add `total_dependencies` in `dependencies_key`.

…nce tweaks.

eriknw · 2019-12-10T19:17:59Z

Okay, so I just tried (and decided against) two things that are in master that aren't part of this PR:

When performing DFS along dependencies, master may decide to proceed to a different dependent. This is sometimes desirable, but this behavior is also what causes many of the problematic behaviors in master.
- In this PR, we can do this by creating a new inner_stack and appending the old inner_stack to a list when we decide there's a better path. However, I found it was difficult to accurately determine when it's desirable to do this. Adding this behavior may result in better ordering without significantly impacting the performance, but, for now, I think it's not worth the complexity.
When deciding where to start the next DFS along dependencies, master considers dependents of finished nodes in a LIFO manner.
- This PR does this in a FIFO manner. I experimented with deciding between FIFO and LIFO based on some criteria. This didn't work well.

I just added a new behavior to this PR. When we encounter a node that has dangling dependents--that is, nodes that are able to be computed and have no dependents--we always compute all of these nodes. We may then proceed to a different dependent. This appears to improve the ordering of graphs I've been looking at, and it seems to largely overcome the lack of having the above two behaviors.

I don't think FIFO instead of LIFO behavior will be a significant shortcoming in practice. In fact, I think FIFO is useful given the lack of (1) above--FIFO gives us a chance to go back and visit branches that we maybe should have gone down instead. I also think FIFO will work well with the distributed and multithreaded scheduler.

@jcrist I know this is a lot to digest. Let me know if you want to meet to go over this (and no rush).

xref dask#27 xref dask/dask#5646

jcrist

Erik and I talked this through IRL, and I'm quite happy with these changes. The new ordering is both better for these example problems, and faster in most cases (sometimes significantly). I have one comment on clarifying a docstring, but otherwise this LGTM.

jcrist · 2019-12-17T17:55:28Z

dask/order.py

+    3.  The maximum value of the maximum number of dependencies of
+        all final dependencies (see module-level comment for more)
+    4.  The minimum height from a root node
+    5.  The maximum height from a root node


I find the description of 2 and 3 difficult to understand, perhaps an ascii diagram here would be useful to clarify these metrics?

docstring updated. Hopefully better.

@jcrist does this look OK now?

… exists

jcrist · 2020-01-14T17:15:30Z

Thanks @eriknw, this looks good to me. Merging!

…not key names (dask#5646)" This reverts commit c63dc14.

eriknw added 2 commits November 27, 2019 13:57

run black

9b1e9ca

Clean up: update docstrings, code comments, and some performance tweaks

6b9f87d

TomAugspurger reviewed Dec 3, 2019

View reviewed changes

Remove commented out line in test

f4e918d

Merge branch 'master' into better_order_gh5584

6dca992

Add test that regressed on master.

911efd5

Avoid sorting or taking the min when there are many, many edges. This respects the use case here: dask#3298 Minor performance improvements.

Merge branch 'master' into better_order_gh5584

34bad4c

eriknw added 2 commits December 6, 2019 19:29

Fix failing test and add regression test.

0e6c602

Re-add `total_dependencies` in `dependencies_key`.

Don't leave any dangling single nodes in order. Also, some performa…

2879775

…nce tweaks.

eriknw added 3 commits December 10, 2019 13:25

Run black (correct version?)

a511928

Fix typo in doctest

9dc8ac2

Fix typo in doctest

658da5d

eriknw added a commit to eriknw/dask-benchmarks that referenced this pull request Dec 13, 2019

Add benchmarks for dask.order.order.

09694e7

xref dask#27 xref dask/dask#5646

eriknw mentioned this pull request Dec 13, 2019

Add benchmarks for dask.order.order. dask/dask-benchmarks#28

Merged

jcrist reviewed Dec 17, 2019

View reviewed changes

eriknw added 3 commits December 17, 2019 14:44

Improve docstring of graph_metrics; also, detect and raise if cycle…

7e810f2

… exists

oops. test cycle detection in order with non-string keys

614cc8c

Pre-compute initial_stack_key in order (for performance)

90fafd1

jcrist merged commit c63dc14 into dask:master Jan 14, 2020

jrbourbeau mentioned this pull request Jan 14, 2020

Update order to handle empty task graph #5789

Merged

2 tasks

eriknw mentioned this pull request Mar 25, 2020

Update dask.order.order to consider "next" nodes using both FIFO and LIFO #5872

Merged

2 tasks

TomAugspurger added a commit to TomAugspurger/dask that referenced this pull request Mar 26, 2020

Revert "Redo dask.order.order. Fix dask#5584. Use structural info, …

74bf502

…not key names (dask#5646)" This reverts commit c63dc14.

Uh oh!

Conversation

eriknw commented Nov 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrocklin commented Nov 27, 2019

Uh oh!

eriknw commented Nov 27, 2019

Uh oh!

Uh oh!

TomAugspurger Dec 3, 2019

Choose a reason for hiding this comment

Uh oh!

eriknw Dec 3, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eriknw commented Dec 3, 2019

Uh oh!

eriknw commented Dec 3, 2019

Uh oh!

eriknw commented Dec 4, 2019

Uh oh!

eriknw commented Dec 6, 2019

Uh oh!

TomAugspurger commented Dec 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eriknw commented Dec 6, 2019

Uh oh!

eriknw commented Dec 10, 2019

Uh oh!

jcrist left a comment

Choose a reason for hiding this comment

Uh oh!

jcrist Dec 17, 2019

Choose a reason for hiding this comment

Uh oh!

eriknw Dec 17, 2019

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Jan 9, 2020

Choose a reason for hiding this comment

Uh oh!

jcrist commented Jan 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eriknw commented Nov 27, 2019 •

edited

Loading

TomAugspurger commented Dec 6, 2019 •

edited

Loading