Skip to content

Redo dask.order.order. Fix #5584. Use structural info, not key names#5646

Merged
jcrist merged 15 commits intodask:masterfrom
eriknw:better_order_gh5584
Jan 14, 2020
Merged

Redo dask.order.order. Fix #5584. Use structural info, not key names#5646
jcrist merged 15 commits intodask:masterfrom
eriknw:better_order_gh5584

Conversation

@eriknw
Copy link
Member

@eriknw eriknw commented Nov 27, 2019

Closes #5584

This is a rewrite of dask.order.order, but the goals remain the same and many previous lessons were taken into consideration. The new version relies less on the key name by using more metrics and using a different strategy for walking up and down the DAG.

Performance appears to be about the same (often a little faster).

@TomAugspurger also suggested I add some benchmarks to dask-benchmarks.

  • Tests added / passed
  • Passes black dask / flake8 dask

…names.

This is a substantial rewrite of `dask.order.order`, but the goals remain the
same and many previous lessons were taken into consideration.  The new version
relies less on the key name by using more metrics and using a different
strategy for walking up and down the DAG.

Performance appears to be about the same (sometimes a litle faster, sometimes
a little slower, and never a lot slower).

I still need to wrap up some cosmetics (doc strings, code comments, etc).

@TomAugspurger also suggested I add some benchmarks to dask-benchmarks.
@mrocklin
Copy link
Member

Thanks @eriknw . When you get back from the holidays, I would be curious about the following:

  1. What were the workloads that motivated this change?
  2. At a high level how does the proposed change change behavior?

The ordering code is fairly core to Dask's functioning today, so changes here should be scrutinized. It would be good if someone else close to this code (@jcrist @jrbourbeau ?) could review as well.

@eriknw
Copy link
Member Author

eriknw commented Nov 27, 2019

Certainly. I expect this to be thoroughly scrutinized and will provide much more supporting material.

  1. I began by trying to understand what was happening in dask.order.order here: Tasks order and scheduling #5584 (comment). The old order would alternate up and down the DAG in a sometimes self-avoiding way, which resulted in poor ordering. This behavior may not show up in other well-patterned DAGs due to good ordering of similar keys. In the linked issue, the keys were hashes from dask.delayed, which revealed this problem that could easily happen elsewhere.

  2. At a high level, the proposed change doesn't change any previously expected or established behavior (or at least my understanding of it), but it does the ordering more robustly. It relies less on using the key and more on using the structure. The ordering looks good to me in all examples that I looked at from previous issues (but I'm sure I didn't look at or fully understand everything).

Here's a simple example from the tests where the old order failed:

def test_avoid_broker_nodes(abcde):
    r"""

    b0    b1  b2
    |      \  /
    a0      a1

    a0 should be run before a1
    """
    a, b, c, d, e = abcde

    ....

    # Switch name of 0, 1 for "b"s too
    dsk = {
        (a, 0): (f,),
        (a, 1): (f,),
        (b, 1): (f, (a, 0)),
        (b, 0): (f, (a, 1)),
        (b, 2): (f, (a, 1)),
    }
    o = order(dsk)
    assert o[(a, 0)] < o[(a, 1)]

I found the old approach--although clever and short--to be irredeemable when trying to address these. I don't mean to sound too critical (the previous way and all the discussions around it were incredibly helpful), so not improving the old way is probably a lack of imagination on my part along with my preference for trying something new.

o = order(dsk)
expected = {"y": 0}
expected.update({k: i + 1 for i, k in enumerate(x_keys)})
expected = {"y": 10}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share why this changed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looked to me that the old code here was wrong. "y" cannot be 0. It has 10 dependencies.

@eriknw
Copy link
Member Author

eriknw commented Dec 3, 2019

Here is a notebook showing a few examples:

https://gist.github.com/eriknw/0ab7cf29a4f4f6037a22ede14b7f58b8

@eriknw
Copy link
Member Author

eriknw commented Dec 3, 2019

This is awaiting further review and scrutiny.

As it turns out, I don't think this PR is all that different in spirit and practice than the original. This makes a few changes how we traverse the graph:

  1. We introduce two stacks--an inner and outer--in order to complete the current "tactical target" (an easy-to-compute root) before moving on to the next "tactical target". (There are actually three stacks if you count the pool from which to choose the initial node in of a weakly connected subgraph.)
  2. We begin at a leaf node, not a root node. Although different, this isn't a significant change in and of itself.
  3. We use three different key functions--where to start, how to choose a dependency, and how to choose a dependent--and these are designed to try to work together. I can discuss these keys in more detail upon request.

This PR adds a lot more structure. Although more complicated, I think this is also more maintainable and modifiable, and I expect to be around a while to address issues that may arise.

In my benchmarks on my machine, this PR is about the same speed or a little faster than the original. About two thirds of the time is spent in get_dependencies, reverse_dict, ndependencies, and graph_metrics (formerly ndependents).

@eriknw
Copy link
Member Author

eriknw commented Dec 4, 2019

Here is the notebook above run using master. There is at least one serious regression!

https://gist.github.com/eriknw/b8478260d6c883cc3c80a1816a993442

Here's the serious regression:

bad_master

We should not be calculation all the dependents on the top at the beginning. I'd like to add this as a test.

Avoid sorting or taking the min when there are many, many edges.
This respects the use case here: dask#3298

Minor performance improvements.
@eriknw
Copy link
Member Author

eriknw commented Dec 6, 2019

I think the people able to review this are pretty busy this week, so here are relevant previous issues and PRs:
#5584
#4310
#4098
#3652
#3588
dask/dask-ml#206
#3554
#3524
#3303
#3298
#3271
#3066
#3056
#3055
#3017
#3013

Also, here are some benchmarks.

https://gist.github.com/eriknw/82c204b7923f079e267ab245f7228f0a

For these on my machine, the current PR is always faster than master, and sometimes significantly (x3-x10) faster (perhaps when there are many more edges than nodes?). To be fair, some of the performance improvements of this PR could also be applied to master.

To reiterate, master has had regressions based on previously established and expected behavior. I believe this PR fixes these. Okay, I'll try to not revisit this issue until it's been reviewed!

@TomAugspurger
Copy link
Member

TomAugspurger commented Dec 6, 2019

Are you able to reproduce the failing test locally?

=================================== FAILURES ===================================

_____________________________ test_describe_empty ______________________________

[gw2] linux -- Python 3.6.7 /home/travis/miniconda/envs/test-environment/bin/python

I don't really see how it would be related.

Also, if you're able to, adding benchmarks to https://github.com/dask/dask-benchmarks/ can be done before this is merged. Those should be relatively easy to review.

@eriknw
Copy link
Member Author

eriknw commented Dec 6, 2019

Aha, I can reproduce the failing test:

import dask
import pandas as pd
import dask.dataframe as dd
df_len0 = pd.DataFrame({"A": [], "B": []})
ddf_len0 = dd.from_pandas(df_len0, 2)
raise_when_computed = ddf_len0.describe(percentiles_method="dask")

We expect raise_when_computed to, uh, raise when computed. However, we modify the order the tasks are run, so this the PR raises a different exception.

Here is the DAG on master:
graph-master

and here is the DAG on this PR:
graph-pr

I don't have a strong opinion which DAG to prefer. I can modify dependencies_key to once again include -total_dependencies[x] (which is used in master) to go where the work is. Actually, yeah, let me do that and add this DAG to the order tests.

@eriknw
Copy link
Member Author

eriknw commented Dec 10, 2019

Okay, so I just tried (and decided against) two things that are in master that aren't part of this PR:

  1. When performing DFS along dependencies, master may decide to proceed to a different dependent. This is sometimes desirable, but this behavior is also what causes many of the problematic behaviors in master.
    - In this PR, we can do this by creating a new inner_stack and appending the old inner_stack to a list when we decide there's a better path. However, I found it was difficult to accurately determine when it's desirable to do this. Adding this behavior may result in better ordering without significantly impacting the performance, but, for now, I think it's not worth the complexity.
  2. When deciding where to start the next DFS along dependencies, master considers dependents of finished nodes in a LIFO manner.
    - This PR does this in a FIFO manner. I experimented with deciding between FIFO and LIFO based on some criteria. This didn't work well.

I just added a new behavior to this PR. When we encounter a node that has dangling dependents--that is, nodes that are able to be computed and have no dependents--we always compute all of these nodes. We may then proceed to a different dependent. This appears to improve the ordering of graphs I've been looking at, and it seems to largely overcome the lack of having the above two behaviors.

I don't think FIFO instead of LIFO behavior will be a significant shortcoming in practice. In fact, I think FIFO is useful given the lack of (1) above--FIFO gives us a chance to go back and visit branches that we maybe should have gone down instead. I also think FIFO will work well with the distributed and multithreaded scheduler.

@jcrist I know this is a lot to digest. Let me know if you want to meet to go over this (and no rush).

eriknw added a commit to eriknw/dask-benchmarks that referenced this pull request Dec 13, 2019
Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erik and I talked this through IRL, and I'm quite happy with these changes. The new ordering is both better for these example problems, and faster in most cases (sometimes significantly). I have one comment on clarifying a docstring, but otherwise this LGTM.

3. The maximum value of the maximum number of dependencies of
all final dependencies (see module-level comment for more)
4. The minimum height from a root node
5. The maximum height from a root node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the description of 2 and 3 difficult to understand, perhaps an ascii diagram here would be useful to clarify these metrics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring updated. Hopefully better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcrist does this look OK now?

@jcrist
Copy link
Member

jcrist commented Jan 14, 2020

Thanks @eriknw, this looks good to me. Merging!

@jcrist jcrist merged commit c63dc14 into dask:master Jan 14, 2020
TomAugspurger added a commit to TomAugspurger/dask that referenced this pull request Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tasks order and scheduling

4 participants