Full traversal in dask.order by mrocklin · Pull Request #3066 · dask/dask

mrocklin · 2018-01-14T20:03:00Z

Tests added / passed
Passes flake8 dask
Fully documented, including docs/source/changelog.rst for all changes
and one of the docs/source/*-api.rst files for new API

mrocklin · 2018-01-15T01:05:37Z

So this test seems to conflict with the following situation:

    def test_avoid_upwards_branching(abcde):
        """
             a1
             |
             a2
             |
             a3    d1
            /  \  /
          b1    c1
          |     |
          b2    c2
                |
                c3
    
        Prefer b1 over c1 because it won't stick around waiting for d1 to complete
        """
        a, b, c, d, e = abcde
        dsk = {(a, 1): (f, (a, 2)),
               (a, 2): (f, (a, 3)),
               (a, 3): (f, (b, 1), (c, 1)),
               (b, 1): (f, (b, 2)),
               (c, 1): (f, (c, 2)),
               (c, 2): (f, (c, 3)),
               (d, 1): (f, (c, 1))}
    
        o = order(dsk)

In this test we want to select b, but doing so requires that we start descending from a. The test in this PR asks that we start from short top-nodes because they can be finished more quickly, allowing their children to be released once a long-running co-parent chain starts running. The test in this PR asks us to start descending from short top-nodes. The test in this comment asks that we start from long top-nodes.

This brings up the question of not doing top-down depth first, but something that wanders around a bit more or swaps values during traversal.

mrocklin · 2018-01-16T21:02:18Z

OK, there is a solution here that solves a number of problems modestly well. It is sacrifices perfection in a few cases for "good enough" in more. The traversal is more complex now and may be more expensive. I'll have to do some benchmarking. Here are the visualizations for a couple of troublesome graphs.

import matplotlib.pyplot as plt

import dask.array as da
n = 10
x  = da.random.normal(size=(n, 100), chunks=(1, 100))
y  = da.random.normal(size=(n,), chunks=(1,))
xy = (x * y[:, None]).cumsum(axis=0)
xx = (x[:, None, :] * x[:, :, None]).cumsum(axis=0)
beta = da.stack([da.linalg.solve(xx[i], xy[i]) for i in range(xx.shape[0])],
                axis=0)
ey = (x * beta).sum(axis=1)
ey.visualize('dask.png', color='order', node_attr=dict(penwidth='8'),
             optimize_graph=True, cmap=plt.cm.RdBu)


A, B = 10, 99
x = da.random.normal(size=(A, B), chunks=(1, None))
for _ in range(2):
    y = (x[:, None, :] * x[:, :, None]).cumsum(axis=0)
    x = x.cumsum(axis=0)
w = (y * x[:, None]).sum(axis=(1,2))

w.visualize('gh3055.png', color='order', node_attr=dict(penwidth='8'),
            cmap=plt.cm.RdBu)

mrocklin · 2018-01-16T21:04:50Z

@eriknw and @jcrist might both find this PR of interest

mrocklin · 2018-01-16T21:12:58Z

Anecdotal timing information

%%time
n = 100
x  = da.random.normal(size=(n, 100), chunks=(1, 100))
y  = da.random.normal(size=(n,), chunks=(1,))
xy = (x * y[:, None]).cumsum(axis=0)
xx = (x[:, None, :] * x[:, :, None]).cumsum(axis=0)
beta = da.stack([da.linalg.solve(xx[i], xy[i]) for i in range(xx.shape[0])],
                axis=0)
ey = (x * beta).sum(axis=1)
CPU times: user 239 ms, sys: 4.21 ms, total: 243 ms
Wall time: 233 ms

%time dsk = dict(ey.__dask_graph__())
CPU times: user 74.1 ms, sys: 0 ns, total: 74.1 ms
Wall time: 73.2 ms

Master: 25ms

%timeit _ = order(dsk)
10 loops, best of 3: 25.9 ms per loop

This branch: 39ms

%timeit _ = order(dsk)
10 loops, best of 3: 39.3 ms per loop

This is non-trivial, but possibly still worth it.

This depends on dask/dask#3066 to pass

mrocklin · 2018-02-01T20:32:40Z

Merging this soon if there are no further comments. Alternatively @jcrist, you're probably the most likely to review this. Have you had a chance to look it over?

jcrist · 2018-02-01T20:39:35Z

Looking now.

jcrist

Overall this looks good to me. I can't comment on whether the algorithm change is a benefit overall, but the new tests look good and the implementation seems clean.

jcrist · 2018-02-01T20:44:45Z

dask/tests/test_order.py

    ndependencies(dependencies, dependents)


+@pytest.mark.xfail(reason="Can't please 'em all")


jcrist · 2018-02-01T20:45:50Z

dask/tests/test_order.py

    o = order(dsk)
    L = [o[k] for k in w.__dask_keys__()]
-    assert sorted(L) == L[::-1]
+    assert sorted(L) == L[::-1] or sorted(L) == L


Is this non-determinism necessary?

mrocklin · 2018-02-01T21:11:06Z

@jcrist ok to merge on passed tests?

jcrist · 2018-02-01T21:12:28Z

Fine by me.

* Remove worker prioritization We now trust the scheduler's priority entirely * Keep previously xfailed test This depends on dask/dask#3066 to pass * use defaultdicts in GroupProgress

mrocklin mentioned this pull request Jan 16, 2018

groupby on dask objects doesn't handle chunks well pydata/xarray#1832

Closed

mrocklin changed the title ~~[WIP] Add failing test for short dependent chains~~ Full traversal in dask.order Jan 16, 2018

mrocklin closed this Jan 16, 2018

mrocklin reopened this Jan 16, 2018

mrocklin force-pushed the order-up branch 2 times, most recently from 5fc13fa to fcc0af0 Compare January 31, 2018 12:39

mrocklin added a commit to mrocklin/distributed that referenced this pull request Feb 1, 2018

Keep previously xfailed test

1c18c0e

This depends on dask/dask#3066 to pass

mrocklin mentioned this pull request Feb 1, 2018

Sort by grouper values prior to calling groupby-apply #3118

Merged

3 tasks

mrocklin force-pushed the order-up branch from fcc0af0 to 09196f8 Compare February 1, 2018 17:37

mrocklin added 8 commits February 1, 2018 14:19

Add failing test for short dependent chains

62b4cce

add more ordering tests

2c0aef3

Implement full traversal in order

9da1a94

relax tests and clean up old code

90ad97b

pass on cycle test

8b7076c

remove self references in dependencies in order

66a7a24

add changelog entry

f24751f

remove graph cycle test

9d0aa1c

mrocklin force-pushed the order-up branch from 09196f8 to 9d0aa1c Compare February 1, 2018 19:22

jcrist reviewed Feb 1, 2018

View reviewed changes

remove reversed list in test

c736c53

mrocklin merged commit cceb6e2 into dask:master Feb 1, 2018

mrocklin deleted the order-up branch February 1, 2018 21:21

eriknw mentioned this pull request Dec 6, 2019

Redo dask.order.order. Fix #5584. Use structural info, not key names #5646

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Full traversal in dask.order#3066

Full traversal in dask.order#3066
mrocklin merged 9 commits intodask:masterfrom
mrocklin:order-up

mrocklin commented Jan 14, 2018 •

edited

Loading

Uh oh!

mrocklin commented Jan 15, 2018

Uh oh!

mrocklin commented Jan 16, 2018

Uh oh!

mrocklin commented Jan 16, 2018

Uh oh!

mrocklin commented Jan 16, 2018

Uh oh!

mrocklin commented Feb 1, 2018

Uh oh!

jcrist commented Feb 1, 2018

Uh oh!

jcrist left a comment

Uh oh!

jcrist Feb 1, 2018

Uh oh!

jcrist Feb 1, 2018

Uh oh!

mrocklin Feb 1, 2018

Uh oh!

mrocklin commented Feb 1, 2018

Uh oh!

jcrist commented Feb 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		ndependencies(dependencies, dependents)


		@pytest.mark.xfail(reason="Can't please 'em all")

Uh oh!

Conversation

mrocklin commented Jan 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mrocklin commented Jan 15, 2018

Uh oh!

mrocklin commented Jan 16, 2018

Uh oh!

mrocklin commented Jan 16, 2018

Uh oh!

mrocklin commented Jan 16, 2018

Master: 25ms

This branch: 39ms

Uh oh!

mrocklin commented Feb 1, 2018

Uh oh!

jcrist commented Feb 1, 2018

Uh oh!

jcrist left a comment

Choose a reason for hiding this comment

Uh oh!

jcrist Feb 1, 2018

Choose a reason for hiding this comment

Uh oh!

jcrist Feb 1, 2018

Choose a reason for hiding this comment

Uh oh!

mrocklin Feb 1, 2018

Choose a reason for hiding this comment

Uh oh!

mrocklin commented Feb 1, 2018

Uh oh!

jcrist commented Feb 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mrocklin commented Jan 14, 2018 •

edited

Loading