Skip to content

Graph prioritization problem #3013

@mrocklin

Description

@mrocklin

I'm trying to track down a graph prioritization/ordering problem, tasks are being run in an order that is not conducive to low-memory use. Here is a minimal example provided by a collaborator

import dask.array as da

# n, k = 250, 10000  # big enough to fill memory
n, k = 5, 100  # for experimentation
x  = da.random.normal(size=(n, k), chunks=(1, k))
y  = da.random.normal(size=(n,), chunks=(1,))
xy = (x * y[:, None]).cumsum(axis=0)
xx = (x[:, None, :] * x[:, :, None]).cumsum(axis=0)
beta = da.stack([da.linalg.solve(xx[i], xy[i]) for i in range(xx.shape[0])],
        axis=0)
ey = (x * beta).sum(axis=1)

Trace

This results in the following trace:

Created by

from trace import Track
Track(save_every=1, rankdir='LR', path='every-svg', format='png',               node_attr={'penwidth': '6'}).register()

import dask
ey.compute(get=dask.get)

Trace provided here: https://gist.github.com/5b7bf78621496697fa3001462e1910ae

Looking at order

If we look just at the prioritization created by dask.order.order we find that there are some unfortunate jumps around the graph. We would want this to be pretty smooth

Created by

import matplotlib.pyplot as plt
import dask.order
from dask.dot import dot_graph
from dask.core import get_dependencies, reverse_dict

def colorize(t):
    t = t[:3]
    i = sum(v * 256 ** (len(t) - i - 1) for i, v in enumerate(t))
    return "#" + hex(int(i))[2:]


def color_dict(o, cmap=plt.cm.viridis):
    mx = max(o.values())
    colors = {k: colorize(cmap(v / mx, bytes=True)) for k, v in o.items()}
    return colors


def visualize_colors(dsk, filename='dask.pdf', cmap=plt.cm.viridis, **kwargs):
    o = dask.order.order(dsk)
    mx = max(o.values())

    colors = {k: colorize(cmap(v / mx, bytes=True)) for k, v in o.items()}

    dot_graph(dsk, filename=filename,
              function_attributes={k: {'color': v, 'label': str(o[k])} for k, v in colors.items()},
              data_attributes={k: {'color': v} for k, v in colors.items()},
              node_attr={'penwidth': '6'}, **kwargs)

(depends on #2987 )

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions