-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
I'm trying to track down a graph prioritization/ordering problem, tasks are being run in an order that is not conducive to low-memory use. Here is a minimal example provided by a collaborator
import dask.array as da
# n, k = 250, 10000 # big enough to fill memory
n, k = 5, 100 # for experimentation
x = da.random.normal(size=(n, k), chunks=(1, k))
y = da.random.normal(size=(n,), chunks=(1,))
xy = (x * y[:, None]).cumsum(axis=0)
xx = (x[:, None, :] * x[:, :, None]).cumsum(axis=0)
beta = da.stack([da.linalg.solve(xx[i], xy[i]) for i in range(xx.shape[0])],
axis=0)
ey = (x * beta).sum(axis=1)Trace
This results in the following trace:
Created by
from trace import Track
Track(save_every=1, rankdir='LR', path='every-svg', format='png', node_attr={'penwidth': '6'}).register()
import dask
ey.compute(get=dask.get)Trace provided here: https://gist.github.com/5b7bf78621496697fa3001462e1910ae
Looking at order
If we look just at the prioritization created by dask.order.order we find that there are some unfortunate jumps around the graph. We would want this to be pretty smooth
Created by
import matplotlib.pyplot as plt
import dask.order
from dask.dot import dot_graph
from dask.core import get_dependencies, reverse_dict
def colorize(t):
t = t[:3]
i = sum(v * 256 ** (len(t) - i - 1) for i, v in enumerate(t))
return "#" + hex(int(i))[2:]
def color_dict(o, cmap=plt.cm.viridis):
mx = max(o.values())
colors = {k: colorize(cmap(v / mx, bytes=True)) for k, v in o.items()}
return colors
def visualize_colors(dsk, filename='dask.pdf', cmap=plt.cm.viridis, **kwargs):
o = dask.order.order(dsk)
mx = max(o.values())
colors = {k: colorize(cmap(v / mx, bytes=True)) for k, v in o.items()}
dot_graph(dsk, filename=filename,
function_attributes={k: {'color': v, 'label': str(o[k])} for k, v in colors.items()},
data_attributes={k: {'color': v} for k, v in colors.items()},
node_attr={'penwidth': '6'}, **kwargs)(depends on #2987 )
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels

