Add more ordering diagnostics to dask.visualize#7992
Conversation
I added these to help investigate dask#7929 The names could probably be better. This also needs documented and tested, but I thought I'd share, because, you know, pretty plots! I think all options here are potentially useful. Anything else we might want to show? Speaking of pretty plots, I'll share some tomorrow. Good night!
Also, update so that visualize `color="pressure"` includes memory usage when the task is run (shown on the function) and when the data is released (shown on the data).
|
As promised, here are a few graphs. These show memory pressure--the number of dependencies that are held in memory when a task is run (shown in the function circle) and when the data of a task is released (shown in the data rectangle). Here's a simple example to get you started: Note that all these are created with Another one that is slightly more complicated: And another: And a more complicated one: And the example from #7929 (main branch) And the same one using PR 7929: To compare the two graphs above more directly, it would be nice to have them on the same scale. So, we can now pass I think this is pretty nice. I don't know why I didn't whip this up long ago! |
|
@eriknw Checking in here, is this PR still WIP, or is it in a state ready for review? |
|
Thanks for checking! Still WIP, and not forgotten. Feature-wise, it could be reviewed. TODO:
|
|
I forgot a plot in the example investigation in the previous post. The second value returned by import dask
import dask.array as da
import pandas as pd
import hvplot.pandas
A = da.arange(33, chunks=1).cumsum(0, method='blelloch')
info, num_in_memory = dask.order.diagnostics(dict(dask.base.collections_to_dsk([A])))
df = pd.DataFrame(
{
'time': list(range(len(num_in_memory))),
'Num in memory': num_in_memory
}
)
# rasterize=True is nice for very large graphs
df.hvplot(x='time', y='Num in memory', rasterize=False)
assert num_in_memory == [
val.num_data_when_run for val in sorted((val for val in info.values()), key=lambda x: x.order)
]I still like it as a separate return value to make it easier to use. |
dask.visualizedask.visualize
|
This a big diagnostic improvement and I love the narrative that you go through in #7992 (comment) - can we make that into a blog post or a how-to? |
|
+1 to writing up as a blogpost |
|
Aw, thanks for the kind words! Yeah, I can write a blog post from this, but you must know how much it pains me to show off I'm about to go on vacation (hooray!), so it'll be at least a few weeks before I can get around to it. |
|
no worries, anytime is a good time :) it can be framed as how to understand or debug. |















I added these to help investigate #7929
The names could probably be better. This also needs documented and tested, but I thought I'd share, because, you know, pretty plots! I think all options here are potentially useful. Anything else we might want to show?
Speaking of pretty plots, I'll share some tomorrow. Good night!
black dask/flake8 dask/isort dask