-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
I propose that we make it possible for algorithm implementations such as rearrange_by_column_tasks()
to annotate the task graph such that the static graph optimizers and prioritizers can do a better job.
Motivation
The static scheduling policies of Dask is done by dask.order.order(), which, based on the structure of the task graph, prioritize the tasks in a depth-first order (generally speaking). In particular, dask.order.order() does not know anything about what each task does, their compute time or memory use thus its prioritization can in some cases be very sub-optimal. For example in the case of #6051, we know that it is always an advantages to prioritizes shuffle-split.
Approaches
Magic key names
The easiest, but properly also the most ugly approach, is to introduce magic keys such as "dsk-prioritize:42" to indicate that this task should have priority 42.
Single magic key
Introduce a special key in the graph that contains a dict of annotations of the whole graph. This key could then be consumed by the graph optimizer and prioritizer.
Extra task argument
Extend #3783 to also let optimizers and prioritizers consume the extra task argument.
Custom task graph class
Similar to #2299, represent a task graph using a custom mapping class with added annotation information. This require that all transformations of the task graph maintains its custom type and not copy it to a regular a dict.
What is your thought? Any better solution?