Skip to content

[FEA] Task Graph Annotation #6054

@madsbk

Description

@madsbk

I propose that we make it possible for algorithm implementations such as rearrange_by_column_tasks()
to annotate the task graph such that the static graph optimizers and prioritizers can do a better job.

Motivation

The static scheduling policies of Dask is done by dask.order.order(), which, based on the structure of the task graph, prioritize the tasks in a depth-first order (generally speaking). In particular, dask.order.order() does not know anything about what each task does, their compute time or memory use thus its prioritization can in some cases be very sub-optimal. For example in the case of #6051, we know that it is always an advantages to prioritizes shuffle-split.

Approaches

Magic key names

The easiest, but properly also the most ugly approach, is to introduce magic keys such as "dsk-prioritize:42" to indicate that this task should have priority 42.

Single magic key

Introduce a special key in the graph that contains a dict of annotations of the whole graph. This key could then be consumed by the graph optimizer and prioritizer.

Extra task argument

Extend #3783 to also let optimizers and prioritizers consume the extra task argument.

Custom task graph class

Similar to #2299, represent a task graph using a custom mapping class with added annotation information. This require that all transformations of the task graph maintains its custom type and not copy it to a regular a dict.


What is your thought? Any better solution?

@rjzamora, @mrocklin, @eriknw

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions