Skip to content

Layer Annotations #6701

@mrocklin

Description

@mrocklin

There are a variety of reasons to annotate tasks, including resources like GPUs, memory constraints, retries, worker restrictions, and so on. There are some ways to specify annotations separately from tasks, such as with compute(..., retries=...) but this ends up being awkward.

There have been multiple requests to annotate the tasks themselves, which would make it a bit easier to track annotations and apply them at the point of graph creation. There are at least two issues about this #3783 and #6054 and an implementation at #6217

Unfortunately this is hard because our task type, tuple, isn't well set up for extension. Changing to a Task type is possible, but has some performance implications, and would be a large change at the core of the project, and so would need to be done with some care.

Annotated Layers

An alternative approach would be to annotate high level graph layers which are easy to modify and in flux now and so easy to change designs. Layers maybe also side-step some of the performance concerns.

This also has some limitations, but I think that most people asking for this feature might be ok with layer-based annotations.

Current work with layers

We're currently working to include all graph layers in Layer(Mapping) subclasses, and communicate these layers directly to the scheduler. This gives us a nice conduit of potentially richer information. These will be applied universally across all major Dask collections maintained within the dask/dask repository.

API

I'm going to suggest that we recommend using context managers for annotations like the following:

x = da.ones(10)
y = da.ones(10)

with dask.annotate(priority=1, retries=2):
    z = x + y

The Layer.__init__ method would look at some global state for annotations, and apply those onto the layer on construction. Any layer made within the context block would be affected.

Limitations

I think that it's not yet clear what we would do with Delayed. Delayed does currently use HighLevelGraphs, but we're a bit sensitive here on performance grounds, just because there would be a separate layer per task, and overheads might creep up a little here.

cc @sjperkins @jcrist

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions