Layer Annotations

There are a variety of reasons to annotate tasks, including resources like GPUs,  memory constraints, retries, worker restrictions, and so on.  There are some ways to specify annotations separately from tasks, such as with `compute(..., retries=...)` but this ends up being awkward.

There have been multiple requests to annotate the tasks themselves, which would make it a bit easier to track annotations and apply them at the point of graph creation.  There are at least two issues about this https://github.com/dask/dask/issues/3783 and https://github.com/dask/dask/issues/6054 and an implementation at https://github.com/dask/dask/pull/6217 

Unfortunately this is hard because our task type, `tuple`, isn't well set up for extension.  Changing to a `Task` type is possible, but has some performance implications, and would be a large change at the core of the project, and so would need to be done with some care.

## Annotated Layers

An alternative approach would be to annotate high level graph layers which are easy to modify and in flux now and so easy to change designs.  Layers maybe also side-step some of the performance concerns.  

This also has some limitations, but I think that most people asking for this feature might be ok with layer-based annotations.

## Current work with layers

We're currently working to include all graph layers in `Layer(Mapping)` subclasses, and communicate these layers directly to the scheduler.  This gives us a nice conduit of potentially richer information.  These will be applied universally across all major Dask collections maintained within the dask/dask repository.

## API

I'm going to suggest that we recommend using context managers for annotations like the following:

```python
x = da.ones(10)
y = da.ones(10)

with dask.annotate(priority=1, retries=2):
    z = x + y
```

The `Layer.__init__` method would look at some global state for annotations, and apply those onto the layer on construction.  Any layer made within the context block would be affected.

## Limitations

I think that it's not yet clear what we would do with Delayed.  Delayed does currently use HighLevelGraphs, but we're a bit sensitive here on performance grounds, just because there would be a separate layer per task, and overheads might creep up a little here.

cc @sjperkins @jcrist 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Layer Annotations #6701

Annotated Layers

Current work with layers

API

Limitations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Layer Annotations #6701

Description

Annotated Layers

Current work with layers

API

Limitations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions