[FEA] Dynamic Task Graph / Task Checkpointing

**TL;DR:** By introducing task checkpointing where a running task can update its state on the scheduler, it is possible to reduce scheduler overhead, support long running tasks, and uses explicit worker-to-worker communication while maintaining resilient. 

### Motivation 

As discussed in many issues and PRs (e.g. https://github.com/dask/distributed/issues/3783, https://github.com/dask/distributed/issues/854, https://github.com/dask/distributed/issues/3139, https://github.com/dask/dask/issues/6163), the scheduler overhead of Dask/Distribued can be a problem as the number of tasks increases. Many proposals involves optimizing the Python code through PyPy, Cython, Rust, or some other tool/language.

This PR propose an orthogonal approach that reduces the number of tasks and make it possible to encapsulate domain knowledge of specific operations into tasks -- such as minimizing memory use, overlapping computation and communication, etc. 

### Related Approaches 

#### Current Task Workflow 

All tasks go through the follow flow:
```
**Client**  
  1. Graph creation  
  2. Graph optimization 
  3. Serialize graph 
  4. Send graph to scheduler 
**Scheduler** 
  5. Update graph 
  6. Send tasks, one at a time, to workers 
**Worker**  
  7. Execute tasks
```


#### Task Fusion
All tasks go through steps 1 to 4 but by fusing tasks (potential into `SubgraphCallable`) only a reduced graph goes through step 5 and 6, which can significantly easy the load on the scheduler. However, fusing tasks also limits the available parallelism thus it has its limits.  
 
#### [Task Generation](https://github.com/dask/distributed/pull/3765)
At graph creation, we use _task generators_ to reduce the size of the graph. Particularly, in operations such as [`shuffle()`](https://github.com/dask/dask/blob/bdb7e906051b686207d8e1005f13962593be9797/dask/dataframe/shuffle.py#L443) that consist of up to `n**2` number of tasks. This means that only steps 3 to 7 encounter all tasks. And if we allow the scheduler to execute python code, we can extend this to steps 5 to 7. 

#### [Submit Tasks from Tasks](https://docs.dask.org/en/latest/futures.html#submit-tasks-from-tasks)
Instead of implementing expensive operations such as [`shuffle()`](https://github.com/dask/dask/blob/bdb7e906051b686207d8e1005f13962593be9797/dask/dataframe/shuffle.py#L443) in a task graph, we can use few long running jobs that use direct worker-to-worker communicate to bypass the scheduler altogether. This approach is very performance efficient but also has two major drawbacks:  
  - It provides no resilient, if a worker disconnects unexpected the states of the long running jobs are all lost. 
  - In cases such as  [`shuffle()`](https://github.com/dask/dask/blob/bdb7e906051b686207d8e1005f13962593be9797/dask/dataframe/shuffle.py#L443), this approach requires extra memory because the inputs to the long running jobs must be in-memory until the jobs completes. Something that can be an absolute deal breaker https://github.com/dask/dask/pull/6051.  

### Proposed Approach 

#### Dynamic Task Graph / Task Checkpointing 

At graph creation, we use _dynamic tasks_ to reduce the size of the graph and encapsulate domain knowledge of specific operations. This means that only step 7 encounters all tasks. 

Dynamic tasks are regular tasks that are optimized, scheduled, and executed on workers as regular tasks. It is only when they use checkpointing that they differ. The following is the logic flow when a running task calls checkpointing: 

  1. A task running on a worker sends a _task update_ to the scheduler that contains: 
     - New keys that is now in-memory on the worker 
     - New keys that the task now depend on  
     - Existing keys that the task doesn’t depend on anymore 
     - A new task (function & key/literal arguments) that replaces the existing task. 
  2. The scheduler updates relevant TaskStates and release keys that no one depend on anymore. 
  3. If all dependencies are satisfied, the task can now be rescheduled from its new state. If not, the task transits to the `waiting` state. 

---

Any thoughts? Is it something I should begin implementing?

cc. @mrocklin, @quasiben, @rjzamora, @jakirkham 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Dynamic Task Graph / Task Checkpointing #3811

Motivation

Related Approaches

Current Task Workflow

Task Fusion

Task Generation

Submit Tasks from Tasks

Proposed Approach

Dynamic Task Graph / Task Checkpointing

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[FEA] Dynamic Task Graph / Task Checkpointing #3811

Description

Motivation

Related Approaches

Current Task Workflow

Task Fusion

Task Generation

Submit Tasks from Tasks

Proposed Approach

Dynamic Task Graph / Task Checkpointing

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions