[Discussion] Client/Scheduler Performance

Many of us are experimenting with scheduler changes in the hopes of accelerating performance.  As graph size increases, the scheduler and the processing of the graph can become a bottleneck. However, we should not limit our attention to only the scheduler.  The construction of the graph in the client can also be improved as graph creation can also be slow when the graph size greatly increases

We've also seen some experiments/discussions around scheduler performance, notably:
- [Rust Scheduler Experiments](https://github.com/dask/distributed/issues/3139)
- [Cython/PyPy/C Discussion](https://github.com/dask/distributed/issues/854)

In thinking about changes to the scheduler and client we should develop some workflow based benchmarks which can be executed in CI (fast execution) but also can tuned for something more realistic
 
## Benchmarks 
- [ ] tunable dataframe benchmark
  - [ ] [a shuffle](https://github.com/dask/dask/pull/6137/#issuecomment-620097990) 
  - [ ] task which only targets update_graph
  -  [ ] slow client graph creation
  -  [ ] full data frame workflow (filter/aggregation/merge -- something representative of common work)
  -  [ ] Dask Array workflow
  -  [ ] Dask Bag workflow

We also need to better under the scheduler/client/graph internals.  We should document these.  (Though I don't know where this document should be or how to organize it yet).  But I think we need the following

## Documentation:
-  [ ] Document  the Scheduler
-  [ ] Document protocol for messages
  - [Background on Communication Protocol](https://github.com/dask/distributed/issues/3357)
  - [Detailing Message (from rust folks)](https://github.com/spirali/rsds/blob/0513f5c83d42d34cfda01febf7b87482ec250d0f/dask/message-gallery.ts)
- [ ] Document Graph specification
  - Developing a better understand of the graph spec might also allow us to re-write parts of the client in native languages to increase performance
- [ ] In doing the above, we should be able to outline how we separate the scheduler into two pieces.  Currently, the scheduler is a mix of  comms and state machine. This separation would allow us to more easily swap scheduler experiments in and out of dask workloads while also minimizing the requirements necessary for new schedulers to adhere to.
 
## Evaluate Schedulers
-  [ ] Run Rust scheduler on reasonable workflow and document breakages/performance
-  [ ] Document Rust Scheduler

This list is probably far from complete and happy to amend/change/update as we proceed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Discussion] Client/Scheduler Performance #3783

Benchmarks

Documentation:

Evaluate Schedulers

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Discussion] Client/Scheduler Performance #3783

Description

Benchmarks

Documentation:

Evaluate Schedulers

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions