-
-
Notifications
You must be signed in to change notification settings - Fork 749
Description
Many of us are experimenting with scheduler changes in the hopes of accelerating performance. As graph size increases, the scheduler and the processing of the graph can become a bottleneck. However, we should not limit our attention to only the scheduler. The construction of the graph in the client can also be improved as graph creation can also be slow when the graph size greatly increases
We've also seen some experiments/discussions around scheduler performance, notably:
In thinking about changes to the scheduler and client we should develop some workflow based benchmarks which can be executed in CI (fast execution) but also can tuned for something more realistic
Benchmarks
- tunable dataframe benchmark
- a shuffle
- task which only targets update_graph
- slow client graph creation
- full data frame workflow (filter/aggregation/merge -- something representative of common work)
- Dask Array workflow
- Dask Bag workflow
We also need to better under the scheduler/client/graph internals. We should document these. (Though I don't know where this document should be or how to organize it yet). But I think we need the following
Documentation:
- Document the Scheduler
- Document protocol for messages
- Background on Communication Protocol
- Detailing Message (from rust folks)
- Document Graph specification
- Developing a better understand of the graph spec might also allow us to re-write parts of the client in native languages to increase performance
- In doing the above, we should be able to outline how we separate the scheduler into two pieces. Currently, the scheduler is a mix of comms and state machine. This separation would allow us to more easily swap scheduler experiments in and out of dask workloads while also minimizing the requirements necessary for new schedulers to adhere to.
Evaluate Schedulers
- Run Rust scheduler on reasonable workflow and document breakages/performance
- Document Rust Scheduler
This list is probably far from complete and happy to amend/change/update as we proceed