Skip to content

[Discussion] Client/Scheduler Performance #3783

@quasiben

Description

@quasiben

Many of us are experimenting with scheduler changes in the hopes of accelerating performance. As graph size increases, the scheduler and the processing of the graph can become a bottleneck. However, we should not limit our attention to only the scheduler. The construction of the graph in the client can also be improved as graph creation can also be slow when the graph size greatly increases

We've also seen some experiments/discussions around scheduler performance, notably:

In thinking about changes to the scheduler and client we should develop some workflow based benchmarks which can be executed in CI (fast execution) but also can tuned for something more realistic

Benchmarks

  • tunable dataframe benchmark
    • a shuffle
    • task which only targets update_graph
    • slow client graph creation
    • full data frame workflow (filter/aggregation/merge -- something representative of common work)
    • Dask Array workflow
    • Dask Bag workflow

We also need to better under the scheduler/client/graph internals. We should document these. (Though I don't know where this document should be or how to organize it yet). But I think we need the following

Documentation:

  • Document the Scheduler
  • Document protocol for messages
  • Background on Communication Protocol
  • Detailing Message (from rust folks)
  • Document Graph specification
    • Developing a better understand of the graph spec might also allow us to re-write parts of the client in native languages to increase performance
  • In doing the above, we should be able to outline how we separate the scheduler into two pieces. Currently, the scheduler is a mix of comms and state machine. This separation would allow us to more easily swap scheduler experiments in and out of dask workloads while also minimizing the requirements necessary for new schedulers to adhere to.

Evaluate Schedulers

  • Run Rust scheduler on reasonable workflow and document breakages/performance
  • Document Rust Scheduler

This list is probably far from complete and happy to amend/change/update as we proceed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions