-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Over the past ~2 years we ran many manual benchmarks to confirm various improvements to our scheduling or stealing heuristics.
Recently, we started working on H20 benchmarks to have something to compare. The H20 benchmarks are however heavily biased towards shuffle operations since they are motivated by database benchmarks. While this use case is surely relevant, there are many dask-specific workloads these benchmarks do not cover properly.
AC
- All benchmarks listed in Design and prototype for root-ish task deprioritization by withholding tasks on the scheduler dask/distributed#6560 (comment) are part of the coiled-runtime benchmark suite
- [Timebox ~2 days] Collect and convert suitable user provided examples that can be run on coiled-runtime, see below for examples
We should increase our benchmark coverage significantly to incorporate dask specific payload
Examples where this kind of automation would be helpful for future work or would've been helpful in the past. Some of these contain actual reproducers we could simply extract and adapt to Coiled
- Improve work stealing for scaling situations dask/distributed#4920
- Poor work scheduling when cluster adapts size dask/distributed#4471
- Ease memory pressure by deprioritizing root tasks? dask/distributed#6360
- Co-assign neighboring tasks to neighboring workers dask/distributed#4892
- AutoRestrictor scheduler plugin dask/distributed#4864
- an example that shows the need for memory backpressure dask/distributed#2602 (comment)
- an example that shows the need for memory backpressure dask/distributed#2602 (comment)
- Geospatial-type workload showing two common scheduler failures at once dask/distributed#6571
- Design and prototype for root-ish task deprioritization by withholding tasks on the scheduler dask/distributed#6560 (comment)
This would build up on the infrastructure created by #148