-
-
Notifications
You must be signed in to change notification settings - Fork 756
Labels
discussionDiscussing a topic with no specific actions yetDiscussing a topic with no specific actions yetenhancementImprove existing functionality or make things work betterImprove existing functionality or make things work bettermemoryperformancestabilityIssue or feature related to cluster stability (e.g. deadlock)Issue or feature related to cluster stability (e.g. deadlock)
Description
Root task overproduction is a significant problem for memory pressure and runtime
#6360 has shown that we can find a way to deal with the problem isolated in the scheduler (e.g. w/out full STA) by limiting the number of tasks assigned to the worker at a given time
Implementing this logic might have significant impact on our scheduling policies and we want to review the design and implementation thoroughly and perform necessary performance benchmark before committing to it
Expectations
- A draft PR is created that implements a version of the in Ease memory pressure by deprioritizing root tasks? #6360 proposed algorithm that does not preserve root-task co-assignment
- (Best effort) A draft PR is created that implements a version of the in Ease memory pressure by deprioritizing root tasks? #6360 proposed algorithm that does preserve root-task co-assignment
- The algorithm should be able to run on large graphs
- Implementation can be tested by OSS users (e.g. pangeo)
- There is consensus about how to withhold tasks on the scheduler by Friday June 24th, 2022
- PRs are only merged iff there are thorough performance benchmarks available confirming this works for non-root-task cases as well, e.g. traditional shuffle, map overlap, other-evil-graph-problem
Out of scope / Follow up
- Thorough performance benchmarks exist (as part of coiled-runtime)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
discussionDiscussing a topic with no specific actions yetDiscussing a topic with no specific actions yetenhancementImprove existing functionality or make things work betterImprove existing functionality or make things work bettermemoryperformancestabilityIssue or feature related to cluster stability (e.g. deadlock)Issue or feature related to cluster stability (e.g. deadlock)