Conversation
With this change, workers are not sent more tasks than they have threads. This eliminates root task overproduction and removes the need for work stealing. This is optimized for a minimal diff. There is _lots_ we might change/rip out if we went forward with this. Performance will likely be poor on some workloads without speculative task assignment and root task co-assignment.
| "distributed.scheduler.work-stealing": False, | ||
| }, | ||
| ) | ||
| async def test_root_task_overproduction(c, s, *nannies): |
There was a problem hiding this comment.
This simple test fails on main with a KilledWorkerError after the root tasks run a worker out of memory and restart it 3 times.
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 9h 37m 35s ⏱️ + 3h 6m 21s For more details on these failures and errors, see this check. Results for commit d93fd4b. ± Comparison against base commit cb88e3b. |
|
Principally, removing the need for work stealing is very exciting since we're having a bunch of problems with it. However, just assigning as many tasks to workers as there are threads is something I am skeptical of. Assigning more tasks to the workers ahead of time has a couple of benefits, for instance
I could also see the scheduler and network becoming more of a bottleneck for a couple of workloads |
|
Closing in favor of #6614, which only applies to root tasks. |
With this change, workers are not sent more tasks than they have threads. This eliminates root task overproduction and removes the need for work stealing.
I didn't originally intend to make this change (originally, I wanted to remove queuing of root tasks only, not all tasks). I just discovered that it was the natural outcome of #6560 without co-assignment. I thought that was interesting, so I'm opening this for discussion.
This is the minimal diff to enact this change. There is lots we would change/rip out if we went forward with this. The purpose of this PR is just to try the change on some common workloads and see how performance is affected.
Performance will likely be poor on some workloads without speculative task assignment and root task co-assignment.
I believe there's a relatively simple optimization we could make here that would bring performance about in line with current scheduling (minus co-assignment): do allow queuing extra tasks onto workers when those tasks meet specific criteria. The criteria would match the fan-out style tasks you see in a task-based shuffle,
rechunk, ormap_overlap. But that's an optimization to play with later.Why?
I was working on the "ignore root task co-assignment" side of withholding root tasks #6560. Initially I was going to try withholding just root tasks.
ws.nthreadstasks per worker. All other tasks would end up assigned via the normal logic. So I just removed the co-assign code for simplicity. Now root tasks aren't special anymore.decide_worker, but if that worker is full, put the task inno-workerinstead". But that raises an obvious question: what if there was a different, not-full worker available we could have picked instead?decide_worker.decide_workerto only theidleworkers set, you get this PR.See the test added for an example of an embarrassingly-parallel workload that's currently unrunnable on main (workers repeatedly run out of memory and die), which runs smoothly in a constant amount of memory with this PR.
pre-commit run --all-files