-
-
Notifications
You must be signed in to change notification settings - Fork 757
Closed
Closed
Copy link
Description
While stress-testing #7062, test_RetireWorker_stress, which gracefully removes the best part of the cluster while performing a very heavy computation, failed once out of 162 runs:
https://github.com/crusaderky/distributed/actions/runs/3114670981/jobs/5050785452#step:18:1674
2022-09-23 18:56:03,193 - distributed.scheduler - ERROR - (<WorkerState 'tcp://127.0.0.1:63881', name: 6, status: closing_gracefully, memory: 21, processing: 27>, {<WorkerState 'tcp://127.0.0.1:63869', name: 0, status: running, memory: 61, processing: 6>, <WorkerState 'tcp://127.0.0.1:63879', name: 5, status: running, memory: 59, processing: 14>, <WorkerState 'tcp://127.0.0.1:63885', name: 8, status: running, memory: 59, processing: 17>, <WorkerState 'tcp://127.0.0.1:63877', name: 4, status: running, memory: 58, processing: 5>, <WorkerState 'tcp://127.0.0.1:63887', name: 9, status: running, memory: 59, processing: 6>})
Traceback (most recent call last):
File "d:\a\distributed\distributed\distributed\scheduler.py", line 2040, in transition_waiting_processing
if not (ws := self.decide_worker_rootish_queuing_disabled(ts)):
File "d:\a\distributed\distributed\distributed\scheduler.py", line 1901, in decide_worker_rootish_queuing_disabled
assert ws in self.running, (ws, self.running)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels