-
-
Notifications
You must be signed in to change notification settings - Fork 757
Description
remove is supposed to do this:
distributed/distributed/scheduler.py
Lines 5971 to 5973 in 198522b
| remove: bool (defaults to True) | |
| Whether or not to remove the worker metadata immediately or else | |
| wait for the worker to contact us |
In this block, we always call close_worker, regardless of remove:
distributed/distributed/scheduler.py
Lines 6093 to 6096 in 198522b
| if close_workers and ws.address in self.workers: | |
| await self.close_worker(worker=ws.address, safe=True) | |
| if remove: | |
| await self.remove_worker(address=ws.address, safe=True) |
But close_worker calls remove_worker internally:
distributed/distributed/scheduler.py
Lines 3482 to 3494 in 198522b
| @log_errors | |
| async def close_worker(self, worker: str, safe: bool = False): | |
| """Remove a worker from the cluster | |
| This both removes the worker from our local state and also sends a | |
| signal to the worker to shut down. This works regardless of whether or | |
| not the worker has a nanny process restarting it | |
| """ | |
| logger.info("Closing worker %s", worker) | |
| self.log_event(worker, {"action": "close-worker"}) | |
| # FIXME: This does not handle nannies | |
| self.worker_send(worker, {"op": "close", "report": False}) | |
| await self.remove_worker(address=worker, safe=safe) |
So if you pass close_worker=True, remove=False, the worker will still be immediately removed I think.
For example, this fails:
diff --git a/distributed/tests/test_active_memory_manager.py b/distributed/tests/test_active_memory_manager.py
index 8fcc3f31..594faa6b 100644
--- a/distributed/tests/test_active_memory_manager.py
+++ b/distributed/tests/test_active_memory_manager.py
@@ -765,7 +765,7 @@ async def test_RetireWorker_no_remove(c, s, a, b):
"""Test RetireWorker behaviour on retire_workers(..., remove=False)"""
x = await c.scatter({"x": "x"}, workers=[a.address])
- await c.retire_workers([a.address], close_workers=False, remove=False)
+ await c.retire_workers([a.address], close_workers=True, remove=False)
# Wait 2 AMM iterations
# retire_workers may return before all keys have been dropped from a
while s.tasks["x"].who_has != {s.workers[b.address]}:Not a very big deal, since this is probably only used internally/in tests. But I'm not sure what the use-case is for close_worker=True, remove=False. Maybe we should just get rid of the remove= kwarg?
cc @crusaderky