Skip to content

retire_workers(remove=False) has no effect? #6227

@gjoseph92

Description

@gjoseph92

remove is supposed to do this:

remove: bool (defaults to True)
Whether or not to remove the worker metadata immediately or else
wait for the worker to contact us

In this block, we always call close_worker, regardless of remove:

if close_workers and ws.address in self.workers:
await self.close_worker(worker=ws.address, safe=True)
if remove:
await self.remove_worker(address=ws.address, safe=True)

But close_worker calls remove_worker internally:

@log_errors
async def close_worker(self, worker: str, safe: bool = False):
"""Remove a worker from the cluster
This both removes the worker from our local state and also sends a
signal to the worker to shut down. This works regardless of whether or
not the worker has a nanny process restarting it
"""
logger.info("Closing worker %s", worker)
self.log_event(worker, {"action": "close-worker"})
# FIXME: This does not handle nannies
self.worker_send(worker, {"op": "close", "report": False})
await self.remove_worker(address=worker, safe=safe)

So if you pass close_worker=True, remove=False, the worker will still be immediately removed I think.

For example, this fails:

diff --git a/distributed/tests/test_active_memory_manager.py b/distributed/tests/test_active_memory_manager.py
index 8fcc3f31..594faa6b 100644
--- a/distributed/tests/test_active_memory_manager.py
+++ b/distributed/tests/test_active_memory_manager.py
@@ -765,7 +765,7 @@ async def test_RetireWorker_no_remove(c, s, a, b):
     """Test RetireWorker behaviour on retire_workers(..., remove=False)"""
 
     x = await c.scatter({"x": "x"}, workers=[a.address])
-    await c.retire_workers([a.address], close_workers=False, remove=False)
+    await c.retire_workers([a.address], close_workers=True, remove=False)
     # Wait 2 AMM iterations
     # retire_workers may return before all keys have been dropped from a
     while s.tasks["x"].who_has != {s.workers[b.address]}:

Not a very big deal, since this is probably only used internally/in tests. But I'm not sure what the use-case is for close_worker=True, remove=False. Maybe we should just get rid of the remove= kwarg?

cc @crusaderky

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething is broken

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions