Don't connect to cluster subprocesses at shutdown#6829
Conversation
| worker_kwargs=None, | ||
| active_rpc_timeout=10, | ||
| disconnect_timeout=20, | ||
| shutdown_timeout=20, |
There was a problem hiding this comment.
This argument name was unused across the codebase, so changing it seems fine.
| async def close(): | ||
| logger.debug("Closing out test cluster") | ||
| alive_workers = [ | ||
| w["address"] | ||
| for w in workers_by_pid.values() | ||
| if w["proc"].is_alive() | ||
| ] | ||
| await disconnect_all( | ||
| alive_workers, | ||
| timeout=disconnect_timeout, | ||
| rpc_kwargs=rpc_kwargs, | ||
| ) | ||
| if scheduler.is_alive(): | ||
| await disconnect( | ||
| saddr, timeout=disconnect_timeout, rpc_kwargs=rpc_kwargs | ||
| ) |
There was a problem hiding this comment.
The change is that I just deleted this entire finally section (and therefore removed the try block). The stack.callback(_kill_join, scheduler, shutdown_timeout) will accomplish an equivalent thing without RPCs.
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ±0 15 suites ±0 6h 53m 55s ⏱️ + 13m 1s For more details on these failures and errors, see this check. Results for commit 993f420. ± Comparison against base commit 4af2d0a. |
|
Failures look unrelated:
|
| proc.join() | ||
| def _kill_join(proc, timeout): | ||
| proc.kill() | ||
| proc.join(timeout) |
There was a problem hiding this comment.
Maybe rely on the pytest timeout?
There was a problem hiding this comment.
Also maybe kill them all at the same time then join them all
Closes #6828. See that issue for explanation.
The one thing we lose with this over the RPC is a clean
closecall—killis more abrupt. If we'd like to maintain the cleanclose, I can easily do that by sending aSIGINT, waiting, thenSIGKILLonly if necessary.pre-commit run --all-files