fix(gateway): kickstart launchd after graceful restart#34366
Conversation
Detect launchd-managed gateway processes during /restart and route restarts through the service manager instead of detached helpers. Bootstrap unloaded launchd jobs before self-signalling so macOS restarts do not strand the gateway.
|
I found one issue that looks worth fixing before merge.
The test assert calls[0][1][:2] == ["launchctl", "bootstrap"] # 1. bootstrap the service
assert calls[1] == ("sigusr1_restart", 4242, 35.0) # 2. graceful drain via SIGUSR1
assert calls[2][1] == ["launchctl", "kickstart", "-k", "gui/501/ai.hermes.gateway"] # 3. kickstartBut the production code in if _graceful_restart_via_sigusr1(pid, drain_timeout + 5):
# ... comment about kickstarting ...
print("↻ Graceful drain complete; kickstarting launchd job")There is no actual The test will fail at Suggested fix: After the graceful drain print, add: subprocess.run(
["launchctl", "kickstart", "-k", get_launchd_label()],
check=True,
timeout=30,
)This also applies to the |
|
Thanks — addressed in 4908cd7. The production path already fell through to a shared Changes made:
Verification:
|
Summary
launchctl kickstart -kafter graceful drain because macOS launchd can leave the job loaded butstate = not runningafter exit 75 /EX_TEMPFAIL.Local verification on affected machine
Before the extra kickstart, this machine reproduced the failure after
hermes gateway restart:launchctl print gui/501/ai.hermes.gatewayshowedstate = not running.last exit code = 75: EX_TEMPFAIL.After this change:
hermes gateway restartexits successfully.launchctl print gui/501/ai.hermes.gatewayshowsstate = running.Test Plan
venv/bin/python -m py_compile hermes_cli/gateway.py gateway/run.pyvenv/bin/python -m pytest tests/gateway/test_restart_notification.py tests/hermes_cli/test_gateway.py -q -o 'addopts='