Summary
On macOS, when hermes update is triggered from within the Gateway process tree (e.g., agent executing via terminal tool), launchd_restart() sends SIGUSR1 and returns immediately without waiting for the gateway to exit or issuing launchctl kickstart. The gateway exits with code 75, but launchd does not restart it, leaving the service permanently dead until manual intervention.
Root Cause
In hermes_cli/gateway.py, launchd_restart() has two code paths:
Path A (SIGUSR1): Triggered when the gateway PID is an ancestor of the current process. Sends SIGUSR1, prints "Service restart requested", then returns immediately — no wait for exit, no kickstart.
Path B (SIGTERM + kickstart): Triggered when the gateway PID is NOT an ancestor. Sends SIGTERM, waits for exit, then runs launchctl kickstart -k.
When Path A is taken, the gateway receives SIGUSR1 and begins a graceful shutdown (drain + exit code 75). However, since launchd_restart() already returned, nobody is responsible for restarting the service. macOS launchd does not automatically restart after exit(75) in this configuration — system logs show "pending spawn, domain in on-demand-only mode" with no follow-up WILL_SPAWN.
Reproduction
- Have Hermes gateway running on macOS with launchd.
- From a Telegram conversation, ask the agent to run
hermes update directly via its terminal tool (NOT using the /update slash command).
- The agent process is a child of the gateway, so
_is_pid_ancestor_of_current_process() returns True → Path A is taken.
- Gateway exits with code 75 → launchd does not restart → service stays dead.
Note: The normal /update command avoids this by spawning hermes update --gateway via setsid + start_new_session=True, which detaches from the gateway process tree and takes Path B. This bug only manifests when the update command runs inside the gateway process tree.
Contrast with Linux
PR #9850 (merged) fixed a similar issue for Linux by adding systemctl is-active health checks and retry logic after systemctl restart. The macOS launchd path was completely omitted from that fix.
Evidence
macOS system logs consistently show exit(75) followed by no restart:
- "exited due to exit(75)"
- "pending spawn, domain in on-demand-only mode: ai.hermes.gateway"
- No WILL_SPAWN entry follows
In contrast, when the gateway is killed by an external signal (SIGTERM/SIGKILL from outside the process tree), launchd immediately issues WILL_SPAWN and the service recovers within seconds.
Suggested Fix
Remove the early return in Path A and let both paths converge on _wait_for_gateway_exit() + launchctl kickstart -k. This ensures the gateway is always restarted regardless of how the update was triggered.
Summary
On macOS, when
hermes updateis triggered from within the Gateway process tree (e.g., agent executing via terminal tool),launchd_restart()sends SIGUSR1 and returns immediately without waiting for the gateway to exit or issuinglaunchctl kickstart. The gateway exits with code 75, but launchd does not restart it, leaving the service permanently dead until manual intervention.Root Cause
In
hermes_cli/gateway.py,launchd_restart()has two code paths:Path A (SIGUSR1): Triggered when the gateway PID is an ancestor of the current process. Sends SIGUSR1, prints "Service restart requested", then returns immediately — no wait for exit, no kickstart.
Path B (SIGTERM + kickstart): Triggered when the gateway PID is NOT an ancestor. Sends SIGTERM, waits for exit, then runs
launchctl kickstart -k.When Path A is taken, the gateway receives SIGUSR1 and begins a graceful shutdown (drain + exit code 75). However, since
launchd_restart()already returned, nobody is responsible for restarting the service. macOS launchd does not automatically restart after exit(75) in this configuration — system logs show "pending spawn, domain in on-demand-only mode" with no follow-up WILL_SPAWN.Reproduction
hermes updatedirectly via its terminal tool (NOT using the/updateslash command)._is_pid_ancestor_of_current_process()returns True → Path A is taken.Note: The normal
/updatecommand avoids this by spawninghermes update --gatewayviasetsid+start_new_session=True, which detaches from the gateway process tree and takes Path B. This bug only manifests when the update command runs inside the gateway process tree.Contrast with Linux
PR #9850 (merged) fixed a similar issue for Linux by adding
systemctl is-activehealth checks and retry logic aftersystemctl restart. The macOS launchd path was completely omitted from that fix.Evidence
macOS system logs consistently show exit(75) followed by no restart:
In contrast, when the gateway is killed by an external signal (SIGTERM/SIGKILL from outside the process tree), launchd immediately issues WILL_SPAWN and the service recovers within seconds.
Suggested Fix
Remove the early
returnin Path A and let both paths converge on_wait_for_gateway_exit()+launchctl kickstart -k. This ensures the gateway is always restarted regardless of how the update was triggered.