Bug Description
On macOS, the gateway's /restart command (and any other code path that asks the gateway to relaunch via the service manager) does not actually trigger a launchd-driven restart. The gateway exits with code 0, launchd's KeepAlive { SuccessfulExit: false } policy treats that as "stopped successfully", and the gateway stays down until the user manually re-bootstraps it.
Same code path works correctly on Linux/systemd.
Steps to Reproduce
- Install the gateway as a launchd service (the standard macOS deployment via
hermes gateway install).
- Confirm it's running:
launchctl list ai.hermes.gateway shows a PID.
- Send
/restart to the bot (or trigger any code path that calls _handle_restart_command).
- The gateway gracefully drains and exits.
- Wait — and observe that launchd does not relaunch it.
launchctl list ai.hermes.gateway still references the previous (now-dead) PID and no new process spawns. Telegram / Discord / Feishu adapters all stay disconnected.
Expected vs Actual
Expected: After /restart, the gateway exits and launchd brings it right back up — same behaviour as systemd on Linux.
Actual: The gateway exits cleanly (code 0) and stays down. launchctl list shows the stale PID and no relaunch happens.
Operating System
macOS 15.4 (Darwin 25.4.0)
Python Version
3.11.14
Hermes Version
Working off main (HEAD 6a6766fb8).
Additional Logs / Traceback (optional)
~/.hermes/logs/gateway-exit-diag.log for a working systemd-style restart shows:
{"tag": "asyncio.run.SystemExit", "code": 75}
{"tag": "gateway.start", "pid": <new>}
For the failing macOS launchd /restart, the SystemExit-75 line is missing entirely — the gateway falls through to return True → sys.exit(0), and the next gateway.start entry only shows up much later when the user manually runs launchctl kickstart -k.
Root Cause Analysis
In gateway/run.py (~line 9720), the gateway decides between two restart strategies:
_under_service = bool(os.environ.get("INVOCATION_ID")) # systemd sets this
_in_container = os.path.exists("/.dockerenv") or os.path.exists("/run/.containerenv")
if _under_service or _in_container:
self.request_restart(detached=False, via_service=True)
else:
self.request_restart(detached=True, via_service=False)
INVOCATION_ID is set only by systemd. macOS launchd uses a different convention — it injects XPC_SERVICE_NAME and XPC_FLAGS into the environment of managed jobs but does not set INVOCATION_ID.
So under launchd, _under_service is False, the code takes the detached-subprocess branch, and request_restart(via_service=False) flows through to the exit path:
# gateway/run.py ~line 18162
if runner._restart_via_service:
raise SystemExit(75)
return True
Because _restart_via_service=False, the SystemExit(75) branch is skipped, the function returns True → sys.exit(0). launchd's KeepAlive { SuccessfulExit: false } policy then refuses to relaunch a "successful" exit.
The detached-subprocess fallback (the branch the code does take) doesn't actually start a replacement process under launchd either, because launchd reparents the spawned subprocess and tears it down when the parent exits — same mechanism the _under_service block already documents for systemd KillMode=mixed.
Proposed Fix
Extend the probe to recognise launchd:
_under_service = bool(
os.environ.get("INVOCATION_ID") # systemd (Linux) sets this
or os.environ.get("XPC_SERVICE_NAME") # launchd (macOS) sets this
)
XPC_SERVICE_NAME is set by launchd for every managed job (LimitLoadToSessionType does not affect this). I've verified it is present in the live gateway process on macOS 15.4. The variable is launchd-specific so it can't false-positive on a Linux box.
PR with the fix and a regression test: see linked PR.
Are you willing to submit a PR for this?
Bug Description
On macOS, the gateway's
/restartcommand (and any other code path that asks the gateway to relaunch via the service manager) does not actually trigger a launchd-driven restart. The gateway exits with code 0, launchd'sKeepAlive { SuccessfulExit: false }policy treats that as "stopped successfully", and the gateway stays down until the user manually re-bootstraps it.Same code path works correctly on Linux/systemd.
Steps to Reproduce
hermes gateway install).launchctl list ai.hermes.gatewayshows a PID./restartto the bot (or trigger any code path that calls_handle_restart_command).launchctl list ai.hermes.gatewaystill references the previous (now-dead) PID and no new process spawns. Telegram / Discord / Feishu adapters all stay disconnected.Expected vs Actual
Expected: After
/restart, the gateway exits and launchd brings it right back up — same behaviour as systemd on Linux.Actual: The gateway exits cleanly (code 0) and stays down.
launchctl listshows the stale PID and no relaunch happens.Operating System
macOS 15.4 (Darwin 25.4.0)
Python Version
3.11.14
Hermes Version
Working off
main(HEAD6a6766fb8).Additional Logs / Traceback (optional)
~/.hermes/logs/gateway-exit-diag.logfor a working systemd-style restart shows:{"tag": "asyncio.run.SystemExit", "code": 75} {"tag": "gateway.start", "pid": <new>}For the failing macOS launchd
/restart, the SystemExit-75 line is missing entirely — the gateway falls through toreturn True→sys.exit(0), and the nextgateway.startentry only shows up much later when the user manually runslaunchctl kickstart -k.Root Cause Analysis
In
gateway/run.py(~line 9720), the gateway decides between two restart strategies:INVOCATION_IDis set only by systemd. macOS launchd uses a different convention — it injectsXPC_SERVICE_NAMEandXPC_FLAGSinto the environment of managed jobs but does not setINVOCATION_ID.So under launchd,
_under_serviceisFalse, the code takes the detached-subprocess branch, andrequest_restart(via_service=False)flows through to the exit path:Because
_restart_via_service=False, the SystemExit(75) branch is skipped, the function returnsTrue→sys.exit(0). launchd'sKeepAlive { SuccessfulExit: false }policy then refuses to relaunch a "successful" exit.The detached-subprocess fallback (the branch the code does take) doesn't actually start a replacement process under launchd either, because launchd reparents the spawned subprocess and tears it down when the parent exits — same mechanism the
_under_serviceblock already documents for systemdKillMode=mixed.Proposed Fix
Extend the probe to recognise launchd:
XPC_SERVICE_NAMEis set by launchd for every managed job (LimitLoadToSessionTypedoes not affect this). I've verified it is present in the live gateway process on macOS 15.4. The variable is launchd-specific so it can't false-positive on a Linux box.PR with the fix and a regression test: see linked PR.
Are you willing to submit a PR for this?