Summary
A gateway started under systemd (<name> gateway install → Restart=always) exits with code 1 on a plain systemctl --user stop, leaving the unit in failed state. A planned operator stop should exit 0 and leave the unit inactive. The non-zero exit pollutes systemctl is-active/is-failed, requires systemctl reset-failed before a clean start, and misleads any health monitoring that reads unit state.
Reproduce
<name> gateway install # creates hermes-gateway-<name>.service (Restart=always)
<name> gateway start
systemctl --user stop hermes-gateway-<name>
systemctl --user is-active hermes-gateway-<name> # → "failed" (expected: "inactive")
systemctl --user status hermes-gateway-<name> # Main process exited, code=exited, status=1/FAILURE; Result: exit-code
Journal on stop:
Stopping hermes-gateway-<name>.service...
WARNING gateway.run: Shutdown context: signal=SIGTERM under_systemd=yes parent_name=systemd ...
INFO gateway.run: Exiting with code 1 (signal-initiated shutdown without restart request) so systemd Restart=on-failure can revive the gateway.
hermes-gateway-<name>.service: Main process exited, code=exited, status=1/FAILURE
hermes-gateway-<name>.service: Failed with result 'exit-code'.
Root cause
gateway/run.py, end of the gateway-run coroutine:
if _signal_initiated_shutdown and not runner._restart_requested:
logger.info("Exiting with code 1 (signal-initiated shutdown without restart "
"request) so systemd Restart=on-failure can revive the gateway.")
return False # → sys.exit(1)
Any SIGTERM that isn't a /restart//update/CLI-gateway stop (which use the planned-stop marker) lands here and exits 1 — including systemctl stop, which is a deliberate, planned operator stop and should be a clean exit.
Two issues compound it:
-
The exit-1 rationale is self-defeating under the unit Hermes itself generates. The comment says exit-1 is "so systemd Restart=on-failure can revive" — but the installed unit uses Restart=always (see hermes_cli/gateway.py), under which exit 0 is also restarted. So exiting non-zero buys nothing for revival; it only converts a clean stop into a failed unit.
-
systemctl stop isn't distinguished from an unexpected external kill. Both arrive as SIGTERM. But when systemd is stopping the unit it will not restart it regardless of exit code, so there's no need to exit non-zero; the non-zero exit just leaves a spurious failed. Hermes already detects INVOCATION_ID (knows it's under systemd) and has a planned-stop marker mechanism — a systemd-initiated stop could be treated as planned → exit 0.
Expected
systemctl stop (and any systemd-initiated stop of the unit) → exit 0 → unit inactive, not failed. Reserve the non-zero exit for the genuine "process got SIGTERM but the service manager is NOT stopping the unit" case (e.g. an external kill, OOM, container signal) where a restart is actually wanted.
(Note: a unit-file SuccessExitStatus=1 is not the fix — it would mask genuine exit-1 crashes as success, defeating failure detection. The distinction needs to be made in the gateway based on whether the SIGTERM is a systemd stop job vs. an external signal.)
Impact
failed state requires systemctl reset-failed before a clean restart, breaking simple stop→edit→start operator workflows.
- Any monitoring keying on
is-active/is-failed to detect crashes can't distinguish a deliberate stop from a real failure.
Summary
A gateway started under systemd (
<name> gateway install→Restart=always) exits with code 1 on a plainsystemctl --user stop, leaving the unit infailedstate. A planned operator stop should exit 0 and leave the unitinactive. The non-zero exit pollutessystemctl is-active/is-failed, requiressystemctl reset-failedbefore a cleanstart, and misleads any health monitoring that reads unit state.Reproduce
Journal on stop:
Root cause
gateway/run.py, end of the gateway-run coroutine:Any SIGTERM that isn't a
/restart//update/CLI-gateway stop(which use the planned-stop marker) lands here and exits 1 — includingsystemctl stop, which is a deliberate, planned operator stop and should be a clean exit.Two issues compound it:
The exit-1 rationale is self-defeating under the unit Hermes itself generates. The comment says exit-1 is "so systemd
Restart=on-failurecan revive" — but the installed unit usesRestart=always(seehermes_cli/gateway.py), under whichexit 0is also restarted. So exiting non-zero buys nothing for revival; it only converts a clean stop into afailedunit.systemctl stopisn't distinguished from an unexpected external kill. Both arrive as SIGTERM. But when systemd is stopping the unit it will not restart it regardless of exit code, so there's no need to exit non-zero; the non-zero exit just leaves a spuriousfailed. Hermes already detectsINVOCATION_ID(knows it's under systemd) and has a planned-stop marker mechanism — a systemd-initiated stop could be treated as planned → exit 0.Expected
systemctl stop(and any systemd-initiated stop of the unit) → exit 0 → unitinactive, notfailed. Reserve the non-zero exit for the genuine "process got SIGTERM but the service manager is NOT stopping the unit" case (e.g. an externalkill, OOM, container signal) where a restart is actually wanted.(Note: a unit-file
SuccessExitStatus=1is not the fix — it would mask genuine exit-1 crashes as success, defeating failure detection. The distinction needs to be made in the gateway based on whether the SIGTERM is a systemd stop job vs. an external signal.)Impact
failedstate requiressystemctl reset-failedbefore a clean restart, breaking simplestop→edit→startoperator workflows.is-active/is-failedto detect crashes can't distinguish a deliberate stop from a real failure.