Related
Problem
Both PRs attempt to solve #41631 by inferring whether a SIGTERM came from systemctl stop — using INVOCATION_ID / _shutdown_ctx["under_systemd"] as the signal source indicator. This approach is fundamentally unreliable.
Why inference breaks
From inside the process, a SIGTERM is a SIGTERM — there is no reliable way to distinguish:
systemctl stop (intentional admin stop → should exit 0)
kill -SIGTERM $PID from an external process (unintended → should exit 1 so Restart=always can revive)
- OOM killer sending SIGTERM within a systemd cgroup (unintended → should exit 1)
All three look identical inside shutdown_signal_handler(). Checking INVOCATION_ID or under_systemd only tells you "I am running under systemd" — not "systemd is the one who sent this SIGTERM."
Concrete failure cases
| Scenario |
INVOCATION_ID |
Expected exit |
Both PRs exit |
systemctl stop |
set |
0 ✅ |
0 ✅ |
kill -SIGTERM from external process, under systemd |
set |
1 (restart wanted) |
0 ❌ |
| OOM killer SIGTERM in systemd cgroup |
set |
1 (restart wanted) |
0 ❌ |
Docker container with systemd, kill -SIGTERM |
set |
1 (restart wanted) |
0 ❌ |
In all the ❌ cases, the gateway exits 0, systemd marks the unit as inactive instead of failed, and no automatic restart happens — defeating the purpose of Restart=on-failure / Restart=always.
Additional issues with #41642
- The test file (
test_systemd_stop_exit_code.py) tests a simulated version of the condition logic rather than the actual shutdown_signal_handler() code path. The _simulate_planned_stop_detection() function duplicates the branching logic instead of invoking the real handler, so passing tests do not verify real behavior.
- Relies on
snapshot_shutdown_context() which performs I/O (/proc reads) — this runs synchronously inside the asyncio signal handler. If /proc is slow or unavailable, the detection silently degrades.
Proposed approach
The existing planned_stop marker mechanism is the correct design — hermes gateway stop already writes a marker file before sending SIGTERM. The problem is that systemctl stop bypasses this mechanism by sending SIGTERM directly.
Fix the systemd unit, not the signal handler:
[Service]
ExecStop=/usr/bin/hermes gateway stop
# OR: ExecStop=/bin/sh -c 'hermes gateway stop || true'
This way:
systemctl stop → calls hermes gateway stop → writes planned-stop marker → sends SIGTERM → signal handler finds marker → exit 0 ✅
- External
kill -SIGTERM → no marker → signal handler treats as unexpected → exit 1 → Restart=always revives ✅
- OOM killer → no marker → exit 1 → restart ✅
No inference, no guessing, no new edge cases.
If ExecStop change is not possible
If changing the installed unit is undesirable (e.g., upgrade path concerns), a lighter alternative is to add an ExecStop helper that writes the marker and then sends SIGTERM, without going through the full CLI:
#!/bin/sh
# hermes-systemd-stop-helper
MARKER_DIR=$(hermes config get marker_dir 2>/dev/null)
[ -n "$MARKER_DIR" ] && touch "$MARKER_DIR/planned-stop-$(cat $MAINPID)"
kill -SIGTERM "$MAINPID"
This still avoids the inference problem entirely.
Summary
Both PRs implement the right goal (exit 0 on systemctl stop) but use an unreliable mechanism (environment-based inference) that breaks the restart guarantee for unintended SIGTERMs. The correct fix is to make systemctl stop go through the existing planned-stop marker path by changing ExecStop, rather than teaching the signal handler to guess who sent the signal.
Related
Problem
Both PRs attempt to solve #41631 by inferring whether a SIGTERM came from
systemctl stop— usingINVOCATION_ID/_shutdown_ctx["under_systemd"]as the signal source indicator. This approach is fundamentally unreliable.Why inference breaks
From inside the process, a SIGTERM is a SIGTERM — there is no reliable way to distinguish:
systemctl stop(intentional admin stop → should exit 0)kill -SIGTERM $PIDfrom an external process (unintended → should exit 1 so Restart=always can revive)All three look identical inside
shutdown_signal_handler(). CheckingINVOCATION_IDorunder_systemdonly tells you "I am running under systemd" — not "systemd is the one who sent this SIGTERM."Concrete failure cases
systemctl stopkill -SIGTERMfrom external process, under systemdkill -SIGTERMIn all the ❌ cases, the gateway exits 0, systemd marks the unit as
inactiveinstead offailed, and no automatic restart happens — defeating the purpose ofRestart=on-failure/Restart=always.Additional issues with #41642
test_systemd_stop_exit_code.py) tests a simulated version of the condition logic rather than the actualshutdown_signal_handler()code path. The_simulate_planned_stop_detection()function duplicates the branching logic instead of invoking the real handler, so passing tests do not verify real behavior.snapshot_shutdown_context()which performs I/O (/procreads) — this runs synchronously inside the asyncio signal handler. If/procis slow or unavailable, the detection silently degrades.Proposed approach
The existing
planned_stopmarker mechanism is the correct design —hermes gateway stopalready writes a marker file before sending SIGTERM. The problem is thatsystemctl stopbypasses this mechanism by sending SIGTERM directly.Fix the systemd unit, not the signal handler:
This way:
systemctl stop→ callshermes gateway stop→ writes planned-stop marker → sends SIGTERM → signal handler finds marker → exit 0 ✅kill -SIGTERM→ no marker → signal handler treats as unexpected → exit 1 → Restart=always revives ✅No inference, no guessing, no new edge cases.
If ExecStop change is not possible
If changing the installed unit is undesirable (e.g., upgrade path concerns), a lighter alternative is to add an ExecStop helper that writes the marker and then sends SIGTERM, without going through the full CLI:
This still avoids the inference problem entirely.
Summary
Both PRs implement the right goal (exit 0 on
systemctl stop) but use an unreliable mechanism (environment-based inference) that breaks the restart guarantee for unintended SIGTERMs. The correct fix is to makesystemctl stopgo through the existing planned-stop marker path by changingExecStop, rather than teaching the signal handler to guess who sent the signal.