fix(gateway): exit 0 when systemd sends SIGTERM via systemctl stop#41690
Open
dcain2336 wants to merge 1 commit into
Open
fix(gateway): exit 0 when systemd sends SIGTERM via systemctl stop#41690dcain2336 wants to merge 1 commit into
dcain2336 wants to merge 1 commit into
Conversation
When running under systemd (INVOCATION_ID set), treat SIGTERM as a planned stop so the unit exits cleanly (code 0) instead of code 1. Only signals from outside the service manager (external kill, OOM, container signal) exit non-zero so Restart=on-failure can revive. Previously, systemctl stop caused the gateway to exit 1, which put the unit in a failed state despite systemd having intentionally stopped it. Closes NousResearch#41631
Collaborator
Contributor
Code Review — Positive VerificationReviewed the full diff (211 lines: Correctness:
Edge cases handled:
Test coverage: 163-line test file covers the key scenarios. LGTM. |
Contributor
|
✅ Verified — systemd SIGTERM detection via INVOCATION_ID Reviewed the diff in
LGTM — clean gateway stability fix. |
14 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When running under systemd (detected via
INVOCATION_ID), treat SIGTERM as a planned stop so the unit exits cleanly (code 0) instead of code 1.Only signals from outside the service manager (external kill, OOM, container signal) exit non-zero so
Restart=on-failurecan revive the gateway.Problem
systemctl stop hermes-gateway.servicesends SIGTERM to the gateway. The gateway didn't have a "planned stop marker" (onlyhermes gateway stopcreates one), so it treated the signal as unexpected and exited 1 — putting the unit in a failed state despite systemd having intentionally stopped it.Fix
In
shutdown_signal_handler, after checking for takeover/planned-stop markers, we now check: ifreceived_signal == SIGTERMandINVOCATION_IDis set in the environment, setplanned_stop = Trueso_signal_initiated_shutdownstaysFalseand the gateway exits 0.This is safe because:
INVOCATION_IDis only set by systemd (not external kill, not container signal)INVOCATION_ID) still exits 1 forRestart=on-failureChanges
gateway/run.py: Added systemd-initiated SIGTERM detection in signal handlertests/test_issue_41631_fix.py: 9 regression tests covering all scenariosTest Plan
Closes #41631