Skip to content

fix(gateway): treat SIGTERM under systemd as planned stop (exit 0)#41639

Closed
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/gateway-systemd-sigterm-exit-code
Closed

fix(gateway): treat SIGTERM under systemd as planned stop (exit 0)#41639
liuhao1024 wants to merge 1 commit into
NousResearch:mainfrom
liuhao1024:fix/gateway-systemd-sigterm-exit-code

Conversation

@liuhao1024

Copy link
Copy Markdown
Contributor

What does this PR do?

Treats SIGTERM under systemd as a planned stop (exit 0) instead of an unexpected kill (exit 1). When systemctl stop sends SIGTERM, the gateway now exits cleanly so the unit reports "inactive" instead of "failed".

Related Issue

Fixes #41631

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • gateway/run.py: Added systemd context check (INVOCATION_ID env var) in shutdown_signal_handler — when running under systemd, SIGTERM is treated as a planned stop (exit 0). The unit uses Restart=always, so exit 0 is safe and avoids the spurious "failed" state.
  • tests/gateway/test_systemd_sigterm_exit.py: 4 tests covering SIGTERM under systemd (planned), SIGTERM without systemd (unplanned), SIGINT always planned, and marker+systemd interaction.

How to Test

  1. Install a gateway under systemd: hermes gateway install && hermes gateway start
  2. Stop it: systemctl --user stop hermes-gateway-<name>
  3. Check status: systemctl --user is-active hermes-gateway-<name> — should show "inactive" (not "failed")
  4. Run the new tests: pytest tests/gateway/test_systemd_sigterm_exit.py -v

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A (Linux-only systemd check, guarded by INVOCATION_ID env var)
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Code Intelligence

  • Analyzed: gateway/run.py:shutdown_signal_handler (callers: signal handler registration via loop.add_signal_handler)
  • Blast radius: LOW — only affects exit code under systemd SIGTERM, no behavioral change for non-systemd or planned-stop-marker paths
  • Related patterns: shutdown_forensics.snapshot_shutdown_context already detects under_systemd via INVOCATION_ID; hermes_cli/gateway.py installs units with Restart=always

When the gateway runs under a systemd unit, `systemctl stop` sends
SIGTERM which the signal handler treated as an unexpected kill — exiting
with code 1 and leaving the unit in "failed" state.

Under systemd, SIGTERM is always intentional (systemctl stop, system
shutdown, etc.).  Treat it as a planned stop so the unit reports
"inactive" instead of "failed".  systemd's Restart=always (which the
installed unit uses) restarts on any exit, so exit 0 is safe.

Fixes NousResearch#41631
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Jun 8, 2026
@liuhao1024

Copy link
Copy Markdown
Contributor Author

Thanks for flagging @alt-glitch. Comparing the two PRs:

The production code changes are equivalent, but #41642 has more comprehensive test coverage and integrates with the existing shutdown forensics system. Keeping #41642 open — happy to consolidate if #41639 is preferred by maintainers.

@liuhao1024

Copy link
Copy Markdown
Contributor Author

Closing in favor of #41642, which uses the existing snapshot_shutdown_context() infrastructure and has more comprehensive test coverage (8 tests vs 4). Both fix #41631 equivalently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: gateway exits code 1 (→ unit 'failed') on systemctl stop; planned stops should exit 0

2 participants