Skip to content

fix(gateway): handle planned service stops (salvage of #19876)#19936

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-b755c148
May 4, 2026
Merged

fix(gateway): handle planned service stops (salvage of #19876)#19936
teknium1 merged 1 commit into
mainfrom
hermes/hermes-b755c148

Conversation

@teknium1

@teknium1 teknium1 commented May 4, 2026

Copy link
Copy Markdown
Contributor

Salvage of #19876 onto current main.

Adds a short-lived .gateway-planned-stop.json marker so deliberate stop paths (hermes gateway stop, systemd stop, launchd stop, profile-scoped stop) and foreground Ctrl+C can be distinguished from an unexpected external SIGTERM. On a planned stop the gateway exits code 0 so service managers (most notably launchd KeepAlive, and --replace takeover races) don't immediately revive it with stale platform identities (Feishu app_id, Telegram getUpdates, etc.).

Follows the existing --replace takeover-marker pattern — factored into a shared _consume_pid_marker_for_self() helper in gateway/status.py.

Note: this closes the revival / identity-conflict class, but does NOT fix slow drains on wedged adapter sockets (e.g. WSL with a hung Feishu websocket). That's the separate drain-hang issue tracked in a follow-up.

Changes

  • gateway/status.py: write_planned_stop_marker(), consume_planned_stop_marker_for_self(), clear_planned_stop_marker(); shared _consume_pid_marker_for_self() helper
  • hermes_cli/gateway.py: stop_profile_gateway(), systemd_stop(), launchd_stop() write the marker before SIGTERM
  • gateway/run.py: signal handler consumes the marker, treats SIGINT as a planned foreground stop, exits cleanly
  • tests/gateway/test_status.py + tests/hermes_cli/test_gateway_service.py: marker + systemd_stop wiring coverage, deterministic TimeoutStopSec assertions

Validation

  • py_compile all three modified modules: clean
  • scripts/run_tests.sh tests/gateway/test_status.py tests/hermes_cli/test_gateway_service.py tests/gateway/test_clean_shutdown_marker.py tests/gateway/test_runner_startup_failures.py: 171 passed

Credit to @helix4u — cherry-picked with authorship preserved. Closes #19876.

@teknium1 teknium1 merged commit b632290 into main May 4, 2026
7 of 10 checks passed
@teknium1 teknium1 deleted the hermes/hermes-b755c148 branch May 4, 2026 23:00
@alt-glitch alt-glitch added comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working labels May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants