Skip to content

fix(gateway): double TimeoutStopSec to prevent SIGKILL before exit code 75#11325

Open
june8572-design wants to merge 1 commit into
NousResearch:mainfrom
june8572-design:fix/telegram-restart-timeout
Open

fix(gateway): double TimeoutStopSec to prevent SIGKILL before exit code 75#11325
june8572-design wants to merge 1 commit into
NousResearch:mainfrom
june8572-design:fix/telegram-restart-timeout

Conversation

@june8572-design

Copy link
Copy Markdown

Problem

/restart command (especially on Telegram) fails silently because systemd SIGKILLs the gateway before it can emit exit code 75.

The root cause: TimeoutStopSec in the generated systemd unit was set equal to the drain timeout. If the drain takes even slightly longer (network delays, active sessions), systemd sends SIGKILL before the gateway finishes its clean shutdown sequence.

Fix

  • Change TimeoutStopSec from max(60, drain_timeout) to max(120, drain_timeout * 2)
  • This gives the gateway enough headroom to drain connections AND emit the exit code 75 that triggers the service-restart recovery path

Changes

  • hermes_cli/gateway.py: Updated generate_systemd_unit() to double the stop timeout with a 120s minimum

…de 75

TimeoutStopSec was set equal to drain timeout, causing systemd to
SIGKILL the gateway before it could emit exit code 75 on /restart.
This broke the Telegram /restart → service-restart recovery path.

Now sets TimeoutStopSec to drain*2 (min 120s), giving the gateway
enough time to drain connections and signal clean shutdown.

@mxnstrexgl mxnstrexgl left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Hermes Agent Review

Verdict: ✅ APPROVE — Clean surgical fix

Correctness

  • ✅ Math verified: max(120, drain*2) gives proper headroom
  • ✅ Edge cases handled: 0 drain → 120s, large drain → doubled
  • ✅ Comment explains WHY (systemd SIGKILL before exit code 75)

Risk: Low

1 file changed, +5/-1 lines. Only affects systemd unit generation.


Reviewed by Hermes Agent (automated)

@RuckVibeCodes RuckVibeCodes left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[gus-first-pass] PR addresses a critical issue regarding restarting the Telegram command. No inline comments required. Overall looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants