Skip to content

gateway: systemctl restart from inside terminal tool deadlocks — kills itself mid-execution #37453

@brian-doherty

Description

@brian-doherty

What happened

When the agent runs systemctl --user restart hermes-gateway via the terminal tool, the command is a child process of the gateway itself. When systemd delivers SIGTERM to the gateway's main PID, it also kills the systemctl restart command mid-execution. The command that was supposed to restart the gateway gets killed by the very restart it initiated.

Why this matters

The gateway has a guard for hermes gateway restart ("Refusing to restart the gateway from inside the gateway process") but this only blocks the Hermes CLI command. Raw systemctl --user restart bypasses the guard and creates a deadlock.

Reproduction

  1. Start a session with the gateway running as a systemd service
  2. Have the agent run: systemctl --user restart hermes-gateway
  3. The gateway receives SIGTERM, starts draining active sessions
  4. The systemctl restart command (a child of the gateway) gets killed before it can complete
  5. If the restart command is killed mid-execution, systemd may not restart the service at all — or it may do so after a long delay

Observed behavior

09:27:29  agent runs "hermes gateway restart" → blocked by guard (correct)
09:27:33  agent runs "systemctl --user restart hermes-gateway" → bypasses guard
09:27:33  gateway receives SIGTERM, begins draining
09:28:34  drain times out after 60s — agent session still active
09:28:36  gateway exits with code 1
09:28:37  systemd journal: "Failed to kill control group ... Invalid argument"
09:34:48  gateway finally restarts (6+ minute gap!)

The 6-minute gap between stop and restart left Telegram, Discord, WhatsApp, and all cron jobs completely dead during that window.

Environment

  • Hermes Agent v0.15.1
  • systemd user service with KillMode=mixed, Restart=always
  • Linux (kernel 6.8.0-124-generic)

Suggested fix

Either:

  1. Block raw systemctl restart via the same guard that blocks hermes gateway restart
  2. Or detect that the restart command is a child of the gateway and defer it (e.g., schedule via systemd-run --user --on-active=10 to run from a separate cgroup)
  3. The existing guard could be extended to detect systemctl commands targeting the hermes-gateway unit

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/gatewayGateway runner, session dispatch, deliverytool/terminalTerminal execution and process managementtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions