Skip to content

[Bug]: Gateway restart from Telegram session causes process death #6666

@autumnlong

Description

@autumnlong

Bug Description

Hermes Version: 0.8.0
Environment: WSL2 Linux, systemd user service, HTTP proxy required for external access

When the agent running inside a Telegram gateway session executes a gateway restart, the gateway process dies and never comes back up. The restart mechanism does not properly use systemd and instead tries to manage the process directly, causing a PID race condition.

Steps to Reproduce

  1. Start gateway as systemd user service: systemctl --user start hermes-gateway
  2. Send a message to the bot via Telegram asking it to restart the gateway
  3. The agent executes a restart command (likely "hermes gateway restart" or "hermes gateway run --replace")
  4. Gateway stops and never recovers

LOG EVIDENCE:

[Event 1 - 21:14:45] Agent triggered restart from Telegram session:
gateway.platforms.telegram: Telegram button resolved 1 approval(s) for session agent:main:telegram:dm:5004002140
gateway.run: Stopping gateway...
gateway.run: ✓ discord disconnected
gateway.run: ✓ telegram disconnected
gateway.run: Gateway stopped
gateway.run: Cron ticker stopped

[After restart attempt]:
❌ Gateway already running (PID 18664).
Use 'hermes gateway restart' to replace it,
or 'hermes gateway stop' to kill it first.
Or use 'hermes gateway run --replace' to auto-replace.

Expected Behavior

Gateway restart triggered from any platform (Telegram, Discord, CLI) should use "systemctl --user restart hermes-gateway" to properly manage the lifecycle, ensuring:

  • No PID race condition
  • Service file customizations (like proxy Environment overrides) are preserved
  • The gateway reliably comes back up after restart

Actual Behavior

Manually run: systemctl --user start hermes-gateway
Must also maintain a systemd override file at:
~/.config/systemd/user/hermes-gateway.service.d/override.conf
to preserve proxy environment variables across restarts.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

Telegram, Discord

Operating System

Hermes Version: 0.8.0 Environment: WSL2 Linux, systemd user service, HTTP proxy required for external access

Python Version

3.11.9

Hermes Version

Hermes Version: 0.8.0

Relevant Logs / Traceback

Root Cause Analysis (optional)

LOG EVIDENCE:

[Event 1 - 21:14:45] Agent triggered restart from Telegram session:
gateway.platforms.telegram: Telegram button resolved 1 approval(s) for session agent:main:telegram:dm:5004002140
gateway.run: Stopping gateway...
gateway.run: ✓ discord disconnected
gateway.run: ✓ telegram disconnected
gateway.run: Gateway stopped
gateway.run: Cron ticker stopped

[After restart attempt]:
❌ Gateway already running (PID 18664).
Use 'hermes gateway restart' to replace it,
or 'hermes gateway stop' to kill it first.
Or use 'hermes gateway run --replace' to auto-replace.

[Event 2 - 22:16:18] Same issue repeated:
gateway.run: Stopping gateway...
gateway.run: ✓ telegram disconnected
gateway.run: Gateway stopped
(service exits, never restarts)

ROOT CAUSE ANALYSIS:

  1. When restart is triggered from within the gateway (via Telegram agent session), the agent runs a command like "hermes gateway restart" or "hermes gateway run --replace"
  2. This kills the current gateway process (which is also hosting the agent session)
  3. The restart command tries to start a new process using nohup/direct execution instead of systemd
  4. PID race condition: the new process detects the old PID still exists and exits
  5. The old process then exits too, leaving no gateway running
  6. When running as systemd service, "hermes gateway restart" regenerates the service file, which can also strip custom Environment overrides (e.g., HTTP proxy settings)

EXPECTED BEHAVIOR:
Gateway restart triggered from any platform (Telegram, Discord, CLI) should use "systemctl --user restart hermes-gateway" to properly manage the lifecycle, ensuring:

  • No PID race condition
  • Service file customizations (like proxy Environment overrides) are preserved
  • The gateway reliably comes back up after restart

WORKAROUND:
Manually run: systemctl --user start hermes-gateway
Must also maintain a systemd override file at:
~/.config/systemd/user/hermes-gateway.service.d/override.conf
to preserve proxy environment variables across restarts.

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions