Skip to content

[Gateway] Stuck session resumes on restart — creates unrecoverable loop #7536

@SHL0MS

Description

@SHL0MS

Problem

When a gateway session gets stuck (hung terminal command, runaway tool loop), restarting the gateway re-enters the stuck state. The user cannot get a clean start.

What happens:

  1. Agent enters a stuck state (hung terminal command, tool loop, 30+ minute hang)
  2. User restarts gateway (hermes gateway stop && hermes gateway)
  3. Telegram redelivers the last unacknowledged message
  4. Agent loads the session from the DB, sees the conversation context, and resumes the stuck task
  5. Back to step 1

User report: "It keeps trying to resume the previous task every time I reboot the gateway and it ends up dying. It's unusable."

The user's only escape was /stop (force-stop) which eventually worked, but required the agent to be responsive enough to process the command.

Current workaround

hermes gateway stop
hermes sessions delete <stuck-session-id>
hermes gateway

Or nuclear: rm -f ~/.hermes/hermes_state.db (regenerates on start, loses all session history).

Suggested fixes

  1. /stop should mark the session as "don't resume" — set a flag in the session DB so the gateway skips it on restart
  2. hermes gateway --clean — start with no pending sessions, ignore Telegram's update backlog
  3. Detect stuck resume loops — if the same session enters the error/timeout state within N seconds of gateway start, abandon it instead of retrying
  4. Consume stale Telegram updates on startup — on gateway start, read and discard any Telegram updates older than N minutes

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions