fix(gateway): retry startup auto-resume when a failed platform reconnects by Frowtek · Pull Request #37669 · NousResearch/hermes-agent

Frowtek · 2026-06-02T21:52:34Z

What does this PR do?

The gateway's documented startup auto-resume skips sessions whose platform
adapter isn't connected yet — and never retries them. A platform that connects
shortly after startup left its interrupted sessions behind; they only recovered
if the user sent a new message. This wires the auto-resume into the reconnect
path so a recovered platform retries the continuation for its own sessions.

Related Issue

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

gateway/run.py — _schedule_resume_pending_sessions() gains an optional
platform filter + an in-flight guard against double resumes.
gateway/run.py — the reconnect-watcher success path re-runs the auto-resume
scoped to the reconnected platform.
tests/gateway/test_restart_resume_pending.py,
tests/gateway/test_platform_reconnect.py — regression coverage.

How to Test

Mark a session resume_pending while its adapter is offline at startup → skipped.
Bring the platform online → reconnect schedules the resume for that session.
scripts/run_tests.sh tests/gateway/test_restart_resume_pending.py tests/gateway/test_platform_reconnect.py -q — all pass.

Checklist

Read the Contributing Guide
Commit messages follow Conventional Commits (fix(gateway): ...)
Searched for duplicate PRs
Only changes related to this fix
Tests pass and I've added regression tests
Cross-platform impact — N/A (pure async control-flow)

…ects

mohamedorigami-jpg · 2026-06-02T22:01:32Z

Clean separation of startup resume vs reconnect resume. The platform scoping is the right approach — reconnects should only retry their own sessions.

One thing worth checking: the entry.session_key in self._running_agents guard on the reconnect path assumes the original agent has already been launched and stored in _running_agents before the reconnect fires. If a platform is flapping (disconnect→reconnect in rapid succession), there's a timing window where the first reconnect triggers resume, the agent fires, the platform drops again, the second reconnect fires before the first agent is stored in _running_agents — you'd get duplicate resumes for the same session. A transition counter (disconnect_count incremented on each drop, matched on reconnect) would close that gap if flappy platforms are a real concern for your deployment.

teknium1 · 2026-06-04T12:57:09Z

Merged via PR #39018 — your commit was cherry-picked onto current main with your authorship preserved in git log (rebase-merge, commit 71a9f44). Thanks @Frowtek!

xiaoyaner0201 · 2026-06-04T15:33:08Z

Cherry-picked this onto a v0.15.2-based fork (Discord gateway) alongside #30030. Resolves a restart-resume amnesia we were hitting — a session whose platform reconnected slightly after startup never got its interrupted turn resumed until a fresh user message arrived. The test_restart_resume_pending + test_platform_reconnect suites pass. Thanks for the fix!

fix(gateway): retry startup auto-resume when a failed platform reconn…

2f8d5fe

…ects

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels Jun 2, 2026

teknium1 mentioned this pull request Jun 4, 2026

fix(gateway): retry startup auto-resume when a failed platform reconnects #39018

Merged

teknium1 closed this in #39018 Jun 4, 2026

xiaoyaner0201 mentioned this pull request Jun 4, 2026

Gateway restart resume can lose immediate pre-restart context (possible JSONL vs SQLite transcript mismatch) #13121

Open

xiaoyaner0201 mentioned this pull request Jun 4, 2026

fix(gateway): harden restart resume recovery #30030

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): retry startup auto-resume when a failed platform reconnects#37669

fix(gateway): retry startup auto-resume when a failed platform reconnects#37669
Frowtek wants to merge 1 commit into
NousResearch:mainfrom
Frowtek:fix/late-reconnect-auto-resume

Frowtek commented Jun 2, 2026 •

edited

Loading

Uh oh!

mohamedorigami-jpg commented Jun 2, 2026

Uh oh!

teknium1 commented Jun 4, 2026

Uh oh!

xiaoyaner0201 commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Frowtek commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Uh oh!

mohamedorigami-jpg commented Jun 2, 2026

Uh oh!

teknium1 commented Jun 4, 2026

Uh oh!

xiaoyaner0201 commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Frowtek commented Jun 2, 2026 •

edited

Loading