Skip to content

fix(gateway): harden restart resume recovery#30030

Open
Qwinty wants to merge 2 commits into
NousResearch:mainfrom
Qwinty:fix/restart-resume-reply-anchor
Open

fix(gateway): harden restart resume recovery#30030
Qwinty wants to merge 2 commits into
NousResearch:mainfrom
Qwinty:fix/restart-resume-reply-anchor

Conversation

@Qwinty

@Qwinty Qwinty commented May 21, 2026

Copy link
Copy Markdown
Contributor

Summary

  • skip stale restart auto-resumes when the transcript already has a terminal assistant response or is outside the freshness window
  • resolve Telegram DM topic bindings before startup auto-resume loads transcript history, so recovery uses the session the topic will actually route to
  • skip empty bound Telegram DM topic sessions instead of synthesizing a blank internal turn and clear their stale resume_pending marker
  • preserve resume_pending metadata when Telegram topic binding resolution switches the session entry before the recovery turn
  • preserve the latest Telegram DM topic reply anchor for startup auto-resume and shutdown notices
  • avoid running post-turn goal continuations after provider/auth/rate-limit failure responses

Related Issue

Fixes #10163.

Supersedes the broader #29188 with a narrower restart-resume recovery fix. Related restart-resume and topic-binding work: #23314, #29713.

Tests

  • /usr/local/lib/hermes-agent/venv/bin/python -m pytest -q -o addopts='' tests/gateway/test_restart_resume_pending.py tests/gateway/test_telegram_topic_mode.py tests/gateway/test_goal_verdict_send.py — 132 passed, 1 warning
  • /usr/local/lib/hermes-agent/venv/bin/python -m py_compile gateway/run.py tests/gateway/test_restart_resume_pending.py tests/gateway/test_telegram_topic_mode.py
  • /usr/local/lib/hermes-agent/venv/bin/python -m ruff check gateway/run.py tests/gateway/test_restart_resume_pending.py tests/gateway/test_telegram_topic_mode.py
  • git diff --check

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels May 21, 2026
@Qwinty Qwinty force-pushed the fix/restart-resume-reply-anchor branch 2 times, most recently from 387ec85 to a7fa549 Compare May 26, 2026 08:47
@Qwinty

Qwinty commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

Rebased on current upstream/main; PR is mergeable again.

Local verification:

  • python -m pytest -q -o addopts='' tests/gateway/test_goal_verdict_send.py tests/gateway/test_restart_resume_pending.py - 86 passed
  • python -m ruff check gateway/run.py tests/gateway/test_goal_verdict_send.py tests/gateway/test_restart_resume_pending.py - passed
  • git diff --check upstream/main...HEAD - passed

CI has been retriggered on a7fa549d3.

@xiaoyaner0201

Copy link
Copy Markdown

Cherry-picked onto our v0.15.2-based fork (Discord gateway) together with #37669. The two compose cleanly — only a small benign overlap in _schedule_resume_pending_sessions, where #37669's in-flight guard and this PR's freshness / empty-bound-topic skips sit next to each other (both kept, sequentially). Restart-resume regression suite is green. Thanks for hardening this path!

@Qwinty Qwinty force-pushed the fix/restart-resume-reply-anchor branch from e5069a6 to a4e81bd Compare June 9, 2026 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Telegram topic session can behave like /new after gateway restart/update despite persistent-session design

3 participants