fix(gateway): retry startup auto-resume when a failed platform reconnects#39018
Merged
Conversation
Contributor
🔎 Lint report:
|
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A messaging platform that was offline at gateway startup now gets its restart-interrupted sessions auto-resumed the moment it reconnects, instead of waiting for a manual user message.
Root cause: the documented startup auto-resume (
_schedule_resume_pending_sessions()) runs once at startup and silently skips any session whose adapter isn't connected yet. Those sessions were never rescheduled, so a late-connecting platform's interrupted sessions only recovered when the user sent a fresh message.Changes
gateway/run.py—_schedule_resume_pending_sessions()gains an optionalplatformfilter and an in-flight guard (session_key in self._running_agents) so a session can't be resumed twice.gateway/run.py— the reconnect-watcher success path re-runs the auto-resume scoped to the reconnected platform (wrapped in try/except so a reschedule failure can't break reconnect).tests/gateway/— 3 new regression tests (late reconnect reschedules, platform-scoping, running-agent skip).Validation
tests/gateway/test_restart_resume_pending.py,tests/gateway/test_platform_reconnect.py).Salvage of #37669 by @Frowtek — cherry-picked onto current main, authorship preserved. Closes #37669.
Infographic