Skip to content

fix(gateway): batch critical fixes — session resume, /new race, HA WebSocket scheme#19182

Merged
kshitijk4poor merged 6 commits into
mainfrom
fix/gateway-critical-batch
May 3, 2026
Merged

fix(gateway): batch critical fixes — session resume, /new race, HA WebSocket scheme#19182
kshitijk4poor merged 6 commits into
mainfrom
fix/gateway-critical-batch

Conversation

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Summary

Batch salvage of 3 critical gateway fixes from independent contributors. Each addresses a confirmed bug on current main.

PR #19033 by @millerc79 — Session history destroyed on gateway restart

suspend_recently_active() blanket-set suspended=True on recent sessions at startup. Next message hit get_or_create_session which saw suspended=True → auto-reset → new session_id, empty history. Every fast restart/crash wiped ALL active conversations.

Fix: Set resume_pending=True instead of suspended=True. The resume path in get_or_create_session preserves the existing session_id and transcript. Tests updated to reflect the new semantics (4 tests updated across 2 files).

PR #18915 by @shellybotmoyer — /new response silently dropped (race condition)

In _dispatch_active_session_command(), the response to /new was sent AFTER cancel_session_processing(). The cancellation's 5-sec wait_for + finally blocks could interfere with the send, silently dropping the confirmation. User saw infinite "typing..." with no reply.

Fix: Send the response BEFORE cancelling the old task. Fixes #18912.

PR #18984 by @CharlieKerfoot — Home Assistant HTTPS WebSocket broken

.replace("http://", "ws://").replace("https://", "wss://") — for https:// URLs, the first replace matches the http prefix of https, producing ws://s://.... The second replace then has nothing to match. Completely breaks HA integration over TLS.

Fix: Swap replace order: https://wss:// first, then http://ws://.

Test results

  • 63 passed in restart_resume_pending + clean_shutdown_marker (0 failures after test updates)
  • All affected tests updated to reflect resume_pending semantics

millerc79 and others added 6 commits May 3, 2026 16:18
…suspend

suspend_recently_active() was unconditionally setting suspended=True on
startup, causing get_or_create_session() to wipe conversation history on
every restart. Change to set resume_pending=True instead, so sessions
auto-resume while still allowing stuck-loop escalation after 3 failures.
…avoid race (#18912)

When /new is issued while an agent is actively processing, the confirmation response was never sent to the user because cancel_session_processing() was called before _send_with_retry(). Task cancellation side effects could silently drop the response.

Fix: reorder to send the response BEFORE cancelling the old task. Add logging at the send point (matching the pattern at line 2800 in _process_message_background) so future failures are visible.

Closes: #18912
Tests updated to reflect suspend_recently_active now setting
resume_pending=True (preserves session) instead of suspended=True
(wipes session history).

AUTHOR_MAP entries: millerc79 (#19033), shellybotmoyer (#18915)
@kshitijk4poor kshitijk4poor merged commit 6f2dab2 into main May 3, 2026
9 of 10 checks passed
@kshitijk4poor kshitijk4poor deleted the fix/gateway-critical-batch branch May 3, 2026 10:54
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/gateway Gateway runner, session dispatch, delivery labels May 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition: /new during active agent session never sends response (Telegram gateway)

4 participants