fix(agent,gateway): voice interrupts + cascading interrupt hang#6600
fix(agent,gateway): voice interrupts + cascading interrupt hang#6600kristianvast wants to merge 2 commits into
Conversation
When a voice/audio message arrived while an agent was running, it hit the interrupt path with event.text == "" because STT only happened at line ~2601, *after* the running-agent guard. The voice was silently dropped — the agent saw an empty interrupt and the user never heard back. Fix in two places: 1. Fresh-message path (_enrich_message_with_transcription, line 5659): now returns a (text, transcripts) tuple so callers can echo raw transcripts back to the user before the agent loop starts. The fresh-message dispatch at line 2614 echoes each transcript as 🎙️ "..." immediately, so the user sees STT quality in real time. 2. Interrupt path (gateway's monitor_for_interrupt async task, line ~6710, and the post-agent drain via the new _dequeue_pending_with_transcription helper at line 5750): transcribe audio media BEFORE calling agent.interrupt(), so the running agent gets the real transcript instead of an empty string or a file-path placeholder. Same 🎙️ echo format as fresh voice messages — voice interrupts now feel identical to text interrupts. Tests updated to match the new tuple return type of _enrich_message_with_transcription.
…duced closes ## The bug When agent.interrupt() fires during an active LLM API call, the main thread intentionally force-closes the worker-local httpx client to stop token generation (comment at _interruptible_api_call line 4128 and _interruptible_streaming_api_call line 4662). This raises RemoteProtocolError on the daemon worker thread's chat.completions.create() call — which is the expected consequence, NOT a network bug. The streaming retry loop (_call() at line ~4486) treated this as a transient connection error and retried it, logging "⚠️ Connection to provider dropped (RemoteProtocolError). Reconnecting… (attempt N/M)". Each doomed retry stalled for the full HERMES_STREAM_STALE_TIMEOUT (default 180s, scaled up to 300s for large contexts), producing a multi-minute hang of the "musing..." spinner after rapid interrupts. Worse, because the gateway caches AIAgent instances per session (gateway/run.py:6327-6371), the stale daemon worker outlives the interrupted turn. When the cached agent starts its next turn, the stale worker from the previous turn is still running — retrying, falling back, emitting status messages — and races the new turn on shared client state. This was the root cause of the 7-minute cascading interrupt hang observed in the wild (13:00:12 → 13:07:54 on 2026-04-09). ## The fix Add a request-local cancellation token (_request_cancelled dict) inside both _interruptible_api_call and _interruptible_streaming_api_call. The main thread sets it to True when it observes self._interrupt_requested and is about to force-close the client. The worker's retry loop checks the token at four decision points and exits cleanly on cancellation: 1. Top of the retry loop (before each new _stream_attempt) — so rapid cascading interrupts don't waste a fresh request. 2. Immediately inside the "except Exception as e" block — so the forced RemoteProtocolError is recognized as a cancel, not a transient error. 3. Before emitting the "Reconnecting…" status and retrying — prevents user-facing noise that implies a real network outage. 4. Before falling back to the non-streaming _interruptible_api_call — prevents a second doomed request. Same pattern applied to the non-streaming path: the worker's exception handler checks the cancel token and returns without surfacing the caught error, so the main thread's InterruptedError is the only thing callers see. Why a request-local token instead of self._interrupt_requested: self._interrupt_requested is cleared at run_conversation() turn boundaries (lines 6899 and 8883), so a stale daemon worker from the previous turn can't reliably observe it — by the time the worker checks, the flag may already be False. A token scoped to the specific request survives the turn boundary and unambiguously marks THIS request as cancelled regardless of what happens to the agent's global state. Also adds explicit "Force-closing httpx client due to interrupt (not a network error)" debug log so future debuggers don't misread the subsequent RemoteProtocolError cascade as a provider outage. ## Tests New tests/test_cascading_interrupt.py with 5 regression tests: - test_interrupt_during_stream_does_not_retry — interrupt mid-stream, assert create() called exactly once, no "Reconnecting…" status, <2s. - test_cached_agent_after_interrupt_second_turn_clean — interrupt turn A, immediately start turn B on the SAME agent, assert turn B succeeds quickly with no stale-worker contamination. - test_interrupt_during_non_streaming_does_not_leak_error — same guarantee for the non-streaming path. - test_logged_as_cancellation_not_reconnect — verify the cancel-path debug log fires and the "Streaming attempt N/M failed" log does NOT. - test_normal_transient_error_still_retries — regression guard: real RemoteProtocolError (no interrupt) still triggers the retry path. All 5 pass in 3.08s. No regressions in the existing 257 interrupt / streaming / run_agent / gateway tests. Design validated by oracle consultation (session ses_28e0015f0ffeDet3wmOv9s550z).
|
Related to #8434 (voice messages interrupting with empty text) — this PR provides a more comprehensive fix covering both the empty-text and cascading hang issues. |
|
Related to #8434 — this PR provides a more comprehensive fix covering both the empty-text and cascading hang issues. |
…-interrupt hang) When agent.interrupt() fires during an active LLM call, the main poll loop force-closes the worker-local httpx client to stop token generation. That raises a transport error (RemoteProtocolError) on the worker thread — the EXPECTED consequence of our own close, not a network bug. The streaming retry loop misclassified it as a transient connection error and retried; each doomed retry stalled for the full stream-stale timeout (up to 300s). Because the gateway caches AIAgent instances per session, the stale worker outlived the interrupted turn and raced the next turn's request on shared client state — the root of the multi-minute cascading-interrupt hang reported in the wild. Fix: a request-local _request_cancelled token set by the poll loop right before the force-close, in both interruptible_api_call (non-streaming) and interruptible_streaming_api_call. The worker's exception handler checks the token and exits cleanly — no retry, no fallback, no 'reconnecting' status — instead of treating the forced error as transient. The token is request- local (not agent._interrupt_requested, which is cleared at turn boundaries) so a stale worker outliving its turn still recognizes its own forced close. Original diagnosis and fix by @kristianvast (PR #6600), against the then- inline methods in run_agent.py. Those were since extracted into agent/chat_completion_helpers.py, so the fix is reapplied there. Co-authored-by: Kristian Vastveit <kristianvast@users.noreply.github.com>
|
Your diagnosis and fix were spot-on. Since this PR was opened, the The voice-transcription-during-active-run portion of this PR (commit 2) was left out of this salvage to keep it focused on the cascading-interrupt fix; that gateway-side change is being tracked separately. Thanks for the thorough root-cause writeup. |
Salvaged from NousResearch#6600 (@kristianvast) — re-scoped to the voice half only and rebased onto current main. The cascading-interrupt hang half of the original PR landed independently in dd0d122, so this carries ONLY Problem 1. When a voice/audio message arrives while the agent is busy on the same session, it hit the interrupt path with empty text because STT only ran after the running-agent guard — the voice was effectively lost. Now we transcribe audio BEFORE signaling the agent (and on the fresh-message path), echo the raw transcript back to the user (🎙️), and _enrich_message_with_transcription returns (text, transcripts) so callers can echo. A new _dequeue_pending_with_transcription drives the post-agent drain the same way. Reapplied onto _prepare_inbound_message_text (inbound enrichment was extracted from the inline dispatch block since the original PR). Co-authored-by: Kristian Vastveit <kristian@agrointel.no>
…ming-worker fix(gateway): transcribe voice messages during active agent runs (salvage #6600, voice half)
…-interrupt hang) When agent.interrupt() fires during an active LLM call, the main poll loop force-closes the worker-local httpx client to stop token generation. That raises a transport error (RemoteProtocolError) on the worker thread — the EXPECTED consequence of our own close, not a network bug. The streaming retry loop misclassified it as a transient connection error and retried; each doomed retry stalled for the full stream-stale timeout (up to 300s). Because the gateway caches AIAgent instances per session, the stale worker outlived the interrupted turn and raced the next turn's request on shared client state — the root of the multi-minute cascading-interrupt hang reported in the wild. Fix: a request-local _request_cancelled token set by the poll loop right before the force-close, in both interruptible_api_call (non-streaming) and interruptible_streaming_api_call. The worker's exception handler checks the token and exits cleanly — no retry, no fallback, no 'reconnecting' status — instead of treating the forced error as transient. The token is request- local (not agent._interrupt_requested, which is cleared at turn boundaries) so a stale worker outliving its turn still recognizes its own forced close. Original diagnosis and fix by @kristianvast (PR NousResearch#6600), against the then- inline methods in run_agent.py. Those were since extracted into agent/chat_completion_helpers.py, so the fix is reapplied there. Co-authored-by: Kristian Vastveit <kristianvast@users.noreply.github.com>
Salvaged from NousResearch#6600 (@kristianvast) — re-scoped to the voice half only and rebased onto current main. The cascading-interrupt hang half of the original PR landed independently in dd0d122, so this carries ONLY Problem 1. When a voice/audio message arrives while the agent is busy on the same session, it hit the interrupt path with empty text because STT only ran after the running-agent guard — the voice was effectively lost. Now we transcribe audio BEFORE signaling the agent (and on the fresh-message path), echo the raw transcript back to the user (🎙️), and _enrich_message_with_transcription returns (text, transcripts) so callers can echo. A new _dequeue_pending_with_transcription drives the post-agent drain the same way. Reapplied onto _prepare_inbound_message_text (inbound enrichment was extracted from the inline dispatch block since the original PR). Co-authored-by: Kristian Vastveit <kristian@agrointel.no>
…-interrupt hang) When agent.interrupt() fires during an active LLM call, the main poll loop force-closes the worker-local httpx client to stop token generation. That raises a transport error (RemoteProtocolError) on the worker thread — the EXPECTED consequence of our own close, not a network bug. The streaming retry loop misclassified it as a transient connection error and retried; each doomed retry stalled for the full stream-stale timeout (up to 300s). Because the gateway caches AIAgent instances per session, the stale worker outlived the interrupted turn and raced the next turn's request on shared client state — the root of the multi-minute cascading-interrupt hang reported in the wild. Fix: a request-local _request_cancelled token set by the poll loop right before the force-close, in both interruptible_api_call (non-streaming) and interruptible_streaming_api_call. The worker's exception handler checks the token and exits cleanly — no retry, no fallback, no 'reconnecting' status — instead of treating the forced error as transient. The token is request- local (not agent._interrupt_requested, which is cleared at turn boundaries) so a stale worker outliving its turn still recognizes its own forced close. Original diagnosis and fix by @kristianvast (PR NousResearch#6600), against the then- inline methods in run_agent.py. Those were since extracted into agent/chat_completion_helpers.py, so the fix is reapplied there. Co-authored-by: Kristian Vastveit <kristianvast@users.noreply.github.com>
Salvaged from NousResearch#6600 (@kristianvast) — re-scoped to the voice half only and rebased onto current main. The cascading-interrupt hang half of the original PR landed independently in dd0d122, so this carries ONLY Problem 1. When a voice/audio message arrives while the agent is busy on the same session, it hit the interrupt path with empty text because STT only ran after the running-agent guard — the voice was effectively lost. Now we transcribe audio BEFORE signaling the agent (and on the fresh-message path), echo the raw transcript back to the user (🎙️), and _enrich_message_with_transcription returns (text, transcripts) so callers can echo. A new _dequeue_pending_with_transcription drives the post-agent drain the same way. Reapplied onto _prepare_inbound_message_text (inbound enrichment was extracted from the inline dispatch block since the original PR). Co-authored-by: Kristian Vastveit <kristian@agrointel.no>
… fork consolidation; finish fork-feature ports Per-cluster restoration with the test suite as the oracle, after comparing the merged tree's failures against a pristine-upstream run in the same environment (14 file-level deltas, now zero): - gateway/run.py: upstream wholesale (fork's monolith had undone the mixin decomposition; both real fork deltas re-applied — voice_ack_callback **kwargs; the custom-providers context-length fix exists upstream). - agent/conversation_loop.py + turn_context.py: upstream structure with the fork features regrafted at their new homes — sender_device attribution (#131), preflight token-usage emission + compression-complete status and live-estimate snapshots (#126). - agent/chat_completion_helpers.py: upstream wholesale (brings the second partial-stream-stub routing site and the NousResearch#6600 cancellation fix). - agent/tool_executor.py: usage= kwarg on tool start/complete callbacks now falls back to the bare 3-arg form for legacy receivers. - tools/approval.py: upstream's resolved-HERMES_HOME rewrite + normalize steps restored alongside the fork's self-host kill guard (#128). - hermes_cli/main.py: desktop install-identity stale-build cluster and the post-subcommand global-flag hoister ported from fork main. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-interrupt hang) When agent.interrupt() fires during an active LLM call, the main poll loop force-closes the worker-local httpx client to stop token generation. That raises a transport error (RemoteProtocolError) on the worker thread — the EXPECTED consequence of our own close, not a network bug. The streaming retry loop misclassified it as a transient connection error and retried; each doomed retry stalled for the full stream-stale timeout (up to 300s). Because the gateway caches AIAgent instances per session, the stale worker outlived the interrupted turn and raced the next turn's request on shared client state — the root of the multi-minute cascading-interrupt hang reported in the wild. Fix: a request-local _request_cancelled token set by the poll loop right before the force-close, in both interruptible_api_call (non-streaming) and interruptible_streaming_api_call. The worker's exception handler checks the token and exits cleanly — no retry, no fallback, no 'reconnecting' status — instead of treating the forced error as transient. The token is request- local (not agent._interrupt_requested, which is cleared at turn boundaries) so a stale worker outliving its turn still recognizes its own forced close. Original diagnosis and fix by @kristianvast (PR #6600), against the then- inline methods in run_agent.py. Those were since extracted into agent/chat_completion_helpers.py, so the fix is reapplied there. Co-authored-by: Kristian Vastveit <kristianvast@users.noreply.github.com>
Salvaged from #6600 (@kristianvast) — re-scoped to the voice half only and rebased onto current main. The cascading-interrupt hang half of the original PR landed independently in 6370360, so this carries ONLY Problem 1. When a voice/audio message arrives while the agent is busy on the same session, it hit the interrupt path with empty text because STT only ran after the running-agent guard — the voice was effectively lost. Now we transcribe audio BEFORE signaling the agent (and on the fresh-message path), echo the raw transcript back to the user (🎙️), and _enrich_message_with_transcription returns (text, transcripts) so callers can echo. A new _dequeue_pending_with_transcription drives the post-agent drain the same way. Reapplied onto _prepare_inbound_message_text (inbound enrichment was extracted from the inline dispatch block since the original PR). Co-authored-by: Kristian Vastveit <kristian@agrointel.no>
…ming-worker fix(gateway): transcribe voice messages during active agent runs (salvage #6600, voice half)
Summary
Two related bug fixes for voice-message and streaming-interrupt handling. They're complementary — together they make mid-task voice messages behave identically to mid-task text messages, and eliminate a multi-minute hang that could follow rapid consecutive interrupts.
Problem 1 — Voice messages during active agent runs silently drop
When a voice message arrives while the agent is busy on the same session, it hits the interrupt path with
event.text == ""because STT only happens at_enrich_message_with_transcriptionafter the running-agent guard. The voice is effectively lost: the agent sees an empty interrupt, and nothing gets echoed back to the user. This makes mid-task voice messaging unusable in the gateway.Related upstream context: #6548 (Discord transcription), 2508098 (Discord placeholder strip), 6e02fa7 (Discord empty placeholder) — all improve fresh-voice paths, but none touch the running-agent interrupt path.
Fix
Two touch points in
gateway/run.py:_enrich_message_with_transcriptionnow returns a(text, transcripts)tuple so callers can echo raw transcripts back to the user before feeding them to the agent. The fresh-message dispatch at_handle_messageechoes each transcript as🎙️ "..."immediately, giving the user visible confirmation of STT quality in real time (matches how Heimdal already feels on fresh messages for users who had this as a local patch).monitor_for_interrupttask now transcribes audio media before callingagent.interrupt(), and a new_dequeue_pending_with_transcriptionhelper drives the post-agent drain the same way. Result: voice interrupts reach the running agent with the real transcript, not a placeholder or empty string.Same
🎙️echo format for both fresh and interrupt paths — voice interrupts now feel identical to text interrupts from the user's side.Problem 2 — Cascading interrupt hang (7+ min ""musing…"" after rapid interrupts)
When
agent.interrupt()fires during an active LLM streaming call, the main thread intentionally force-closes the worker-local httpx client to stop token generation. The comment at_interruptible_streaming_api_callspells this out: ""Force-close the in-flight worker-local HTTP connection to stop token generation without poisoning the shared client used to seed future retries."" The resultingRemoteProtocolErroron the daemon worker thread is the expected consequence, not a network bug.But the streaming retry loop inside
_call()treated this as a transient connection error and retried it — each retry stalling for the fullHERMES_STREAM_STALE_TIMEOUT(180s base, up to 300s for large contexts per the scaling at line ~4650). With 2 retries, that's ~6 minutes of ""Reconnecting… (attempt N/M)"" status spam, then a fallback to the non-streaming path that also fails the same way, then eventually a legitInterruptedErrordelivered to the caller.The cached-agent twist: the gateway caches
AIAgentinstances per session (_agent_cacheatgateway/run.pyline ~6327). When the main thread raisesInterruptedErrorand the turn ends, the daemon worker from the interrupted turn is not joined — it keeps running in the background. When the next turn starts on the same cached agent, the stale worker is still retrying, still emitting ""Reconnecting…"" status, still touching shared client state, and races the new turn.Observed in production on 2026-04-09: a voice interrupt at
13:00:12, then ""musing…"" for 7 minutes 42 seconds before another user-supplied interrupt at13:07:54finally unstuck the agent. Cascading rapid interrupts reproduced the pattern reliably.Fix — request-local cancellation token
Add a
_request_cancelled = {""value"": False}dict scoped to each call of_interruptible_api_calland_interruptible_streaming_api_call. The outer poll loop sets it toTruebefore force-closing the httpx client on interrupt. The worker's retry loop checks it at four decision points and exits cleanly if set:_stream_attempt— so rapid cascading interrupts don't waste a fresh request.except Exception as e:handler, before classifying the error as transient — so the forcedRemoteProtocolErroris recognized as a cancel, not retried.Same pattern applied to the non-streaming
_interruptible_api_call: the worker's exception handler checks the cancel token and returns without surfacing the caught error, so the main thread'sInterruptedErroris the only thing callers see.Why a request-local token instead of
self._interrupt_requestedself._interrupt_requestedis cleared atrun_conversation()turn boundaries (clear_interrupt()calls at line ~6899 and ~8883). A stale daemon worker from the previous turn can't reliably observe it — by the time the worker checks, the flag may already beFalsefor the next turn. A token scoped to the specific request survives the turn boundary and unambiguously marks this request as cancelled regardless of what happens to the agent's global state.Explicitly not done (deliberate)
self.clear_interrupt()to theInterruptedErrorhandler at line ~7529. It's already cleared atrun_conversationentry/exit, and clearing it early risks wiping_interrupt_messagebefore result assembly at lines ~8878-8880.self._replace_primary_openai_client()in any interrupt path. That would break the worker-local client isolation design intent (""without poisoning the shared client used to seed future retries"") and add churn on cross-turn races on the cached agent.HERMES_AGENT_TIMEOUTalready bounds the whole turn; adding a narrower retry-budget deserves its own PR if desired.Design validated by oracle consultation before implementation.
How to test
Automated
New regression tests in
tests/test_cascading_interrupt.pywith 5 scenarios:test_interrupt_during_stream_does_not_retry— interrupt mid-stream; assertcreate()called exactly once, no ""Reconnecting…"" status emitted, elapsed < 2s.test_cached_agent_after_interrupt_second_turn_clean— interrupt turn A, immediately start turn B on the same agent instance; assert turn B succeeds quickly with no stale-worker contamination.test_interrupt_during_non_streaming_does_not_leak_error— same guarantee for the non-streaming path.test_logged_as_cancellation_not_reconnect— verify the cancel-path debug log fires and no ""Streaming attempt N/M failed"" log appears after the interrupt.test_normal_transient_error_still_retries— regression guard: a genuineRemoteProtocolError(no interrupt) still triggers the retry path so real network errors continue to recover.Run:
All 5 pass in ~3s. The broader related suite (
tests/test_interrupt_propagation.py,tests/test_interactive_interrupt.py,tests/test_streaming.py,tests/test_run_agent.py,tests/tools/test_interrupt.py,tests/gateway/test_stt_config.py) — 104 tests — also all pass with no regressions after this change.Manual (gateway)
""count to 300 slowly""or similar.🎙️ ""…""echo within ~1 second, followed by the agent pivoting to the transcribed instruction.🎙️echo and a clean handoff — no ""Reconnecting… (attempt N/M)"" messages, no multi-minute hang.""Force-closing httpx client due to interrupt (not a network error)""when interrupts fire (diagnostic log, debug level).Platforms tested
File surface
gateway/run.pyrun_agent.pytests/test_cascading_interrupt.pytests/gateway/test_stt_config.pyAll changes are additive where possible. No refactoring, no reformatting, no behavior change to paths that don't involve cancellation.
Related upstream work
25080986fix(gateway): discard empty placeholder when voice transcription succeeds (Discord) — complementary, targets_enrich_message_with_transcriptionfor a different duplication issue on fresh messages.ae4a884efix(agent): disable stale stream timeout for local providers (fix(agent): disable stale stream timeout for local providers #6368) — adjacent to my changes in_interruptible_streaming_api_callbut outside my modified region. Rebased cleanly.1a3ae6acfeat: structured API error classification for smart failover (feat: structured API error classification for smart failover #6514) — separate retry-logic improvement; does not overlap.License
By submitting this PR I agree my contributions are licensed under MIT per
CONTRIBUTING.md.