fix(streaming): surface dropped tool-call on mid-stream stall#12072
Merged
Conversation
When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) |
teknium1
added a commit
that referenced
this pull request
Apr 18, 2026
… endpoint The direct Minimax OpenRouter endpoint silently drops tool-call streams on tool-calling workflows (MiniMax-M2#109, reproduced 4/4 times on 2026-04-18: zero content, no finish_reason, silent close at ~40s). PR #12072 surfaced the failure to the user; this PR avoids it entirely by routing minimax/* requests to Fireworks / NovitaAI / Google-Vertex / AtlasCloud / Together by default. New module agent/provider_tweaks.py centralizes known-broken-endpoint avoidance with a single registry entry per upstream bug. User-supplied provider preferences (provider_sort, providers_allowed/ignored/order) always win — tweaks only fill in defaults where absent, and a user who sets 'only' is fully opted out. Wired into both provider_preferences build sites in run_agent.py (main chat loop + iteration-summary call). Only applies when base_url targets openrouter.ai. Validation | | Before | After | |---|---|---| | minimax/minimax-m2.7 tool-call stream on OR (direct endpoint) | 0/4 success | 4/4 on Fireworks | | extra_body.provider injected for minimax/* on OpenRouter | no | ignore=[minimax] order=[fireworks,novitaai,google-vertex,atlascloud,together] | | extra_body.provider for anthropic/* on OpenRouter | unchanged | unchanged | | extra_body.provider for minimax/* on api.minimax.io | unchanged | unchanged | | User-supplied {only:[minimax]} | unchanged | unchanged (explicit opt-in honoured) | | tests/agent/test_provider_tweaks.py | n/a | 23 passed | | tests/run_agent/test_streaming.py | 26 passed | 26 passed | Live e2e sanity (real OpenRouter call): 89.6s clean response via Fireworks, with `extra_body.provider={'ignore': ['minimax'], 'order': ['fireworks',...]}` confirmed in the outgoing request.
cg2aigc
pushed a commit
to cg2aigc/hermes-agent
that referenced
this pull request
Apr 18, 2026
…search#12072) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) |
pljeroen
pushed a commit
to pljeroen/hermes-agent
that referenced
this pull request
Apr 19, 2026
… endpoint The direct Minimax OpenRouter endpoint silently drops tool-call streams on tool-calling workflows (MiniMax-M2#109, reproduced 4/4 times on 2026-04-18: zero content, no finish_reason, silent close at ~40s). PR NousResearch#12072 surfaced the failure to the user; this PR avoids it entirely by routing minimax/* requests to Fireworks / NovitaAI / Google-Vertex / AtlasCloud / Together by default. New module agent/provider_tweaks.py centralizes known-broken-endpoint avoidance with a single registry entry per upstream bug. User-supplied provider preferences (provider_sort, providers_allowed/ignored/order) always win — tweaks only fill in defaults where absent, and a user who sets 'only' is fully opted out. Wired into both provider_preferences build sites in run_agent.py (main chat loop + iteration-summary call). Only applies when base_url targets openrouter.ai. Validation | | Before | After | |---|---|---| | minimax/minimax-m2.7 tool-call stream on OR (direct endpoint) | 0/4 success | 4/4 on Fireworks | | extra_body.provider injected for minimax/* on OpenRouter | no | ignore=[minimax] order=[fireworks,novitaai,google-vertex,atlascloud,together] | | extra_body.provider for anthropic/* on OpenRouter | unchanged | unchanged | | extra_body.provider for minimax/* on api.minimax.io | unchanged | unchanged | | User-supplied {only:[minimax]} | unchanged | unchanged (explicit opt-in honoured) | | tests/agent/test_provider_tweaks.py | n/a | 23 passed | | tests/run_agent/test_streaming.py | 26 passed | 26 passed | Live e2e sanity (real OpenRouter call): 89.6s clean response via Fireworks, with `extra_body.provider={'ignore': ['minimax'], 'order': ['fireworks',...]}` confirmed in the outgoing request.
0xK8oX
pushed a commit
to 0xK8oX/hermes-agent
that referenced
this pull request
Apr 24, 2026
…search#12072) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) | # Conflicts: # run_agent.py
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
…search#12072) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) |
aj-nt
pushed a commit
to aj-nt/hermes-agent
that referenced
this pull request
May 1, 2026
…search#12072) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) |
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…search#12072) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) |
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…search#12072) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) |
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…search#12072) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Users no longer get a silent failure when a streaming model stalls mid tool-call — the dropped tool is now named in a user-visible warning appended to the assistant message.
Root cause
When a stream dies after text was already delivered, the partial-stream recovery path at the end of
_interruptible_streaming_api_callreturns a stub with the recovered text andtool_calls=None,finish_reason=stop. That's correct for text-only stalls (retrying would duplicate the text), but when a tool call was in flight, the attempted action is lost with no indication — agent treats the turn as complete, session exits cleanly, nothing happened.Changes
run_agent.py: streaming accumulator records each tool-call name intoresult['partial_tool_names']as soon as the name is known.run_agent.py: stub builder checks that list and, when non-empty, appends⚠ Stream stalled mid tool-call (<names>); the action was not executed. Ask me to retry if you want to continue.tocontent, and also fires it as a live stream delta so the user sees it before the turn closes, not just in the persisted transcript.run_agent.py: text-only partial streams keep the original bare-recovery behaviour (no warning noise for a scenario the previous design already handled correctly).tests/run_agent/test_streaming.py: two new regression tests — one asserts the warning fires + names the dropped tool + reaches the live delta callback, one asserts text-only partial streams are unchanged.Validation
tests/run_agent/test_streaming.pyReproduction methodology (for future re-verification)
1. Environment setup
2. The trigger prompt
Substantive enough that MiniMax M2.7 emits leading commentary, starts a
write_filetool call, and needs to generate a large JSON arguments blob — that's when the stall hits the 240 s stale-stream detector:3. Run the session (non-interactive, so
-qcaptures the full lifecycle cleanly)HERMES_HOME=/tmp/hermes-minimax-eval \ HERMES_DEBUG_INTERRUPT=1 \ PYTHONPATH=/home/teknium/.hermes/hermes-agent \ /home/teknium/.hermes/hermes-agent/venv/bin/python3 -m hermes_cli.main chat \ --provider openrouter \ --model minimax/minimax-m2.7 \ --yolo \ -q "$(cat /tmp/hermes-prompt.txt)" \ 2>&1 | tee /tmp/hermes-minimax-repro/session.log(Note the slug form —
--provider openrouter --model minimax/minimax-m2.7. Using--model openrouter/minimax/minimax-m2.7fails HTTP 400 "not a valid model ID".)4. The observed failure (on
mainbefore this PR)search_files,read_fileon each backend module — visible as🔎 grep,📖 readlines"Now I have all the information needed. Let me write the comprehensive audit:"as text content, then starts emitting awrite_filetool call_stream_stale_timeoutinrun_agent.py), hermes emits⚠️ No response from provider for 240s ... Reconnecting...deltas_were_sent["yes"] = Truepath at line ~6085 takes over → returns a stub withtool_calls=None,finish_reason="stop"Duration: Xm Ys,Messages: N (M user, K tool calls))/tmp/minimax-audit.mdis written. No error is surfaced to the user. Exit code is 0.5. Root cause confirmation
Read
run_agent.pylines 6039-6072 (theif result["error"] is not Nonebranch after the streaming while loop):tool_calls=Nonehard-codes the loss of any in-progress tool call.tool_calls_acc(the dict accumulating partial tool-call deltas) is in scope inside_call_chat_completionsonly, so the stub builder can't see it. The fix threads the partial tool names through the sharedresultdict.6. Deterministic repro without a live model (what the new unit test does)
tests/run_agent/test_streaming.py::TestPartialToolCallWarning::test_partial_tool_call_surfaces_warningconstructs a fake stream as a generator that yields:content="Let me write the audit: ") — flipsdeltas_were_sent["yes"]name="write_file"— populatestool_calls_acc+result["partial_tool_names"]tool_calls_accraise _StallError(...)— simulates the mid-stream connection deathIt patches
_create_request_openai_clientso that the mock'schat.completions.createreturns this generator, constructs anAIAgentwithapi_mode="chat_completions", and calls_interruptible_streaming_api_call({})directly. Runs in ~1 second with no network, no real model. Asserts the stub content contains both the recovered text AND"Stream stalled mid tool-call"AND the tool name"write_file"AND that the warning was also fired as a live delta.The paired test
test_partial_text_only_no_warninguses the same pattern minus the tool-call deltas and asserts the pre-fix bare-recovery behaviour is unchanged for text-only partial streams.7. Re-running just this PR's tests
cd /home/teknium/.hermes/hermes-agent scripts/run_tests.sh tests/run_agent/test_streaming.py::TestPartialToolCallWarning -vExpected: 2 passed in <2 s.
8. Related environmental gotchas encountered during the hunt
tools.*INFO logging is suppressed underquiet_modeviarun_agent.py:923. SetHERMES_DEBUG_INTERRUPT=1to see trace fromtools/environments/base.py+tools/interrupt.py.Duration:footer. On MiniMax M2.7 that extends wall-clock uptime by minutes. Not a hang.~/.hermes/hermes-agent/standalone clone lags behindorigin/main— setPYTHONPATHto a worktree for live-test of just-merged fixes.