fix(streaming): surface dropped tool-call on mid-stream stall by teknium1 · Pull Request #12072 · NousResearch/hermes-agent

teknium1 · 2026-04-18T08:51:32Z

Summary

Users no longer get a silent failure when a streaming model stalls mid tool-call — the dropped tool is now named in a user-visible warning appended to the assistant message.

Root cause

When a stream dies after text was already delivered, the partial-stream recovery path at the end of _interruptible_streaming_api_call returns a stub with the recovered text and tool_calls=None, finish_reason=stop. That's correct for text-only stalls (retrying would duplicate the text), but when a tool call was in flight, the attempted action is lost with no indication — agent treats the turn as complete, session exits cleanly, nothing happened.

Changes

run_agent.py: streaming accumulator records each tool-call name into result['partial_tool_names'] as soon as the name is known.
run_agent.py: stub builder checks that list and, when non-empty, appends ⚠ Stream stalled mid tool-call (<names>); the action was not executed. Ask me to retry if you want to continue. to content, and also fires it as a live stream delta so the user sees it before the turn closes, not just in the persisted transcript.
run_agent.py: text-only partial streams keep the original bare-recovery behaviour (no warning noise for a scenario the previous design already handled correctly).
tests/run_agent/test_streaming.py: two new regression tests — one asserts the warning fires + names the dropped tool + reaches the live delta callback, one asserts text-only partial streams are unchanged.

Validation

Scenario	Before	After
Stream dies mid tool-call, text already delivered	Silent exit code 0, no indication	Warning in content + live delta, user sees which tool was dropped
Text-only partial stream	Bare recovered text	Unchanged
`tests/run_agent/test_streaming.py`	24 passed	26 passed (2 new)

Reproduction methodology (for future re-verification)

1. Environment setup

# Isolated HERMES_HOME so prior state doesn't leak in
mkdir -p /tmp/hermes-minimax-eval && cp ~/.hermes/.env /tmp/hermes-minimax-eval/.env
cat > /tmp/hermes-minimax-eval/config.yaml <<'EOF'
_config_version: 5
display:
  streaming: true
agent:
  max_turns: 50
EOF

2. The trigger prompt

Substantive enough that MiniMax M2.7 emits leading commentary, starts a write_file tool call, and needs to generate a large JSON arguments blob — that's when the stall hits the 240 s stale-stream detector:

cat > /tmp/hermes-prompt.txt <<'EOF'
You have access to the hermes-agent repo at /home/teknium/.hermes/hermes-agent.
Audit how interrupts and subprocess cleanup work across ALL execution backends:

1. Read tools/environments/base.py, local.py, docker.py, ssh_env.py (if exists,
   else list what's there), modal_utils.py, and any daytona/singularity files.
2. For each backend identify: how _wait_for_process polls, how _kill_process
   works, what happens if the agent is interrupted mid-execution, what happens
   if the agent process crashes.
3. Check interrupt propagation to in-flight commands per backend.
4. Write a thorough markdown audit to /tmp/minimax-audit.md with one section
   per backend, including line-number references and concrete code excerpts.
5. At the end, list inconsistencies or potential bugs across backends.

Take your time, read actual code, don't speculate. I want real evidence.
EOF

3. Run the session (non-interactive, so `-q` captures the full lifecycle cleanly)

HERMES_HOME=/tmp/hermes-minimax-eval \
HERMES_DEBUG_INTERRUPT=1 \
PYTHONPATH=/home/teknium/.hermes/hermes-agent \
/home/teknium/.hermes/hermes-agent/venv/bin/python3 -m hermes_cli.main chat \
    --provider openrouter \
    --model minimax/minimax-m2.7 \
    --yolo \
    -q "$(cat /tmp/hermes-prompt.txt)" \
    2>&1 | tee /tmp/hermes-minimax-repro/session.log

(Note the slug form — --provider openrouter --model minimax/minimax-m2.7. Using --model openrouter/minimax/minimax-m2.7 fails HTTP 400 "not a valid model ID".)

4. The observed failure (on `main` before this PR)

Agent streams reasoning + commentary for ~4-6 minutes
Eventually emits tool calls: search_files, read_file on each backend module — visible as 🔎 grep, 📖 read lines
Reaches the summary step: streams "Now I have all the information needed. Let me write the comprehensive audit:" as text content, then starts emitting a write_file tool call
MiniMax goes silent mid-JSON-arguments
After 240 s (scaled stale-stream threshold at this context size, _stream_stale_timeout in run_agent.py), hermes emits ⚠️ No response from provider for 240s ... Reconnecting...
Connection is closed, retry runs, also fails the same way
Final: deltas_were_sent["yes"] = True path at line ~6085 takes over → returns a stub with tool_calls=None, finish_reason="stop"
Agent treats the turn as complete, writes session footer (Duration: Xm Ys, Messages: N (M user, K tool calls))
No /tmp/minimax-audit.md is written. No error is surfaced to the user. Exit code is 0.

5. Root cause confirmation

Read run_agent.py lines 6039-6072 (the if result["error"] is not None branch after the streaming while loop):

if deltas_were_sent["yes"]:
    _partial_text = (getattr(self, "_current_streamed_assistant_text", "") or "").strip() or None
    logger.warning("Partial stream delivered before error; returning stub ...")
    _stub_msg = SimpleNamespace(
        role="assistant", content=_partial_text, tool_calls=None,  # ← !!!
        reasoning_content=None,
    )
    return SimpleNamespace(
        id="partial-stream-stub",
        choices=[SimpleNamespace(index=0, message=_stub_msg, finish_reason="stop")],
        ...
    )

tool_calls=None hard-codes the loss of any in-progress tool call. tool_calls_acc (the dict accumulating partial tool-call deltas) is in scope inside _call_chat_completions only, so the stub builder can't see it. The fix threads the partial tool names through the shared result dict.

6. Deterministic repro without a live model (what the new unit test does)

tests/run_agent/test_streaming.py::TestPartialToolCallWarning::test_partial_tool_call_surfaces_warning constructs a fake stream as a generator that yields:

A text chunk (content="Let me write the audit: ") — flips deltas_were_sent["yes"]
A tool_call delta with name="write_file" — populates tool_calls_acc + result["partial_tool_names"]
A tool_call delta with partial args — extends tool_calls_acc
Then raise _StallError(...) — simulates the mid-stream connection death

It patches _create_request_openai_client so that the mock's chat.completions.create returns this generator, constructs an AIAgent with api_mode="chat_completions", and calls _interruptible_streaming_api_call({}) directly. Runs in ~1 second with no network, no real model. Asserts the stub content contains both the recovered text AND "Stream stalled mid tool-call" AND the tool name "write_file" AND that the warning was also fired as a live delta.

The paired test test_partial_text_only_no_warning uses the same pattern minus the tool-call deltas and asserts the pre-fix bare-recovery behaviour is unchanged for text-only partial streams.

7. Re-running just this PR's tests

cd /home/teknium/.hermes/hermes-agent
scripts/run_tests.sh tests/run_agent/test_streaming.py::TestPartialToolCallWarning -v

Expected: 2 passed in <2 s.

8. Related environmental gotchas encountered during the hunt

tools.* INFO logging is suppressed under quiet_mode via run_agent.py:923. Set HERMES_DEBUG_INTERRUPT=1 to see trace from tools/environments/base.py + tools/interrupt.py.
Post-session cleanup (memory flush, title gen, any aux tasks) runs synchronously in-process after the Duration: footer. On MiniMax M2.7 that extends wall-clock uptime by minutes. Not a hang.
~/.hermes/hermes-agent/ standalone clone lags behind origin/main — set PYTHONPATH to a worktree for live-test of just-merged fixes.

When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) |

… endpoint The direct Minimax OpenRouter endpoint silently drops tool-call streams on tool-calling workflows (MiniMax-M2#109, reproduced 4/4 times on 2026-04-18: zero content, no finish_reason, silent close at ~40s). PR #12072 surfaced the failure to the user; this PR avoids it entirely by routing minimax/* requests to Fireworks / NovitaAI / Google-Vertex / AtlasCloud / Together by default. New module agent/provider_tweaks.py centralizes known-broken-endpoint avoidance with a single registry entry per upstream bug. User-supplied provider preferences (provider_sort, providers_allowed/ignored/order) always win — tweaks only fill in defaults where absent, and a user who sets 'only' is fully opted out. Wired into both provider_preferences build sites in run_agent.py (main chat loop + iteration-summary call). Only applies when base_url targets openrouter.ai. Validation | | Before | After | |---|---|---| | minimax/minimax-m2.7 tool-call stream on OR (direct endpoint) | 0/4 success | 4/4 on Fireworks | | extra_body.provider injected for minimax/* on OpenRouter | no | ignore=[minimax] order=[fireworks,novitaai,google-vertex,atlascloud,together] | | extra_body.provider for anthropic/* on OpenRouter | unchanged | unchanged | | extra_body.provider for minimax/* on api.minimax.io | unchanged | unchanged | | User-supplied {only:[minimax]} | unchanged | unchanged (explicit opt-in honoured) | | tests/agent/test_provider_tweaks.py | n/a | 23 passed | | tests/run_agent/test_streaming.py | 26 passed | 26 passed | Live e2e sanity (real OpenRouter call): 89.6s clean response via Fireworks, with `extra_body.provider={'ignore': ['minimax'], 'order': ['fireworks',...]}` confirmed in the outgoing request.

…search#12072) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) |

… endpoint The direct Minimax OpenRouter endpoint silently drops tool-call streams on tool-calling workflows (MiniMax-M2#109, reproduced 4/4 times on 2026-04-18: zero content, no finish_reason, silent close at ~40s). PR NousResearch#12072 surfaced the failure to the user; this PR avoids it entirely by routing minimax/* requests to Fireworks / NovitaAI / Google-Vertex / AtlasCloud / Together by default. New module agent/provider_tweaks.py centralizes known-broken-endpoint avoidance with a single registry entry per upstream bug. User-supplied provider preferences (provider_sort, providers_allowed/ignored/order) always win — tweaks only fill in defaults where absent, and a user who sets 'only' is fully opted out. Wired into both provider_preferences build sites in run_agent.py (main chat loop + iteration-summary call). Only applies when base_url targets openrouter.ai. Validation | | Before | After | |---|---|---| | minimax/minimax-m2.7 tool-call stream on OR (direct endpoint) | 0/4 success | 4/4 on Fireworks | | extra_body.provider injected for minimax/* on OpenRouter | no | ignore=[minimax] order=[fireworks,novitaai,google-vertex,atlascloud,together] | | extra_body.provider for anthropic/* on OpenRouter | unchanged | unchanged | | extra_body.provider for minimax/* on api.minimax.io | unchanged | unchanged | | User-supplied {only:[minimax]} | unchanged | unchanged (explicit opt-in honoured) | | tests/agent/test_provider_tweaks.py | n/a | 23 passed | | tests/run_agent/test_streaming.py | 26 passed | 26 passed | Live e2e sanity (real OpenRouter call): 89.6s clean response via Fireworks, with `extra_body.provider={'ignore': ['minimax'], 'order': ['fireworks',...]}` confirmed in the outgoing request.

…search#12072) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) | # Conflicts: # run_agent.py

…search#12072) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: | Scenario | Before | After | |---------------------------------------------|---------------------------|---------------------------------------------| | Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool | | Text-only partial stream | Bare recovered text | Unchanged | | tests/run_agent/test_streaming.py | 24 passed | 26 passed (2 new) |

teknium1 merged commit 8322b42 into main Apr 18, 2026
4 of 7 checks passed

teknium1 deleted the hermes/hermes-3a6370f8 branch April 18, 2026 08:52

teknium1 mentioned this pull request Apr 18, 2026

fix(streaming): route minimax/* on OpenRouter away from broken direct endpoint #12087

Closed

WwNeXst mentioned this pull request Apr 18, 2026

[Bug]: Context compression failure uses static placeholder instead of preserved message tail — context permanently lost #12131

Open

alt-glitch mentioned this pull request Apr 26, 2026

Bug: 长文档写入频繁触发 'Stream stalled mid tool-call' 错误 #15886

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(streaming): surface dropped tool-call on mid-stream stall#12072

fix(streaming): surface dropped tool-call on mid-stream stall#12072
teknium1 merged 1 commit into
mainfrom
hermes/hermes-3a6370f8

teknium1 commented Apr 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

teknium1 commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Changes

Validation

Reproduction methodology (for future re-verification)

1. Environment setup

2. The trigger prompt

3. Run the session (non-interactive, so -q captures the full lifecycle cleanly)

4. The observed failure (on main before this PR)

5. Root cause confirmation

6. Deterministic repro without a live model (what the new unit test does)

7. Re-running just this PR's tests

8. Related environmental gotchas encountered during the hunt

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

teknium1 commented Apr 18, 2026 •

edited

Loading

3. Run the session (non-interactive, so `-q` captures the full lifecycle cleanly)

4. The observed failure (on `main` before this PR)