Skip to content

fix(streaming): surface dropped tool-call on mid-stream stall#12072

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-3a6370f8
Apr 18, 2026
Merged

fix(streaming): surface dropped tool-call on mid-stream stall#12072
teknium1 merged 1 commit into
mainfrom
hermes/hermes-3a6370f8

Conversation

@teknium1

@teknium1 teknium1 commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Users no longer get a silent failure when a streaming model stalls mid tool-call — the dropped tool is now named in a user-visible warning appended to the assistant message.

Root cause

When a stream dies after text was already delivered, the partial-stream recovery path at the end of _interruptible_streaming_api_call returns a stub with the recovered text and tool_calls=None, finish_reason=stop. That's correct for text-only stalls (retrying would duplicate the text), but when a tool call was in flight, the attempted action is lost with no indication — agent treats the turn as complete, session exits cleanly, nothing happened.

Changes

  • run_agent.py: streaming accumulator records each tool-call name into result['partial_tool_names'] as soon as the name is known.
  • run_agent.py: stub builder checks that list and, when non-empty, appends ⚠ Stream stalled mid tool-call (<names>); the action was not executed. Ask me to retry if you want to continue. to content, and also fires it as a live stream delta so the user sees it before the turn closes, not just in the persisted transcript.
  • run_agent.py: text-only partial streams keep the original bare-recovery behaviour (no warning noise for a scenario the previous design already handled correctly).
  • tests/run_agent/test_streaming.py: two new regression tests — one asserts the warning fires + names the dropped tool + reaches the live delta callback, one asserts text-only partial streams are unchanged.

Validation

Scenario Before After
Stream dies mid tool-call, text already delivered Silent exit code 0, no indication Warning in content + live delta, user sees which tool was dropped
Text-only partial stream Bare recovered text Unchanged
tests/run_agent/test_streaming.py 24 passed 26 passed (2 new)

Reproduction methodology (for future re-verification)

1. Environment setup

# Isolated HERMES_HOME so prior state doesn't leak in
mkdir -p /tmp/hermes-minimax-eval && cp ~/.hermes/.env /tmp/hermes-minimax-eval/.env
cat > /tmp/hermes-minimax-eval/config.yaml <<'EOF'
_config_version: 5
display:
  streaming: true
agent:
  max_turns: 50
EOF

2. The trigger prompt

Substantive enough that MiniMax M2.7 emits leading commentary, starts a write_file tool call, and needs to generate a large JSON arguments blob — that's when the stall hits the 240 s stale-stream detector:

cat > /tmp/hermes-prompt.txt <<'EOF'
You have access to the hermes-agent repo at /home/teknium/.hermes/hermes-agent.
Audit how interrupts and subprocess cleanup work across ALL execution backends:

1. Read tools/environments/base.py, local.py, docker.py, ssh_env.py (if exists,
   else list what's there), modal_utils.py, and any daytona/singularity files.
2. For each backend identify: how _wait_for_process polls, how _kill_process
   works, what happens if the agent is interrupted mid-execution, what happens
   if the agent process crashes.
3. Check interrupt propagation to in-flight commands per backend.
4. Write a thorough markdown audit to /tmp/minimax-audit.md with one section
   per backend, including line-number references and concrete code excerpts.
5. At the end, list inconsistencies or potential bugs across backends.

Take your time, read actual code, don't speculate. I want real evidence.
EOF

3. Run the session (non-interactive, so -q captures the full lifecycle cleanly)

HERMES_HOME=/tmp/hermes-minimax-eval \
HERMES_DEBUG_INTERRUPT=1 \
PYTHONPATH=/home/teknium/.hermes/hermes-agent \
/home/teknium/.hermes/hermes-agent/venv/bin/python3 -m hermes_cli.main chat \
    --provider openrouter \
    --model minimax/minimax-m2.7 \
    --yolo \
    -q "$(cat /tmp/hermes-prompt.txt)" \
    2>&1 | tee /tmp/hermes-minimax-repro/session.log

(Note the slug form — --provider openrouter --model minimax/minimax-m2.7. Using --model openrouter/minimax/minimax-m2.7 fails HTTP 400 "not a valid model ID".)

4. The observed failure (on main before this PR)

  • Agent streams reasoning + commentary for ~4-6 minutes
  • Eventually emits tool calls: search_files, read_file on each backend module — visible as 🔎 grep, 📖 read lines
  • Reaches the summary step: streams "Now I have all the information needed. Let me write the comprehensive audit:" as text content, then starts emitting a write_file tool call
  • MiniMax goes silent mid-JSON-arguments
  • After 240 s (scaled stale-stream threshold at this context size, _stream_stale_timeout in run_agent.py), hermes emits ⚠️ No response from provider for 240s ... Reconnecting...
  • Connection is closed, retry runs, also fails the same way
  • Final: deltas_were_sent["yes"] = True path at line ~6085 takes over → returns a stub with tool_calls=None, finish_reason="stop"
  • Agent treats the turn as complete, writes session footer (Duration: Xm Ys, Messages: N (M user, K tool calls))
  • No /tmp/minimax-audit.md is written. No error is surfaced to the user. Exit code is 0.

5. Root cause confirmation

Read run_agent.py lines 6039-6072 (the if result["error"] is not None branch after the streaming while loop):

if deltas_were_sent["yes"]:
    _partial_text = (getattr(self, "_current_streamed_assistant_text", "") or "").strip() or None
    logger.warning("Partial stream delivered before error; returning stub ...")
    _stub_msg = SimpleNamespace(
        role="assistant", content=_partial_text, tool_calls=None,  # ← !!!
        reasoning_content=None,
    )
    return SimpleNamespace(
        id="partial-stream-stub",
        choices=[SimpleNamespace(index=0, message=_stub_msg, finish_reason="stop")],
        ...
    )

tool_calls=None hard-codes the loss of any in-progress tool call. tool_calls_acc (the dict accumulating partial tool-call deltas) is in scope inside _call_chat_completions only, so the stub builder can't see it. The fix threads the partial tool names through the shared result dict.

6. Deterministic repro without a live model (what the new unit test does)

tests/run_agent/test_streaming.py::TestPartialToolCallWarning::test_partial_tool_call_surfaces_warning constructs a fake stream as a generator that yields:

  1. A text chunk (content="Let me write the audit: ") — flips deltas_were_sent["yes"]
  2. A tool_call delta with name="write_file" — populates tool_calls_acc + result["partial_tool_names"]
  3. A tool_call delta with partial args — extends tool_calls_acc
  4. Then raise _StallError(...) — simulates the mid-stream connection death

It patches _create_request_openai_client so that the mock's chat.completions.create returns this generator, constructs an AIAgent with api_mode="chat_completions", and calls _interruptible_streaming_api_call({}) directly. Runs in ~1 second with no network, no real model. Asserts the stub content contains both the recovered text AND "Stream stalled mid tool-call" AND the tool name "write_file" AND that the warning was also fired as a live delta.

The paired test test_partial_text_only_no_warning uses the same pattern minus the tool-call deltas and asserts the pre-fix bare-recovery behaviour is unchanged for text-only partial streams.

7. Re-running just this PR's tests

cd /home/teknium/.hermes/hermes-agent
scripts/run_tests.sh tests/run_agent/test_streaming.py::TestPartialToolCallWarning -v

Expected: 2 passed in <2 s.

8. Related environmental gotchas encountered during the hunt

  • tools.* INFO logging is suppressed under quiet_mode via run_agent.py:923. Set HERMES_DEBUG_INTERRUPT=1 to see trace from tools/environments/base.py + tools/interrupt.py.
  • Post-session cleanup (memory flush, title gen, any aux tasks) runs synchronously in-process after the Duration: footer. On MiniMax M2.7 that extends wall-clock uptime by minutes. Not a hang.
  • ~/.hermes/hermes-agent/ standalone clone lags behind origin/main — set PYTHONPATH to a worktree for live-test of just-merged fixes.

When streaming died after text was already delivered to the user but
before a tool-call's arguments finished streaming, the partial-stream
stub at the end of _interruptible_streaming_api_call silently set
`tool_calls=None` on the returned message and kept `finish_reason=stop`.
The agent treated the turn as complete, the session exited cleanly with
code 0, and the attempted action was lost with zero user-facing signal.

Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task:
agent streamed 'Let me write the audit:', started emitting a write_file
tool call, MiniMax stalled for 240s mid-arguments, the stale-stream
detector killed the connection, the stub fired, session ended, no file
written, no error shown.

Fix: the streaming accumulator now records each tool-call's name into
`result['partial_tool_names']` as soon as the name is known. When the
stub builder fires after a partial delivery and finds any recorded tool
names, it appends a human-visible warning to the stub's content — and
also fires it as a live stream delta so the user sees it immediately,
not only in the persisted transcript. The next turn's model also sees
the warning in conversation history and can retry on its own. Text-only
partial streams keep the original bare-recovery behaviour (no warning).

Validation:
| Scenario                                    | Before                    | After                                       |
|---------------------------------------------|---------------------------|---------------------------------------------|
| Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool |
| Text-only partial stream                     | Bare recovered text       | Unchanged                                   |
| tests/run_agent/test_streaming.py            | 24 passed                 | 26 passed (2 new)                           |
@teknium1 teknium1 merged commit 8322b42 into main Apr 18, 2026
4 of 7 checks passed
@teknium1 teknium1 deleted the hermes/hermes-3a6370f8 branch April 18, 2026 08:52
teknium1 added a commit that referenced this pull request Apr 18, 2026
… endpoint

The direct Minimax OpenRouter endpoint silently drops tool-call streams on
tool-calling workflows (MiniMax-M2#109, reproduced 4/4 times on 2026-04-18:
zero content, no finish_reason, silent close at ~40s). PR #12072 surfaced
the failure to the user; this PR avoids it entirely by routing minimax/*
requests to Fireworks / NovitaAI / Google-Vertex / AtlasCloud / Together
by default.

New module agent/provider_tweaks.py centralizes known-broken-endpoint
avoidance with a single registry entry per upstream bug. User-supplied
provider preferences (provider_sort, providers_allowed/ignored/order)
always win — tweaks only fill in defaults where absent, and a user who
sets 'only' is fully opted out.

Wired into both provider_preferences build sites in run_agent.py (main
chat loop + iteration-summary call). Only applies when base_url targets
openrouter.ai.

Validation
| | Before | After |
|---|---|---|
| minimax/minimax-m2.7 tool-call stream on OR (direct endpoint) | 0/4 success | 4/4 on Fireworks |
| extra_body.provider injected for minimax/* on OpenRouter | no | ignore=[minimax] order=[fireworks,novitaai,google-vertex,atlascloud,together] |
| extra_body.provider for anthropic/* on OpenRouter | unchanged | unchanged |
| extra_body.provider for minimax/* on api.minimax.io | unchanged | unchanged |
| User-supplied {only:[minimax]} | unchanged | unchanged (explicit opt-in honoured) |
| tests/agent/test_provider_tweaks.py | n/a | 23 passed |
| tests/run_agent/test_streaming.py | 26 passed | 26 passed |

Live e2e sanity (real OpenRouter call): 89.6s clean response via Fireworks,
with `extra_body.provider={'ignore': ['minimax'], 'order': ['fireworks',...]}`
confirmed in the outgoing request.
cg2aigc pushed a commit to cg2aigc/hermes-agent that referenced this pull request Apr 18, 2026
…search#12072)

When streaming died after text was already delivered to the user but
before a tool-call's arguments finished streaming, the partial-stream
stub at the end of _interruptible_streaming_api_call silently set
`tool_calls=None` on the returned message and kept `finish_reason=stop`.
The agent treated the turn as complete, the session exited cleanly with
code 0, and the attempted action was lost with zero user-facing signal.

Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task:
agent streamed 'Let me write the audit:', started emitting a write_file
tool call, MiniMax stalled for 240s mid-arguments, the stale-stream
detector killed the connection, the stub fired, session ended, no file
written, no error shown.

Fix: the streaming accumulator now records each tool-call's name into
`result['partial_tool_names']` as soon as the name is known. When the
stub builder fires after a partial delivery and finds any recorded tool
names, it appends a human-visible warning to the stub's content — and
also fires it as a live stream delta so the user sees it immediately,
not only in the persisted transcript. The next turn's model also sees
the warning in conversation history and can retry on its own. Text-only
partial streams keep the original bare-recovery behaviour (no warning).

Validation:
| Scenario                                    | Before                    | After                                       |
|---------------------------------------------|---------------------------|---------------------------------------------|
| Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool |
| Text-only partial stream                     | Bare recovered text       | Unchanged                                   |
| tests/run_agent/test_streaming.py            | 24 passed                 | 26 passed (2 new)                           |
pljeroen pushed a commit to pljeroen/hermes-agent that referenced this pull request Apr 19, 2026
… endpoint

The direct Minimax OpenRouter endpoint silently drops tool-call streams on
tool-calling workflows (MiniMax-M2#109, reproduced 4/4 times on 2026-04-18:
zero content, no finish_reason, silent close at ~40s). PR NousResearch#12072 surfaced
the failure to the user; this PR avoids it entirely by routing minimax/*
requests to Fireworks / NovitaAI / Google-Vertex / AtlasCloud / Together
by default.

New module agent/provider_tweaks.py centralizes known-broken-endpoint
avoidance with a single registry entry per upstream bug. User-supplied
provider preferences (provider_sort, providers_allowed/ignored/order)
always win — tweaks only fill in defaults where absent, and a user who
sets 'only' is fully opted out.

Wired into both provider_preferences build sites in run_agent.py (main
chat loop + iteration-summary call). Only applies when base_url targets
openrouter.ai.

Validation
| | Before | After |
|---|---|---|
| minimax/minimax-m2.7 tool-call stream on OR (direct endpoint) | 0/4 success | 4/4 on Fireworks |
| extra_body.provider injected for minimax/* on OpenRouter | no | ignore=[minimax] order=[fireworks,novitaai,google-vertex,atlascloud,together] |
| extra_body.provider for anthropic/* on OpenRouter | unchanged | unchanged |
| extra_body.provider for minimax/* on api.minimax.io | unchanged | unchanged |
| User-supplied {only:[minimax]} | unchanged | unchanged (explicit opt-in honoured) |
| tests/agent/test_provider_tweaks.py | n/a | 23 passed |
| tests/run_agent/test_streaming.py | 26 passed | 26 passed |

Live e2e sanity (real OpenRouter call): 89.6s clean response via Fireworks,
with `extra_body.provider={'ignore': ['minimax'], 'order': ['fireworks',...]}`
confirmed in the outgoing request.
0xK8oX pushed a commit to 0xK8oX/hermes-agent that referenced this pull request Apr 24, 2026
…search#12072)

When streaming died after text was already delivered to the user but
before a tool-call's arguments finished streaming, the partial-stream
stub at the end of _interruptible_streaming_api_call silently set
`tool_calls=None` on the returned message and kept `finish_reason=stop`.
The agent treated the turn as complete, the session exited cleanly with
code 0, and the attempted action was lost with zero user-facing signal.

Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task:
agent streamed 'Let me write the audit:', started emitting a write_file
tool call, MiniMax stalled for 240s mid-arguments, the stale-stream
detector killed the connection, the stub fired, session ended, no file
written, no error shown.

Fix: the streaming accumulator now records each tool-call's name into
`result['partial_tool_names']` as soon as the name is known. When the
stub builder fires after a partial delivery and finds any recorded tool
names, it appends a human-visible warning to the stub's content — and
also fires it as a live stream delta so the user sees it immediately,
not only in the persisted transcript. The next turn's model also sees
the warning in conversation history and can retry on its own. Text-only
partial streams keep the original bare-recovery behaviour (no warning).

Validation:
| Scenario                                    | Before                    | After                                       |
|---------------------------------------------|---------------------------|---------------------------------------------|
| Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool |
| Text-only partial stream                     | Bare recovered text       | Unchanged                                   |
| tests/run_agent/test_streaming.py            | 24 passed                 | 26 passed (2 new)                           |

# Conflicts:
#	run_agent.py
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…search#12072)

When streaming died after text was already delivered to the user but
before a tool-call's arguments finished streaming, the partial-stream
stub at the end of _interruptible_streaming_api_call silently set
`tool_calls=None` on the returned message and kept `finish_reason=stop`.
The agent treated the turn as complete, the session exited cleanly with
code 0, and the attempted action was lost with zero user-facing signal.

Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task:
agent streamed 'Let me write the audit:', started emitting a write_file
tool call, MiniMax stalled for 240s mid-arguments, the stale-stream
detector killed the connection, the stub fired, session ended, no file
written, no error shown.

Fix: the streaming accumulator now records each tool-call's name into
`result['partial_tool_names']` as soon as the name is known. When the
stub builder fires after a partial delivery and finds any recorded tool
names, it appends a human-visible warning to the stub's content — and
also fires it as a live stream delta so the user sees it immediately,
not only in the persisted transcript. The next turn's model also sees
the warning in conversation history and can retry on its own. Text-only
partial streams keep the original bare-recovery behaviour (no warning).

Validation:
| Scenario                                    | Before                    | After                                       |
|---------------------------------------------|---------------------------|---------------------------------------------|
| Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool |
| Text-only partial stream                     | Bare recovered text       | Unchanged                                   |
| tests/run_agent/test_streaming.py            | 24 passed                 | 26 passed (2 new)                           |
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…search#12072)

When streaming died after text was already delivered to the user but
before a tool-call's arguments finished streaming, the partial-stream
stub at the end of _interruptible_streaming_api_call silently set
`tool_calls=None` on the returned message and kept `finish_reason=stop`.
The agent treated the turn as complete, the session exited cleanly with
code 0, and the attempted action was lost with zero user-facing signal.

Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task:
agent streamed 'Let me write the audit:', started emitting a write_file
tool call, MiniMax stalled for 240s mid-arguments, the stale-stream
detector killed the connection, the stub fired, session ended, no file
written, no error shown.

Fix: the streaming accumulator now records each tool-call's name into
`result['partial_tool_names']` as soon as the name is known. When the
stub builder fires after a partial delivery and finds any recorded tool
names, it appends a human-visible warning to the stub's content — and
also fires it as a live stream delta so the user sees it immediately,
not only in the persisted transcript. The next turn's model also sees
the warning in conversation history and can retry on its own. Text-only
partial streams keep the original bare-recovery behaviour (no warning).

Validation:
| Scenario                                    | Before                    | After                                       |
|---------------------------------------------|---------------------------|---------------------------------------------|
| Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool |
| Text-only partial stream                     | Bare recovered text       | Unchanged                                   |
| tests/run_agent/test_streaming.py            | 24 passed                 | 26 passed (2 new)                           |
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…search#12072)

When streaming died after text was already delivered to the user but
before a tool-call's arguments finished streaming, the partial-stream
stub at the end of _interruptible_streaming_api_call silently set
`tool_calls=None` on the returned message and kept `finish_reason=stop`.
The agent treated the turn as complete, the session exited cleanly with
code 0, and the attempted action was lost with zero user-facing signal.

Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task:
agent streamed 'Let me write the audit:', started emitting a write_file
tool call, MiniMax stalled for 240s mid-arguments, the stale-stream
detector killed the connection, the stub fired, session ended, no file
written, no error shown.

Fix: the streaming accumulator now records each tool-call's name into
`result['partial_tool_names']` as soon as the name is known. When the
stub builder fires after a partial delivery and finds any recorded tool
names, it appends a human-visible warning to the stub's content — and
also fires it as a live stream delta so the user sees it immediately,
not only in the persisted transcript. The next turn's model also sees
the warning in conversation history and can retry on its own. Text-only
partial streams keep the original bare-recovery behaviour (no warning).

Validation:
| Scenario                                    | Before                    | After                                       |
|---------------------------------------------|---------------------------|---------------------------------------------|
| Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool |
| Text-only partial stream                     | Bare recovered text       | Unchanged                                   |
| tests/run_agent/test_streaming.py            | 24 passed                 | 26 passed (2 new)                           |
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…search#12072)

When streaming died after text was already delivered to the user but
before a tool-call's arguments finished streaming, the partial-stream
stub at the end of _interruptible_streaming_api_call silently set
`tool_calls=None` on the returned message and kept `finish_reason=stop`.
The agent treated the turn as complete, the session exited cleanly with
code 0, and the attempted action was lost with zero user-facing signal.

Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task:
agent streamed 'Let me write the audit:', started emitting a write_file
tool call, MiniMax stalled for 240s mid-arguments, the stale-stream
detector killed the connection, the stub fired, session ended, no file
written, no error shown.

Fix: the streaming accumulator now records each tool-call's name into
`result['partial_tool_names']` as soon as the name is known. When the
stub builder fires after a partial delivery and finds any recorded tool
names, it appends a human-visible warning to the stub's content — and
also fires it as a live stream delta so the user sees it immediately,
not only in the persisted transcript. The next turn's model also sees
the warning in conversation history and can retry on its own. Text-only
partial streams keep the original bare-recovery behaviour (no warning).

Validation:
| Scenario                                    | Before                    | After                                       |
|---------------------------------------------|---------------------------|---------------------------------------------|
| Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool |
| Text-only partial stream                     | Bare recovered text       | Unchanged                                   |
| tests/run_agent/test_streaming.py            | 24 passed                 | 26 passed (2 new)                           |
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…search#12072)

When streaming died after text was already delivered to the user but
before a tool-call's arguments finished streaming, the partial-stream
stub at the end of _interruptible_streaming_api_call silently set
`tool_calls=None` on the returned message and kept `finish_reason=stop`.
The agent treated the turn as complete, the session exited cleanly with
code 0, and the attempted action was lost with zero user-facing signal.

Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task:
agent streamed 'Let me write the audit:', started emitting a write_file
tool call, MiniMax stalled for 240s mid-arguments, the stale-stream
detector killed the connection, the stub fired, session ended, no file
written, no error shown.

Fix: the streaming accumulator now records each tool-call's name into
`result['partial_tool_names']` as soon as the name is known. When the
stub builder fires after a partial delivery and finds any recorded tool
names, it appends a human-visible warning to the stub's content — and
also fires it as a live stream delta so the user sees it immediately,
not only in the persisted transcript. The next turn's model also sees
the warning in conversation history and can retry on its own. Text-only
partial streams keep the original bare-recovery behaviour (no warning).

Validation:
| Scenario                                    | Before                    | After                                       |
|---------------------------------------------|---------------------------|---------------------------------------------|
| Stream dies mid tool-call, text already sent | Silent exit, no indication | User sees ⚠ warning naming the dropped tool |
| Text-only partial stream                     | Bare recovered text       | Unchanged                                   |
| tests/run_agent/test_streaming.py            | 24 passed                 | 26 passed (2 new)                           |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant