Symptom
Per gist sandbox-finding.md: sandbox dispatch on canonical (post-v0.18.6) loops between codestral and mistral-large with the synthetic "Your previous response was empty" recovery prompt — even though codestral emits tool_calls_count=1. v0.18.6's tool-execution dispatch diagnostics ([tool-dispatch] etc., closes #130) do NOT fire — meaning the empty-content gate intercepts BEFORE tool execution. File never written.
Code path
agent/conversation_loop.py ~ line 3782-3870 (read on v0.18.6 HEAD: 38bcb1e82):
# Line 3782-3784
_truly_empty = not agent._strip_think_blocks(
final_response
).strip()
_truly_empty is computed from final_response (the assistant message's content string only). It does not check assistant_message.tool_calls. A response with content="" (or empty after <think> stripping) + populated tool_calls evaluates to _truly_empty = True.
# Line 3842 — _structural_empty branch
_structural_empty = (
_truly_empty
and not _has_structured
and finish_reason == "stop" # ← correctly gated against tool_calls responses
and not _prior_was_tool
and _tools_attached
and not getattr(agent, "_tools_empty_terminal_handled", False)
)
_structural_empty correctly gates on finish_reason == "stop" — so this branch does NOT fire for finish_reason="tool_calls" responses. ✓
# Line 3870-ish — broader empty-content retry path
if _truly_empty and (not _has_structured or _prefill_exhausted) and agent._empty_content_retries < 3:
agent._empty_content_retries += 1
...
This branch does fire for tool_calls responses — there's no tool_calls gate. _truly_empty = True (content empty after strip), not _has_structured = True (no reasoning fields), retries increment. Then the synthetic recovery prompt at the line ~3856 path emits "Your previous response was empty".
The user sees the recovery loop: codestral → empty content + tool_calls → hermes treats as empty → re-prompts → mistral-large emits prose ("Creating /tmp/random_test.py...") + tool_call → hermes treats as empty again → loop until 3 retries exhaust.
Why v0.18.4's recovery (#122) doesn't help
agent/transports/chat_completions.py::normalize_response correctly recovers SDK-dropped tool_calls and populates NormalizedResponse.tool_calls. That's verified. But _truly_empty in conversation_loop is computed from final_response (content string), not from assistant_message.tool_calls. Even with tool_calls fully populated, the gate doesn't know.
Proposed fix (sketch)
Either of:
(a) Tighten _truly_empty to include tool_calls absence:
_truly_empty = (
not agent._strip_think_blocks(final_response).strip()
and not getattr(assistant_message, "tool_calls", None)
)
(b) Tighten the retry gate to skip when tool_calls are present:
if _truly_empty and not getattr(assistant_message, "tool_calls", None) \
and (not _has_structured or _prefill_exhausted) \
and agent._empty_content_retries < 3:
...
Either fix makes the broader empty-content retry path consistent with the existing _structural_empty gate (which already correctly excludes tool_calls responses via finish_reason == "stop").
(a) is cleaner — _truly_empty becomes the single source of truth for "the response carries nothing useful". (b) is narrower — only the retry path changes.
Verified end-to-end
Earlier diagnostic + my in-process probes confirmed:
Companion devagentic-side observability
devagentic#337 / PR devagentic#338 adds a [debug-response-shape] log at the OAI shim exit dumping the full wire shape (finish_reason, content type/nullness/length, tool_call_0 keys + type field). Helps hermes-maint verify what hermes RECEIVES vs what parser expects.
Repro (from the gist)
docker exec -u duplex devagentic-duplex-claude tmux send-keys -t sandbox \
"Write /tmp/random_test.py with: import random; print(random.randint(1,100)) - then execute it." Enter
sleep 30
docker exec -u duplex devagentic-duplex-claude tmux capture-pane -t sandbox -p -S -40
ssh dev "grep cascade-entry /tmp/service.log | tail -5"
Expected post-fix: [tool-dispatch] lines from v0.18.6 fire, file gets written.
Related
- gist sandbox-finding.md (full diagnostic)
- hermes-agent#122 (v0.18.3 SDK recovery — works correctly)
- hermes-agent#125 (v0.18.4 drop finish_reason gate — broadened recovery)
- hermes-agent#128 (v0.18.5 loud invalid-tool diagnostic — confirmed names valid)
- hermes-agent#131 (v0.18.6 tool-execution diagnostic — doesn't fire because gate intercepts before)
- devagentic#337 / PR devagentic#338 — companion wire-shape log
🤖 Generated with Claude Code
Symptom
Per gist sandbox-finding.md: sandbox dispatch on canonical (post-v0.18.6) loops between codestral and mistral-large with the synthetic "Your previous response was empty" recovery prompt — even though codestral emits
tool_calls_count=1. v0.18.6's tool-execution dispatch diagnostics ([tool-dispatch]etc., closes #130) do NOT fire — meaning the empty-content gate intercepts BEFORE tool execution. File never written.Code path
agent/conversation_loop.py~ line 3782-3870 (read on v0.18.6 HEAD:38bcb1e82):_truly_emptyis computed fromfinal_response(the assistant message's content string only). It does not checkassistant_message.tool_calls. A response withcontent=""(or empty after<think>stripping) + populatedtool_callsevaluates to_truly_empty = True._structural_emptycorrectly gates onfinish_reason == "stop"— so this branch does NOT fire forfinish_reason="tool_calls"responses. ✓This branch does fire for
tool_callsresponses — there's no tool_calls gate._truly_empty = True(content empty after strip),not _has_structured = True(no reasoning fields), retries increment. Then the synthetic recovery prompt at the line ~3856 path emits "Your previous response was empty".The user sees the recovery loop: codestral → empty content + tool_calls → hermes treats as empty → re-prompts → mistral-large emits prose ("Creating /tmp/random_test.py...") + tool_call → hermes treats as empty again → loop until 3 retries exhaust.
Why v0.18.4's recovery (#122) doesn't help
agent/transports/chat_completions.py::normalize_responsecorrectly recovers SDK-dropped tool_calls and populatesNormalizedResponse.tool_calls. That's verified. But_truly_emptyin conversation_loop is computed fromfinal_response(content string), not fromassistant_message.tool_calls. Even with tool_calls fully populated, the gate doesn't know.Proposed fix (sketch)
Either of:
(a) Tighten
_truly_emptyto include tool_calls absence:(b) Tighten the retry gate to skip when tool_calls are present:
Either fix makes the broader empty-content retry path consistent with the existing
_structural_emptygate (which already correctly excludes tool_calls responses viafinish_reason == "stop").(a) is cleaner —
_truly_emptybecomes the single source of truth for "the response carries nothing useful". (b) is narrower — only the retry path changes.Verified end-to-end
Earlier diagnostic + my in-process probes confirmed:
normalize_responsepopulates tool_calls correctlywrite_filehandler accepts mistral's args + writes the file (verified by direct invocation inssh dev)Companion devagentic-side observability
devagentic#337 / PR devagentic#338 adds a
[debug-response-shape]log at the OAI shim exit dumping the full wire shape (finish_reason, content type/nullness/length, tool_call_0 keys + type field). Helps hermes-maint verify what hermes RECEIVES vs what parser expects.Repro (from the gist)
Expected post-fix:
[tool-dispatch]lines from v0.18.6 fire, file gets written.Related
🤖 Generated with Claude Code