Bug Description
When an assistant turn contains both text content and tool calls, the content is captured in _last_content_with_tools (L8874). If the next model response after tool execution returns empty content with no tool calls, the fallback at L8997-9014 immediately uses the captured content as the final response and breaks out of the loop — bypassing the standard 3x empty-response retry mechanism (L9048-9058) and the 2x thinking-prefill retry (L9028-9041).
This causes the agent to silently terminate mid-task when the model happens to return empty content at an intermediate point. The user receives the captured text (which is often a progress update like "Task 4 done! Now starting task 5:") as the final response, with no indication that the agent loop has stopped.
Reproduction Scenario
- Give the agent a long multi-step task (e.g., 5 subtasks dispatched via tool calls)
- After completing subtask 4, the model generates:
"Tests passed! Now dispatching task 5 (UI):" + [tool_call: terminal(read sidebar.tsx)]
- The content is saved to
_last_content_with_tools
- Tool results return successfully
- The model is called again but returns empty content, no tool calls (transient, would succeed on retry)
- Instead of retrying (as would happen without
_last_content_with_tools), the fallback triggers immediately → loop exits → user receives "Tests passed! Now dispatching task 5:" as final response
- Task 5 is never executed. No warning or error is emitted.
Real-world impact
In a gateway (Discord) session, this caused a 5.5-hour silent stall. The agent completed 4/5 subtasks in 19 minutes, then the fallback triggered at iteration 52. The user received a message saying "now dispatching task 5" and assumed work was in progress. The agent was actually idle. The user only discovered the stall 5.5 hours later.
Root Cause
run_agent.py L8992-9014:
if not self._has_content_after_think_block(final_response):
fallback = getattr(self, '_last_content_with_tools', None)
if fallback:
logger.debug("Empty follow-up after tool calls — using prior turn content as final response")
self._last_content_with_tools = None
self._empty_content_retries = 0 # ← resets retry counter!
# ... rewrites the original assistant msg content ...
final_response = self._strip_think_blocks(fallback).strip()
self._response_was_previewed = True
break # ← exits loop immediately
The fallback:
- Bypasses the 3x empty-response retry (L9048-9058) that would otherwise trigger
- Resets
_empty_content_retries to 0, ensuring retries can never accumulate
- Breaks immediately — no chance for the model to recover with a retry
- Emits only a
logger.debug() — invisible to users and gateway logs
The design intent ("model already said what it needed to say, e.g. 'You're welcome!' + memory save") is valid for housekeeping tool calls, but does not account for substantive tool calls (terminal, read_file, search_files, etc.) where the content is a progress update, not a final answer.
Relationship to Existing Issues
Existing Retry Mechanisms (for context)
| Failure mode |
Retry mechanism |
Status |
| API call failure |
3x with exponential backoff |
✅ Works |
| Stream transport error |
HERMES_STREAM_RETRIES (default 2) |
✅ Works |
| Empty response (no content, no reasoning) |
3x silent retry |
✅ Works |
| Thinking-only (reasoning but no text) |
2x prefill continuation |
✅ Works |
| 429 rate limit |
retry-after + credential rotation |
✅ Works |
Empty response WITH _last_content_with_tools |
0x retry, immediate exit |
❌ This bug |
Suggested Direction
The _last_content_with_tools fallback should not short-circuit the retry pipeline. A possible approach:
- When
_last_content_with_tools exists but the model returns empty, still attempt the standard 3x empty-response retries first
- Only fall back to
_last_content_with_tools after retries are exhausted
- Consider distinguishing housekeeping-only tool-call turns (memory, todo, skill_manage — safe to use content as final response) from substantive tool-call turns (terminal, read_file, search_files, delegate_task — content is likely a progress update)
- Note: the code at L8876-8882 already has
_HOUSEKEEPING_TOOLS detection for muting output — this signal could be reused
Note: I am not prescribing a specific fix. The interaction between _last_content_with_tools, empty-response retries, thinking-prefill retries, and the Codex/Anthropic API modes is complex. The maintainers are best positioned to design the right solution.
Environment
- hermes-agent gateway (Discord platform)
- Model: claude-opus-4.6-1m via custom provider
- 52 API calls / 19 minutes before silent termination
HERMES_MAX_ITERATIONS=90 (well within budget)
- No errors in gateway.log, errors.log, or gateway.error.log
Bug Description
When an assistant turn contains both text content and tool calls, the content is captured in
_last_content_with_tools(L8874). If the next model response after tool execution returns empty content with no tool calls, the fallback at L8997-9014 immediately uses the captured content as the final response and breaks out of the loop — bypassing the standard 3x empty-response retry mechanism (L9048-9058) and the 2x thinking-prefill retry (L9028-9041).This causes the agent to silently terminate mid-task when the model happens to return empty content at an intermediate point. The user receives the captured text (which is often a progress update like "Task 4 done! Now starting task 5:") as the final response, with no indication that the agent loop has stopped.
Reproduction Scenario
"Tests passed! Now dispatching task 5 (UI):"+[tool_call: terminal(read sidebar.tsx)]_last_content_with_tools_last_content_with_tools), the fallback triggers immediately → loop exits → user receives "Tests passed! Now dispatching task 5:" as final responseReal-world impact
In a gateway (Discord) session, this caused a 5.5-hour silent stall. The agent completed 4/5 subtasks in 19 minutes, then the fallback triggered at iteration 52. The user received a message saying "now dispatching task 5" and assumed work was in progress. The agent was actually idle. The user only discovered the stall 5.5 hours later.
Root Cause
run_agent.pyL8992-9014:The fallback:
_empty_content_retriesto 0, ensuring retries can never accumulatelogger.debug()— invisible to users and gateway logsThe design intent ("model already said what it needed to say, e.g. 'You're welcome!' + memory save") is valid for housekeeping tool calls, but does not account for substantive tool calls (terminal, read_file, search_files, etc.) where the content is a progress update, not a final answer.
Relationship to Existing Issues
_last_content_with_toolsmechanism, but describes the content-dropping problem on Telegram. This issue describes the more severe loop termination consequence._last_content_with_toolsfallback prevents retries entirely.Existing Retry Mechanisms (for context)
HERMES_STREAM_RETRIES(default 2)_last_content_with_toolsSuggested Direction
The
_last_content_with_toolsfallback should not short-circuit the retry pipeline. A possible approach:_last_content_with_toolsexists but the model returns empty, still attempt the standard 3x empty-response retries first_last_content_with_toolsafter retries are exhausted_HOUSEKEEPING_TOOLSdetection for muting output — this signal could be reusedNote: I am not prescribing a specific fix. The interaction between
_last_content_with_tools, empty-response retries, thinking-prefill retries, and the Codex/Anthropic API modes is complex. The maintainers are best positioned to design the right solution.Environment
HERMES_MAX_ITERATIONS=90(well within budget)