Skip to content

[Bug]: _last_content_with_tools fallback bypasses empty-response retries, causing silent agent loop termination mid-task #7968

@nocoo

Description

@nocoo

Bug Description

When an assistant turn contains both text content and tool calls, the content is captured in _last_content_with_tools (L8874). If the next model response after tool execution returns empty content with no tool calls, the fallback at L8997-9014 immediately uses the captured content as the final response and breaks out of the loop — bypassing the standard 3x empty-response retry mechanism (L9048-9058) and the 2x thinking-prefill retry (L9028-9041).

This causes the agent to silently terminate mid-task when the model happens to return empty content at an intermediate point. The user receives the captured text (which is often a progress update like "Task 4 done! Now starting task 5:") as the final response, with no indication that the agent loop has stopped.

Reproduction Scenario

  1. Give the agent a long multi-step task (e.g., 5 subtasks dispatched via tool calls)
  2. After completing subtask 4, the model generates: "Tests passed! Now dispatching task 5 (UI):" + [tool_call: terminal(read sidebar.tsx)]
  3. The content is saved to _last_content_with_tools
  4. Tool results return successfully
  5. The model is called again but returns empty content, no tool calls (transient, would succeed on retry)
  6. Instead of retrying (as would happen without _last_content_with_tools), the fallback triggers immediately → loop exits → user receives "Tests passed! Now dispatching task 5:" as final response
  7. Task 5 is never executed. No warning or error is emitted.

Real-world impact

In a gateway (Discord) session, this caused a 5.5-hour silent stall. The agent completed 4/5 subtasks in 19 minutes, then the fallback triggered at iteration 52. The user received a message saying "now dispatching task 5" and assumed work was in progress. The agent was actually idle. The user only discovered the stall 5.5 hours later.

Root Cause

run_agent.py L8992-9014:

if not self._has_content_after_think_block(final_response):
    fallback = getattr(self, '_last_content_with_tools', None)
    if fallback:
        logger.debug("Empty follow-up after tool calls — using prior turn content as final response")
        self._last_content_with_tools = None
        self._empty_content_retries = 0          # ← resets retry counter!
        # ... rewrites the original assistant msg content ...
        final_response = self._strip_think_blocks(fallback).strip()
        self._response_was_previewed = True
        break                                     # ← exits loop immediately

The fallback:

  • Bypasses the 3x empty-response retry (L9048-9058) that would otherwise trigger
  • Resets _empty_content_retries to 0, ensuring retries can never accumulate
  • Breaks immediately — no chance for the model to recover with a retry
  • Emits only a logger.debug() — invisible to users and gateway logs

The design intent ("model already said what it needed to say, e.g. 'You're welcome!' + memory save") is valid for housekeeping tool calls, but does not account for substantive tool calls (terminal, read_file, search_files, etc.) where the content is a progress update, not a final answer.

Relationship to Existing Issues

Existing Retry Mechanisms (for context)

Failure mode Retry mechanism Status
API call failure 3x with exponential backoff ✅ Works
Stream transport error HERMES_STREAM_RETRIES (default 2) ✅ Works
Empty response (no content, no reasoning) 3x silent retry ✅ Works
Thinking-only (reasoning but no text) 2x prefill continuation ✅ Works
429 rate limit retry-after + credential rotation ✅ Works
Empty response WITH _last_content_with_tools 0x retry, immediate exit This bug

Suggested Direction

The _last_content_with_tools fallback should not short-circuit the retry pipeline. A possible approach:

  1. When _last_content_with_tools exists but the model returns empty, still attempt the standard 3x empty-response retries first
  2. Only fall back to _last_content_with_tools after retries are exhausted
  3. Consider distinguishing housekeeping-only tool-call turns (memory, todo, skill_manage — safe to use content as final response) from substantive tool-call turns (terminal, read_file, search_files, delegate_task — content is likely a progress update)
  4. Note: the code at L8876-8882 already has _HOUSEKEEPING_TOOLS detection for muting output — this signal could be reused

Note: I am not prescribing a specific fix. The interaction between _last_content_with_tools, empty-response retries, thinking-prefill retries, and the Codex/Anthropic API modes is complex. The maintainers are best positioned to design the right solution.

Environment

  • hermes-agent gateway (Discord platform)
  • Model: claude-opus-4.6-1m via custom provider
  • 52 API calls / 19 minutes before silent termination
  • HERMES_MAX_ITERATIONS=90 (well within budget)
  • No errors in gateway.log, errors.log, or gateway.error.log

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildersweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions