Skip to content

Bug: Empty response after tool calls silently abandons incomplete multi-step tasks #9400

@sniperHW

Description

@sniperHW

Description

When executing multi-step tasks (e.g., fetch a webpage → extract content → save to file), Hermes-Agent sometimes silently drops incomplete tasks after a tool call returns. The status line shows:

↻ Empty response after tool calls — using earlier content as final answer

The remaining steps are never executed. This is especially frequent when using non-Claude models (e.g., GLM-5), but the root cause is in the agent loop design, not the model.

Reproduction

  1. Give Hermes a 3+ step task that requires sequential tool calls, e.g., "fetch this URL, extract the article, and save it to a file"
  2. The first 2 tool calls execute successfully
  3. On the 3rd turn, the model returns an empty response (no text, no tool_calls)
  4. Hermes falls back to using earlier content as the final answer and breaks out of the loop
  5. Steps 3+ are never executed

Root Cause Analysis

The agent loop in run_agent.py (lines ~10141-10164) handles empty responses with a fallback chain:

Empty response detected
  ├─ Partial stream content? → use it, break
  ├─ _last_content_with_tools exists? → reuse it, break    ← this path fires
  ├─ Thinking-only content? → continue loop
  └─ Nothing? → retry (max 3)

When the model emitted text alongside a tool call in a previous turn (e.g., "OK, fetching the page" + browser_navigate), that text is stored in _last_content_with_tools. On a subsequent empty response, the fallback reuses this old text as the "final answer" and exits the conversation loop entirely.

The critical issue: the fallback was designed for graceful degradation ("at least give the user something"), but it causes a worse outcome — silently abandoning incomplete tasks.

Once run_conversation() returns, the todo list, skill instructions, and pending steps are all lost. There is no mechanism to detect that tasks remain unfinished.

Why Claude Code handles this correctly

Claude Code faces the same empty-response problem but avoids task loss through cross-session task persistence:

  1. Immutable JSON task manifest — A read-only task checklist is generated at the start. The execution agent can only flip status fields, never delete or rewrite tasks.
  2. Forced 3-step wake-up ritual — Every new session (including those restarted after empty responses) runs pwdgit logread progress.txt before doing anything else.
  3. Context Reset — Rather than compressing overflowing context, Claude Code wipes it entirely and boots a fresh agent with a structured handoff file.
Dimension Hermes Agent Claude Code
Task state storage Volatile (in-message todo list) Persistent (JSON + progress.txt on disk)
After empty response Fallback → break → loop exits New session reads progress file → resumes
State tamper resistance Model can forget/skip tasks JSON "physical lock" — model only changes status
Recovery granularity Entire conversation lost Per-step precise recovery

Suggested Solutions

Option A (Source-level fix): Detect pending tasks before exiting

Modify the fallback logic to check for unfinished tasks before breaking out of the loop. If pending work exists, inject a continuation prompt instead of exiting:

# Pseudocode
if fallback and has_pending_todos(messages):
    messages.append({"role": "user", "content": "Please continue with the remaining steps."})
    continue  # stay in the loop
else:
    break

This is the most robust solution. The empty response retry counter would prevent infinite loops.

Option B: Persistent task state file

Add an optional mechanism to write task state to disk (similar to Claude Code's progress.txt). If the loop exits with pending tasks, the next user turn can detect and resume automatically.

Option C: Configurable fallback behavior

Add a config option like fallback_on_empty: "continue" | "exit" so users can choose whether empty responses should retry with a prompt or exit gracefully.

Environment

  • Hermes Agent version: latest (from repo)
  • Model: GLM-5 (via custom provider), but the issue affects any model prone to empty responses
  • Platform: CLI + WebUI

Related

This aligns with the Harness Engineering insight from Anthropic's "Effective harnesses for long-running agents" — every harness component encodes an assumption about what the model cannot do. The current fallback assumes "empty response = task complete," which is frequently incorrect for non-Claude models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions