Description
When executing multi-step tasks (e.g., fetch a webpage → extract content → save to file), Hermes-Agent sometimes silently drops incomplete tasks after a tool call returns. The status line shows:
↻ Empty response after tool calls — using earlier content as final answer
The remaining steps are never executed. This is especially frequent when using non-Claude models (e.g., GLM-5), but the root cause is in the agent loop design, not the model.
Reproduction
- Give Hermes a 3+ step task that requires sequential tool calls, e.g., "fetch this URL, extract the article, and save it to a file"
- The first 2 tool calls execute successfully
- On the 3rd turn, the model returns an empty response (no text, no tool_calls)
- Hermes falls back to using earlier content as the final answer and breaks out of the loop
- Steps 3+ are never executed
Root Cause Analysis
The agent loop in run_agent.py (lines ~10141-10164) handles empty responses with a fallback chain:
Empty response detected
├─ Partial stream content? → use it, break
├─ _last_content_with_tools exists? → reuse it, break ← this path fires
├─ Thinking-only content? → continue loop
└─ Nothing? → retry (max 3)
When the model emitted text alongside a tool call in a previous turn (e.g., "OK, fetching the page" + browser_navigate), that text is stored in _last_content_with_tools. On a subsequent empty response, the fallback reuses this old text as the "final answer" and exits the conversation loop entirely.
The critical issue: the fallback was designed for graceful degradation ("at least give the user something"), but it causes a worse outcome — silently abandoning incomplete tasks.
Once run_conversation() returns, the todo list, skill instructions, and pending steps are all lost. There is no mechanism to detect that tasks remain unfinished.
Why Claude Code handles this correctly
Claude Code faces the same empty-response problem but avoids task loss through cross-session task persistence:
- Immutable JSON task manifest — A read-only task checklist is generated at the start. The execution agent can only flip
status fields, never delete or rewrite tasks.
- Forced 3-step wake-up ritual — Every new session (including those restarted after empty responses) runs
pwd → git log → read progress.txt before doing anything else.
- Context Reset — Rather than compressing overflowing context, Claude Code wipes it entirely and boots a fresh agent with a structured handoff file.
| Dimension |
Hermes Agent |
Claude Code |
| Task state storage |
Volatile (in-message todo list) |
Persistent (JSON + progress.txt on disk) |
| After empty response |
Fallback → break → loop exits |
New session reads progress file → resumes |
| State tamper resistance |
Model can forget/skip tasks |
JSON "physical lock" — model only changes status |
| Recovery granularity |
Entire conversation lost |
Per-step precise recovery |
Suggested Solutions
Option A (Source-level fix): Detect pending tasks before exiting
Modify the fallback logic to check for unfinished tasks before breaking out of the loop. If pending work exists, inject a continuation prompt instead of exiting:
# Pseudocode
if fallback and has_pending_todos(messages):
messages.append({"role": "user", "content": "Please continue with the remaining steps."})
continue # stay in the loop
else:
break
This is the most robust solution. The empty response retry counter would prevent infinite loops.
Option B: Persistent task state file
Add an optional mechanism to write task state to disk (similar to Claude Code's progress.txt). If the loop exits with pending tasks, the next user turn can detect and resume automatically.
Option C: Configurable fallback behavior
Add a config option like fallback_on_empty: "continue" | "exit" so users can choose whether empty responses should retry with a prompt or exit gracefully.
Environment
- Hermes Agent version: latest (from repo)
- Model: GLM-5 (via custom provider), but the issue affects any model prone to empty responses
- Platform: CLI + WebUI
Related
This aligns with the Harness Engineering insight from Anthropic's "Effective harnesses for long-running agents" — every harness component encodes an assumption about what the model cannot do. The current fallback assumes "empty response = task complete," which is frequently incorrect for non-Claude models.
Description
When executing multi-step tasks (e.g., fetch a webpage → extract content → save to file), Hermes-Agent sometimes silently drops incomplete tasks after a tool call returns. The status line shows:
The remaining steps are never executed. This is especially frequent when using non-Claude models (e.g., GLM-5), but the root cause is in the agent loop design, not the model.
Reproduction
Root Cause Analysis
The agent loop in
run_agent.py(lines ~10141-10164) handles empty responses with a fallback chain:When the model emitted text alongside a tool call in a previous turn (e.g., "OK, fetching the page" +
browser_navigate), that text is stored in_last_content_with_tools. On a subsequent empty response, the fallback reuses this old text as the "final answer" and exits the conversation loop entirely.The critical issue: the fallback was designed for graceful degradation ("at least give the user something"), but it causes a worse outcome — silently abandoning incomplete tasks.
Once
run_conversation()returns, the todo list, skill instructions, and pending steps are all lost. There is no mechanism to detect that tasks remain unfinished.Why Claude Code handles this correctly
Claude Code faces the same empty-response problem but avoids task loss through cross-session task persistence:
statusfields, never delete or rewrite tasks.pwd→git log→read progress.txtbefore doing anything else.Suggested Solutions
Option A (Source-level fix): Detect pending tasks before exiting
Modify the fallback logic to check for unfinished tasks before breaking out of the loop. If pending work exists, inject a continuation prompt instead of exiting:
This is the most robust solution. The empty response retry counter would prevent infinite loops.
Option B: Persistent task state file
Add an optional mechanism to write task state to disk (similar to Claude Code's progress.txt). If the loop exits with pending tasks, the next user turn can detect and resume automatically.
Option C: Configurable fallback behavior
Add a config option like
fallback_on_empty: "continue" | "exit"so users can choose whether empty responses should retry with a prompt or exit gracefully.Environment
Related
This aligns with the Harness Engineering insight from Anthropic's "Effective harnesses for long-running agents" — every harness component encodes an assumption about what the model cannot do. The current fallback assumes "empty response = task complete," which is frequently incorrect for non-Claude models.