Skip to content

feat: agent resilience — truncated tool calls, empty response recovery, error sanitization#3838

Closed
teknium1 wants to merge 1 commit into
mainfrom
feat/agent-resilience
Closed

feat: agent resilience — truncated tool calls, empty response recovery, error sanitization#3838
teknium1 wants to merge 1 commit into
mainfrom
feat/agent-resilience

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Three agent resilience features ported from Ironclaw, addressing real failure modes in autonomous (cron/delegate) contexts.

1. Discard incomplete tool calls on finish_reason=length (ironclaw#1632)

When the LLM runs out of context mid-tool-call, the tool call JSON is likely incomplete/malformed. Instead of trying to parse and execute it:

  • Discard the tool calls, preserve any partial text
  • Inject a message: "Your response was truncated. Please summarize and continue shorter."
  • Track consecutive occurrences — after 3, temporarily disable tools for the next call

2. Empty response recovery (ironclaw#1677 + #1720)

When the LLM returns nothing (no content, no tool calls):

  • If meaningful text output exists earlier in conversation, treat as graceful completion
  • Otherwise nudge once ("Your previous response was empty. Please continue.")
  • Max 2 consecutive empties before giving up
  • Prevents silent hangs in autonomous contexts

3. Sanitize tool error results (ironclaw#1639)

Tool exceptions can contain arbitrary text that gets injected into LLM context. Now sanitized:

  • Strip XML-like boundary tags (tool_call, function_call, result, response, etc.)
  • Strip markdown code fences and CDATA sections
  • Cap at 2000 characters
  • Wrap with [TOOL_ERROR] prefix
  • Prevents injection attacks via crafted error messages

Changes

File Change
run_agent.py Truncated tool call handling + empty response recovery
model_tools.py _sanitize_tool_error() helper + error sanitization in handle_function_call()
tests/test_agent_resilience.py 18 tests across 4 test classes

Test plan

python -m pytest tests/test_agent_resilience.py -n0 -q  # 18 tests
python -m pytest tests/ -n0 -q                            # full suite

…, tool error sanitization

Three resilience features ported from Ironclaw:

1. Discard incomplete tool calls (ironclaw#1632)
   When finish_reason='length' and tool calls are present, they're likely
   incomplete. Discard them, inject a summarize notice. After 3 consecutive
   occurrences, temporarily disable tools.

2. Empty response recovery (ironclaw#1677 + #1720)
   When the LLM returns empty (no content, no tool calls):
   - If meaningful output exists earlier, treat as completion
   - Otherwise nudge once, then fail gracefully
   Max 2 consecutive empties before giving up.

3. Sanitize tool error results (ironclaw#1639)
   Strip XML boundary markers, CDATA sections, and code fences from error
   messages before sending to LLM. Cap at 2000 chars. Prevents
   injection attacks via crafted tool error messages.

18 new tests.
@alt-glitch alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 2, 2026
teknium1 added a commit that referenced this pull request May 16, 2026
…text (#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of #3838's three-feature scout).
The truncated-tool-call (#1632) and empty-response-recovery (#1677,
#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).
@teknium1

Copy link
Copy Markdown
Contributor Author

Closing in favor of #26823, which salvages this PR's piece #3 (ironclaw#1639 tool error sanitization).

The other two pieces are already implemented more thoroughly on current main:

For piece #3 the salvage also catches a gap the original missed: tools/registry.py::dispatch has its own try/except that handles most tool exceptions before they ever reach handle_function_call's outer except — so the original sanitization would have been bypassed in the primary error path. PR #26823 patches both.

@teknium1 teknium1 closed this May 16, 2026
rousegordon-ops pushed a commit to rousegordon-ops/hermes-agent that referenced this pull request May 16, 2026
…text (NousResearch#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout).
The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677,
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).

(cherry picked from commit 627f8a5)
DIZ-admin pushed a commit to DIZ-admin/hermes-agent that referenced this pull request May 16, 2026
…text (NousResearch#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout).
The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677,
NousResearch#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).
venyon2k pushed a commit to venyon2k/hermes-agent that referenced this pull request May 17, 2026
…text (NousResearch#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout).
The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677,
NousResearch#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).
clckmedia pushed a commit to clckmedia/hermes-agent that referenced this pull request May 19, 2026
…text (NousResearch#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout).
The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677,
NousResearch#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).

(cherry picked from commit 627f8a5)
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…text (NousResearch#26823)

Adds _sanitize_tool_error() in model_tools and routes both error paths
through it: registry.dispatch's try/except (the primary path for tool
exceptions) and handle_function_call's outer except (defense in depth).

Stripping targets structural framing tokens that the model itself can
react to even though json.dumps already handles wire-layer escaping:
XML role tags (tool_call, function_call, result, response, output,
input, system, assistant, user), CDATA sections, and markdown code
fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR]
prefix.

Defense-in-depth: a tool exception carrying '<tool_call>...' won't
break message framing (json escapes it), but the model still reads
those tokens and they nudge it toward role-confusion framing.

Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout).
The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677,
NousResearch#1720) pieces are skipped because main now implements both far more
thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry +
length rewrite; L4500/L15090+ for empty-response scaffolding stripper,
multi-stage nudge, fallback model activation).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants