feat: agent resilience — truncated tool calls, empty response recovery, error sanitization by teknium1 · Pull Request #3838 · NousResearch/hermes-agent

teknium1 · 2026-03-30T00:59:35Z

Summary

Three agent resilience features ported from Ironclaw, addressing real failure modes in autonomous (cron/delegate) contexts.

1. Discard incomplete tool calls on finish_reason=length (ironclaw#1632)

When the LLM runs out of context mid-tool-call, the tool call JSON is likely incomplete/malformed. Instead of trying to parse and execute it:

Discard the tool calls, preserve any partial text
Inject a message: "Your response was truncated. Please summarize and continue shorter."
Track consecutive occurrences — after 3, temporarily disable tools for the next call

2. Empty response recovery (ironclaw#1677 + #1720)

When the LLM returns nothing (no content, no tool calls):

If meaningful text output exists earlier in conversation, treat as graceful completion
Otherwise nudge once ("Your previous response was empty. Please continue.")
Max 2 consecutive empties before giving up
Prevents silent hangs in autonomous contexts

3. Sanitize tool error results (ironclaw#1639)

Tool exceptions can contain arbitrary text that gets injected into LLM context. Now sanitized:

Strip XML-like boundary tags (tool_call, function_call, result, response, etc.)
Strip markdown code fences and CDATA sections
Cap at 2000 characters
Wrap with [TOOL_ERROR] prefix
Prevents injection attacks via crafted error messages

Changes

File	Change
`run_agent.py`	Truncated tool call handling + empty response recovery
`model_tools.py`	`_sanitize_tool_error()` helper + error sanitization in `handle_function_call()`
`tests/test_agent_resilience.py`	18 tests across 4 test classes

Test plan

python -m pytest tests/test_agent_resilience.py -n0 -q  # 18 tests
python -m pytest tests/ -n0 -q                            # full suite

…, tool error sanitization Three resilience features ported from Ironclaw: 1. Discard incomplete tool calls (ironclaw#1632) When finish_reason='length' and tool calls are present, they're likely incomplete. Discard them, inject a summarize notice. After 3 consecutive occurrences, temporarily disable tools. 2. Empty response recovery (ironclaw#1677 + #1720) When the LLM returns empty (no content, no tool calls): - If meaningful output exists earlier, treat as completion - Otherwise nudge once, then fail gracefully Max 2 consecutive empties before giving up. 3. Sanitize tool error results (ironclaw#1639) Strip XML boundary markers, CDATA sections, and code fences from error messages before sending to LLM. Cap at 2000 chars. Prevents injection attacks via crafted tool error messages. 18 new tests.

…text (#26823) Adds _sanitize_tool_error() in model_tools and routes both error paths through it: registry.dispatch's try/except (the primary path for tool exceptions) and handle_function_call's outer except (defense in depth). Stripping targets structural framing tokens that the model itself can react to even though json.dumps already handles wire-layer escaping: XML role tags (tool_call, function_call, result, response, output, input, system, assistant, user), CDATA sections, and markdown code fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR] prefix. Defense-in-depth: a tool exception carrying '<tool_call>...' won't break message framing (json escapes it), but the model still reads those tokens and they nudge it toward role-confusion framing. Ported from ironclaw#1639 (one piece of #3838's three-feature scout). The truncated-tool-call (#1632) and empty-response-recovery (#1677, #1720) pieces are skipped because main now implements both far more thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry + length rewrite; L4500/L15090+ for empty-response scaffolding stripper, multi-stage nudge, fallback model activation).

teknium1 · 2026-05-16T07:57:57Z

Closing in favor of #26823, which salvages this PR's piece #3 (ironclaw#1639 tool error sanitization).

The other two pieces are already implemented more thoroughly on current main:

fix(gateway): overwrite stale PID in gateway_state.json on restart #1632 truncated tool calls → run_agent.py L8147 / L12209 / L13012 (retry counter, length finish_reason rewrite, refusal after 2nd consecutive truncation)
revert: revert SMS (Telnyx) platform adapter for review #1677 / fix(compressor): summary role can violate consecutive-role constraint #1720 empty-response recovery → run_agent.py L4500 / L15090+ (scaffolding stripper, multi-stage nudge, fallback model activation; dedicated empty-response-recovery.md skill reference)

For piece #3 the salvage also catches a gap the original missed: tools/registry.py::dispatch has its own try/except that handles most tool exceptions before they ever reach handle_function_call's outer except — so the original sanitization would have been bypassed in the primary error path. PR #26823 patches both.

…text (NousResearch#26823) Adds _sanitize_tool_error() in model_tools and routes both error paths through it: registry.dispatch's try/except (the primary path for tool exceptions) and handle_function_call's outer except (defense in depth). Stripping targets structural framing tokens that the model itself can react to even though json.dumps already handles wire-layer escaping: XML role tags (tool_call, function_call, result, response, output, input, system, assistant, user), CDATA sections, and markdown code fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR] prefix. Defense-in-depth: a tool exception carrying '<tool_call>...' won't break message framing (json escapes it), but the model still reads those tokens and they nudge it toward role-confusion framing. Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout). The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677, thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry + length rewrite; L4500/L15090+ for empty-response scaffolding stripper, multi-stage nudge, fallback model activation). (cherry picked from commit 627f8a5)

…text (NousResearch#26823) Adds _sanitize_tool_error() in model_tools and routes both error paths through it: registry.dispatch's try/except (the primary path for tool exceptions) and handle_function_call's outer except (defense in depth). Stripping targets structural framing tokens that the model itself can react to even though json.dumps already handles wire-layer escaping: XML role tags (tool_call, function_call, result, response, output, input, system, assistant, user), CDATA sections, and markdown code fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR] prefix. Defense-in-depth: a tool exception carrying '<tool_call>...' won't break message framing (json escapes it), but the model still reads those tokens and they nudge it toward role-confusion framing. Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout). The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677, NousResearch#1720) pieces are skipped because main now implements both far more thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry + length rewrite; L4500/L15090+ for empty-response scaffolding stripper, multi-stage nudge, fallback model activation).

…text (NousResearch#26823) Adds _sanitize_tool_error() in model_tools and routes both error paths through it: registry.dispatch's try/except (the primary path for tool exceptions) and handle_function_call's outer except (defense in depth). Stripping targets structural framing tokens that the model itself can react to even though json.dumps already handles wire-layer escaping: XML role tags (tool_call, function_call, result, response, output, input, system, assistant, user), CDATA sections, and markdown code fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR] prefix. Defense-in-depth: a tool exception carrying '<tool_call>...' won't break message framing (json escapes it), but the model still reads those tokens and they nudge it toward role-confusion framing. Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout). The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677, NousResearch#1720) pieces are skipped because main now implements both far more thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry + length rewrite; L4500/L15090+ for empty-response scaffolding stripper, multi-stage nudge, fallback model activation). (cherry picked from commit 627f8a5)

…text (NousResearch#26823) Adds _sanitize_tool_error() in model_tools and routes both error paths through it: registry.dispatch's try/except (the primary path for tool exceptions) and handle_function_call's outer except (defense in depth). Stripping targets structural framing tokens that the model itself can react to even though json.dumps already handles wire-layer escaping: XML role tags (tool_call, function_call, result, response, output, input, system, assistant, user), CDATA sections, and markdown code fences. Caps message body at 2000 chars and wraps with [TOOL_ERROR] prefix. Defense-in-depth: a tool exception carrying '<tool_call>...' won't break message framing (json escapes it), but the model still reads those tokens and they nudge it toward role-confusion framing. Ported from ironclaw#1639 (one piece of NousResearch#3838's three-feature scout). The truncated-tool-call (NousResearch#1632) and empty-response-recovery (NousResearch#1677, NousResearch#1720) pieces are skipped because main now implements both far more thoroughly (run_agent.py L8147/L12209/L13012 for truncation retry + length rewrite; L4500/L15090+ for empty-response scaffolding stripper, multi-stage nudge, fallback model activation).

alt-glitch added type/feature New feature or request P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 2, 2026

teknium1 mentioned this pull request May 16, 2026

security: sanitize tool error strings before injecting into model context (salvage of #3838 piece 3/3) #26823

Merged

teknium1 closed this May 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: agent resilience — truncated tool calls, empty response recovery, error sanitization#3838

feat: agent resilience — truncated tool calls, empty response recovery, error sanitization#3838
teknium1 wants to merge 1 commit into
mainfrom
feat/agent-resilience

teknium1 commented Mar 30, 2026

Uh oh!

teknium1 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented Mar 30, 2026

Summary

1. Discard incomplete tool calls on finish_reason=length (ironclaw#1632)

2. Empty response recovery (ironclaw#1677 + #1720)

3. Sanitize tool error results (ironclaw#1639)

Changes

Test plan

Uh oh!

teknium1 commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants