fix: add circuit breaker for infinite tool-call loops by gzsiang · Pull Request #12632 · NousResearch/hermes-agent

gzsiang · 2026-04-19T17:29:38Z

Summary

Add a circuit breaker mechanism to detect and break infinite tool-call loops in the agent framework.

Problem

When the LLM gets stuck in a retry loop calling the same tool with identical arguments repeatedly, it wastes tokens and never makes progress. This can happen with any tool (terminal, execute_code, etc.).

Solution

Added circuit breaker logic in both execution paths:

_invoke_tool() — concurrent execution path
_execute_tool_calls_sequential() — sequential execution path

How it works:

Each tool call gets a signature: tool_name:md5(json_args)
Signatures are recorded in order
When the last N calls all have the same signature, the breaker triggers
Returns an error message to the LLM, forcing it to try a different strategy
After triggering, the counter resets so future calls can succeed

Default threshold: 5 consecutive identical calls (configurable via _circuit_breaker_threshold)

Testing

22 existing tests pass
Manual testing confirms breaker triggers on repeated identical calls
Different tool/args combinations correctly reset the counter
After breaker triggers, new calls succeed normally

Changes

run_agent.py: Added _consecutive_tool_calls and _circuit_breaker_threshold state
run_agent.py: Added circuit breaker logic to _invoke_tool() and _execute_tool_calls_sequential()
Error messages in English for upstream compatibility

When the database contains non-UTF-8 binary data (e.g. from image attachments or binary tool outputs), FastAPI's jsonable_encoder fails with UnicodeDecodeError when serializing session data for the web UI. This adds a _sanitize_for_json() helper that recursively converts bytes to UTF-8 strings (or base64 as fallback) before JSON serialization, applied to all session-related API endpoints.

…(optional via config)

- Translate severity levels (HIGH→高危, CRITICAL→致命, etc.) - Translate common rule titles (Pipe to interpreter→管道传输到解释器, etc.) - Translate 'Security scan' prefix to '安全扫描' - Add rule_id-based lookup for comprehensive rule coverage - Fallback to title string matching for rules not in the rule map - Respects approvals.language config (zh/en)

Add a circuit breaker mechanism in _invoke_tool() and the sequential execution path to detect and break infinite tool-call loops. When the same tool with identical arguments is called 3 consecutive times, the circuit breaker triggers and returns an error, forcing the LLM to try a different strategy. The breaker uses MD5 hashing of (tool_name, json_args) to create a stable signature, then checks if the last N calls all have the same signature. After triggering, the counter resets so future calls can succeed. - Added _consecutive_tool_calls and _circuit_breaker_threshold to AIAgent.__init__ - Added circuit breaker logic to _invoke_tool() (concurrent path) - Added circuit breaker logic to _execute_tool_calls_sequential() (sequential path) - All 22 existing tests pass

…shold - Change error message from Chinese to English for upstream compatibility - Increase default threshold from 3 to 5 to avoid false positives (legitimate retry scenarios exist, e.g. network jitter) - Fix indentation bug in sequential execution path - All 22 existing tests pass

…elp text - Add _DESCRIPTION_ZH and _CATEGORY_ZH dicts in commands.py - Add _desc_zh() and _cat_zh() helpers for runtime localization - Localize COMMANDS, COMMANDS_BY_CATEGORY, gateway_help_lines(), telegram_bot_commands() - Add HELP_ZH and HOTKEYS_ZH constants in core.ts for web UI - Add BRAND_ZH and getBrand() in theme.ts for branding strings - Language detection from config.yaml (approvals.language or language key) - Backward compatible: English defaults when language != 'zh'

- banner.py: Chinese translations for welcome banner text (tools, skills, MCP servers, update messages) - skin_engine.py: new 'chinese' skin with gold theme, auto-select based on approvals.language config - tips.py: full Chinese tips corpus (TIPS_ZH) with language detection

… split

Prevents the LLM from retrying different variations of a failing tool call. When a tool returns errors/empty results N times consecutively (regardless of argument changes), the counter triggers and forces a strategy change. - Added _tool_failure_count dict and _tool_failure_threshold (5) in __init__ - Added _check_tool_failure() method to detect failures and update counter - Added check calls in both concurrent and sequential execution paths - Each tool has independent counter; success resets counter to 0

- Remove erroneous _check_tool_failure call in _invoke_tool that referenced undefined variable function_result - Add temperature field to auxiliary.vision config (default: 0.1) - Fixes 8 failing tests in concurrent tool execution and camofox vision

- Move _check_tool_failure calls to AFTER each tool execution - Adds failure retry counter check for all tool types (todo, session_search, memory, memory manager tools, clarify, delegate_task, and generic tools) - Prevents LLM from retrying different parameter variations of failing tools - Complements the circuit breaker which detects identical tool+args loops

- Add read_timestamps to _read_tracker initialization (file_tools.py) - Fix camofox vision test mock path (test_browser_camofox.py) - Catch ModuleNotFoundError in browser_tool camofox import - Fix _CAMEL_ALIASES apiKey mapping in config.py - Use os.path.join for hermes_home/.env path (file_safety.py) Fixes 8 failing tests across 6 test files.

- Fix mock paths in test_zombie_process_cleanup.py (use run_agent.* instead of tools.*) - Fix mock path in test_agent_cache.py (mock run_agent.cleanup_vm instead of _tt.cleanup_vm) - Use get_hermes_home() in build_write_denied_paths for profile-aware path resolution - Add ModuleNotFoundError handling in browser_tool camofox and auxiliary_client imports Fixes 6 failing tests: - test_zombie_process_cleanup.py: test_close_calls_cleanup_functions, test_close_survives_partial_failures - test_agent_cache.py: test_close_vs_release_full_teardown_difference - test_write_deny.py: test_hermes_env - test_approval_heartbeat.py: test_wait_returns_immediately_on_user_response, test_heartbeat_import_failure_does_not_break_wait

- Fix build_write_denied_paths to use home/.hermes/.env instead of get_hermes_home() - Fix mock paths in test_zombie_process_cleanup.py (use tools.process_registry.process_registry) Fixes 6 failing tests across 4 test files.

pytest's caplog fixture doesn't automatically capture warnings from hermes_cli.config logger because the logger's level is not set. This conftest.py sets the logger level to DEBUG for all tests in tests/hermes_cli/, ensuring caplog captures WARNING and above messages. Fixes 2 failing tests in test_provider_config_validation.py:

- test_minimax_provider: add _fallback_chain init in test stub - test_tips: shorten EN Tip 105 to 131 chars (was 160) - test_concurrent_interrupt: fix polling_tool sig + re-apply after _make_agent

…_breaker_threshold from 5 to 3 - Reduces _tool_failure_threshold from 5 to 3 - Improves circuit breaker error message with specific suggestions - Helps stop tool-loop failures earlier

When a tool call fails repeatedly, the compression model is asked for a suggestion. This gives the main model a fresh perspective to break out of tool loops. - Added _get_tool_suggestion() for retry threshold failures - Added _get_consecutive_suggestion() for identical-call circuit breaker - Suggestions are appended to the error message with a 💡 emoji - Gracefully falls back if compression model is unavailable

When a tool call fails repeatedly, suggestions are provided even without a compression model configured. - Added _heuristic_suggestion() — pattern matching on error messages - Added _heuristic_consecutive_suggestion() — 20+ tool-specific suggestions - Compression model is tried first, then falls back to heuristics - Works zero-config — no LLM needed for basic suggestions

Replaced complex pattern-matching heuristics with simple generic prompts. - Removed _heuristic_suggestion() (8 pattern rules) - Removed _heuristic_consecutive_suggestion() (20+ tool mappings) - Fallback is now just a direct text suggestion - Compression model still tries first for smarter suggestions - Net: -90 lines of dead pattern logic

Added README explaining: - This is a fork of NousResearch/hermes-agent - What Hermes Agent is - Custom modifications in this fork (circuit breaker improvements, test fixes)

…role

The integration tests call write_file_tool which goes through _get_file_ops() that creates a terminal environment. Without explicit TERMINAL_ENV=local, the CI environment may default to modal which fails due to missing credentials. Fixes: test_net_new_file_no_warning, test_sibling_agent_write_surfaces_warning_through_handler

The test_terminal_and_file_toolsets_resolve_all_tools and test_terminal_tool_present tests call get_tool_definitions() which filters tools based on check_terminal_requirements(). Without explicit TERMINAL_ENV=local, the CI environment's TERMINAL_ENV=modal (without credentials) causes check_terminal_requirements() to return False, dropping the terminal tool from the schema. Fixes: test_terminal_and_file_toolsets_resolve_all_tools, test_terminal_tool_present

…erflow

gzsiang · 2026-04-22T16:28:29Z

Apologies — this PR inadvertently includes i18n/Chinese localization commits that were not intended to be part of the circuit breaker submission. Those changes were mixed into my fork's main branch by mistake.

I will open a clean PR with only the circuit breaker changes shortly.

The i18n work is tracked separately in #13625. I'll submit it as its own PR if the maintainers are interested.

Closing this one to keep things clean.

gzsiang · 2026-04-22T16:29:05Z

Update: The clean PR with only circuit breaker changes is now available at #14059.

gzsiang force-pushed the main branch 3 times, most recently from a0e7c45 to 06d29c7 Compare April 20, 2026 09:56

gzsiang mentioned this pull request Apr 21, 2026

fix(web): handle binary data in session messages to prevent 500 errors #12569

Closed

gzsiang force-pushed the main branch from 18218b4 to f960bbf Compare April 21, 2026 14:31

alt-glitch added type/feature New feature or request comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/tui Terminal UI (ui-tui/ + tui_gateway/) labels Apr 21, 2026

gzsiang added 19 commits April 22, 2026 11:14

feat(i18n): add Chinese translations for dangerous command approvals …

7395bc4

…(optional via config)

feat(i18n): add Chinese translations for spinner thinking verbs

4b77710

fix(tips): add TIPS alias for backward compat, update tests for EN/ZH…

7b7bfae

… split

feat: i18n - localize user-facing strings in CLI to Chinese

96afb12

feat(i18n): add format_zh() for configurable Chinese localization

a57dfc1

fix(types): add explicit type annotation for HOTKEYS_ZH

8ed3cd7

fix(test): resolve remaining test failures

94683bc

- Fix build_write_denied_paths to use home/.hermes/.env instead of get_hermes_home() - Fix mock paths in test_zombie_process_cleanup.py (use tools.process_registry.process_registry) Fixes 6 failing tests across 4 test files.

gzsiang force-pushed the main branch from 7dae787 to 8ba679a Compare April 22, 2026 03:17

fix: fix 3 failing CI tests

9da12e1

- test_minimax_provider: add _fallback_chain init in test stub - test_tips: shorten EN Tip 105 to 131 chars (was 160) - test_concurrent_interrupt: fix polling_tool sig + re-apply after _make_agent

gzsiang added 13 commits April 22, 2026 15:06

chore: lower circuit breaker threshold from 5 to 3 - Reduces _circuit…

220c64b

…_breaker_threshold from 5 to 3 - Reduces _tool_failure_threshold from 5 to 3 - Improves circuit breaker error message with specific suggestions - Helps stop tool-loop failures earlier

docs: add README to fork repository

480f7f5

Added README explaining: - This is a fork of NousResearch/hermes-agent - What Hermes Agent is - Custom modifications in this fork (circuit breaker improvements, test fixes)

docs: add README with language toggle (EN default, CN collapsible)

d84c410

docs: add language switcher (English | 中文)

ba204ab

docs: add bilingual README with language switcher

13275c3

docs: refine README - add language config, clarify compression model …

070fe08

…role

chore: lower compression threshold 0.80 -> 0.70 to prevent context ov…

bd219fb

…erflow

fix: update compressor tests for 0.70 default threshold

16a1c0d

gzsiang closed this Apr 22, 2026

gzsiang mentioned this pull request Apr 22, 2026

feat: circuit breaker for infinite tool-call loops #14059

Closed

alt-glitch mentioned this pull request Apr 28, 2026

feat: circuit breaker with compression model judgment #16749

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add circuit breaker for infinite tool-call loops#12632

fix: add circuit breaker for infinite tool-call loops#12632
gzsiang wants to merge 33 commits into
NousResearch:mainfrom
gzsiang:main

gzsiang commented Apr 19, 2026

Uh oh!

gzsiang commented Apr 22, 2026

Uh oh!

gzsiang commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

gzsiang commented Apr 19, 2026

Summary

Problem

Solution

Testing

Changes

Uh oh!

gzsiang commented Apr 22, 2026

Uh oh!

gzsiang commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants