Skip to content

fix: add circuit breaker for infinite tool-call loops#12632

Closed
gzsiang wants to merge 33 commits into
NousResearch:mainfrom
gzsiang:main
Closed

fix: add circuit breaker for infinite tool-call loops#12632
gzsiang wants to merge 33 commits into
NousResearch:mainfrom
gzsiang:main

Conversation

@gzsiang

@gzsiang gzsiang commented Apr 19, 2026

Copy link
Copy Markdown

Summary

Add a circuit breaker mechanism to detect and break infinite tool-call loops in the agent framework.

Problem

When the LLM gets stuck in a retry loop calling the same tool with identical arguments repeatedly, it wastes tokens and never makes progress. This can happen with any tool (terminal, execute_code, etc.).

Solution

Added circuit breaker logic in both execution paths:

  • _invoke_tool() — concurrent execution path
  • _execute_tool_calls_sequential() — sequential execution path

How it works:

  1. Each tool call gets a signature: tool_name:md5(json_args)
  2. Signatures are recorded in order
  3. When the last N calls all have the same signature, the breaker triggers
  4. Returns an error message to the LLM, forcing it to try a different strategy
  5. After triggering, the counter resets so future calls can succeed

Default threshold: 5 consecutive identical calls (configurable via _circuit_breaker_threshold)

Testing

  • 22 existing tests pass
  • Manual testing confirms breaker triggers on repeated identical calls
  • Different tool/args combinations correctly reset the counter
  • After breaker triggers, new calls succeed normally

Changes

  • run_agent.py: Added _consecutive_tool_calls and _circuit_breaker_threshold state
  • run_agent.py: Added circuit breaker logic to _invoke_tool() and _execute_tool_calls_sequential()
  • Error messages in English for upstream compatibility

@gzsiang gzsiang force-pushed the main branch 3 times, most recently from a0e7c45 to 06d29c7 Compare April 20, 2026 09:56
@alt-glitch alt-glitch added type/feature New feature or request comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/tui Terminal UI (ui-tui/ + tui_gateway/) labels Apr 21, 2026
gzsiang added 19 commits April 22, 2026 11:14
When the database contains non-UTF-8 binary data (e.g. from image
attachments or binary tool outputs), FastAPI's jsonable_encoder fails
with UnicodeDecodeError when serializing session data for the web UI.

This adds a _sanitize_for_json() helper that recursively converts
bytes to UTF-8 strings (or base64 as fallback) before JSON serialization,
applied to all session-related API endpoints.
- Translate severity levels (HIGH→高危, CRITICAL→致命, etc.)
- Translate common rule titles (Pipe to interpreter→管道传输到解释器, etc.)
- Translate 'Security scan' prefix to '安全扫描'
- Add rule_id-based lookup for comprehensive rule coverage
- Fallback to title string matching for rules not in the rule map
- Respects approvals.language config (zh/en)
Add a circuit breaker mechanism in _invoke_tool() and the sequential
execution path to detect and break infinite tool-call loops.

When the same tool with identical arguments is called 3 consecutive
times, the circuit breaker triggers and returns an error, forcing the
LLM to try a different strategy.

The breaker uses MD5 hashing of (tool_name, json_args) to create a
stable signature, then checks if the last N calls all have the same
signature. After triggering, the counter resets so future calls can
succeed.

- Added _consecutive_tool_calls and _circuit_breaker_threshold to AIAgent.__init__
- Added circuit breaker logic to _invoke_tool() (concurrent path)
- Added circuit breaker logic to _execute_tool_calls_sequential() (sequential path)
- All 22 existing tests pass
…shold

- Change error message from Chinese to English for upstream compatibility
- Increase default threshold from 3 to 5 to avoid false positives
  (legitimate retry scenarios exist, e.g. network jitter)
- Fix indentation bug in sequential execution path
- All 22 existing tests pass
…elp text

- Add _DESCRIPTION_ZH and _CATEGORY_ZH dicts in commands.py
- Add _desc_zh() and _cat_zh() helpers for runtime localization
- Localize COMMANDS, COMMANDS_BY_CATEGORY, gateway_help_lines(), telegram_bot_commands()
- Add HELP_ZH and HOTKEYS_ZH constants in core.ts for web UI
- Add BRAND_ZH and getBrand() in theme.ts for branding strings
- Language detection from config.yaml (approvals.language or language key)
- Backward compatible: English defaults when language != 'zh'
- banner.py: Chinese translations for welcome banner text (tools, skills,
  MCP servers, update messages)
- skin_engine.py: new 'chinese' skin with gold theme, auto-select based
  on approvals.language config
- tips.py: full Chinese tips corpus (TIPS_ZH) with language detection
Prevents the LLM from retrying different variations of a failing tool call.
When a tool returns errors/empty results N times consecutively (regardless of
argument changes), the counter triggers and forces a strategy change.

- Added _tool_failure_count dict and _tool_failure_threshold (5) in __init__
- Added _check_tool_failure() method to detect failures and update counter
- Added check calls in both concurrent and sequential execution paths
- Each tool has independent counter; success resets counter to 0
- Remove erroneous _check_tool_failure call in _invoke_tool that referenced
  undefined variable function_result
- Add temperature field to auxiliary.vision config (default: 0.1)
- Fixes 8 failing tests in concurrent tool execution and camofox vision
- Move _check_tool_failure calls to AFTER each tool execution
- Adds failure retry counter check for all tool types (todo, session_search,
  memory, memory manager tools, clarify, delegate_task, and generic tools)
- Prevents LLM from retrying different parameter variations of failing tools
- Complements the circuit breaker which detects identical tool+args loops
- Add read_timestamps to _read_tracker initialization (file_tools.py)
- Fix camofox vision test mock path (test_browser_camofox.py)
- Catch ModuleNotFoundError in browser_tool camofox import
- Fix _CAMEL_ALIASES apiKey mapping in config.py
- Use os.path.join for hermes_home/.env path (file_safety.py)

Fixes 8 failing tests across 6 test files.
- Fix mock paths in test_zombie_process_cleanup.py (use run_agent.* instead of tools.*)
- Fix mock path in test_agent_cache.py (mock run_agent.cleanup_vm instead of _tt.cleanup_vm)
- Use get_hermes_home() in build_write_denied_paths for profile-aware path resolution
- Add ModuleNotFoundError handling in browser_tool camofox and auxiliary_client imports

Fixes 6 failing tests:
- test_zombie_process_cleanup.py: test_close_calls_cleanup_functions, test_close_survives_partial_failures
- test_agent_cache.py: test_close_vs_release_full_teardown_difference
- test_write_deny.py: test_hermes_env
- test_approval_heartbeat.py: test_wait_returns_immediately_on_user_response, test_heartbeat_import_failure_does_not_break_wait
- Fix build_write_denied_paths to use home/.hermes/.env instead of get_hermes_home()
- Fix mock paths in test_zombie_process_cleanup.py (use tools.process_registry.process_registry)

Fixes 6 failing tests across 4 test files.
pytest's caplog fixture doesn't automatically capture warnings from
hermes_cli.config logger because the logger's level is not set.

This conftest.py sets the logger level to DEBUG for all tests in
tests/hermes_cli/, ensuring caplog captures WARNING and above messages.

Fixes 2 failing tests in test_provider_config_validation.py:
- test_minimax_provider: add _fallback_chain init in test stub
- test_tips: shorten EN Tip 105 to 131 chars (was 160)
- test_concurrent_interrupt: fix polling_tool sig + re-apply after _make_agent
gzsiang added 13 commits April 22, 2026 15:06
…_breaker_threshold from 5 to 3 - Reduces _tool_failure_threshold from 5 to 3 - Improves circuit breaker error message with specific suggestions - Helps stop tool-loop failures earlier
When a tool call fails repeatedly, the compression model is asked
for a suggestion. This gives the main model a fresh perspective
to break out of tool loops.

- Added _get_tool_suggestion() for retry threshold failures
- Added _get_consecutive_suggestion() for identical-call circuit breaker
- Suggestions are appended to the error message with a 💡 emoji
- Gracefully falls back if compression model is unavailable
When a tool call fails repeatedly, suggestions are provided even
without a compression model configured.

- Added _heuristic_suggestion() — pattern matching on error messages
- Added _heuristic_consecutive_suggestion() — 20+ tool-specific suggestions
- Compression model is tried first, then falls back to heuristics
- Works zero-config — no LLM needed for basic suggestions
Replaced complex pattern-matching heuristics with simple generic prompts.
- Removed _heuristic_suggestion() (8 pattern rules)
- Removed _heuristic_consecutive_suggestion() (20+ tool mappings)
- Fallback is now just a direct text suggestion
- Compression model still tries first for smarter suggestions
- Net: -90 lines of dead pattern logic
Added README explaining:
- This is a fork of NousResearch/hermes-agent
- What Hermes Agent is
- Custom modifications in this fork (circuit breaker improvements, test fixes)
The integration tests call write_file_tool which goes through
_get_file_ops() that creates a terminal environment. Without
explicit TERMINAL_ENV=local, the CI environment may default to
modal which fails due to missing credentials.

Fixes: test_net_new_file_no_warning, test_sibling_agent_write_surfaces_warning_through_handler
The test_terminal_and_file_toolsets_resolve_all_tools and
test_terminal_tool_present tests call get_tool_definitions() which
filters tools based on check_terminal_requirements(). Without
explicit TERMINAL_ENV=local, the CI environment's TERMINAL_ENV=modal
(without credentials) causes check_terminal_requirements() to return
False, dropping the terminal tool from the schema.

Fixes: test_terminal_and_file_toolsets_resolve_all_tools,
       test_terminal_tool_present
@gzsiang

gzsiang commented Apr 22, 2026

Copy link
Copy Markdown
Author

Apologies — this PR inadvertently includes i18n/Chinese localization commits that were not intended to be part of the circuit breaker submission. Those changes were mixed into my fork's main branch by mistake.

I will open a clean PR with only the circuit breaker changes shortly.

The i18n work is tracked separately in #13625. I'll submit it as its own PR if the maintainers are interested.

Closing this one to keep things clean.

@gzsiang

gzsiang commented Apr 22, 2026

Copy link
Copy Markdown
Author

Update: The clean PR with only circuit breaker changes is now available at #14059.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/tui Terminal UI (ui-tui/ + tui_gateway/) type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants