feat: circuit breaker with compression model judgment by gzsiang · Pull Request #16749 · NousResearch/hermes-agent

gzsiang · 2026-04-28T01:01:15Z

Summary

Two-layer circuit breaker (disabled by default). Enable via config.yaml:

circuit_breaker:
  enabled: true
  consecutive_threshold: 5
  failure_threshold: 5

Changes

Compression model judges if agent is looping (YES/NO)
Fallback to main model self-reflection if no compression model
Config-gated (disabled by default)
Thresholds configurable

alt-glitch · 2026-04-28T01:13:34Z

Related to #7811 (open, circuit breaker + LLM fallback) and #14059 / #12632 (closed, same feature with dirty branches). Files changed here also include many unrelated modules (banner, skin_engine, tips, browser tests, TUI theme) — appears to be a dirty branch.

gzsiang · 2026-04-28T01:25:31Z

Thanks for the review, @alt-glitch!

I've rebased the branch onto latest upstream/main and force-pushed a clean set of 3 commits. The PR now only contains the circuit-breaker changes in run_agent.py (+296 lines).

The i18n changes have been moved to a separate branch and will be submitted as a separate PR.

Regarding related issues:

feat: circuit breaker and LLM fallback engine #7811 (circuit breaker + LLM fallback): This PR implements the LLM fallback approach described there — using the compression model to judge looping behavior, with a self-reflection fallback for users without a compression model configured.
feat: circuit breaker for infinite tool-call loops #14059 / fix: add circuit breaker for infinite tool-call loops #12632 (closed): Superseded by this PR. The original approach used hard-coded parameter hash matching; this version uses language model judgment instead, which avoids false positives (e.g., reading a large file in chunks).

The PR is now ready for review.

liuhao1024 · 2026-05-03T17:37:57Z

Hey @gzsiang, nice work on the circuit breaker concept! The compression-model-as-judge approach is clever. A couple of things I noticed:

1. function_result is undefined at the call site (will crash at runtime)

In _invoke_tool, the failure check is placed before the tool dispatch:

# Circuit breaker: track consecutive calls (before execution)
...
_cb_retry_msg = self._check_tool_failure(function_name, function_result)  # ← function_result not yet defined
if _cb_retry_msg is not None:
    return json.dumps({"error": _cb_retry_msg}, ensure_ascii=False)
if function_name == "todo":  # tool execution starts here

At this point function_result hasn't been assigned, so this raises NameError whenever the circuit breaker is enabled. The comment says "After execution" but the code sits above the dispatch block. Possibly the intent was to call _check_tool_failure after each tool returns?

2. Failure detection is inactive in sequential mode

In _execute_tool_calls_sequential, only the consecutive-call counter is incremented — _check_tool_failure is never called there. So Layer 2 (failure retry detection) silently does nothing in sequential execution, while the counter grows unboundedly.

3. Minor: _get_consecutive_suggestion / _try_auxiliary_model_consecutive appear unused

I don't see them called anywhere in the diff — they may be dead code.

Hope this helps!

ether-btc · 2026-05-03T20:31:56Z

The two-layer design is thoughtful: compression model as primary judge, main model self-reflection as fallback. This avoids a circular dependency where the main model would need to judge its own looping behavior.

Threshold calibration: consecutive_threshold: 5 and failure_threshold: 5 are hardcoded defaults. Have these been validated against any real production traces? The difference between "consecutive" and "cumulative" threshold matters significantly.
State persistence: When a circuit opens, is the state persisted across agent restarts? If the agent restarts, does the circuit automatically reset to "closed," potentially allowing the same looping pattern to immediately recur?
Compression model availability: If the configured compression model is unavailable, the fallback is main model self-reflection. But self-reflection is itself a call that could loop. Is there a depth limit on the reflection chain?
Disabled by default: The config-gated design is good. Has there been consideration of an opt-out for the compression model judgment specifically, separate from the circuit breaker enable flag?

Clean single-file implementation. Worth merging with documentation.

gzsiang · 2026-05-04T03:45:24Z

Thanks for the thorough review, @liuhao1024! All three issues are now fixed in the latest commit:

1. NameError fix — Removed the misplaced _check_tool_failure() call that sat above the tool dispatch in _invoke_tool() (where function_result was not yet defined). The call is now placed correctly after handle_function_call() returns, so function_result is always available.

2. Sequential path now active — Added a _check_tool_failure() call in _execute_tool_calls_sequential(), right after _detect_tool_failure() and before the result is appended to messages. Layer 2 failure detection is now functional in sequential mode.

3. Dead code removed — _get_consecutive_suggestion() and _try_auxiliary_model_consecutive() were never called in the current implementation (the equivalent logic lives in _check_tool_loop()). Both methods have been deleted.

All circuit-breaker tests still pass (18 passed, 3 skipped). The 3 pre-existing failures in test_concurrent_interrupt / test_real_interrupt_subagent are unrelated to this PR and already present on main.

gzsiang · 2026-05-04T08:31:44Z

@ether-btc Thanks for the thoughtful questions! Addressing each:

1. Threshold calibration
The default of 5 was chosen based on practical testing — it gives the agent enough room for legitimate retries (e.g., reading different sections of a large file, fixing parameters after an error) while still catching loops early. Importantly, Layer 1 (consecutive call check) does not hard-break at the threshold; it asks the auxiliary model to judge whether the calls are purposeful. The threshold only determines when to invite judgment, not when to block. Layer 2 (failure check) does hard-break, but that triggers only on actual failures, not on successful repeated calls.

Regarding consecutive vs cumulative: the current implementation is strictly consecutive — the counter resets when a different tool is called. This is intentional: cumulative counting would produce false positives on agents that legitimately use the same tool at different stages of a task.

2. State persistence across restarts
The counters (_consecutive_tool_calls, _tool_failure_count) are in-memory only — they reset on agent restart. This is by design: circuit breaker protects against loops within a single session. Cross-session persistence would introduce complexity (when to decay? how to handle context changes?) without clear benefit, since a restart implies a fresh session with different context.

3. Self-reflection depth / loop risk
The self-reflection fallback does not create a recursive call chain — it returns a text prompt injected into the current turn as a tool error message, not a new LLM invocation. However, there is a practical concern: weak or heavily-quantized models may ignore the self-reflection prompt, leading to a "soft loop" where the agent is told to stop but keeps going. The counter continues to increment, and on the next threshold crossing the same self-reflection message is returned, which the model may again ignore. This is an inherent limitation of self-reflection on low-capability models — the auxiliary model judgment path avoids it entirely.

4. Separate opt-out for compression model judgment
Currently not supported — enabled: true activates both layers together. However, the architecture makes this easy to add: a single boolean config (e.g., use_auxiliary_judgment: false) would skip the call_llm step in _check_tool_loop and go straight to the self-reflection fallback. The main use case would be users who have a compression model configured but don't trust its judgment (e.g., a weak local model producing too many false positives). I'd suggest adding this in a follow-up PR rather than bloating the initial implementation.

Appreciate the review! 🙏

…avior Replace hard-coded circuit breaker with two-layer approach: 1. Consecutive call tracking: Count consecutive calls to the same tool (regardless of success/failure). When threshold (5) is reached, trigger a loop check. 2. Compression model judgment: Ask the compression model to analyze if the agent is looping/repeating useless work. This is closer to how a human would notice repetitive behavior. 3. Failure tracking: Keep the existing failure counter for tools that keep returning errors. Key changes: - Remove _circuit_breaker_threshold (old identical-args breaker) - Add _consecutive_threshold = 5 (new consecutive call threshold) - Add _check_tool_loop() method that asks compression model: "Is the agent looping?" based on recent tool calls and results - If compression model says YES -> trigger circuit breaker - If compression model says NO -> reset counter and allow continuation - Examples in prompt: reading same file repeatedly = YES, reading different sections = NO, same failing command = YES, different params = NO This avoids false positives (e.g. reading a large file in chunks) while catching actual loops (e.g. repeatedly reading the same file).

When compression model is unavailable (not configured), fall back to letting the main model self-reflect — show it the consecutive call count and last result, ask it to assess if it's looping. Two paths: - With compression model: fast judgment by auxiliary model (YES/NO) - Without compression model: main model self-reflection prompt Both paths avoid hard-coded rules, using language model judgment instead of mechanical thresholds.

…eaker Circuit breaker is now disabled by default. Enable via config.yaml: circuit_breaker: enabled: true consecutive_threshold: 5 # calls before asking compression model failure_threshold: 5 # consecutive failures before breaker Both layers (consecutive call check and failure check) are gated by this config option. When disabled, no tracking occurs, adding zero overhead.

… model - Replace compression model references with auxiliary model - Update method names: _try_compression_model_* -> _try_auxiliary_model_* - Update documentation

…e dead code - Fix NameError: _check_tool_failure() was called before tool dispatch in _invoke_tool(), where function_result was not yet defined. Moved the call to after the tool result is available (handle_function_call path). - Fix silent no-op in sequential mode: _execute_tool_calls_sequential() incremented the consecutive counter but never called _check_tool_failure(), so Layer 2 failure detection was inactive in sequential execution. Added the call after _detect_tool_failure() and before result is appended. - Remove dead code: _get_consecutive_suggestion() and _try_auxiliary_model_consecutive() were never called anywhere in the diff; the same logic is handled by _check_tool_loop(). Removed both methods.

- Layer 2 (failure check): detect Hermes non-JSON error markers ([error], Error executing tool, Error:, startswith(Error)) instead of resetting failure counter on JSONDecodeError - Self-reflection fallback: hard kill message ("YOU ARE STUCK IN A LOOP. DO NOT retry this tool.") instead of soft suggestion

… reasoning_content parsing

gzsiang · 2026-05-08T07:20:51Z

Circuit Breaker Redesign: From Loop Detection to Suggestion-Based Break

This commit significantly refactors the circuit breaker, shifting from a loop detection model to an auxiliary model suggestion mechanism.

Core Changes

1. Phase 1 bug fix
Fixed a bug in : when tools return JSON with format, is a value rather than a key. The JSON key check never matched and returned early without falling back to non-JSON text pattern matching. Now it properly falls through to text pattern checks (e.g., , , etc.) when JSON key conditions are not met.

2. Phase 2 redesign: from detector to breaker

Old behavior: asks auxiliary model is this a loop? -> answer yes/no -> if yes, returns STOP
New behavior: asks auxiliary model to directly suggest an alternative approach -> main LLM executes it -> breaks the deadlock
Auxiliary model increased from 10 to 1024
Added parameter so the auxiliary model can see the last call arguments

3. Fixed auxiliary model response parsing
Reasoning models like Qwen3.5 output responses in instead of . The old code only read , causing the auxiliary model to always return empty suggestions. Now both fields are checked.

4. All circuit breaker messages localized to Chinese

Test Results

Setup: Hermes Agent + Qwen3.6-35B-A3B (main model, IQ4_XS quantized) + Qwen3.5-4B (auxiliary model)

Scenario: The main model was stuck in a loop repeatedly calling with persistent errors during an HTML parsing task.

Actual session log excerpt:

Key finding:

Even when the 4B auxiliary model's suggestions are not perfectly accurate, the act of providing a suggestion alone is sufficient to break the main LLM's reasoning deadlock
The main model acts on the suggestion, receives new feedback during execution, and naturally returns to productive work
The IQ4_XS quantized main model can now complete tasks without getting stuck in unproductive tool loops

1. "error" in data 在 error=null 时也返回 True（键存在即判定为失败），导致 exit_code=0 的正常结果被误判为失败。修复：改为 data.get("error") is not None 2. terminal/execute_code 工具的结果包含 exit_code 字段，但原代码 non-JSON 分支检查的是 "exit code 1"（空格），与 JSON 中的 "exit_code"（下划线）不匹配，导致 exit_code!=0 时也无法被检测。修复：在 JSON 分支中增加 exit_code != 0 的检查。详见会话 20260511_121344_c5f11b： - 真实失败 2 次（exit_code=1） - 误判失败 2 次（exit_code=0，error=null，但 "error" in data = True） - 熔断在第 4 次计数时触发（阈值为 5，第 5 次调用被阻断）

Resolve merge conflicts in run_agent.py (4 conflicts): - Accept upstream changes for compression config, aux context, stream diagnostics, and provider profile path. - Circuit breaker code remains intact.

gzsiang · 2026-05-17T12:21:14Z

After extensive real-world usage, I've found a number of issues that need addressing, and the feature's use case is quite niche. I don't plan to maintain it going forward. Closing this PR.

Added Chinese description of fork features: - Circuit breaker (NousResearch#16749) - CLI Chinese localization (NousResearch#15282) - Message embedding (NousResearch#18059) - Emergency compression (NousResearch#18607)

…ature, PR NousResearch#16749 closed)

gzsiang mentioned this pull request Apr 28, 2026

feat: circuit breaker for infinite tool-call loops #14059

Closed

gzsiang force-pushed the pr-circuit-breaker-v2 branch from 1d51371 to 775c3f7 Compare April 28, 2026 01:20

gzsiang force-pushed the pr-circuit-breaker-v2 branch from 663a2fb to af8ce08 Compare May 3, 2026 19:17

ether-btc mentioned this pull request May 4, 2026

feat(circuit-breaker): add use_auxiliary_judgment option #19616

Closed

gzsiang force-pushed the pr-circuit-breaker-v2 branch from e94797d to f8b6fe0 Compare May 4, 2026 16:53

gzsiang added 6 commits May 6, 2026 17:14

refactor(circuit-breaker): Use auxiliary model instead of compression…

f1538bb

… model - Replace compression model references with auxiliary model - Update method names: _try_compression_model_* -> _try_auxiliary_model_* - Update documentation

gzsiang force-pushed the pr-circuit-breaker-v2 branch from f8b6fe0 to 86405c4 Compare May 6, 2026 11:27

fix: circuit breaker two-phase with auxiliary model suggestions + fix…

32dbc2b

… reasoning_content parsing

gzsiang added 3 commits May 8, 2026 18:25

fix: circuit breaker - reasoning_content fallback, i18n, ANSI display

e49e932

Merge upstream/main into pr-circuit-breaker-v2

eeaf31b

Resolve merge conflicts in run_agent.py (4 conflicts): - Accept upstream changes for compression config, aux context, stream diagnostics, and provider profile path. - Circuit breaker code remains intact.

gzsiang added 2 commits May 16, 2026 07:52

chore: remove .cli_i18n plan files

0066c37

chore: remove i18n changes from circuit-breaker branch

224b771

gzsiang closed this May 17, 2026

gzsiang added a commit to gzsiang/hermes-agent that referenced this pull request Jun 6, 2026

fix: remove dead circuit breaker code from run_agent.py (abandoned fe…

c9a3e2e

…ature, PR NousResearch#16749 closed)

gzsiang deleted the pr-circuit-breaker-v2 branch June 8, 2026 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: circuit breaker with compression model judgment#16749

feat: circuit breaker with compression model judgment#16749
gzsiang wants to merge 12 commits into
NousResearch:mainfrom
gzsiang:pr-circuit-breaker-v2

gzsiang commented Apr 28, 2026

Uh oh!

alt-glitch commented Apr 28, 2026

Uh oh!

gzsiang commented Apr 28, 2026

Uh oh!

liuhao1024 commented May 3, 2026

Uh oh!

ether-btc commented May 3, 2026

Uh oh!

gzsiang commented May 4, 2026

Uh oh!

gzsiang commented May 4, 2026

Uh oh!

gzsiang commented May 8, 2026

Uh oh!

gzsiang commented May 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gzsiang commented Apr 28, 2026

Summary

Changes

Uh oh!

alt-glitch commented Apr 28, 2026

Uh oh!

gzsiang commented Apr 28, 2026

Uh oh!

liuhao1024 commented May 3, 2026

Uh oh!

ether-btc commented May 3, 2026

Uh oh!

gzsiang commented May 4, 2026

Uh oh!

gzsiang commented May 4, 2026

Uh oh!

gzsiang commented May 8, 2026

Circuit Breaker Redesign: From Loop Detection to Suggestion-Based Break

Core Changes

Test Results

Uh oh!

gzsiang commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gzsiang commented May 17, 2026 •

edited

Loading