Skip to content

feat: circuit breaker with compression model judgment#16749

Closed
gzsiang wants to merge 12 commits into
NousResearch:mainfrom
gzsiang:pr-circuit-breaker-v2
Closed

feat: circuit breaker with compression model judgment#16749
gzsiang wants to merge 12 commits into
NousResearch:mainfrom
gzsiang:pr-circuit-breaker-v2

Conversation

@gzsiang

@gzsiang gzsiang commented Apr 28, 2026

Copy link
Copy Markdown

Summary

Two-layer circuit breaker (disabled by default). Enable via config.yaml:

circuit_breaker:
  enabled: true
  consecutive_threshold: 5
  failure_threshold: 5

Changes

  • Compression model judges if agent is looping (YES/NO)
  • Fallback to main model self-reflection if no compression model
  • Config-gated (disabled by default)
  • Thresholds configurable

@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery comp/tui Terminal UI (ui-tui/ + tui_gateway/) labels Apr 28, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related to #7811 (open, circuit breaker + LLM fallback) and #14059 / #12632 (closed, same feature with dirty branches). Files changed here also include many unrelated modules (banner, skin_engine, tips, browser tests, TUI theme) — appears to be a dirty branch.

@gzsiang gzsiang force-pushed the pr-circuit-breaker-v2 branch from 1d51371 to 775c3f7 Compare April 28, 2026 01:20
@gzsiang

gzsiang commented Apr 28, 2026

Copy link
Copy Markdown
Author

Thanks for the review, @alt-glitch!

I've rebased the branch onto latest upstream/main and force-pushed a clean set of 3 commits. The PR now only contains the circuit-breaker changes in run_agent.py (+296 lines).

The i18n changes have been moved to a separate branch and will be submitted as a separate PR.

Regarding related issues:

The PR is now ready for review.

@liuhao1024

Copy link
Copy Markdown
Contributor

Hey @gzsiang, nice work on the circuit breaker concept! The compression-model-as-judge approach is clever. A couple of things I noticed:

1. function_result is undefined at the call site (will crash at runtime)

In _invoke_tool, the failure check is placed before the tool dispatch:

# Circuit breaker: track consecutive calls (before execution)
...
_cb_retry_msg = self._check_tool_failure(function_name, function_result)  # ← function_result not yet defined
if _cb_retry_msg is not None:
    return json.dumps({"error": _cb_retry_msg}, ensure_ascii=False)
if function_name == "todo":  # tool execution starts here

At this point function_result hasn't been assigned, so this raises NameError whenever the circuit breaker is enabled. The comment says "After execution" but the code sits above the dispatch block. Possibly the intent was to call _check_tool_failure after each tool returns?

2. Failure detection is inactive in sequential mode

In _execute_tool_calls_sequential, only the consecutive-call counter is incremented — _check_tool_failure is never called there. So Layer 2 (failure retry detection) silently does nothing in sequential execution, while the counter grows unboundedly.

3. Minor: _get_consecutive_suggestion / _try_auxiliary_model_consecutive appear unused

I don't see them called anywhere in the diff — they may be dead code.

Hope this helps!

@gzsiang gzsiang force-pushed the pr-circuit-breaker-v2 branch from 663a2fb to af8ce08 Compare May 3, 2026 19:17
@ether-btc

Copy link
Copy Markdown
Contributor

The two-layer design is thoughtful: compression model as primary judge, main model self-reflection as fallback. This avoids a circular dependency where the main model would need to judge its own looping behavior.

  1. Threshold calibration: consecutive_threshold: 5 and failure_threshold: 5 are hardcoded defaults. Have these been validated against any real production traces? The difference between "consecutive" and "cumulative" threshold matters significantly.

  2. State persistence: When a circuit opens, is the state persisted across agent restarts? If the agent restarts, does the circuit automatically reset to "closed," potentially allowing the same looping pattern to immediately recur?

  3. Compression model availability: If the configured compression model is unavailable, the fallback is main model self-reflection. But self-reflection is itself a call that could loop. Is there a depth limit on the reflection chain?

  4. Disabled by default: The config-gated design is good. Has there been consideration of an opt-out for the compression model judgment specifically, separate from the circuit breaker enable flag?

Clean single-file implementation. Worth merging with documentation.

@gzsiang

gzsiang commented May 4, 2026

Copy link
Copy Markdown
Author

Thanks for the thorough review, @liuhao1024! All three issues are now fixed in the latest commit:

1. NameError fix — Removed the misplaced _check_tool_failure() call that sat above the tool dispatch in _invoke_tool() (where function_result was not yet defined). The call is now placed correctly after handle_function_call() returns, so function_result is always available.

2. Sequential path now active — Added a _check_tool_failure() call in _execute_tool_calls_sequential(), right after _detect_tool_failure() and before the result is appended to messages. Layer 2 failure detection is now functional in sequential mode.

3. Dead code removed_get_consecutive_suggestion() and _try_auxiliary_model_consecutive() were never called in the current implementation (the equivalent logic lives in _check_tool_loop()). Both methods have been deleted.

All circuit-breaker tests still pass (18 passed, 3 skipped). The 3 pre-existing failures in test_concurrent_interrupt / test_real_interrupt_subagent are unrelated to this PR and already present on main.

@gzsiang

gzsiang commented May 4, 2026

Copy link
Copy Markdown
Author

@ether-btc Thanks for the thoughtful questions! Addressing each:

1. Threshold calibration
The default of 5 was chosen based on practical testing — it gives the agent enough room for legitimate retries (e.g., reading different sections of a large file, fixing parameters after an error) while still catching loops early. Importantly, Layer 1 (consecutive call check) does not hard-break at the threshold; it asks the auxiliary model to judge whether the calls are purposeful. The threshold only determines when to invite judgment, not when to block. Layer 2 (failure check) does hard-break, but that triggers only on actual failures, not on successful repeated calls.

Regarding consecutive vs cumulative: the current implementation is strictly consecutive — the counter resets when a different tool is called. This is intentional: cumulative counting would produce false positives on agents that legitimately use the same tool at different stages of a task.

2. State persistence across restarts
The counters (_consecutive_tool_calls, _tool_failure_count) are in-memory only — they reset on agent restart. This is by design: circuit breaker protects against loops within a single session. Cross-session persistence would introduce complexity (when to decay? how to handle context changes?) without clear benefit, since a restart implies a fresh session with different context.

3. Self-reflection depth / loop risk
The self-reflection fallback does not create a recursive call chain — it returns a text prompt injected into the current turn as a tool error message, not a new LLM invocation. However, there is a practical concern: weak or heavily-quantized models may ignore the self-reflection prompt, leading to a "soft loop" where the agent is told to stop but keeps going. The counter continues to increment, and on the next threshold crossing the same self-reflection message is returned, which the model may again ignore. This is an inherent limitation of self-reflection on low-capability models — the auxiliary model judgment path avoids it entirely.

4. Separate opt-out for compression model judgment
Currently not supported — enabled: true activates both layers together. However, the architecture makes this easy to add: a single boolean config (e.g., use_auxiliary_judgment: false) would skip the call_llm step in _check_tool_loop and go straight to the self-reflection fallback. The main use case would be users who have a compression model configured but don't trust its judgment (e.g., a weak local model producing too many false positives). I'd suggest adding this in a follow-up PR rather than bloating the initial implementation.

Appreciate the review! 🙏

gzsiang added 6 commits May 6, 2026 17:14
…avior

Replace hard-coded circuit breaker with two-layer approach:

1. Consecutive call tracking: Count consecutive calls to the same tool
   (regardless of success/failure). When threshold (5) is reached,
   trigger a loop check.

2. Compression model judgment: Ask the compression model to analyze
   if the agent is looping/repeating useless work. This is closer to
   how a human would notice repetitive behavior.

3. Failure tracking: Keep the existing failure counter for tools that
   keep returning errors.

Key changes:
- Remove _circuit_breaker_threshold (old identical-args breaker)
- Add _consecutive_threshold = 5 (new consecutive call threshold)
- Add _check_tool_loop() method that asks compression model:
  "Is the agent looping?" based on recent tool calls and results
- If compression model says YES -> trigger circuit breaker
- If compression model says NO -> reset counter and allow continuation
- Examples in prompt: reading same file repeatedly = YES, reading
  different sections = NO, same failing command = YES, different
  params = NO

This avoids false positives (e.g. reading a large file in chunks)
while catching actual loops (e.g. repeatedly reading the same file).
When compression model is unavailable (not configured), fall back to
letting the main model self-reflect — show it the consecutive call
count and last result, ask it to assess if it's looping.

Two paths:
- With compression model: fast judgment by auxiliary model (YES/NO)
- Without compression model: main model self-reflection prompt

Both paths avoid hard-coded rules, using language model judgment
instead of mechanical thresholds.
…eaker

Circuit breaker is now disabled by default. Enable via config.yaml:

    circuit_breaker:
      enabled: true
      consecutive_threshold: 5  # calls before asking compression model
      failure_threshold: 5      # consecutive failures before breaker

Both layers (consecutive call check and failure check) are gated by
this config option. When disabled, no tracking occurs, adding zero
overhead.
… model

- Replace compression model references with auxiliary model
- Update method names: _try_compression_model_* -> _try_auxiliary_model_*
- Update documentation
…e dead code

- Fix NameError: _check_tool_failure() was called before tool dispatch in
  _invoke_tool(), where function_result was not yet defined. Moved the call
  to after the tool result is available (handle_function_call path).

- Fix silent no-op in sequential mode: _execute_tool_calls_sequential()
  incremented the consecutive counter but never called _check_tool_failure(),
  so Layer 2 failure detection was inactive in sequential execution. Added
  the call after _detect_tool_failure() and before result is appended.

- Remove dead code: _get_consecutive_suggestion() and
  _try_auxiliary_model_consecutive() were never called anywhere in the diff;
  the same logic is handled by _check_tool_loop(). Removed both methods.
- Layer 2 (failure check): detect Hermes non-JSON error markers
  ([error], Error executing tool, Error:, startswith(Error))
  instead of resetting failure counter on JSONDecodeError
- Self-reflection fallback: hard kill message ("YOU ARE STUCK IN A LOOP.
  DO NOT retry this tool.") instead of soft suggestion
@gzsiang gzsiang force-pushed the pr-circuit-breaker-v2 branch from f8b6fe0 to 86405c4 Compare May 6, 2026 11:27
@gzsiang

gzsiang commented May 8, 2026

Copy link
Copy Markdown
Author

Circuit Breaker Redesign: From Loop Detection to Suggestion-Based Break

This commit significantly refactors the circuit breaker, shifting from a loop detection model to an auxiliary model suggestion mechanism.

Core Changes

1. Phase 1 bug fix
Fixed a bug in : when tools return JSON with format, is a value rather than a key. The JSON key check never matched and returned early without falling back to non-JSON text pattern matching. Now it properly falls through to text pattern checks (e.g., , , etc.) when JSON key conditions are not met.

2. Phase 2 redesign: from detector to breaker

  • Old behavior: asks auxiliary model is this a loop? -> answer yes/no -> if yes, returns STOP
  • New behavior: asks auxiliary model to directly suggest an alternative approach -> main LLM executes it -> breaks the deadlock
  • Auxiliary model increased from 10 to 1024
  • Added parameter so the auxiliary model can see the last call arguments

3. Fixed auxiliary model response parsing
Reasoning models like Qwen3.5 output responses in instead of . The old code only read , causing the auxiliary model to always return empty suggestions. Now both fields are checked.

4. All circuit breaker messages localized to Chinese

Test Results

Setup: Hermes Agent + Qwen3.6-35B-A3B (main model, IQ4_XS quantized) + Qwen3.5-4B (auxiliary model)

Scenario: The main model was stuck in a loop repeatedly calling with persistent errors during an HTML parsing task.

Actual session log excerpt:

Key finding:

  • Even when the 4B auxiliary model's suggestions are not perfectly accurate, the act of providing a suggestion alone is sufficient to break the main LLM's reasoning deadlock
  • The main model acts on the suggestion, receives new feedback during execution, and naturally returns to productive work
  • The IQ4_XS quantized main model can now complete tasks without getting stuck in unproductive tool loops

gzsiang added 3 commits May 8, 2026 18:25
1. "error" in data 在 error=null 时也返回 True(键存在即判定为失败),
   导致 exit_code=0 的正常结果被误判为失败。
   修复:改为 data.get("error") is not None

2. terminal/execute_code 工具的结果包含 exit_code 字段,但原代码
   non-JSON 分支检查的是 "exit code 1"(空格),与 JSON 中的
   "exit_code"(下划线)不匹配,导致 exit_code!=0 时也无法被检测。
   修复:在 JSON 分支中增加 exit_code != 0 的检查。

详见会话 20260511_121344_c5f11b:
- 真实失败 2 次(exit_code=1)
- 误判失败 2 次(exit_code=0,error=null,但 "error" in data = True)
- 熔断在第 4 次计数时触发(阈值为 5,第 5 次调用被阻断)
Resolve merge conflicts in run_agent.py (4 conflicts):
- Accept upstream changes for compression config, aux context,
  stream diagnostics, and provider profile path.
- Circuit breaker code remains intact.
@gzsiang

gzsiang commented May 17, 2026

Copy link
Copy Markdown
Author

After extensive real-world usage, I've found a number of issues that need addressing, and the feature's use case is quite niche. I don't plan to maintain it going forward. Closing this PR.

@gzsiang gzsiang closed this May 17, 2026
gzsiang added a commit to gzsiang/hermes-agent that referenced this pull request Jun 6, 2026
Added Chinese description of fork features:
- Circuit breaker (NousResearch#16749)
- CLI Chinese localization (NousResearch#15282)
- Message embedding (NousResearch#18059)
- Emergency compression (NousResearch#18607)
gzsiang added a commit to gzsiang/hermes-agent that referenced this pull request Jun 6, 2026
@gzsiang gzsiang deleted the pr-circuit-breaker-v2 branch June 8, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder comp/cli CLI entry point, hermes_cli/, setup wizard comp/gateway Gateway runner, session dispatch, delivery comp/tui Terminal UI (ui-tui/ + tui_gateway/) P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants