feat: add intra-loop stagnation detector (#415) by Aureliolo · Pull Request #458 · Aureliolo/synthorg

Aureliolo · 2026-03-15T18:32:18Z

Summary

Add stagnation detection that analyzes TurnRecord tool-call fingerprints across a sliding window, intervenes with corrective prompt injection, and terminates early with STAGNATION if correction fails
New StagnationDetector async protocol with ToolRepetitionDetector default implementation using dual-signal detection (repetition ratio + cycle detection)
StagnationConfig frozen model with configurable window_size, repetition_threshold, cycle_detection, max_corrections, min_tool_turns
STAGNATION termination reason + tool_call_fingerprints field on TurnRecord
Fingerprint computation: name:sha256(canonical_json)[:16], sorted per-turn
ReactLoop integration (loop-scoped corrections counter) + PlanExecuteLoop integration (per-step scoped)
Shared check_stagnation() helper in loop_helpers.py with proper error handling (non-critical — logged and skipped on failure)
AgentEngine wiring via stagnation_detector parameter
Checkpoint resume path preserves stagnation_detector (and approval_gate) via new read-only properties
Observability events: check_performed, detected, correction_injected, terminated
Design spec: docs/design/engine.md stagnation detection section
CLAUDE.md: package structure + event constants updated

Test plan

66 stagnation-specific tests (models, detector, fingerprints, cycle detection, Hypothesis properties)
Extended loop_protocol, loop_helpers, react_loop, plan_execute_loop tests
Protocol conformance test (isinstance check)
Repetition ratio exact-value tests
Direct _detect_cycle coverage (6 cases)
PlanExecuteLoop step corrections counter increment test
Full suite: 8072 passed, 94.49% coverage
mypy strict: 0 errors
ruff lint + format: clean

Review coverage

Pre-reviewed by 9 agents, 14 findings addressed:

2 CRITICAL (checkpoint resume dropping detector, unguarded check() call)
4 MAJOR (code duplication, function length, docs, warning message)
6 MEDIUM (deep-copy details, cycle_length constraint, cross-field validator, test gaps, corrective message edge case)
2 MINOR (protocol conformance test, step corrections counter test)

Closes #415

github-actions · 2026-03-15T18:32:29Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-03-15T18:32:35Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 53cc38dd-ed0a-4154-93c1-94f430f9255a

📥 Commits

Reviewing files that changed from the base of the PR and between d6f1137 and f24b1cf.

📒 Files selected for processing (26)

CLAUDE.md
docs/design/engine.md
src/synthorg/engine/__init__.py
src/synthorg/engine/agent_engine.py
src/synthorg/engine/checkpoint/resume.py
src/synthorg/engine/loop_helpers.py
src/synthorg/engine/loop_protocol.py
src/synthorg/engine/plan_execute_loop.py
src/synthorg/engine/react_loop.py
src/synthorg/engine/stagnation/__init__.py
src/synthorg/engine/stagnation/detector.py
src/synthorg/engine/stagnation/models.py
src/synthorg/engine/stagnation/protocol.py
src/synthorg/observability/events/stagnation.py
tests/unit/engine/checkpoint/test_resume.py
tests/unit/engine/stagnation/__init__.py
tests/unit/engine/stagnation/test_detector.py
tests/unit/engine/stagnation/test_fingerprint.py
tests/unit/engine/stagnation/test_models.py
tests/unit/engine/stagnation/test_properties.py
tests/unit/engine/test_agent_engine.py
tests/unit/engine/test_loop_helpers.py
tests/unit/engine/test_loop_protocol.py
tests/unit/engine/test_plan_execute_loop.py
tests/unit/engine/test_react_loop.py
tests/unit/observability/test_events.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Stagnation detection added to detect repetitive tool usage, inject corrective prompts, and terminate tasks when unresolved; configurable sensitivity, windowing, and max-corrections.
Documentation
- Engine docs updated with stagnation detection design and integration guidance.
Observability
- New stagnation-related event names and updated logging guidance to surface checks, detections, injections, and terminations.
Tests
- Extensive unit and property tests for detection, fingerprinting, loop integration, and behavior coverage.

Walkthrough

Adds intra-loop stagnation detection: new stagnation package (protocol, models, ToolRepetitionDetector), integrates detector into ReactLoop and PlanExecuteLoop, wires detector through AgentEngine and checkpoint wrapping, augments loop records with tool_call_fingerprints, and adds tests and observability events for stagnation flows and corrective prompt injection/termination.

Changes

Cohort / File(s)	Summary
Stagnation core `src/synthorg/engine/stagnation/__init__.py`, `src/synthorg/engine/stagnation/protocol.py`, `src/synthorg/engine/stagnation/models.py`, `src/synthorg/engine/stagnation/detector.py`	New stagnation package: protocol, frozen models (config/result/verdict), and default ToolRepetitionDetector implementing windowed repetition and cycle detection, verdict construction, and corrective-message generation.
Loop integration `src/synthorg/engine/react_loop.py`, `src/synthorg/engine/plan_execute_loop.py`, `src/synthorg/engine/loop_helpers.py`, `src/synthorg/engine/loop_protocol.py`	Add STAGNATION TerminationReason, add TurnRecord.tool_call_fingerprints, compute fingerprints, check_stagnation and handler logic, and wire per-loop stagnation_detector plus corrections_injected across iterations/steps.
Engine wiring & helpers `src/synthorg/engine/__init__.py`, `src/synthorg/engine/agent_engine.py`, `src/synthorg/engine/checkpoint/resume.py`	Export stagnation types from engine package, add StagnationDetector parameter to AgentEngine and default loop construction, and preserve stagnation_detector when injecting checkpoint callbacks.
Observability `src/synthorg/observability/events/stagnation.py`, `src/synthorg/engine/loop_helpers.py`	New stagnation event constants and event emission points for checks, detections, corrections injected, and termination.
Tests — unit & property `tests/unit/engine/stagnation/*`, `tests/unit/engine/test_loop_helpers.py`, `tests/unit/engine/test_plan_execute_loop.py`, `tests/unit/engine/test_react_loop.py`, `tests/unit/engine/test_agent_engine.py`, `tests/unit/observability/test_events.py`, `tests/unit/engine/checkpoint/test_resume.py`	Comprehensive tests for detector logic, fingerprinting, models validation, property tests, loop integration, checkpoint preservation, and React/PlanExecute loop behaviors under NO_STAGNATION / INJECT_PROMPT / TERMINATE verdicts.
Manifests / metadata `manifest_file`, `pyproject.toml`, `requirements.txt`, `setup.py`, `Pipfile`	Small manifest adjustments recorded in summary (lines changed entries).

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Loop as Execution Loop
participant Detector as StagnationDetector
participant Engine as TaskEngine / AgentEngine
participant Tools as Tool Invoker
Loop->>Tools: perform tool call(s) for turn
Tools-->>Loop: tool call results
Loop->>Loop: compute tool_call_fingerprints
Loop->>Detector: async check(turns window, corrections_injected)
Detector-->>Loop: StagnationResult (NO_STAGNATION | INJECT_PROMPT | TERMINATE)
alt NO_STAGNATION
Loop->>Engine: continue normal loop
else INJECT_PROMPT
Loop->>Engine: inject corrective user prompt into context
Loop->>Loop: increment corrections_injected
else TERMINATE
Loop->>Engine: return ExecutionResult with TerminationReason.STAGNATION and metadata
end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

test: add fuzz and property-based testing across all layers #421 — Adds Hypothesis-based/stagnation property tests and related test infrastructure; strongly related to the new property tests and test artifacts introduced here.
feat: implement checkpoint recovery strategy #367 — Changes to loop constructors and make_loop_with_callback to preserve injected loop-level state (checkpoint callback); closely related to the checkpoint + stagnation_detector preservation logic.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'feat: add intra-loop stagnation detector (`#415`)' directly and clearly describes the main change: introducing stagnation detection functionality to execution loops.
Description check	✅ Passed	The PR description provides detailed context about stagnation detection implementation, including the protocol design, configuration model, loop integrations, testing, and related changes—all relevant to the changeset.
Linked Issues check	✅ Passed	The PR fully implements the requirements from issue `#415`: adds a StagnationDetector protocol with ToolRepetitionDetector implementation, dual-signal detection (repetition ratio + cycle detection), configurable StagnationConfig, STAGNATION termination reason, tool_call_fingerprints field, integration into both ReactLoop and PlanExecuteLoop, and corrective prompt injection with termination on failure.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to stagnation detection requirements: new modules for detector/protocol/models, integration into loops, TurnRecord extension, observability events, documentation, and comprehensive tests. No extraneous modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/stagnation-detector

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/stagnation-detector

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-15T18:32:49Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial feature to enhance the robustness and efficiency of agent execution: intra-loop stagnation detection. By analyzing patterns in tool calls, the system can now identify when an agent is repeatedly performing the same actions without making progress. It then attempts to guide the agent with corrective prompts and, if unsuccessful, terminates the execution early with a specific STAGNATION reason, preventing wasted resources and improving overall agent reliability.

Highlights

Stagnation Detection Core Logic: Implemented a new mechanism to detect and intervene when agents get stuck in repetitive tool-calling loops, using TurnRecord tool-call fingerprints.
Flexible Protocol & Default Implementation: Introduced an async StagnationDetector protocol, with a default ToolRepetitionDetector that uses dual-signal analysis (repetition ratio and cycle detection).
Configurable Behavior: Added StagnationConfig to allow customization of detection parameters like window size, repetition threshold, and maximum corrective interventions.
Loop Integration & Termination: Integrated stagnation checks into ReactLoop and PlanExecuteLoop, introducing STAGNATION as a new TerminationReason for early exit.
Enhanced Observability: Included new observability events to track stagnation checks, detections, corrections, and terminations.

Changelog

CLAUDE.md
- Updated the documentation to reflect the new stagnation detection feature and its associated event constants.
docs/design/engine.md
- Documented the design and implementation details of the new stagnation detection system, including its protocol, default implementation, configuration, and intervention flow.
src/synthorg/engine/init.py
- Exported the new StagnationConfig, StagnationDetector, StagnationResult, StagnationVerdict, and ToolRepetitionDetector classes.
src/synthorg/engine/agent_engine.py
- Modified the AgentEngine to allow injection of a StagnationDetector and updated the default loop creation to include it, along with an updated warning message for external loop provision.
src/synthorg/engine/checkpoint/resume.py
- Ensured that approval_gate and the newly added stagnation_detector are preserved when resuming ReactLoop and PlanExecuteLoop from a checkpoint.
src/synthorg/engine/loop_helpers.py
- Added utility functions for computing tool call fingerprints and a shared check_stagnation helper to encapsulate stagnation detection logic, including prompt injection and termination.
src/synthorg/engine/loop_protocol.py
- Extended the TerminationReason enum with STAGNATION and added a tool_call_fingerprints field to TurnRecord to store deterministic hashes of tool calls.
src/synthorg/engine/plan_execute_loop.py
- Integrated the check_stagnation logic into the PlanExecuteLoop, ensuring stagnation detection is performed per step with step-scoped corrections.
src/synthorg/engine/react_loop.py
- Integrated the check_stagnation logic into the ReactLoop, performing stagnation detection after each successful turn.
src/synthorg/engine/stagnation/init.py
- Created a new package for stagnation detection, re-exporting its public API.
src/synthorg/engine/stagnation/detector.py
- Implemented the ToolRepetitionDetector, which uses repetition ratio and cycle detection on tool call fingerprints to identify stagnation.
src/synthorg/engine/stagnation/models.py
- Defined Pydantic models for StagnationVerdict, StagnationConfig, and StagnationResult to structure stagnation-related data.
src/synthorg/engine/stagnation/protocol.py
- Defined the StagnationDetector asynchronous protocol for detecting intra-loop stagnation.
src/synthorg/observability/events/stagnation.py
- Defined new constants for observability events related to stagnation detection.
tests/unit/engine/stagnation/test_detector.py
- Added comprehensive unit tests for the ToolRepetitionDetector, covering various scenarios of repetition and cycle detection.
tests/unit/engine/stagnation/test_fingerprint.py
- Added unit tests to verify the deterministic and order-independent computation of tool call fingerprints.
tests/unit/engine/stagnation/test_models.py
- Added unit tests for the StagnationConfig and StagnationResult models, including validation rules.
tests/unit/engine/stagnation/test_properties.py
- Added property-based tests using Hypothesis for the fingerprint computation and detector behavior.
tests/unit/engine/test_loop_helpers.py
- Updated existing tests to account for the new tool_call_fingerprints field in TurnRecord and its handling.
tests/unit/engine/test_loop_protocol.py
- Updated tests for TerminationReason and TurnRecord to reflect the addition of STAGNATION and tool_call_fingerprints.
tests/unit/engine/test_plan_execute_loop.py
- Added integration tests to verify the PlanExecuteLoop's interaction with the new stagnation detector, including step-scoped behavior.
tests/unit/engine/test_react_loop.py
- Added integration tests to verify the ReactLoop's interaction with the new stagnation detector, including prompt injection and termination.
tests/unit/observability/test_events.py
- Updated the event discovery test to include the new stagnation domain module.

Activity

The author, Aureliolo, implemented this feature.
The pull request was pre-reviewed by 9 agents, and 14 findings were addressed, including 2 critical, 4 major, 6 medium, and 2 minor issues, indicating a thorough review process and significant iteration before the PR was opened.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a robust stagnation detection mechanism to prevent agents from getting stuck in unproductive loops. The implementation includes a new StagnationDetector protocol, a ToolRepetitionDetector that uses fingerprinting and cycle detection, and configuration models. The feature is well-integrated into the existing ReactLoop and PlanExecuteLoop, with special care for per-step detection in the latter. The changes are comprehensive, covering core logic, documentation, and extensive tests, including property-based tests. My review found one area for improvement in the implementation of the corrective message generation, where a branch of code appears to be unreachable, which can be simplified.

gemini-code-assist · 2026-03-15T18:35:11Z

src/synthorg/engine/stagnation/detector.py

+def _build_corrective_message(repeated_tools: list[str]) -> str:
+    """Build a corrective user-role message for prompt injection.
+
+    Args:
+        repeated_tools: Sorted list of repeated tool fingerprints.
+
+    Returns:
+        A corrective message string.
+    """
+    if repeated_tools:
+        tool_list = ", ".join(repeated_tools)
+        return (
+            "[SYSTEM INTERVENTION: Stagnation detected — your recent tool "
+            "calls show a repeating pattern without progress. The following "
+            "tools have been called with identical arguments multiple "
+            f"times: {tool_list}. Try a different approach: modify your "
+            "arguments, use different tools, or reconsider your strategy.]"
+        )
+    return (
+        "[SYSTEM INTERVENTION: Stagnation detected — your recent tool "
+        "calls show a repeating cycle pattern without progress. Try a "
+        "different approach: modify your arguments, use different tools, "
+        "or reconsider your strategy.]"
+    )


The else block in this function appears to be unreachable. Stagnation is detected if either the tool repetition ratio is high or a cycle is detected. In both of these cases, there will be repeated tool fingerprints, which means the repeated_tools list will not be empty. This makes the if repeated_tools: check always true and the else block dead code.

You can simplify this function by removing the conditional logic and the unreachable branch.

def _build_corrective_message(repeated_tools: list[str]) -> str: """Build a corrective user-role message for prompt injection. Args: repeated_tools: Sorted list of repeated tool fingerprints. Returns: A corrective message string. """ tool_list = ", ".join(repeated_tools) return ( "[SYSTEM INTERVENTION: Stagnation detected — your recent tool " "calls show a repeating pattern without progress. The following " "tools have been called with identical arguments multiple " f"times: {tool_list}. Try a different approach: modify your " "arguments, use different tools, or reconsider your strategy.]" )

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

src/synthorg/engine/react_loop.py (1)
101-189: 🛠️ Refactor suggestion | 🟠 Major

Keep execute() under the 50-line cap.

The stagnation handoff makes the main loop denser and harder to scan. Extract one iteration or the post-turn stagnation handling into a small helper so execute() stays orchestration-only.

As per coding guidelines, Functions must be < 50 lines; files must be < 800 lines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/react_loop.py` around lines 101 - 189, The execute()
method exceeds the 50-line guideline because the post-turn stagnation handling
is inlined; extract that block into a helper (e.g.,
_handle_post_turn_stagnation) that accepts (ctx, turns, corrections_injected,
execution_id) and returns either an ExecutionResult or (ctx,
corrections_injected), then replace the inlined stagnation logic in execute()
with a single await call to _handle_post_turn_stagnation and handle its return
the same way as the current code; touch references to check_stagnation and the
variables ctx, corrections_injected, and turns so the new helper calls
check_stagnation and preserves the existing control flow.
src/synthorg/engine/agent_engine.py (1)
151-189: 🛠️ Refactor suggestion | 🟠 Major

Extract the new loop-wiring branch out of AgentEngine.__init__.

This constructor is already far beyond the repo's 50-line limit, and the new approval/stagnation branch adds one more responsibility to a very busy setup path. Pull external-loop validation and default-loop construction into a helper so the constructor is easier to audit.

As per coding guidelines, Functions must be < 50 lines; files must be < 800 lines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 151 - 189, The constructor
AgentEngine.__init__ is doing external-loop validation and default-loop
construction inline; extract that branch into a new helper (e.g.,
_configure_execution_loop or _setup_loop_wiring) so __init__ stays small. Move
the logic that reads execution_loop, checks self._approval_gate and
self._stagnation_detector to emit APPROVAL_GATE_LOOP_WIRING_WARNING, and assigns
self._loop (using self._make_default_loop() when execution_loop is None) into
the helper; call this helper from __init__ passing the execution_loop argument
and ensure it returns/sets self._loop and retains existing behavior around
warning message and wiring semantics.
docs/design/engine.md (1)
289-290: ⚠️ Potential issue | 🟡 Minor

Update TerminationReason enum documentation to include STAGNATION.

The enum listing at lines 289-290 does not include the new STAGNATION termination reason that this PR introduces. The documentation should be updated to reflect the complete set of possible termination reasons.
 `TerminationReason`
-:   Enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `SHUTDOWN`, `ERROR`,
-    `PARKED`.  `max_turns` defaults to 20.
+:   Enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `SHUTDOWN`, `ERROR`,
+    `PARKED`, `STAGNATION`.  `max_turns` defaults to 20.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/engine.md` around lines 289 - 290, The TerminationReason enum
documentation is missing the new STAGNATION value; update the enum listing in
docs/design/engine.md so the list of termination reasons under TerminationReason
includes `STAGNATION` alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`,
`SHUTDOWN`, `ERROR`, and `PARKED` and ensure any explanatory note (e.g., about
`max_turns` defaulting to 20) remains intact.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/loop_protocol.py`:
- Around line 65-68: Change the type of tool_call_fingerprints from tuple[str,
...] to tuple[NotBlankStr, ...] to enforce non-blank identifiers; update the
Field declaration for tool_call_fingerprints accordingly and ensure NotBlankStr
is imported from core.types at the top of the module so the Field uses
tuple[NotBlankStr, ...] as its type annotation and validation.

In `@tests/unit/engine/stagnation/test_models.py`:
- Around line 60-106: The tests around StagnationConfig (e.g.,
test_window_size_lower_bound, test_window_size_upper_bound,
test_repetition_threshold_lower_bound, test_repetition_threshold_upper_bound,
test_repetition_threshold_negative_rejected,
test_repetition_threshold_above_one_rejected, test_max_corrections_zero,
test_max_corrections_negative_rejected, test_min_tool_turns_lower_bound,
test_min_tool_turns_zero_rejected,
test_min_tool_turns_exceeds_window_size_rejected) are repetitive; consolidate
them using pytest.mark.parametrize to cover similar boundary inputs and expected
ValidationError or accepted values in a single parametrized test per field
(window_size, repetition_threshold, max_corrections, min_tool_turns). Update
tests to iterate tuples of (input_kwargs, should_raise, expected_value_or_match)
and assert by either expecting ValidationError with match or validating the
created StagnationConfig attribute, keeping references to the StagnationConfig
constructor and the specific test names to locate and replace the current
individual tests (also apply the same refactor pattern to the similar tests at
the other section noted around lines 176-195).

In `@tests/unit/engine/test_plan_execute_loop.py`:
- Around line 852-879: The test's _FakeStagnationDetector currently ignores the
corrections_injected argument so the assertion that check_count == 2 can pass
even if PlanExecuteLoop never increments corrections; modify
_FakeStagnationDetector to record the corrections_injected values passed into
check (e.g., add a list attribute like recorded_corrections) and append the
corrections_injected each time check() is called, then in the test assert that
recorded_corrections == [0, 1] (repeat the same change for the other occurrence
around lines 1008-1050) to ensure PlanExecuteLoop actually forwards the
incrementing correction counts.
- Around line 966-1006: The test currently only asserts
detector.last_turns_count == 1 which could be satisfied by a turn from step 1;
update the test to assert that the detector actually received the step-2 turn by
capturing the inspected turn(s) from the fake detector (the
_FakeStagnationDetector instance stored in detector) and asserting properties
that identify the step-2 tool turn (e.g., check
detector.last_inspected_turns[0].tool_calls_made or the inspected turn's tool
call id equals "tc-1" produced by _tool_use_response("echo","tc-1")); modify or
extend _FakeStagnationDetector to record and expose the last inspected turns if
needed, then replace the existing last_turns_count assertion with an assertion
that the inspected turn's tool_calls_made (or tool call id) matches the expected
step-2 value.

In `@tests/unit/engine/test_react_loop.py`:
- Around line 1039-1066: The test's fake stagnation detector
(_FakeStagnationDetector) must record the corrections_injected values passed to
its async check(...) method so the test can assert they were incremented; add an
attribute (e.g., corrections_args: list[int]) to _FakeStagnationDetector, append
the corrections_injected parameter inside check(...), and update the test
assertions (the places around the existing check_count assertions) to assert
that detector.corrections_args == [0, 1] (instead of relying only on
check_count). Apply the same change to the other instance referenced around
lines 1160-1198.

---

Outside diff comments:
In `@docs/design/engine.md`:
- Around line 289-290: The TerminationReason enum documentation is missing the
new STAGNATION value; update the enum listing in docs/design/engine.md so the
list of termination reasons under TerminationReason includes `STAGNATION`
alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `SHUTDOWN`, `ERROR`, and
`PARKED` and ensure any explanatory note (e.g., about `max_turns` defaulting to
20) remains intact.

In `@src/synthorg/engine/agent_engine.py`:
- Around line 151-189: The constructor AgentEngine.__init__ is doing
external-loop validation and default-loop construction inline; extract that
branch into a new helper (e.g., _configure_execution_loop or _setup_loop_wiring)
so __init__ stays small. Move the logic that reads execution_loop, checks
self._approval_gate and self._stagnation_detector to emit
APPROVAL_GATE_LOOP_WIRING_WARNING, and assigns self._loop (using
self._make_default_loop() when execution_loop is None) into the helper; call
this helper from __init__ passing the execution_loop argument and ensure it
returns/sets self._loop and retains existing behavior around warning message and
wiring semantics.

In `@src/synthorg/engine/react_loop.py`:
- Around line 101-189: The execute() method exceeds the 50-line guideline
because the post-turn stagnation handling is inlined; extract that block into a
helper (e.g., _handle_post_turn_stagnation) that accepts (ctx, turns,
corrections_injected, execution_id) and returns either an ExecutionResult or
(ctx, corrections_injected), then replace the inlined stagnation logic in
execute() with a single await call to _handle_post_turn_stagnation and handle
its return the same way as the current code; touch references to
check_stagnation and the variables ctx, corrections_injected, and turns so the
new helper calls check_stagnation and preserves the existing control flow.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 42dde218-8444-4f6d-b77f-bae296aa6b20

📥 Commits

Reviewing files that changed from the base of the PR and between 2feed09 and 1fcd75e.

📒 Files selected for processing (24)

CLAUDE.md
docs/design/engine.md
src/synthorg/engine/__init__.py
src/synthorg/engine/agent_engine.py
src/synthorg/engine/checkpoint/resume.py
src/synthorg/engine/loop_helpers.py
src/synthorg/engine/loop_protocol.py
src/synthorg/engine/plan_execute_loop.py
src/synthorg/engine/react_loop.py
src/synthorg/engine/stagnation/__init__.py
src/synthorg/engine/stagnation/detector.py
src/synthorg/engine/stagnation/models.py
src/synthorg/engine/stagnation/protocol.py
src/synthorg/observability/events/stagnation.py
tests/unit/engine/stagnation/__init__.py
tests/unit/engine/stagnation/test_detector.py
tests/unit/engine/stagnation/test_fingerprint.py
tests/unit/engine/stagnation/test_models.py
tests/unit/engine/stagnation/test_properties.py
tests/unit/engine/test_loop_helpers.py
tests/unit/engine/test_loop_protocol.py
tests/unit/engine/test_plan_execute_loop.py
tests/unit/engine/test_react_loop.py
tests/unit/observability/test_events.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Test (Python 3.14)
GitHub Check: Build Backend
GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) instead of except (A, B): — PEP 758 syntax enforced by ruff on Python 3.14
All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Use @computed_field in Pydantic models for derived values instead of storing and validating redundant fields
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations (e.g., multiple tool invocations, parallel agent calls) instead of bare create_task
Line length: 88 characters (enforced by ruff)
Functions must be < 50 lines; files must be < 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate input at system boundaries (user input, external APIs, config files)
Create new objects instead of mutating existing ones (immutability principle)

Files:

src/synthorg/engine/loop_protocol.py
src/synthorg/engine/stagnation/__init__.py
tests/unit/observability/test_events.py
src/synthorg/engine/__init__.py
src/synthorg/engine/stagnation/protocol.py
tests/unit/engine/test_react_loop.py
src/synthorg/engine/stagnation/models.py
src/synthorg/engine/checkpoint/resume.py
tests/unit/engine/stagnation/test_fingerprint.py
tests/unit/engine/stagnation/test_properties.py
tests/unit/engine/test_loop_helpers.py
tests/unit/engine/test_loop_protocol.py
src/synthorg/engine/stagnation/detector.py
src/synthorg/engine/loop_helpers.py
src/synthorg/engine/plan_execute_loop.py
src/synthorg/observability/events/stagnation.py
src/synthorg/engine/react_loop.py
src/synthorg/engine/agent_engine.py
tests/unit/engine/stagnation/test_detector.py
tests/unit/engine/test_plan_execute_loop.py
tests/unit/engine/stagnation/test_models.py

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic must import from synthorg.observability import get_logger and define logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code — use the synthorg logger instead
Always use event name constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider); import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Always log with structured kwargs: logger.info(EVENT, key=value) — never use old-style formatting logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG level for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never implement retry logic in driver subclasses or calling code — all provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically
Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig; retryable errors are RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state
For dict/list fields in frozen Pydantic models, use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use Pydantic v2 conventions: BaseModel, model_validator, computed_field, ConfigDict

Files:

src/synthorg/engine/loop_protocol.py
src/synthorg/engine/stagnation/__init__.py
src/synthorg/engine/__init__.py
src/synthorg/engine/stagnation/protocol.py
src/synthorg/engine/stagnation/models.py
src/synthorg/engine/checkpoint/resume.py
src/synthorg/engine/stagnation/detector.py
src/synthorg/engine/loop_helpers.py
src/synthorg/engine/plan_execute_loop.py
src/synthorg/observability/events/stagnation.py
src/synthorg/engine/react_loop.py
src/synthorg/engine/agent_engine.py

docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

ALWAYS read the relevant design page before implementing any feature or planning any issue. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec

Files:

docs/design/engine.md

docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

All docstrings in public APIs must be documented in Google style and reflected in docs/api/ auto-generated library reference (via mkdocstrings + Griffe AST-based parsing)

Files:

docs/design/engine.md

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow to categorize tests
Maintain 80% minimum code coverage (enforced in CI)
Each test must complete within 30 seconds (timeout enforcement)
Always include -n auto when running pytest via uv run python -m pytest — never run tests sequentially (pytest-xdist parallelism)
Prefer @pytest.mark.parametrize for testing similar cases
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names like example-provider, example-large-001, test-provider, test-small-001, or size aliases (large/medium/small)
Use Hypothesis for property-based testing with @given + @settings decorators; control profiles via HYPOTHESIS_PROFILE env var (ci for 200 examples, dev for 1000 examples)

Files:

tests/unit/observability/test_events.py
tests/unit/engine/test_react_loop.py
tests/unit/engine/stagnation/test_fingerprint.py
tests/unit/engine/stagnation/test_properties.py
tests/unit/engine/test_loop_helpers.py
tests/unit/engine/test_loop_protocol.py
tests/unit/engine/stagnation/test_detector.py
tests/unit/engine/test_plan_execute_loop.py
tests/unit/engine/stagnation/test_models.py

🧠 Learnings (16)

📚 Learning: 2026-03-15T12:05:56.884Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

tests/unit/observability/test_events.py
CLAUDE.md
src/synthorg/engine/loop_helpers.py
src/synthorg/observability/events/stagnation.py

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

tests/unit/observability/test_events.py
CLAUDE.md
src/synthorg/engine/loop_helpers.py
src/synthorg/observability/events/stagnation.py

📚 Learning: 2026-03-15T12:05:56.884Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state

Applied to files:

src/synthorg/engine/stagnation/models.py

📚 Learning: 2026-03-15T12:05:56.884Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`

Applied to files:

src/synthorg/engine/stagnation/models.py

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

src/synthorg/engine/stagnation/models.py

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

tests/unit/engine/stagnation/test_properties.py

📚 Learning: 2026-03-15T12:05:56.884Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)

Applied to files:

tests/unit/engine/stagnation/test_properties.py

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

CLAUDE.md
src/synthorg/engine/loop_helpers.py

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-15T12:05:56.884Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Always log with structured kwargs: `logger.info(EVENT, key=value)` — never use old-style formatting `logger.info("msg %s", val)`

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-15T12:05:56.884Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`

Applied to files:

CLAUDE.md
src/synthorg/engine/loop_helpers.py

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-15T12:05:56.884Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

CLAUDE.md

📚 Learning: 2026-03-15T12:05:56.884Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Never use `import logging`, `logging.getLogger()`, or `print()` in application code — use the synthorg logger instead

Applied to files:

CLAUDE.md

🧬 Code graph analysis (14)

src/synthorg/engine/__init__.py (3)

src/synthorg/engine/stagnation/models.py (3)

StagnationConfig (23-81)

StagnationResult (84-137)

StagnationVerdict (15-20)

src/synthorg/engine/stagnation/protocol.py (1)

StagnationDetector (15-46)

src/synthorg/engine/stagnation/detector.py (1)

ToolRepetitionDetector (27-116)

tests/unit/engine/test_react_loop.py (4)

src/synthorg/engine/stagnation/detector.py (2)

get_detector_type (50-52)

check (54-103)

tests/unit/engine/test_loop_helpers.py (5)

_ctx_with_user_msg (93-97)

_stop_response (47-53)

execute (77-85)

_tool_use_response (56-66)

_make_invoker (88-90)

src/synthorg/engine/react_loop.py (3)

ReactLoop (59-327)

stagnation_detector (93-95)

execute (101-196)

src/synthorg/engine/stagnation/models.py (2)

StagnationResult (84-137)

StagnationVerdict (15-20)

src/synthorg/engine/checkpoint/resume.py (2)

src/synthorg/engine/react_loop.py (3)

ReactLoop (59-327)

approval_gate (88-90)

stagnation_detector (93-95)

src/synthorg/engine/plan_execute_loop.py (2)

approval_gate (115-117)

stagnation_detector (120-122)

tests/unit/engine/stagnation/test_fingerprint.py (1)

src/synthorg/engine/loop_helpers.py (1)

compute_fingerprints (478-505)

tests/unit/engine/stagnation/test_properties.py (4)

src/synthorg/engine/loop_helpers.py (1)

compute_fingerprints (478-505)

src/synthorg/engine/loop_protocol.py (1)

TurnRecord (40-81)

src/synthorg/engine/stagnation/detector.py (2)

ToolRepetitionDetector (27-116)

check (54-103)

src/synthorg/engine/stagnation/models.py (2)

StagnationConfig (23-81)

StagnationVerdict (15-20)

tests/unit/engine/test_loop_helpers.py (1)

src/synthorg/engine/loop_helpers.py (2)

make_turn_record (459-475)

clear_last_turn_tool_calls (424-437)

tests/unit/engine/test_loop_protocol.py (1)

src/synthorg/engine/loop_protocol.py (2)

TerminationReason (28-37)

TurnRecord (40-81)

src/synthorg/engine/stagnation/detector.py (3)

src/synthorg/engine/loop_protocol.py (1)

TurnRecord (40-81)

src/synthorg/engine/stagnation/models.py (3)

StagnationConfig (23-81)

StagnationResult (84-137)

StagnationVerdict (15-20)

src/synthorg/engine/stagnation/protocol.py (2)

get_detector_type (44-46)

check (26-42)

src/synthorg/engine/loop_helpers.py (4)

src/synthorg/engine/stagnation/models.py (1)

StagnationVerdict (15-20)

src/synthorg/engine/stagnation/protocol.py (2)

StagnationDetector (15-46)

check (26-42)

src/synthorg/engine/loop_protocol.py (1)

TurnRecord (40-81)

src/synthorg/engine/stagnation/detector.py (1)

check (54-103)

src/synthorg/engine/plan_execute_loop.py (4)

src/synthorg/engine/loop_helpers.py (1)

check_stagnation (526-600)

src/synthorg/engine/stagnation/protocol.py (1)

StagnationDetector (15-46)

src/synthorg/engine/stagnation/detector.py (1)

config (46-48)

src/synthorg/engine/loop_protocol.py (1)

ExecutionResult (84-145)

src/synthorg/engine/react_loop.py (4)

src/synthorg/engine/loop_helpers.py (1)

check_stagnation (526-600)

src/synthorg/engine/stagnation/protocol.py (1)

StagnationDetector (15-46)

src/synthorg/engine/plan_execute_loop.py (2)

stagnation_detector (120-122)

approval_gate (115-117)

src/synthorg/engine/loop_protocol.py (1)

ExecutionResult (84-145)

src/synthorg/engine/agent_engine.py (2)

src/synthorg/engine/stagnation/protocol.py (1)

StagnationDetector (15-46)

src/synthorg/engine/react_loop.py (3)

stagnation_detector (93-95)

ReactLoop (59-327)

approval_gate (88-90)

tests/unit/engine/stagnation/test_detector.py (3)

src/synthorg/engine/loop_protocol.py (1)

TurnRecord (40-81)

src/synthorg/engine/stagnation/detector.py (5)

ToolRepetitionDetector (27-116)

check (54-103)

get_detector_type (50-52)

config (46-48)

_detect_cycle (153-173)

src/synthorg/engine/stagnation/models.py (2)

StagnationConfig (23-81)

StagnationVerdict (15-20)

tests/unit/engine/stagnation/test_models.py (1)

src/synthorg/engine/stagnation/models.py (3)

StagnationConfig (23-81)

StagnationResult (84-137)

StagnationVerdict (15-20)

🔇 Additional comments (36)

src/synthorg/observability/events/stagnation.py (1)

5-8: Looks good — stagnation events are correctly centralized as constants.

src/synthorg/engine/loop_protocol.py (1)

36-36: TerminationReason.STAGNATION addition is correct and consistent with loop termination semantics.

tests/unit/observability/test_events.py (1)

217-217: Good update — domain discovery expectation now includes the new stagnation events module.

src/synthorg/engine/stagnation/__init__.py (1)

10-24: Public stagnation API re-exports are clean and well-scoped.

tests/unit/engine/test_loop_protocol.py (1)

34-37: Tests are correctly aligned with the new STAGNATION reason and fingerprint field behavior.

Also applies to: 80-93

CLAUDE.md (1)

122-122: Documentation updates correctly capture the new stagnation subsystem and observability events.

Also applies to: 190-190

tests/unit/engine/stagnation/test_models.py (1)

13-59: Great coverage depth for stagnation model invariants and validation paths.

Also applies to: 108-202

tests/unit/engine/stagnation/test_fingerprint.py (1)

13-101: Fingerprint tests are solid and cover determinism, canonicalization, and output format well.

src/synthorg/engine/stagnation/protocol.py (1)

14-46: LGTM — the protocol stays intentionally minimal.

check() and get_detector_type() are the only hooks the loops need, and keeping check() async leaves room for service-backed detectors without another API break.

src/synthorg/engine/checkpoint/resume.py (1)

115-127: Nice preservation of loop-scoped collaborators.

Rebuilding the loop with both approval_gate and stagnation_detector keeps resumed executions behaviorally aligned with the original loop instance.

tests/unit/engine/test_loop_helpers.py (1)

522-533: Good coverage around the fingerprint lifecycle.

These assertions protect both sides of the TurnRecord.tool_call_fingerprints contract: creation in make_turn_record() and cleanup in clear_last_turn_tool_calls().

Also applies to: 623-637

docs/design/engine.md (1)

498-563: Well-documented stagnation detection design.

The Stagnation Detection section comprehensively covers the protocol interface, default implementation, configuration options, intervention flow, and loop integration semantics. The distinction between loop-scoped (ReactLoop) and step-scoped (PlanExecuteLoop) correction counters is clearly documented.

src/synthorg/engine/plan_execute_loop.py (3)

96-107: LGTM: Constructor properly accepts and stores stagnation detector.

The constructor signature and storage pattern matches the ReactLoop implementation, maintaining consistency across execution loop implementations.

114-122: LGTM: Read-only properties for checkpoint resume support.

The approval_gate and stagnation_detector properties enable checkpoint resume to preserve these components, as specified in the PR objectives.

653-686: LGTM: Step-scoped stagnation detection correctly implemented.

The implementation properly:

Initializes step_start_idx and step_corrections at step entry for step-scoped windowing

Passes only current step's turns (turns[step_start_idx:]) to the detector

Includes step_number for observability

Handles all three return types from check_stagnation

This aligns with the design spec stating "corrections counter is step-scoped, window resets across step boundaries."
tests/unit/engine/stagnation/test_properties.py (3)
47-48: Good: Tool name strategy correctly filters whitespace-only strings.

The filter lambda s: s.strip() ensures generated tool names comply with NotBlankStr validation, preventing invalid test inputs.

55-89: LGTM: Fingerprint property tests.

The tests verify two key properties:

Determinism: Same name + args always produces identical fingerprints

Format: Fingerprints match the name:16-char-hex format specification

Using rpartition(":") correctly handles tool names containing colons.

94-116: No action needed. The async test methods correctly omit explicit @pytest.mark.asyncio decorators. The asyncio_mode = "auto" configuration in pyproject.toml (line 227) enables automatic async test discovery, which is the required pattern for this codebase per the coding guidelines.
			> Likely an incorrect or invalid review comment.
tests/unit/engine/stagnation/test_detector.py (5)

14-27: LGTM: Well-designed test helper.

The _turn helper correctly derives tool_calls_made from fingerprints and sets appropriate finish_reason based on tool presence. This ensures test data is internally consistent.

30-83: LGTM: Comprehensive NO_STAGNATION test coverage.

Tests cover all paths that should return NO_STAGNATION:

Empty turns

Unique fingerprints

Below min_tool_turns threshold

Turns without tool calls

Disabled detector

Below repetition threshold

85-138: LGTM: INJECT_PROMPT and TERMINATE verdict tests.

Tests verify:

High repetition triggers INJECT_PROMPT with corrective message

Cycle detection triggers INJECT_PROMPT with cycle_length

Exceeding max_corrections triggers TERMINATE

max_corrections=0 skips INJECT_PROMPT and goes directly to TERMINATE

228-232: LGTM: Protocol conformance test.

Verifying isinstance(detector, StagnationDetector) confirms the implementation satisfies the runtime-checkable protocol.

285-327: LGTM: Exact repetition ratio verification.

Testing precise ratio calculations (0.8, 0.5, 0.0) with pytest.approx ensures the algorithm correctness for edge cases.

src/synthorg/engine/stagnation/models.py (4)

15-20: LGTM: StagnationVerdict enum.

Clean StrEnum with clear semantics for the three possible outcomes.

23-81: LGTM: StagnationConfig model.

Well-designed frozen config model with:

Sensible defaults (window_size=5, repetition_threshold=0.6, max_corrections=1)

Appropriate constraints (ge/le bounds)

Cross-field validation ensuring min_tool_turns <= window_size

84-137: LGTM: StagnationResult model with proper validation.

The model correctly:

Deep-copies details dict at construction boundary (line 124-125)

Enforces corrective_message is required for INJECT_PROMPT and forbidden otherwise

Uses frozen config for immutability

Based on learnings: "frozen Pydantic models for config/identity" and "deep-copy at system boundaries."

140-142: LGTM: Reusable NO_STAGNATION_RESULT singleton.

Safe to reuse since StagnationResult is frozen. This avoids allocating new objects for the common no-stagnation case.

src/synthorg/engine/loop_helpers.py (4)

435-437: LGTM: Consistent clearing of tool call metadata.

Both tool_calls_made and tool_call_fingerprints are cleared together, maintaining consistency when shutdown fires before tool execution.

478-505: LGTM: Deterministic fingerprint computation.

The implementation correctly:

Canonicalizes JSON with sort_keys=True and compact separators

Uses default=str to handle non-JSON-serializable argument values

Truncates SHA-256 to 16 hex chars (sufficient for fingerprint uniqueness)

Returns sorted tuple for order-independent comparison across turns

526-599: LGTM: Well-structured stagnation check helper.

The implementation:

Gracefully handles detector failures (logs and continues execution)

Re-raises MemoryError and RecursionError per project conventions

Uses appropriate log levels (WARNING for termination, INFO for correction injection)

Attaches stagnation metadata to the result for observability

577-582: ✓ TerminationReason.STAGNATION is properly defined in the enum.

The enum value is correctly defined at line 36 of loop_protocol.py as STAGNATION = "stagnation".

src/synthorg/engine/stagnation/detector.py (5)

27-53: LGTM: ToolRepetitionDetector class structure.

The class correctly:

Defaults to StagnationConfig() when no config is provided

Exposes configuration via read-only property

Returns a meaningful detector type identifier

54-116: LGTM: Stagnation check logic.

The dual-signal detection approach is well-implemented:

Early exit for disabled config or insufficient tool turns

Repetition ratio computed across all fingerprints in window

Cycle detection runs if enabled

Either signal triggers stagnation detection

The window extraction correctly filters to tool-bearing turns and takes the most recent window_size entries.

119-138: LGTM: Correct repetition ratio computation.

The formula sum(c - 1 for c in counts.values() if c > 1) / total correctly counts excess occurrences as duplicates:

5 identical → 4/5 = 0.8

A,A,B,B → 2/4 = 0.5

All unique → 0/n = 0.0

153-173: LGTM: Cycle detection algorithm.

The algorithm correctly identifies repeating patterns by comparing the last k turns with the preceding k turns, starting from k=2 to find the shortest cycle first.

176-237: LGTM: Result building and corrective messaging.

The helper functions correctly:

Choose INJECT_PROMPT vs TERMINATE based on corrections_injected < max_corrections

Build informative corrective messages with repeated tool names

Include repeated_tools in details for debugging

Log the STAGNATION_DETECTED event with all relevant context

src/synthorg/engine/loop_protocol.py

coderabbitai · 2026-03-15T18:42:14Z

tests/unit/engine/stagnation/test_models.py

+    def test_window_size_lower_bound(self) -> None:
+        with pytest.raises(ValidationError, match="greater than or equal to 2"):
+            StagnationConfig(window_size=1)
+
+    def test_window_size_upper_bound(self) -> None:
+        with pytest.raises(ValidationError, match="less than or equal to 50"):
+            StagnationConfig(window_size=51)
+
+    def test_repetition_threshold_lower_bound(self) -> None:
+        config = StagnationConfig(repetition_threshold=0.0)
+        assert config.repetition_threshold == 0.0
+
+    def test_repetition_threshold_upper_bound(self) -> None:
+        config = StagnationConfig(repetition_threshold=1.0)
+        assert config.repetition_threshold == 1.0
+
+    def test_repetition_threshold_negative_rejected(self) -> None:
+        with pytest.raises(ValidationError):
+            StagnationConfig(repetition_threshold=-0.1)
+
+    def test_repetition_threshold_above_one_rejected(self) -> None:
+        with pytest.raises(ValidationError):
+            StagnationConfig(repetition_threshold=1.1)
+
+    def test_max_corrections_zero(self) -> None:
+        config = StagnationConfig(max_corrections=0)
+        assert config.max_corrections == 0
+
+    def test_max_corrections_negative_rejected(self) -> None:
+        with pytest.raises(ValidationError):
+            StagnationConfig(max_corrections=-1)
+
+    def test_min_tool_turns_lower_bound(self) -> None:
+        config = StagnationConfig(min_tool_turns=1)
+        assert config.min_tool_turns == 1
+
+    def test_min_tool_turns_zero_rejected(self) -> None:
+        with pytest.raises(ValidationError):
+            StagnationConfig(min_tool_turns=0)
+
+    def test_min_tool_turns_exceeds_window_size_rejected(self) -> None:
+        with pytest.raises(
+            ValidationError,
+            match=r"min_tool_turns.*exceeds.*window_size",
+        ):
+            StagnationConfig(window_size=3, min_tool_turns=4)
+


🧹 Nitpick | 🔵 Trivial

Consolidate repetitive boundary tests with @pytest.mark.parametrize.

These cases are strong but repetitive; parameterization would reduce duplication and simplify future additions.

As per coding guidelines: "Prefer @pytest.mark.parametrize for testing similar cases".

Also applies to: 176-195

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/engine/stagnation/test_models.py` around lines 60 - 106, The tests around StagnationConfig (e.g., test_window_size_lower_bound, test_window_size_upper_bound, test_repetition_threshold_lower_bound, test_repetition_threshold_upper_bound, test_repetition_threshold_negative_rejected, test_repetition_threshold_above_one_rejected, test_max_corrections_zero, test_max_corrections_negative_rejected, test_min_tool_turns_lower_bound, test_min_tool_turns_zero_rejected, test_min_tool_turns_exceeds_window_size_rejected) are repetitive; consolidate them using pytest.mark.parametrize to cover similar boundary inputs and expected ValidationError or accepted values in a single parametrized test per field (window_size, repetition_threshold, max_corrections, min_tool_turns). Update tests to iterate tuples of (input_kwargs, should_raise, expected_value_or_match) and assert by either expecting ValidationError with match or validating the created StagnationConfig attribute, keeping references to the StagnationConfig constructor and the specific test names to locate and replace the current individual tests (also apply the same refactor pattern to the similar tests at the other section noted around lines 176-195).

tests/unit/engine/test_plan_execute_loop.py

tests/unit/engine/test_react_loop.py

codecov · 2026-03-15T18:50:56Z

Codecov Report

❌ Patch coverage is 99.30556% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 93.72%. Comparing base (24a0d7a) to head (f24b1cf).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/synthorg/engine/agent_engine.py	66.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #458      +/-   ##
==========================================
+ Coverage   93.67%   93.72%   +0.05%     
==========================================
  Files         469      474       +5     
  Lines       22219    22471     +252     
  Branches     2143     2166      +23     
==========================================
+ Hits        20814    21062     +248     
- Misses       1095     1098       +3     
- Partials      310      311       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add stagnation detection that analyzes TurnRecord tool-call fingerprints across a sliding window, intervenes with corrective prompt injection, and terminates early with STAGNATION if correction fails. - StagnationDetector protocol with async check() method - ToolRepetitionDetector: dual-signal (repetition ratio + cycle detection) - StagnationConfig: window_size, repetition_threshold, cycle_detection, max_corrections, min_tool_turns - STAGNATION termination reason + tool_call_fingerprints on TurnRecord - Fingerprint computation: name:sha256(canonical_json)[:16], sorted - ReactLoop integration: loop-scoped corrections counter - PlanExecuteLoop integration: per-step scoped detection - AgentEngine wiring via stagnation_detector parameter - Observability events: check_performed, detected, correction_injected, terminated - Design spec: engine.md stagnation detection section - 54 new tests (models, detector, fingerprints, Hypothesis properties) - Extended loop_protocol, loop_helpers, react_loop, plan_execute_loop tests

Add 'stagnation' to the expected domain modules set in test_all_domain_modules_discovered.

- Fix checkpoint resume dropping stagnation_detector and approval_gate (CRITICAL: make_loop_with_callback now forwards both via properties) - Add error handling around stagnation_detector.check() calls (CRITICAL: wrap in except MemoryError, RecursionError / except Exception) - Extract shared check_stagnation() helper to loop_helpers.py (removes duplicated logic from ReactLoop + PlanExecuteLoop) - Refactor ToolRepetitionDetector.check() into smaller methods (_extract_window, _compute_repetition_ratio, _build_stagnation_result) - Add approval_gate/stagnation_detector read-only properties to both loops - Update agent_engine warning to mention stagnation_detector - Deep-copy StagnationResult.details at construction (matches ExecutionResult) - Add ge=2 constraint to cycle_length field - Add cross-field validator: min_tool_turns <= window_size - Handle empty repeated_tools in corrective message (cycle-only trigger) - Update CLAUDE.md Package Structure with stagnation/ - Add _detect_cycle direct tests (6 cases: short/cycle2/cycle3/almost/shortest/empty) - Add repetition ratio exact-value tests (3 cases) - Add protocol conformance test (isinstance check) - Add PlanExecuteLoop step corrections counter increment test Pre-reviewed by 9 agents, 14 findings addressed

…and Gemini Critical fixes: - Fix incomplete turns in PlanExecuteLoop STAGNATION result (rebuild with full turns) - Add direct unit tests for check_stagnation() (8 tests: verdicts, exception paths) - Add checkpoint resume tests verifying stagnation_detector preservation - Fix TerminationReason docs missing STAGNATION in enum list and post-execution transitions Major fixes: - Refactor check_stagnation into two functions (<50 lines each) - Fix inaccurate repetition_threshold boundary semantics in docstring - Fix repetition ratio description in design spec - Add turn context to stagnation error log (consistent with peer functions) - Use NotBlankStr for tool_call_fingerprints (per CLAUDE.md convention) - Remove dead else branch in _build_corrective_message (Gemini finding) - Add AgentEngine stagnation_detector wiring test - Narrow exception log message to type-only (security hardening) Medium fixes: - Add tool_call_fingerprints to TurnRecord Attributes docstring - Add Args section to check_stagnation docstring - Fix _compute_repetition_ratio docstring precision - Enhance fake detectors: track corrections_seen, use TurnRecord types - Verify step-2 turn identity in step-scoped test

Cast _loop to ReactLoop via isinstance before accessing stagnation_detector property (ExecutionLoop protocol does not expose it).

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/synthorg/engine/plan_execute_loop.py (1)

88-94: ⚠️ Potential issue | 🟡 Minor

Document stagnation_detector in the public API docs.

The class Args: block still stops at approval_gate, so the new constructor parameter is undocumented.

📝 Suggested docstring update

     Args:
         config: Loop configuration.  Defaults to ``PlanExecuteConfig()``.
         checkpoint_callback: Optional per-turn checkpoint callback.
         approval_gate: Optional gate that checks for pending escalations
             after tool execution and parks the agent when approval is
             required.  ``None`` disables approval checks.
+        stagnation_detector: Optional detector used to identify
+            repetitive tool-use loops within a step. ``None`` disables
+            stagnation checks.

As per coding guidelines: "Use Google-style docstrings on all public classes and functions. Ruff D rules enforce this."

Also applies to: 96-103

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/plan_execute_loop.py` around lines 88 - 94, The docstring
for the public class or function that defines the loop (inspect the
PlanExecuteLoop class / __init__ in plan_execute_loop.py) is missing
documentation for the new constructor parameter stagnation_detector; update the
Google-style Args: block to add an entry for stagnation_detector describing its
purpose, expected type (e.g., callable or StagnationDetector), behavior (how it
detects/handles stagnation and whether None disables it), and default value, and
mirror this addition where the Args block continues around lines 96–103 so the
public API docs include the new parameter.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/engine.md`:
- Around line 468-472: The earlier “Transition sequences” summary table is out
of sync with the later text: add STAGNATION and PARKED to the table’s list of
termination reasons that leave the task in its current state (alongside the
existing MAX_TURNS / BUDGET entries) so the lifecycle outcomes match; update the
table rows/columns that mention termination reasons and their resulting task
state to include the STAGNATION and PARKED symbols and ensure the description
for those entries matches the wording used later in the document.

In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 675-690: The code currently reassigns the shared ctx when
check_stagnation() returns an updated AgentContext, which leaks step-scoped
corrective prompts into later steps; change the tuple unpack to capture the
returned context into a new local variable (e.g., step_ctx or ctx_for_check) and
only use that local context for handling this step's logic and step_corrections,
leaving the outer ctx untouched (keep using ctx elsewhere), and ensure
step_corrections is applied only to the current step flow; do not reassign ctx
from the result of check_stagnation() so corrective prompts remain step-scoped
(references: check_stagnation, ctx, step_corrections, stag_outcome,
ExecutionResult, model_copy).

In `@tests/unit/engine/test_plan_execute_loop.py`:
- Around line 924-927: The assertion is too weak and may pass if _execute_step()
returns only a step-local slice; change the test to assert that the full two
turns (planning and tool-use) are present by using a stricter check on
result.turns (e.g., assert len(result.turns) == 2) and/or assert that one of the
turns is the planning turn (inspect turn.role or turn.type) so the STAGNATION
path reconstructs the complete turn sequence rather than just a step slice.

---

Outside diff comments:
In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 88-94: The docstring for the public class or function that defines
the loop (inspect the PlanExecuteLoop class / __init__ in plan_execute_loop.py)
is missing documentation for the new constructor parameter stagnation_detector;
update the Google-style Args: block to add an entry for stagnation_detector
describing its purpose, expected type (e.g., callable or StagnationDetector),
behavior (how it detects/handles stagnation and whether None disables it), and
default value, and mirror this addition where the Args block continues around
lines 96–103 so the public API docs include the new parameter.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6425edb9-dec5-4fc9-a018-a17beef932c3

📥 Commits

Reviewing files that changed from the base of the PR and between 1fcd75e and d6f1137.

📒 Files selected for processing (11)

docs/design/engine.md
src/synthorg/engine/loop_helpers.py
src/synthorg/engine/loop_protocol.py
src/synthorg/engine/plan_execute_loop.py
src/synthorg/engine/stagnation/detector.py
src/synthorg/engine/stagnation/models.py
tests/unit/engine/checkpoint/test_resume.py
tests/unit/engine/test_agent_engine.py
tests/unit/engine/test_loop_helpers.py
tests/unit/engine/test_plan_execute_loop.py
tests/unit/engine/test_react_loop.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Analyze (python)

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations in Python code—Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) in exception handlers—PEP 758 except syntax for Python 3.14. Ruff enforces this.
Add type hints to all public functions and classes. Use mypy strict mode.
Use Google-style docstrings on all public classes and functions. Ruff D rules enforce this.
Prefer immutability—create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
Handle errors explicitly—never silently swallow exceptions.
Use Line length of 88 characters (ruff enforced).

Files:

tests/unit/engine/test_agent_engine.py
tests/unit/engine/test_react_loop.py
src/synthorg/engine/plan_execute_loop.py
src/synthorg/engine/loop_protocol.py
src/synthorg/engine/loop_helpers.py
tests/unit/engine/test_loop_helpers.py
tests/unit/engine/checkpoint/test_resume.py
src/synthorg/engine/stagnation/models.py
tests/unit/engine/test_plan_execute_loop.py
src/synthorg/engine/stagnation/detector.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow for test organization.
Use asyncio_mode = "auto" in pytest configuration—no manual @pytest.mark.asyncio needed on async tests.
Prefer @pytest.mark.parametrize for testing similar cases.
Use Hypothesis for property-based testing with @given + @settings. Run dev profile with HYPOTHESIS_PROFILE=dev for 1000 examples.
Never skip, dismiss, or ignore flaky tests—always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic.
Do NOT use vendor names (Anthropic, OpenAI, Claude, GPT) in tests. Use test-provider, test-small-001, etc.

Files:

tests/unit/engine/test_agent_engine.py
tests/unit/engine/test_react_loop.py
tests/unit/engine/test_loop_helpers.py
tests/unit/engine/checkpoint/test_resume.py
tests/unit/engine/test_plan_execute_loop.py

src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with adopted conventions: use @computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Use structured concurrency over bare create_task.
Keep functions under 50 lines and files under 800 lines.
Validate at system boundaries (user input, external APIs, config files).
Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__). Never use import logging / logging.getLogger() / print() in application code.
Always use logger as the variable name for the module logger (not _logger, not log).
Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events. import EVENT_CONSTANT.
Log structured data with logger.info(EVENT, key=value)—never use logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG level logging is for object creation, internal flow, and entry/exit of key functions.
Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig.
Retryable errors are RateLimitError, P...

Files:

src/synthorg/engine/plan_execute_loop.py
src/synthorg/engine/loop_protocol.py
src/synthorg/engine/loop_helpers.py
src/synthorg/engine/stagnation/models.py
src/synthorg/engine/stagnation/detector.py

src/synthorg/{engine,providers}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.

Files:

src/synthorg/engine/plan_execute_loop.py
src/synthorg/engine/loop_protocol.py
src/synthorg/engine/loop_helpers.py
src/synthorg/engine/stagnation/models.py
src/synthorg/engine/stagnation/detector.py

docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Markdown documentation files must follow Zensical build conventions with mkdocs.yml at repo root.

Files:

docs/design/engine.md

🧠 Learnings (11)

📚 Learning: 2026-03-15T19:03:01.705Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/api/**/*.py : Authentication uses JWT + API key. Approval gate integration for high-risk operations.

Applied to files:

src/synthorg/engine/plan_execute_loop.py

📚 Learning: 2026-03-15T19:03:01.705Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

src/synthorg/engine/loop_protocol.py

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.

Applied to files:

src/synthorg/engine/loop_protocol.py

📚 Learning: 2026-03-15T19:03:01.705Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.

Applied to files:

src/synthorg/engine/loop_helpers.py

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

src/synthorg/engine/loop_helpers.py

📚 Learning: 2026-03-15T19:03:01.705Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

src/synthorg/engine/loop_helpers.py

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

src/synthorg/engine/loop_helpers.py

📚 Learning: 2026-03-15T19:03:01.705Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

src/synthorg/engine/stagnation/models.py

📚 Learning: 2026-03-15T11:48:14.867Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

src/synthorg/engine/stagnation/models.py

📚 Learning: 2026-03-15T19:03:01.705Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.

Applied to files:

src/synthorg/engine/stagnation/models.py

📚 Learning: 2026-03-15T19:03:01.705Z

Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

src/synthorg/engine/stagnation/models.py

🧬 Code graph analysis (8)

tests/unit/engine/test_agent_engine.py (1)

src/synthorg/engine/react_loop.py (1)

ReactLoop (59-327)

tests/unit/engine/test_react_loop.py (3)

src/synthorg/engine/loop_protocol.py (3)

TerminationReason (28-37)

TurnRecord (40-83)

execute (166-192)

src/synthorg/engine/react_loop.py (3)

ReactLoop (59-327)

stagnation_detector (93-95)

execute (101-196)

src/synthorg/engine/stagnation/models.py (2)

StagnationResult (87-140)

StagnationVerdict (15-20)

src/synthorg/engine/plan_execute_loop.py (5)

src/synthorg/engine/loop_helpers.py (1)

check_stagnation (526-587)

src/synthorg/engine/stagnation/protocol.py (1)

StagnationDetector (15-46)

src/synthorg/engine/react_loop.py (1)

stagnation_detector (93-95)

src/synthorg/engine/stagnation/detector.py (1)

config (46-48)

src/synthorg/engine/loop_protocol.py (1)

ExecutionResult (86-147)

src/synthorg/engine/loop_helpers.py (3)

src/synthorg/engine/stagnation/models.py (2)

StagnationResult (87-140)

StagnationVerdict (15-20)

src/synthorg/engine/stagnation/protocol.py (2)

StagnationDetector (15-46)

check (26-42)

src/synthorg/engine/loop_protocol.py (2)

TurnRecord (40-83)

ExecutionResult (86-147)

tests/unit/engine/test_loop_helpers.py (11)

src/synthorg/engine/loop_helpers.py (1)

check_stagnation (526-587)

tests/unit/engine/test_cost_recording.py (6)

record (75-80)

record (179-180)

record (195-196)

_turn (21-36)

_result (39-48)

test_memory_error_propagates (175-189)

tests/unit/engine/test_plan_execute_loop.py (3)

_stop_response (90-96)

get_detector_type (868-869)

check (871-883)

tests/unit/engine/test_react_loop.py (3)

_stop_response (41-47)

get_detector_type (1055-1056)

check (1058-1068)

tests/unit/engine/stagnation/test_detector.py (1)

_turn (14-27)

tests/unit/engine/stagnation/test_properties.py (1)

_turn (15-26)

src/synthorg/engine/loop_protocol.py (3)

TurnRecord (40-83)

ExecutionResult (86-147)

TerminationReason (28-37)

tests/unit/engine/conftest.py (1)

engine (449-460)

src/synthorg/engine/stagnation/detector.py (2)

get_detector_type (50-52)

check (54-103)

src/synthorg/engine/stagnation/models.py (2)

StagnationResult (87-140)

StagnationVerdict (15-20)

tests/unit/engine/test_agent_engine.py (1)

test_memory_error_propagates (1282-1303)

tests/unit/engine/checkpoint/test_resume.py (1)

src/synthorg/engine/plan_execute_loop.py (1)

stagnation_detector (120-122)

tests/unit/engine/test_plan_execute_loop.py (4)

src/synthorg/engine/loop_protocol.py (3)

TerminationReason (28-37)

TurnRecord (40-83)

execute (166-192)

src/synthorg/engine/stagnation/detector.py (2)

get_detector_type (50-52)

check (54-103)

src/synthorg/engine/stagnation/models.py (2)

StagnationResult (87-140)

StagnationVerdict (15-20)

src/synthorg/engine/plan_execute_loop.py (3)

PlanExecuteLoop (82-896)

stagnation_detector (120-122)

execute (128-206)

src/synthorg/engine/stagnation/detector.py (3)

src/synthorg/engine/loop_protocol.py (1)

TurnRecord (40-83)

src/synthorg/engine/stagnation/models.py (3)

StagnationConfig (23-84)

StagnationResult (87-140)

StagnationVerdict (15-20)

src/synthorg/engine/stagnation/protocol.py (2)

get_detector_type (44-46)

check (26-42)

🔇 Additional comments (19)

src/synthorg/engine/stagnation/models.py (5)

1-12: LGTM!

Module docstring is clear and imports are appropriate. No usage of from __future__ import annotations as per coding guidelines. Using Self from typing and Pydantic v2 constructs correctly.

15-21: LGTM!

Clean enum definition using StrEnum with clear semantic values. The three verdicts (NO_STAGNATION, INJECT_PROMPT, TERMINATE) align well with the stagnation detection flow described in PR objectives.

23-84: LGTM!

Well-designed frozen configuration model with sensible defaults and constraints. The _validate_min_within_window validator correctly ensures min_tool_turns <= window_size with a clear error message explaining why the configuration is invalid.

87-140: LGTM!

Excellent implementation following the coding guidelines. The __init__ override correctly deep-copies the details dict at the construction boundary per the guideline: "copy.deepcopy() at system boundaries". The _validate_corrective_message validator ensures invariant consistency between verdict and corrective_message. Based on learnings: "For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries."

143-146: LGTM!

Good optimization pattern — a reusable frozen instance for the common no-stagnation case avoids repeated allocations.

src/synthorg/engine/loop_helpers.py (7)

8-9: LGTM!

Standard library imports for fingerprint computation (hashlib for SHA-256, json for canonical serialization).

23-26: LGTM!

Event constants correctly imported from the domain-specific synthorg.observability.events.stagnation module per coding guidelines.

424-437: LGTM!

Correctly extended to clear both tool_calls_made and tool_call_fingerprints together. Since fingerprints are derived from tool calls, clearing one without the other would create an inconsistent state.

478-505: LGTM!

Well-designed fingerprinting function with deterministic properties:

sort_keys=True ensures consistent JSON regardless of dict key order

Compact separators avoid whitespace variance

Sorted output tuple ensures same tool calls produce identical fingerprints regardless of call order

16 hex chars (64 bits) provides sufficient collision resistance for this use case

The default=str fallback is defensive for non-JSON-serializable types. This is acceptable given tool arguments should typically be JSON-serializable, but be aware this could produce different fingerprints for semantically equivalent objects with non-deterministic __str__ representations.

459-475: LGTM!

Clean integration of fingerprint computation into turn record creation. The tool_call_fingerprints field is correctly populated from the response's tool calls.

526-587: LGTM!

Well-designed advisory stagnation check with appropriate error handling:

Correctly returns None on detector absence or failure (non-blocking)

Properly logs errors with context before continuing

Clear separation between detection (check) and verdict handling

Uses PEP 758 exception syntax correctly (except MemoryError, RecursionError:)

The advisory design ensures stagnation detection never interrupts an otherwise-healthy loop, as documented in the PR objectives.

590-645: LGTM!

Clean verdict dispatch with appropriate logging levels:

WARNING for STAGNATION_TERMINATED (problematic termination)

INFO for STAGNATION_CORRECTION_INJECTED (state transition)

The corrective message is correctly injected as a USER role message, which aligns with the pattern of system interventions that appear as user guidance to the model.

src/synthorg/engine/stagnation/detector.py (7)

1-24: LGTM!

Module docstring clearly describes the dual-signal detection approach. Imports are appropriate, and the logger is correctly initialized with get_logger(__name__) per coding guidelines.

27-103: LGTM!

Clean implementation of the StagnationDetector protocol:

check method signature matches the protocol definition

get_detector_type returns a unique identifier

Early returns for disabled state or insufficient tool-bearing turns

Properly delegates computation to focused helper functions

The dual-signal detection (ratio >= threshold OR cycle_length is not None) aligns with the PR objectives.

105-116: LGTM!

Correct windowing logic that filters to tool-bearing turns only. The negative slice [-window_size:] gracefully handles cases where fewer tool turns exist than the window size.

119-140: LGTM!

Correct repetition ratio computation. The formula (sum of c-1 for repeated) / total counts excess occurrences (duplicates beyond the first) as a fraction of total fingerprints. Empty window handling returns (0.0, Counter()) appropriately.

143-175: LGTM!

Sound cycle detection algorithm that identifies repeating A→B→A→B patterns by comparing tail segments with their predecessors. The iteration from cycle_len=2 to n//2 ensures the shortest cycle is found first. For sequences shorter than 4 turns, no cycle can be detected (correctly returns None).

178-213: LGTM!

Correct verdict selection logic:

INJECT_PROMPT when corrections_injected < max_corrections

TERMINATE when corrections exhausted

The repeated_tools extraction and sorting ensures deterministic logging and message generation. The INFO level for STAGNATION_DETECTED is appropriate as a state transition event (distinct from the termination action logged separately).

216-233: LGTM!

Clear, actionable corrective message that guides the agent to change its approach. The [SYSTEM INTERVENTION: ...] prefix clearly marks this as an automated intervention. The docstring correctly notes that repeated_tools is always non-empty when this function is called — stagnation detection requires at least one repeated fingerprint (via ratio or cycle signal).

docs/design/engine.md

src/synthorg/engine/plan_execute_loop.py

tests/unit/engine/test_plan_execute_loop.py

…rtion - Add stagnation_detector to PlanExecuteLoop and ReactLoop Args docstrings - Strengthen STAGNATION turn count assertion: verify both planning and tool-use turns are present (not just >= 1)

🤖 I have created a release *beep* *boop* --- ## [0.2.6](v0.2.5...v0.2.6) (2026-03-15) ### Features * add intra-loop stagnation detector ([#415](#415)) ([#458](#458)) ([8e9f34f](8e9f34f)) * add RFC 9457 structured error responses (Phase 1) ([#457](#457)) ([6612a99](6612a99)), closes [#419](#419) * implement AgentStateRepository for runtime state persistence ([#459](#459)) ([5009da7](5009da7)) * **site:** add SEO essentials, contact form, early-access banner ([#467](#467)) ([11b645e](11b645e)), closes [#466](#466) ### Bug Fixes * CLI improvements — config show, completion install, enhanced doctor, Sigstore verification ([#465](#465)) ([9e08cec](9e08cec)) * **site:** add reCAPTCHA v3, main landmark, and docs sitemap ([#469](#469)) ([fa6d35c](fa6d35c)) * use force-tag-creation instead of manual tag creation hack ([#462](#462)) ([2338004](2338004)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

Aureliolo temporarily deployed to ci March 15, 2026 18:32 — with GitHub Actions Inactive

Aureliolo temporarily deployed to cloudflare-preview March 15, 2026 18:33 — with GitHub Actions Inactive

gemini-code-assist bot reviewed Mar 15, 2026

View reviewed changes

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

Aureliolo added 5 commits March 15, 2026 20:09

fix: add stagnation to event module discovery test

fd290be

Add 'stagnation' to the expected domain modules set in test_all_domain_modules_discovered.

fix: resolve mypy attr-defined error in agent engine stagnation test

70d97c4

Cast _loop to ReactLoop via isinstance before accessing stagnation_detector property (ExecutionLoop protocol does not expose it).

coderabbitai bot requested changes Mar 15, 2026

View reviewed changes

docs/design/engine.md Show resolved Hide resolved

src/synthorg/engine/plan_execute_loop.py Show resolved Hide resolved

tests/unit/engine/test_plan_execute_loop.py Outdated Show resolved Hide resolved

fix: add stagnation_detector to loop docstrings, strengthen turn asse…

f24b1cf

…rtion - Add stagnation_detector to PlanExecuteLoop and ReactLoop Args docstrings - Strengthen STAGNATION turn count assertion: verify both planning and tool-use turns are present (not just >= 1)

Aureliolo force-pushed the feat/stagnation-detector branch from d6f1137 to f24b1cf Compare March 15, 2026 19:23

Aureliolo temporarily deployed to ci March 15, 2026 19:23 — with GitHub Actions Inactive

Aureliolo had a problem deploying to cloudflare-preview March 15, 2026 19:24 — with GitHub Actions Error

Aureliolo merged commit 8e9f34f into main Mar 15, 2026
27 of 29 checks passed

Aureliolo deleted the feat/stagnation-detector branch March 15, 2026 19:24

Aureliolo temporarily deployed to cloudflare-preview March 15, 2026 19:24 — with GitHub Actions Inactive

Aureliolo mentioned this pull request Mar 15, 2026

chore(main): release 0.2.6 #463

Merged

Aureliolo mentioned this pull request Mar 15, 2026

chore(main): release 0.2.0 #471

Closed

This was referenced Mar 17, 2026

feat(engine): implement context budget management in execution loops #520

Merged

feat(engine): implement execution loop auto-selection based on task complexity #567

Merged

feat(engine): implement Hybrid Plan + ReAct execution loop #582

Merged

Conversation

Aureliolo commented Mar 15, 2026

Summary

Test plan

Review coverage

Uh oh!

github-actions bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 15, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Mar 15, 2026 •

edited

Loading

coderabbitai bot commented Mar 15, 2026 •

edited

Loading

codecov bot commented Mar 15, 2026 •

edited

Loading