Skip to content

feat: add intra-loop stagnation detector (#415)#458

Merged
Aureliolo merged 6 commits intomainfrom
feat/stagnation-detector
Mar 15, 2026
Merged

feat: add intra-loop stagnation detector (#415)#458
Aureliolo merged 6 commits intomainfrom
feat/stagnation-detector

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Add stagnation detection that analyzes TurnRecord tool-call fingerprints across a sliding window, intervenes with corrective prompt injection, and terminates early with STAGNATION if correction fails
  • New StagnationDetector async protocol with ToolRepetitionDetector default implementation using dual-signal detection (repetition ratio + cycle detection)
  • StagnationConfig frozen model with configurable window_size, repetition_threshold, cycle_detection, max_corrections, min_tool_turns
  • STAGNATION termination reason + tool_call_fingerprints field on TurnRecord
  • Fingerprint computation: name:sha256(canonical_json)[:16], sorted per-turn
  • ReactLoop integration (loop-scoped corrections counter) + PlanExecuteLoop integration (per-step scoped)
  • Shared check_stagnation() helper in loop_helpers.py with proper error handling (non-critical — logged and skipped on failure)
  • AgentEngine wiring via stagnation_detector parameter
  • Checkpoint resume path preserves stagnation_detector (and approval_gate) via new read-only properties
  • Observability events: check_performed, detected, correction_injected, terminated
  • Design spec: docs/design/engine.md stagnation detection section
  • CLAUDE.md: package structure + event constants updated

Test plan

  • 66 stagnation-specific tests (models, detector, fingerprints, cycle detection, Hypothesis properties)
  • Extended loop_protocol, loop_helpers, react_loop, plan_execute_loop tests
  • Protocol conformance test (isinstance check)
  • Repetition ratio exact-value tests
  • Direct _detect_cycle coverage (6 cases)
  • PlanExecuteLoop step corrections counter increment test
  • Full suite: 8072 passed, 94.49% coverage
  • mypy strict: 0 errors
  • ruff lint + format: clean

Review coverage

Pre-reviewed by 9 agents, 14 findings addressed:

  • 2 CRITICAL (checkpoint resume dropping detector, unguarded check() call)
  • 4 MAJOR (code duplication, function length, docs, warning message)
  • 6 MEDIUM (deep-copy details, cycle_length constraint, cross-field validator, test gaps, corrective message edge case)
  • 2 MINOR (protocol conformance test, step corrections counter test)

Closes #415

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 15, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 15, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 53cc38dd-ed0a-4154-93c1-94f430f9255a

📥 Commits

Reviewing files that changed from the base of the PR and between d6f1137 and f24b1cf.

📒 Files selected for processing (26)
  • CLAUDE.md
  • docs/design/engine.md
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/stagnation/__init__.py
  • src/synthorg/engine/stagnation/detector.py
  • src/synthorg/engine/stagnation/models.py
  • src/synthorg/engine/stagnation/protocol.py
  • src/synthorg/observability/events/stagnation.py
  • tests/unit/engine/checkpoint/test_resume.py
  • tests/unit/engine/stagnation/__init__.py
  • tests/unit/engine/stagnation/test_detector.py
  • tests/unit/engine/stagnation/test_fingerprint.py
  • tests/unit/engine/stagnation/test_models.py
  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/engine/test_plan_execute_loop.py
  • tests/unit/engine/test_react_loop.py
  • tests/unit/observability/test_events.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Stagnation detection added to detect repetitive tool usage, inject corrective prompts, and terminate tasks when unresolved; configurable sensitivity, windowing, and max-corrections.
  • Documentation

    • Engine docs updated with stagnation detection design and integration guidance.
  • Observability

    • New stagnation-related event names and updated logging guidance to surface checks, detections, injections, and terminations.
  • Tests

    • Extensive unit and property tests for detection, fingerprinting, loop integration, and behavior coverage.

Walkthrough

Adds intra-loop stagnation detection: new stagnation package (protocol, models, ToolRepetitionDetector), integrates detector into ReactLoop and PlanExecuteLoop, wires detector through AgentEngine and checkpoint wrapping, augments loop records with tool_call_fingerprints, and adds tests and observability events for stagnation flows and corrective prompt injection/termination.

Changes

Cohort / File(s) Summary
Stagnation core
src/synthorg/engine/stagnation/__init__.py, src/synthorg/engine/stagnation/protocol.py, src/synthorg/engine/stagnation/models.py, src/synthorg/engine/stagnation/detector.py
New stagnation package: protocol, frozen models (config/result/verdict), and default ToolRepetitionDetector implementing windowed repetition and cycle detection, verdict construction, and corrective-message generation.
Loop integration
src/synthorg/engine/react_loop.py, src/synthorg/engine/plan_execute_loop.py, src/synthorg/engine/loop_helpers.py, src/synthorg/engine/loop_protocol.py
Add STAGNATION TerminationReason, add TurnRecord.tool_call_fingerprints, compute fingerprints, check_stagnation and handler logic, and wire per-loop stagnation_detector plus corrections_injected across iterations/steps.
Engine wiring & helpers
src/synthorg/engine/__init__.py, src/synthorg/engine/agent_engine.py, src/synthorg/engine/checkpoint/resume.py
Export stagnation types from engine package, add StagnationDetector parameter to AgentEngine and default loop construction, and preserve stagnation_detector when injecting checkpoint callbacks.
Observability
src/synthorg/observability/events/stagnation.py, src/synthorg/engine/loop_helpers.py
New stagnation event constants and event emission points for checks, detections, corrections injected, and termination.
Tests — unit & property
tests/unit/engine/stagnation/*, tests/unit/engine/test_loop_helpers.py, tests/unit/engine/test_plan_execute_loop.py, tests/unit/engine/test_react_loop.py, tests/unit/engine/test_agent_engine.py, tests/unit/observability/test_events.py, tests/unit/engine/checkpoint/test_resume.py
Comprehensive tests for detector logic, fingerprinting, models validation, property tests, loop integration, checkpoint preservation, and React/PlanExecute loop behaviors under NO_STAGNATION / INJECT_PROMPT / TERMINATE verdicts.
Manifests / metadata
manifest_file, pyproject.toml, requirements.txt, setup.py, Pipfile
Small manifest adjustments recorded in summary (lines changed entries).

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Loop as Execution Loop
participant Detector as StagnationDetector
participant Engine as TaskEngine / AgentEngine
participant Tools as Tool Invoker
Loop->>Tools: perform tool call(s) for turn
Tools-->>Loop: tool call results
Loop->>Loop: compute tool_call_fingerprints
Loop->>Detector: async check(turns window, corrections_injected)
Detector-->>Loop: StagnationResult (NO_STAGNATION | INJECT_PROMPT | TERMINATE)
alt NO_STAGNATION
Loop->>Engine: continue normal loop
else INJECT_PROMPT
Loop->>Engine: inject corrective user prompt into context
Loop->>Loop: increment corrections_injected
else TERMINATE
Loop->>Engine: return ExecutionResult with TerminationReason.STAGNATION and metadata
end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: add intra-loop stagnation detector (#415)' directly and clearly describes the main change: introducing stagnation detection functionality to execution loops.
Description check ✅ Passed The PR description provides detailed context about stagnation detection implementation, including the protocol design, configuration model, loop integrations, testing, and related changes—all relevant to the changeset.
Linked Issues check ✅ Passed The PR fully implements the requirements from issue #415: adds a StagnationDetector protocol with ToolRepetitionDetector implementation, dual-signal detection (repetition ratio + cycle detection), configurable StagnationConfig, STAGNATION termination reason, tool_call_fingerprints field, integration into both ReactLoop and PlanExecuteLoop, and corrective prompt injection with termination on failure.
Out of Scope Changes check ✅ Passed All changes are directly scoped to stagnation detection requirements: new modules for detector/protocol/models, integration into loops, TurnRecord extension, observability events, documentation, and comprehensive tests. No extraneous modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/stagnation-detector
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/stagnation-detector
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a crucial feature to enhance the robustness and efficiency of agent execution: intra-loop stagnation detection. By analyzing patterns in tool calls, the system can now identify when an agent is repeatedly performing the same actions without making progress. It then attempts to guide the agent with corrective prompts and, if unsuccessful, terminates the execution early with a specific STAGNATION reason, preventing wasted resources and improving overall agent reliability.

Highlights

  • Stagnation Detection Core Logic: Implemented a new mechanism to detect and intervene when agents get stuck in repetitive tool-calling loops, using TurnRecord tool-call fingerprints.
  • Flexible Protocol & Default Implementation: Introduced an async StagnationDetector protocol, with a default ToolRepetitionDetector that uses dual-signal analysis (repetition ratio and cycle detection).
  • Configurable Behavior: Added StagnationConfig to allow customization of detection parameters like window size, repetition threshold, and maximum corrective interventions.
  • Loop Integration & Termination: Integrated stagnation checks into ReactLoop and PlanExecuteLoop, introducing STAGNATION as a new TerminationReason for early exit.
  • Enhanced Observability: Included new observability events to track stagnation checks, detections, corrections, and terminations.
Changelog
  • CLAUDE.md
    • Updated the documentation to reflect the new stagnation detection feature and its associated event constants.
  • docs/design/engine.md
    • Documented the design and implementation details of the new stagnation detection system, including its protocol, default implementation, configuration, and intervention flow.
  • src/synthorg/engine/init.py
    • Exported the new StagnationConfig, StagnationDetector, StagnationResult, StagnationVerdict, and ToolRepetitionDetector classes.
  • src/synthorg/engine/agent_engine.py
    • Modified the AgentEngine to allow injection of a StagnationDetector and updated the default loop creation to include it, along with an updated warning message for external loop provision.
  • src/synthorg/engine/checkpoint/resume.py
    • Ensured that approval_gate and the newly added stagnation_detector are preserved when resuming ReactLoop and PlanExecuteLoop from a checkpoint.
  • src/synthorg/engine/loop_helpers.py
    • Added utility functions for computing tool call fingerprints and a shared check_stagnation helper to encapsulate stagnation detection logic, including prompt injection and termination.
  • src/synthorg/engine/loop_protocol.py
    • Extended the TerminationReason enum with STAGNATION and added a tool_call_fingerprints field to TurnRecord to store deterministic hashes of tool calls.
  • src/synthorg/engine/plan_execute_loop.py
    • Integrated the check_stagnation logic into the PlanExecuteLoop, ensuring stagnation detection is performed per step with step-scoped corrections.
  • src/synthorg/engine/react_loop.py
    • Integrated the check_stagnation logic into the ReactLoop, performing stagnation detection after each successful turn.
  • src/synthorg/engine/stagnation/init.py
    • Created a new package for stagnation detection, re-exporting its public API.
  • src/synthorg/engine/stagnation/detector.py
    • Implemented the ToolRepetitionDetector, which uses repetition ratio and cycle detection on tool call fingerprints to identify stagnation.
  • src/synthorg/engine/stagnation/models.py
    • Defined Pydantic models for StagnationVerdict, StagnationConfig, and StagnationResult to structure stagnation-related data.
  • src/synthorg/engine/stagnation/protocol.py
    • Defined the StagnationDetector asynchronous protocol for detecting intra-loop stagnation.
  • src/synthorg/observability/events/stagnation.py
    • Defined new constants for observability events related to stagnation detection.
  • tests/unit/engine/stagnation/test_detector.py
    • Added comprehensive unit tests for the ToolRepetitionDetector, covering various scenarios of repetition and cycle detection.
  • tests/unit/engine/stagnation/test_fingerprint.py
    • Added unit tests to verify the deterministic and order-independent computation of tool call fingerprints.
  • tests/unit/engine/stagnation/test_models.py
    • Added unit tests for the StagnationConfig and StagnationResult models, including validation rules.
  • tests/unit/engine/stagnation/test_properties.py
    • Added property-based tests using Hypothesis for the fingerprint computation and detector behavior.
  • tests/unit/engine/test_loop_helpers.py
    • Updated existing tests to account for the new tool_call_fingerprints field in TurnRecord and its handling.
  • tests/unit/engine/test_loop_protocol.py
    • Updated tests for TerminationReason and TurnRecord to reflect the addition of STAGNATION and tool_call_fingerprints.
  • tests/unit/engine/test_plan_execute_loop.py
    • Added integration tests to verify the PlanExecuteLoop's interaction with the new stagnation detector, including step-scoped behavior.
  • tests/unit/engine/test_react_loop.py
    • Added integration tests to verify the ReactLoop's interaction with the new stagnation detector, including prompt injection and termination.
  • tests/unit/observability/test_events.py
    • Updated the event discovery test to include the new stagnation domain module.
Activity
  • The author, Aureliolo, implemented this feature.
  • The pull request was pre-reviewed by 9 agents, and 14 findings were addressed, including 2 critical, 4 major, 6 medium, and 2 minor issues, indicating a thorough review process and significant iteration before the PR was opened.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 15, 2026 18:33 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust stagnation detection mechanism to prevent agents from getting stuck in unproductive loops. The implementation includes a new StagnationDetector protocol, a ToolRepetitionDetector that uses fingerprinting and cycle detection, and configuration models. The feature is well-integrated into the existing ReactLoop and PlanExecuteLoop, with special care for per-step detection in the latter. The changes are comprehensive, covering core logic, documentation, and extensive tests, including property-based tests. My review found one area for improvement in the implementation of the corrective message generation, where a branch of code appears to be unreachable, which can be simplified.

Comment on lines +214 to +237
def _build_corrective_message(repeated_tools: list[str]) -> str:
"""Build a corrective user-role message for prompt injection.

Args:
repeated_tools: Sorted list of repeated tool fingerprints.

Returns:
A corrective message string.
"""
if repeated_tools:
tool_list = ", ".join(repeated_tools)
return (
"[SYSTEM INTERVENTION: Stagnation detected — your recent tool "
"calls show a repeating pattern without progress. The following "
"tools have been called with identical arguments multiple "
f"times: {tool_list}. Try a different approach: modify your "
"arguments, use different tools, or reconsider your strategy.]"
)
return (
"[SYSTEM INTERVENTION: Stagnation detected — your recent tool "
"calls show a repeating cycle pattern without progress. Try a "
"different approach: modify your arguments, use different tools, "
"or reconsider your strategy.]"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The else block in this function appears to be unreachable. Stagnation is detected if either the tool repetition ratio is high or a cycle is detected. In both of these cases, there will be repeated tool fingerprints, which means the repeated_tools list will not be empty. This makes the if repeated_tools: check always true and the else block dead code.

You can simplify this function by removing the conditional logic and the unreachable branch.

def _build_corrective_message(repeated_tools: list[str]) -> str:
    """Build a corrective user-role message for prompt injection.

    Args:
        repeated_tools: Sorted list of repeated tool fingerprints.

    Returns:
        A corrective message string.
    """
    tool_list = ", ".join(repeated_tools)
    return (
        "[SYSTEM INTERVENTION: Stagnation detected — your recent tool "
        "calls show a repeating pattern without progress. The following "
        "tools have been called with identical arguments multiple "
        f"times: {tool_list}. Try a different approach: modify your "
        "arguments, use different tools, or reconsider your strategy.]"
    )

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/synthorg/engine/react_loop.py (1)

101-189: 🛠️ Refactor suggestion | 🟠 Major

Keep execute() under the 50-line cap.

The stagnation handoff makes the main loop denser and harder to scan. Extract one iteration or the post-turn stagnation handling into a small helper so execute() stays orchestration-only.

As per coding guidelines, Functions must be < 50 lines; files must be < 800 lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/react_loop.py` around lines 101 - 189, The execute()
method exceeds the 50-line guideline because the post-turn stagnation handling
is inlined; extract that block into a helper (e.g.,
_handle_post_turn_stagnation) that accepts (ctx, turns, corrections_injected,
execution_id) and returns either an ExecutionResult or (ctx,
corrections_injected), then replace the inlined stagnation logic in execute()
with a single await call to _handle_post_turn_stagnation and handle its return
the same way as the current code; touch references to check_stagnation and the
variables ctx, corrections_injected, and turns so the new helper calls
check_stagnation and preserves the existing control flow.
src/synthorg/engine/agent_engine.py (1)

151-189: 🛠️ Refactor suggestion | 🟠 Major

Extract the new loop-wiring branch out of AgentEngine.__init__.

This constructor is already far beyond the repo's 50-line limit, and the new approval/stagnation branch adds one more responsibility to a very busy setup path. Pull external-loop validation and default-loop construction into a helper so the constructor is easier to audit.

As per coding guidelines, Functions must be < 50 lines; files must be < 800 lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 151 - 189, The constructor
AgentEngine.__init__ is doing external-loop validation and default-loop
construction inline; extract that branch into a new helper (e.g.,
_configure_execution_loop or _setup_loop_wiring) so __init__ stays small. Move
the logic that reads execution_loop, checks self._approval_gate and
self._stagnation_detector to emit APPROVAL_GATE_LOOP_WIRING_WARNING, and assigns
self._loop (using self._make_default_loop() when execution_loop is None) into
the helper; call this helper from __init__ passing the execution_loop argument
and ensure it returns/sets self._loop and retains existing behavior around
warning message and wiring semantics.
docs/design/engine.md (1)

289-290: ⚠️ Potential issue | 🟡 Minor

Update TerminationReason enum documentation to include STAGNATION.

The enum listing at lines 289-290 does not include the new STAGNATION termination reason that this PR introduces. The documentation should be updated to reflect the complete set of possible termination reasons.

 `TerminationReason`
-:   Enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `SHUTDOWN`, `ERROR`,
-    `PARKED`.  `max_turns` defaults to 20.
+:   Enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `SHUTDOWN`, `ERROR`,
+    `PARKED`, `STAGNATION`.  `max_turns` defaults to 20.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/design/engine.md` around lines 289 - 290, The TerminationReason enum
documentation is missing the new STAGNATION value; update the enum listing in
docs/design/engine.md so the list of termination reasons under TerminationReason
includes `STAGNATION` alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`,
`SHUTDOWN`, `ERROR`, and `PARKED` and ensure any explanatory note (e.g., about
`max_turns` defaulting to 20) remains intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/loop_protocol.py`:
- Around line 65-68: Change the type of tool_call_fingerprints from tuple[str,
...] to tuple[NotBlankStr, ...] to enforce non-blank identifiers; update the
Field declaration for tool_call_fingerprints accordingly and ensure NotBlankStr
is imported from core.types at the top of the module so the Field uses
tuple[NotBlankStr, ...] as its type annotation and validation.

In `@tests/unit/engine/stagnation/test_models.py`:
- Around line 60-106: The tests around StagnationConfig (e.g.,
test_window_size_lower_bound, test_window_size_upper_bound,
test_repetition_threshold_lower_bound, test_repetition_threshold_upper_bound,
test_repetition_threshold_negative_rejected,
test_repetition_threshold_above_one_rejected, test_max_corrections_zero,
test_max_corrections_negative_rejected, test_min_tool_turns_lower_bound,
test_min_tool_turns_zero_rejected,
test_min_tool_turns_exceeds_window_size_rejected) are repetitive; consolidate
them using pytest.mark.parametrize to cover similar boundary inputs and expected
ValidationError or accepted values in a single parametrized test per field
(window_size, repetition_threshold, max_corrections, min_tool_turns). Update
tests to iterate tuples of (input_kwargs, should_raise, expected_value_or_match)
and assert by either expecting ValidationError with match or validating the
created StagnationConfig attribute, keeping references to the StagnationConfig
constructor and the specific test names to locate and replace the current
individual tests (also apply the same refactor pattern to the similar tests at
the other section noted around lines 176-195).

In `@tests/unit/engine/test_plan_execute_loop.py`:
- Around line 852-879: The test's _FakeStagnationDetector currently ignores the
corrections_injected argument so the assertion that check_count == 2 can pass
even if PlanExecuteLoop never increments corrections; modify
_FakeStagnationDetector to record the corrections_injected values passed into
check (e.g., add a list attribute like recorded_corrections) and append the
corrections_injected each time check() is called, then in the test assert that
recorded_corrections == [0, 1] (repeat the same change for the other occurrence
around lines 1008-1050) to ensure PlanExecuteLoop actually forwards the
incrementing correction counts.
- Around line 966-1006: The test currently only asserts
detector.last_turns_count == 1 which could be satisfied by a turn from step 1;
update the test to assert that the detector actually received the step-2 turn by
capturing the inspected turn(s) from the fake detector (the
_FakeStagnationDetector instance stored in detector) and asserting properties
that identify the step-2 tool turn (e.g., check
detector.last_inspected_turns[0].tool_calls_made or the inspected turn's tool
call id equals "tc-1" produced by _tool_use_response("echo","tc-1")); modify or
extend _FakeStagnationDetector to record and expose the last inspected turns if
needed, then replace the existing last_turns_count assertion with an assertion
that the inspected turn's tool_calls_made (or tool call id) matches the expected
step-2 value.

In `@tests/unit/engine/test_react_loop.py`:
- Around line 1039-1066: The test's fake stagnation detector
(_FakeStagnationDetector) must record the corrections_injected values passed to
its async check(...) method so the test can assert they were incremented; add an
attribute (e.g., corrections_args: list[int]) to _FakeStagnationDetector, append
the corrections_injected parameter inside check(...), and update the test
assertions (the places around the existing check_count assertions) to assert
that detector.corrections_args == [0, 1] (instead of relying only on
check_count). Apply the same change to the other instance referenced around
lines 1160-1198.

---

Outside diff comments:
In `@docs/design/engine.md`:
- Around line 289-290: The TerminationReason enum documentation is missing the
new STAGNATION value; update the enum listing in docs/design/engine.md so the
list of termination reasons under TerminationReason includes `STAGNATION`
alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `SHUTDOWN`, `ERROR`, and
`PARKED` and ensure any explanatory note (e.g., about `max_turns` defaulting to
20) remains intact.

In `@src/synthorg/engine/agent_engine.py`:
- Around line 151-189: The constructor AgentEngine.__init__ is doing
external-loop validation and default-loop construction inline; extract that
branch into a new helper (e.g., _configure_execution_loop or _setup_loop_wiring)
so __init__ stays small. Move the logic that reads execution_loop, checks
self._approval_gate and self._stagnation_detector to emit
APPROVAL_GATE_LOOP_WIRING_WARNING, and assigns self._loop (using
self._make_default_loop() when execution_loop is None) into the helper; call
this helper from __init__ passing the execution_loop argument and ensure it
returns/sets self._loop and retains existing behavior around warning message and
wiring semantics.

In `@src/synthorg/engine/react_loop.py`:
- Around line 101-189: The execute() method exceeds the 50-line guideline
because the post-turn stagnation handling is inlined; extract that block into a
helper (e.g., _handle_post_turn_stagnation) that accepts (ctx, turns,
corrections_injected, execution_id) and returns either an ExecutionResult or
(ctx, corrections_injected), then replace the inlined stagnation logic in
execute() with a single await call to _handle_post_turn_stagnation and handle
its return the same way as the current code; touch references to
check_stagnation and the variables ctx, corrections_injected, and turns so the
new helper calls check_stagnation and preserves the existing control flow.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 42dde218-8444-4f6d-b77f-bae296aa6b20

📥 Commits

Reviewing files that changed from the base of the PR and between 2feed09 and 1fcd75e.

📒 Files selected for processing (24)
  • CLAUDE.md
  • docs/design/engine.md
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/stagnation/__init__.py
  • src/synthorg/engine/stagnation/detector.py
  • src/synthorg/engine/stagnation/models.py
  • src/synthorg/engine/stagnation/protocol.py
  • src/synthorg/observability/events/stagnation.py
  • tests/unit/engine/stagnation/__init__.py
  • tests/unit/engine/stagnation/test_detector.py
  • tests/unit/engine/stagnation/test_fingerprint.py
  • tests/unit/engine/stagnation/test_models.py
  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/engine/test_plan_execute_loop.py
  • tests/unit/engine/test_react_loop.py
  • tests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) instead of except (A, B): — PEP 758 syntax enforced by ruff on Python 3.14
All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Use @computed_field in Pydantic models for derived values instead of storing and validating redundant fields
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations (e.g., multiple tool invocations, parallel agent calls) instead of bare create_task
Line length: 88 characters (enforced by ruff)
Functions must be < 50 lines; files must be < 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate input at system boundaries (user input, external APIs, config files)
Create new objects instead of mutating existing ones (immutability principle)

Files:

  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/stagnation/__init__.py
  • tests/unit/observability/test_events.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/stagnation/protocol.py
  • tests/unit/engine/test_react_loop.py
  • src/synthorg/engine/stagnation/models.py
  • src/synthorg/engine/checkpoint/resume.py
  • tests/unit/engine/stagnation/test_fingerprint.py
  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/engine/test_loop_protocol.py
  • src/synthorg/engine/stagnation/detector.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/observability/events/stagnation.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/agent_engine.py
  • tests/unit/engine/stagnation/test_detector.py
  • tests/unit/engine/test_plan_execute_loop.py
  • tests/unit/engine/stagnation/test_models.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic must import from synthorg.observability import get_logger and define logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code — use the synthorg logger instead
Always use event name constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider); import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Always log with structured kwargs: logger.info(EVENT, key=value) — never use old-style formatting logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG level for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never implement retry logic in driver subclasses or calling code — all provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically
Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig; retryable errors are RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state
For dict/list fields in frozen Pydantic models, use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use Pydantic v2 conventions: BaseModel, model_validator, computed_field, ConfigDict

Files:

  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/stagnation/__init__.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/stagnation/protocol.py
  • src/synthorg/engine/stagnation/models.py
  • src/synthorg/engine/checkpoint/resume.py
  • src/synthorg/engine/stagnation/detector.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/observability/events/stagnation.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/engine/agent_engine.py
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

ALWAYS read the relevant design page before implementing any feature or planning any issue. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec

Files:

  • docs/design/engine.md
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

All docstrings in public APIs must be documented in Google style and reflected in docs/api/ auto-generated library reference (via mkdocstrings + Griffe AST-based parsing)

Files:

  • docs/design/engine.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow to categorize tests
Maintain 80% minimum code coverage (enforced in CI)
Each test must complete within 30 seconds (timeout enforcement)
Always include -n auto when running pytest via uv run python -m pytest — never run tests sequentially (pytest-xdist parallelism)
Prefer @pytest.mark.parametrize for testing similar cases
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names like example-provider, example-large-001, test-provider, test-small-001, or size aliases (large/medium/small)
Use Hypothesis for property-based testing with @given + @settings decorators; control profiles via HYPOTHESIS_PROFILE env var (ci for 200 examples, dev for 1000 examples)

Files:

  • tests/unit/observability/test_events.py
  • tests/unit/engine/test_react_loop.py
  • tests/unit/engine/stagnation/test_fingerprint.py
  • tests/unit/engine/stagnation/test_properties.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/engine/stagnation/test_detector.py
  • tests/unit/engine/test_plan_execute_loop.py
  • tests/unit/engine/stagnation/test_models.py
🧠 Learnings (16)
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • tests/unit/observability/test_events.py
  • CLAUDE.md
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/observability/events/stagnation.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • tests/unit/observability/test_events.py
  • CLAUDE.md
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/observability/events/stagnation.py
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state

Applied to files:

  • src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`

Applied to files:

  • src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

  • tests/unit/engine/stagnation/test_properties.py
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)

Applied to files:

  • tests/unit/engine/stagnation/test_properties.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
  • src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Always log with structured kwargs: `logger.info(EVENT, key=value)` — never use old-style formatting `logger.info("msg %s", val)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
  • src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Never use `import logging`, `logging.getLogger()`, or `print()` in application code — use the synthorg logger instead

Applied to files:

  • CLAUDE.md
🧬 Code graph analysis (14)
src/synthorg/engine/__init__.py (3)
src/synthorg/engine/stagnation/models.py (3)
  • StagnationConfig (23-81)
  • StagnationResult (84-137)
  • StagnationVerdict (15-20)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
src/synthorg/engine/stagnation/detector.py (1)
  • ToolRepetitionDetector (27-116)
tests/unit/engine/test_react_loop.py (4)
src/synthorg/engine/stagnation/detector.py (2)
  • get_detector_type (50-52)
  • check (54-103)
tests/unit/engine/test_loop_helpers.py (5)
  • _ctx_with_user_msg (93-97)
  • _stop_response (47-53)
  • execute (77-85)
  • _tool_use_response (56-66)
  • _make_invoker (88-90)
src/synthorg/engine/react_loop.py (3)
  • ReactLoop (59-327)
  • stagnation_detector (93-95)
  • execute (101-196)
src/synthorg/engine/stagnation/models.py (2)
  • StagnationResult (84-137)
  • StagnationVerdict (15-20)
src/synthorg/engine/checkpoint/resume.py (2)
src/synthorg/engine/react_loop.py (3)
  • ReactLoop (59-327)
  • approval_gate (88-90)
  • stagnation_detector (93-95)
src/synthorg/engine/plan_execute_loop.py (2)
  • approval_gate (115-117)
  • stagnation_detector (120-122)
tests/unit/engine/stagnation/test_fingerprint.py (1)
src/synthorg/engine/loop_helpers.py (1)
  • compute_fingerprints (478-505)
tests/unit/engine/stagnation/test_properties.py (4)
src/synthorg/engine/loop_helpers.py (1)
  • compute_fingerprints (478-505)
src/synthorg/engine/loop_protocol.py (1)
  • TurnRecord (40-81)
src/synthorg/engine/stagnation/detector.py (2)
  • ToolRepetitionDetector (27-116)
  • check (54-103)
src/synthorg/engine/stagnation/models.py (2)
  • StagnationConfig (23-81)
  • StagnationVerdict (15-20)
tests/unit/engine/test_loop_helpers.py (1)
src/synthorg/engine/loop_helpers.py (2)
  • make_turn_record (459-475)
  • clear_last_turn_tool_calls (424-437)
tests/unit/engine/test_loop_protocol.py (1)
src/synthorg/engine/loop_protocol.py (2)
  • TerminationReason (28-37)
  • TurnRecord (40-81)
src/synthorg/engine/stagnation/detector.py (3)
src/synthorg/engine/loop_protocol.py (1)
  • TurnRecord (40-81)
src/synthorg/engine/stagnation/models.py (3)
  • StagnationConfig (23-81)
  • StagnationResult (84-137)
  • StagnationVerdict (15-20)
src/synthorg/engine/stagnation/protocol.py (2)
  • get_detector_type (44-46)
  • check (26-42)
src/synthorg/engine/loop_helpers.py (4)
src/synthorg/engine/stagnation/models.py (1)
  • StagnationVerdict (15-20)
src/synthorg/engine/stagnation/protocol.py (2)
  • StagnationDetector (15-46)
  • check (26-42)
src/synthorg/engine/loop_protocol.py (1)
  • TurnRecord (40-81)
src/synthorg/engine/stagnation/detector.py (1)
  • check (54-103)
src/synthorg/engine/plan_execute_loop.py (4)
src/synthorg/engine/loop_helpers.py (1)
  • check_stagnation (526-600)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
src/synthorg/engine/stagnation/detector.py (1)
  • config (46-48)
src/synthorg/engine/loop_protocol.py (1)
  • ExecutionResult (84-145)
src/synthorg/engine/react_loop.py (4)
src/synthorg/engine/loop_helpers.py (1)
  • check_stagnation (526-600)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
src/synthorg/engine/plan_execute_loop.py (2)
  • stagnation_detector (120-122)
  • approval_gate (115-117)
src/synthorg/engine/loop_protocol.py (1)
  • ExecutionResult (84-145)
src/synthorg/engine/agent_engine.py (2)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
src/synthorg/engine/react_loop.py (3)
  • stagnation_detector (93-95)
  • ReactLoop (59-327)
  • approval_gate (88-90)
tests/unit/engine/stagnation/test_detector.py (3)
src/synthorg/engine/loop_protocol.py (1)
  • TurnRecord (40-81)
src/synthorg/engine/stagnation/detector.py (5)
  • ToolRepetitionDetector (27-116)
  • check (54-103)
  • get_detector_type (50-52)
  • config (46-48)
  • _detect_cycle (153-173)
src/synthorg/engine/stagnation/models.py (2)
  • StagnationConfig (23-81)
  • StagnationVerdict (15-20)
tests/unit/engine/stagnation/test_models.py (1)
src/synthorg/engine/stagnation/models.py (3)
  • StagnationConfig (23-81)
  • StagnationResult (84-137)
  • StagnationVerdict (15-20)
🔇 Additional comments (36)
src/synthorg/observability/events/stagnation.py (1)

5-8: Looks good — stagnation events are correctly centralized as constants.

src/synthorg/engine/loop_protocol.py (1)

36-36: TerminationReason.STAGNATION addition is correct and consistent with loop termination semantics.

tests/unit/observability/test_events.py (1)

217-217: Good update — domain discovery expectation now includes the new stagnation events module.

src/synthorg/engine/stagnation/__init__.py (1)

10-24: Public stagnation API re-exports are clean and well-scoped.

tests/unit/engine/test_loop_protocol.py (1)

34-37: Tests are correctly aligned with the new STAGNATION reason and fingerprint field behavior.

Also applies to: 80-93

CLAUDE.md (1)

122-122: Documentation updates correctly capture the new stagnation subsystem and observability events.

Also applies to: 190-190

tests/unit/engine/stagnation/test_models.py (1)

13-59: Great coverage depth for stagnation model invariants and validation paths.

Also applies to: 108-202

tests/unit/engine/stagnation/test_fingerprint.py (1)

13-101: Fingerprint tests are solid and cover determinism, canonicalization, and output format well.

src/synthorg/engine/stagnation/protocol.py (1)

14-46: LGTM — the protocol stays intentionally minimal.

check() and get_detector_type() are the only hooks the loops need, and keeping check() async leaves room for service-backed detectors without another API break.

src/synthorg/engine/checkpoint/resume.py (1)

115-127: Nice preservation of loop-scoped collaborators.

Rebuilding the loop with both approval_gate and stagnation_detector keeps resumed executions behaviorally aligned with the original loop instance.

tests/unit/engine/test_loop_helpers.py (1)

522-533: Good coverage around the fingerprint lifecycle.

These assertions protect both sides of the TurnRecord.tool_call_fingerprints contract: creation in make_turn_record() and cleanup in clear_last_turn_tool_calls().

Also applies to: 623-637

docs/design/engine.md (1)

498-563: Well-documented stagnation detection design.

The Stagnation Detection section comprehensively covers the protocol interface, default implementation, configuration options, intervention flow, and loop integration semantics. The distinction between loop-scoped (ReactLoop) and step-scoped (PlanExecuteLoop) correction counters is clearly documented.

src/synthorg/engine/plan_execute_loop.py (3)

96-107: LGTM: Constructor properly accepts and stores stagnation detector.

The constructor signature and storage pattern matches the ReactLoop implementation, maintaining consistency across execution loop implementations.


114-122: LGTM: Read-only properties for checkpoint resume support.

The approval_gate and stagnation_detector properties enable checkpoint resume to preserve these components, as specified in the PR objectives.


653-686: LGTM: Step-scoped stagnation detection correctly implemented.

The implementation properly:

  1. Initializes step_start_idx and step_corrections at step entry for step-scoped windowing
  2. Passes only current step's turns (turns[step_start_idx:]) to the detector
  3. Includes step_number for observability
  4. Handles all three return types from check_stagnation

This aligns with the design spec stating "corrections counter is step-scoped, window resets across step boundaries."

tests/unit/engine/stagnation/test_properties.py (3)

47-48: Good: Tool name strategy correctly filters whitespace-only strings.

The filter lambda s: s.strip() ensures generated tool names comply with NotBlankStr validation, preventing invalid test inputs.


55-89: LGTM: Fingerprint property tests.

The tests verify two key properties:

  1. Determinism: Same name + args always produces identical fingerprints
  2. Format: Fingerprints match the name:16-char-hex format specification

Using rpartition(":") correctly handles tool names containing colons.


94-116: No action needed. The async test methods correctly omit explicit @pytest.mark.asyncio decorators. The asyncio_mode = "auto" configuration in pyproject.toml (line 227) enables automatic async test discovery, which is the required pattern for this codebase per the coding guidelines.

			> Likely an incorrect or invalid review comment.
tests/unit/engine/stagnation/test_detector.py (5)

14-27: LGTM: Well-designed test helper.

The _turn helper correctly derives tool_calls_made from fingerprints and sets appropriate finish_reason based on tool presence. This ensures test data is internally consistent.


30-83: LGTM: Comprehensive NO_STAGNATION test coverage.

Tests cover all paths that should return NO_STAGNATION:

  • Empty turns
  • Unique fingerprints
  • Below min_tool_turns threshold
  • Turns without tool calls
  • Disabled detector
  • Below repetition threshold

85-138: LGTM: INJECT_PROMPT and TERMINATE verdict tests.

Tests verify:

  • High repetition triggers INJECT_PROMPT with corrective message
  • Cycle detection triggers INJECT_PROMPT with cycle_length
  • Exceeding max_corrections triggers TERMINATE
  • max_corrections=0 skips INJECT_PROMPT and goes directly to TERMINATE

228-232: LGTM: Protocol conformance test.

Verifying isinstance(detector, StagnationDetector) confirms the implementation satisfies the runtime-checkable protocol.


285-327: LGTM: Exact repetition ratio verification.

Testing precise ratio calculations (0.8, 0.5, 0.0) with pytest.approx ensures the algorithm correctness for edge cases.

src/synthorg/engine/stagnation/models.py (4)

15-20: LGTM: StagnationVerdict enum.

Clean StrEnum with clear semantics for the three possible outcomes.


23-81: LGTM: StagnationConfig model.

Well-designed frozen config model with:

  • Sensible defaults (window_size=5, repetition_threshold=0.6, max_corrections=1)
  • Appropriate constraints (ge/le bounds)
  • Cross-field validation ensuring min_tool_turns <= window_size

84-137: LGTM: StagnationResult model with proper validation.

The model correctly:

  • Deep-copies details dict at construction boundary (line 124-125)
  • Enforces corrective_message is required for INJECT_PROMPT and forbidden otherwise
  • Uses frozen config for immutability

Based on learnings: "frozen Pydantic models for config/identity" and "deep-copy at system boundaries."


140-142: LGTM: Reusable NO_STAGNATION_RESULT singleton.

Safe to reuse since StagnationResult is frozen. This avoids allocating new objects for the common no-stagnation case.

src/synthorg/engine/loop_helpers.py (4)

435-437: LGTM: Consistent clearing of tool call metadata.

Both tool_calls_made and tool_call_fingerprints are cleared together, maintaining consistency when shutdown fires before tool execution.


478-505: LGTM: Deterministic fingerprint computation.

The implementation correctly:

  • Canonicalizes JSON with sort_keys=True and compact separators
  • Uses default=str to handle non-JSON-serializable argument values
  • Truncates SHA-256 to 16 hex chars (sufficient for fingerprint uniqueness)
  • Returns sorted tuple for order-independent comparison across turns

526-599: LGTM: Well-structured stagnation check helper.

The implementation:

  • Gracefully handles detector failures (logs and continues execution)
  • Re-raises MemoryError and RecursionError per project conventions
  • Uses appropriate log levels (WARNING for termination, INFO for correction injection)
  • Attaches stagnation metadata to the result for observability

577-582: TerminationReason.STAGNATION is properly defined in the enum.

The enum value is correctly defined at line 36 of loop_protocol.py as STAGNATION = "stagnation".

src/synthorg/engine/stagnation/detector.py (5)

27-53: LGTM: ToolRepetitionDetector class structure.

The class correctly:

  • Defaults to StagnationConfig() when no config is provided
  • Exposes configuration via read-only property
  • Returns a meaningful detector type identifier

54-116: LGTM: Stagnation check logic.

The dual-signal detection approach is well-implemented:

  1. Early exit for disabled config or insufficient tool turns
  2. Repetition ratio computed across all fingerprints in window
  3. Cycle detection runs if enabled
  4. Either signal triggers stagnation detection

The window extraction correctly filters to tool-bearing turns and takes the most recent window_size entries.


119-138: LGTM: Correct repetition ratio computation.

The formula sum(c - 1 for c in counts.values() if c > 1) / total correctly counts excess occurrences as duplicates:

  • 5 identical → 4/5 = 0.8
  • A,A,B,B → 2/4 = 0.5
  • All unique → 0/n = 0.0

153-173: LGTM: Cycle detection algorithm.

The algorithm correctly identifies repeating patterns by comparing the last k turns with the preceding k turns, starting from k=2 to find the shortest cycle first.


176-237: LGTM: Result building and corrective messaging.

The helper functions correctly:

  • Choose INJECT_PROMPT vs TERMINATE based on corrections_injected < max_corrections
  • Build informative corrective messages with repeated tool names
  • Include repeated_tools in details for debugging
  • Log the STAGNATION_DETECTED event with all relevant context

Comment on lines +60 to +106
def test_window_size_lower_bound(self) -> None:
with pytest.raises(ValidationError, match="greater than or equal to 2"):
StagnationConfig(window_size=1)

def test_window_size_upper_bound(self) -> None:
with pytest.raises(ValidationError, match="less than or equal to 50"):
StagnationConfig(window_size=51)

def test_repetition_threshold_lower_bound(self) -> None:
config = StagnationConfig(repetition_threshold=0.0)
assert config.repetition_threshold == 0.0

def test_repetition_threshold_upper_bound(self) -> None:
config = StagnationConfig(repetition_threshold=1.0)
assert config.repetition_threshold == 1.0

def test_repetition_threshold_negative_rejected(self) -> None:
with pytest.raises(ValidationError):
StagnationConfig(repetition_threshold=-0.1)

def test_repetition_threshold_above_one_rejected(self) -> None:
with pytest.raises(ValidationError):
StagnationConfig(repetition_threshold=1.1)

def test_max_corrections_zero(self) -> None:
config = StagnationConfig(max_corrections=0)
assert config.max_corrections == 0

def test_max_corrections_negative_rejected(self) -> None:
with pytest.raises(ValidationError):
StagnationConfig(max_corrections=-1)

def test_min_tool_turns_lower_bound(self) -> None:
config = StagnationConfig(min_tool_turns=1)
assert config.min_tool_turns == 1

def test_min_tool_turns_zero_rejected(self) -> None:
with pytest.raises(ValidationError):
StagnationConfig(min_tool_turns=0)

def test_min_tool_turns_exceeds_window_size_rejected(self) -> None:
with pytest.raises(
ValidationError,
match=r"min_tool_turns.*exceeds.*window_size",
):
StagnationConfig(window_size=3, min_tool_turns=4)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consolidate repetitive boundary tests with @pytest.mark.parametrize.

These cases are strong but repetitive; parameterization would reduce duplication and simplify future additions.

As per coding guidelines: "Prefer @pytest.mark.parametrize for testing similar cases".

Also applies to: 176-195

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/stagnation/test_models.py` around lines 60 - 106, The tests
around StagnationConfig (e.g., test_window_size_lower_bound,
test_window_size_upper_bound, test_repetition_threshold_lower_bound,
test_repetition_threshold_upper_bound,
test_repetition_threshold_negative_rejected,
test_repetition_threshold_above_one_rejected, test_max_corrections_zero,
test_max_corrections_negative_rejected, test_min_tool_turns_lower_bound,
test_min_tool_turns_zero_rejected,
test_min_tool_turns_exceeds_window_size_rejected) are repetitive; consolidate
them using pytest.mark.parametrize to cover similar boundary inputs and expected
ValidationError or accepted values in a single parametrized test per field
(window_size, repetition_threshold, max_corrections, min_tool_turns). Update
tests to iterate tuples of (input_kwargs, should_raise, expected_value_or_match)
and assert by either expecting ValidationError with match or validating the
created StagnationConfig attribute, keeping references to the StagnationConfig
constructor and the specific test names to locate and replace the current
individual tests (also apply the same refactor pattern to the similar tests at
the other section noted around lines 176-195).

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 15, 2026

Codecov Report

❌ Patch coverage is 99.30556% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 93.72%. Comparing base (24a0d7a) to head (f24b1cf).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/engine/agent_engine.py 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #458      +/-   ##
==========================================
+ Coverage   93.67%   93.72%   +0.05%     
==========================================
  Files         469      474       +5     
  Lines       22219    22471     +252     
  Branches     2143     2166      +23     
==========================================
+ Hits        20814    21062     +248     
- Misses       1095     1098       +3     
- Partials      310      311       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Add stagnation detection that analyzes TurnRecord tool-call fingerprints
across a sliding window, intervenes with corrective prompt injection,
and terminates early with STAGNATION if correction fails.

- StagnationDetector protocol with async check() method
- ToolRepetitionDetector: dual-signal (repetition ratio + cycle detection)
- StagnationConfig: window_size, repetition_threshold, cycle_detection,
  max_corrections, min_tool_turns
- STAGNATION termination reason + tool_call_fingerprints on TurnRecord
- Fingerprint computation: name:sha256(canonical_json)[:16], sorted
- ReactLoop integration: loop-scoped corrections counter
- PlanExecuteLoop integration: per-step scoped detection
- AgentEngine wiring via stagnation_detector parameter
- Observability events: check_performed, detected, correction_injected,
  terminated
- Design spec: engine.md stagnation detection section
- 54 new tests (models, detector, fingerprints, Hypothesis properties)
- Extended loop_protocol, loop_helpers, react_loop, plan_execute_loop tests
Add 'stagnation' to the expected domain modules set in
test_all_domain_modules_discovered.
- Fix checkpoint resume dropping stagnation_detector and approval_gate
  (CRITICAL: make_loop_with_callback now forwards both via properties)
- Add error handling around stagnation_detector.check() calls
  (CRITICAL: wrap in except MemoryError, RecursionError / except Exception)
- Extract shared check_stagnation() helper to loop_helpers.py
  (removes duplicated logic from ReactLoop + PlanExecuteLoop)
- Refactor ToolRepetitionDetector.check() into smaller methods
  (_extract_window, _compute_repetition_ratio, _build_stagnation_result)
- Add approval_gate/stagnation_detector read-only properties to both loops
- Update agent_engine warning to mention stagnation_detector
- Deep-copy StagnationResult.details at construction (matches ExecutionResult)
- Add ge=2 constraint to cycle_length field
- Add cross-field validator: min_tool_turns <= window_size
- Handle empty repeated_tools in corrective message (cycle-only trigger)
- Update CLAUDE.md Package Structure with stagnation/
- Add _detect_cycle direct tests (6 cases: short/cycle2/cycle3/almost/shortest/empty)
- Add repetition ratio exact-value tests (3 cases)
- Add protocol conformance test (isinstance check)
- Add PlanExecuteLoop step corrections counter increment test

Pre-reviewed by 9 agents, 14 findings addressed
…and Gemini

Critical fixes:
- Fix incomplete turns in PlanExecuteLoop STAGNATION result (rebuild with full turns)
- Add direct unit tests for check_stagnation() (8 tests: verdicts, exception paths)
- Add checkpoint resume tests verifying stagnation_detector preservation
- Fix TerminationReason docs missing STAGNATION in enum list and post-execution transitions

Major fixes:
- Refactor check_stagnation into two functions (<50 lines each)
- Fix inaccurate repetition_threshold boundary semantics in docstring
- Fix repetition ratio description in design spec
- Add turn context to stagnation error log (consistent with peer functions)
- Use NotBlankStr for tool_call_fingerprints (per CLAUDE.md convention)
- Remove dead else branch in _build_corrective_message (Gemini finding)
- Add AgentEngine stagnation_detector wiring test
- Narrow exception log message to type-only (security hardening)

Medium fixes:
- Add tool_call_fingerprints to TurnRecord Attributes docstring
- Add Args section to check_stagnation docstring
- Fix _compute_repetition_ratio docstring precision
- Enhance fake detectors: track corrections_seen, use TurnRecord types
- Verify step-2 turn identity in step-scoped test
Cast _loop to ReactLoop via isinstance before accessing
stagnation_detector property (ExecutionLoop protocol does not
expose it).
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/engine/plan_execute_loop.py (1)

88-94: ⚠️ Potential issue | 🟡 Minor

Document stagnation_detector in the public API docs.

The class Args: block still stops at approval_gate, so the new constructor parameter is undocumented.

📝 Suggested docstring update
     Args:
         config: Loop configuration.  Defaults to ``PlanExecuteConfig()``.
         checkpoint_callback: Optional per-turn checkpoint callback.
         approval_gate: Optional gate that checks for pending escalations
             after tool execution and parks the agent when approval is
             required.  ``None`` disables approval checks.
+        stagnation_detector: Optional detector used to identify
+            repetitive tool-use loops within a step. ``None`` disables
+            stagnation checks.
As per coding guidelines: "Use Google-style docstrings on all public classes and functions. Ruff D rules enforce this."

Also applies to: 96-103

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/plan_execute_loop.py` around lines 88 - 94, The docstring
for the public class or function that defines the loop (inspect the
PlanExecuteLoop class / __init__ in plan_execute_loop.py) is missing
documentation for the new constructor parameter stagnation_detector; update the
Google-style Args: block to add an entry for stagnation_detector describing its
purpose, expected type (e.g., callable or StagnationDetector), behavior (how it
detects/handles stagnation and whether None disables it), and default value, and
mirror this addition where the Args block continues around lines 96–103 so the
public API docs include the new parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/engine.md`:
- Around line 468-472: The earlier “Transition sequences” summary table is out
of sync with the later text: add STAGNATION and PARKED to the table’s list of
termination reasons that leave the task in its current state (alongside the
existing MAX_TURNS / BUDGET entries) so the lifecycle outcomes match; update the
table rows/columns that mention termination reasons and their resulting task
state to include the STAGNATION and PARKED symbols and ensure the description
for those entries matches the wording used later in the document.

In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 675-690: The code currently reassigns the shared ctx when
check_stagnation() returns an updated AgentContext, which leaks step-scoped
corrective prompts into later steps; change the tuple unpack to capture the
returned context into a new local variable (e.g., step_ctx or ctx_for_check) and
only use that local context for handling this step's logic and step_corrections,
leaving the outer ctx untouched (keep using ctx elsewhere), and ensure
step_corrections is applied only to the current step flow; do not reassign ctx
from the result of check_stagnation() so corrective prompts remain step-scoped
(references: check_stagnation, ctx, step_corrections, stag_outcome,
ExecutionResult, model_copy).

In `@tests/unit/engine/test_plan_execute_loop.py`:
- Around line 924-927: The assertion is too weak and may pass if _execute_step()
returns only a step-local slice; change the test to assert that the full two
turns (planning and tool-use) are present by using a stricter check on
result.turns (e.g., assert len(result.turns) == 2) and/or assert that one of the
turns is the planning turn (inspect turn.role or turn.type) so the STAGNATION
path reconstructs the complete turn sequence rather than just a step slice.

---

Outside diff comments:
In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 88-94: The docstring for the public class or function that defines
the loop (inspect the PlanExecuteLoop class / __init__ in plan_execute_loop.py)
is missing documentation for the new constructor parameter stagnation_detector;
update the Google-style Args: block to add an entry for stagnation_detector
describing its purpose, expected type (e.g., callable or StagnationDetector),
behavior (how it detects/handles stagnation and whether None disables it), and
default value, and mirror this addition where the Args block continues around
lines 96–103 so the public API docs include the new parameter.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6425edb9-dec5-4fc9-a018-a17beef932c3

📥 Commits

Reviewing files that changed from the base of the PR and between 1fcd75e and d6f1137.

📒 Files selected for processing (11)
  • docs/design/engine.md
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/stagnation/detector.py
  • src/synthorg/engine/stagnation/models.py
  • tests/unit/engine/checkpoint/test_resume.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/engine/test_plan_execute_loop.py
  • tests/unit/engine/test_react_loop.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations in Python code—Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) in exception handlers—PEP 758 except syntax for Python 3.14. Ruff enforces this.
Add type hints to all public functions and classes. Use mypy strict mode.
Use Google-style docstrings on all public classes and functions. Ruff D rules enforce this.
Prefer immutability—create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
Handle errors explicitly—never silently swallow exceptions.
Use Line length of 88 characters (ruff enforced).

Files:

  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_react_loop.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/loop_helpers.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/engine/checkpoint/test_resume.py
  • src/synthorg/engine/stagnation/models.py
  • tests/unit/engine/test_plan_execute_loop.py
  • src/synthorg/engine/stagnation/detector.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow for test organization.
Use asyncio_mode = "auto" in pytest configuration—no manual @pytest.mark.asyncio needed on async tests.
Prefer @pytest.mark.parametrize for testing similar cases.
Use Hypothesis for property-based testing with @given + @settings. Run dev profile with HYPOTHESIS_PROFILE=dev for 1000 examples.
Never skip, dismiss, or ignore flaky tests—always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic.
Do NOT use vendor names (Anthropic, OpenAI, Claude, GPT) in tests. Use test-provider, test-small-001, etc.

Files:

  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_react_loop.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/engine/checkpoint/test_resume.py
  • tests/unit/engine/test_plan_execute_loop.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with adopted conventions: use @computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Use structured concurrency over bare create_task.
Keep functions under 50 lines and files under 800 lines.
Validate at system boundaries (user input, external APIs, config files).
Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__). Never use import logging / logging.getLogger() / print() in application code.
Always use logger as the variable name for the module logger (not _logger, not log).
Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events. import EVENT_CONSTANT.
Log structured data with logger.info(EVENT, key=value)—never use logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG level logging is for object creation, internal flow, and entry/exit of key functions.
Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig.
Retryable errors are RateLimitError, P...

Files:

  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/stagnation/models.py
  • src/synthorg/engine/stagnation/detector.py
src/synthorg/{engine,providers}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.

Files:

  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/stagnation/models.py
  • src/synthorg/engine/stagnation/detector.py
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Markdown documentation files must follow Zensical build conventions with mkdocs.yml at repo root.

Files:

  • docs/design/engine.md
🧠 Learnings (11)
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/api/**/*.py : Authentication uses JWT + API key. Approval gate integration for high-risk operations.

Applied to files:

  • src/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

  • src/synthorg/engine/loop_protocol.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/engine/loop_protocol.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.

Applied to files:

  • src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.

Applied to files:

  • src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

  • src/synthorg/engine/stagnation/models.py
🧬 Code graph analysis (8)
tests/unit/engine/test_agent_engine.py (1)
src/synthorg/engine/react_loop.py (1)
  • ReactLoop (59-327)
tests/unit/engine/test_react_loop.py (3)
src/synthorg/engine/loop_protocol.py (3)
  • TerminationReason (28-37)
  • TurnRecord (40-83)
  • execute (166-192)
src/synthorg/engine/react_loop.py (3)
  • ReactLoop (59-327)
  • stagnation_detector (93-95)
  • execute (101-196)
src/synthorg/engine/stagnation/models.py (2)
  • StagnationResult (87-140)
  • StagnationVerdict (15-20)
src/synthorg/engine/plan_execute_loop.py (5)
src/synthorg/engine/loop_helpers.py (1)
  • check_stagnation (526-587)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
src/synthorg/engine/react_loop.py (1)
  • stagnation_detector (93-95)
src/synthorg/engine/stagnation/detector.py (1)
  • config (46-48)
src/synthorg/engine/loop_protocol.py (1)
  • ExecutionResult (86-147)
src/synthorg/engine/loop_helpers.py (3)
src/synthorg/engine/stagnation/models.py (2)
  • StagnationResult (87-140)
  • StagnationVerdict (15-20)
src/synthorg/engine/stagnation/protocol.py (2)
  • StagnationDetector (15-46)
  • check (26-42)
src/synthorg/engine/loop_protocol.py (2)
  • TurnRecord (40-83)
  • ExecutionResult (86-147)
tests/unit/engine/test_loop_helpers.py (11)
src/synthorg/engine/loop_helpers.py (1)
  • check_stagnation (526-587)
tests/unit/engine/test_cost_recording.py (6)
  • record (75-80)
  • record (179-180)
  • record (195-196)
  • _turn (21-36)
  • _result (39-48)
  • test_memory_error_propagates (175-189)
tests/unit/engine/test_plan_execute_loop.py (3)
  • _stop_response (90-96)
  • get_detector_type (868-869)
  • check (871-883)
tests/unit/engine/test_react_loop.py (3)
  • _stop_response (41-47)
  • get_detector_type (1055-1056)
  • check (1058-1068)
tests/unit/engine/stagnation/test_detector.py (1)
  • _turn (14-27)
tests/unit/engine/stagnation/test_properties.py (1)
  • _turn (15-26)
src/synthorg/engine/loop_protocol.py (3)
  • TurnRecord (40-83)
  • ExecutionResult (86-147)
  • TerminationReason (28-37)
tests/unit/engine/conftest.py (1)
  • engine (449-460)
src/synthorg/engine/stagnation/detector.py (2)
  • get_detector_type (50-52)
  • check (54-103)
src/synthorg/engine/stagnation/models.py (2)
  • StagnationResult (87-140)
  • StagnationVerdict (15-20)
tests/unit/engine/test_agent_engine.py (1)
  • test_memory_error_propagates (1282-1303)
tests/unit/engine/checkpoint/test_resume.py (1)
src/synthorg/engine/plan_execute_loop.py (1)
  • stagnation_detector (120-122)
tests/unit/engine/test_plan_execute_loop.py (4)
src/synthorg/engine/loop_protocol.py (3)
  • TerminationReason (28-37)
  • TurnRecord (40-83)
  • execute (166-192)
src/synthorg/engine/stagnation/detector.py (2)
  • get_detector_type (50-52)
  • check (54-103)
src/synthorg/engine/stagnation/models.py (2)
  • StagnationResult (87-140)
  • StagnationVerdict (15-20)
src/synthorg/engine/plan_execute_loop.py (3)
  • PlanExecuteLoop (82-896)
  • stagnation_detector (120-122)
  • execute (128-206)
src/synthorg/engine/stagnation/detector.py (3)
src/synthorg/engine/loop_protocol.py (1)
  • TurnRecord (40-83)
src/synthorg/engine/stagnation/models.py (3)
  • StagnationConfig (23-84)
  • StagnationResult (87-140)
  • StagnationVerdict (15-20)
src/synthorg/engine/stagnation/protocol.py (2)
  • get_detector_type (44-46)
  • check (26-42)
🔇 Additional comments (19)
src/synthorg/engine/stagnation/models.py (5)

1-12: LGTM!

Module docstring is clear and imports are appropriate. No usage of from __future__ import annotations as per coding guidelines. Using Self from typing and Pydantic v2 constructs correctly.


15-21: LGTM!

Clean enum definition using StrEnum with clear semantic values. The three verdicts (NO_STAGNATION, INJECT_PROMPT, TERMINATE) align well with the stagnation detection flow described in PR objectives.


23-84: LGTM!

Well-designed frozen configuration model with sensible defaults and constraints. The _validate_min_within_window validator correctly ensures min_tool_turns <= window_size with a clear error message explaining why the configuration is invalid.


87-140: LGTM!

Excellent implementation following the coding guidelines. The __init__ override correctly deep-copies the details dict at the construction boundary per the guideline: "copy.deepcopy() at system boundaries". The _validate_corrective_message validator ensures invariant consistency between verdict and corrective_message. Based on learnings: "For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries."


143-146: LGTM!

Good optimization pattern — a reusable frozen instance for the common no-stagnation case avoids repeated allocations.

src/synthorg/engine/loop_helpers.py (7)

8-9: LGTM!

Standard library imports for fingerprint computation (hashlib for SHA-256, json for canonical serialization).


23-26: LGTM!

Event constants correctly imported from the domain-specific synthorg.observability.events.stagnation module per coding guidelines.


424-437: LGTM!

Correctly extended to clear both tool_calls_made and tool_call_fingerprints together. Since fingerprints are derived from tool calls, clearing one without the other would create an inconsistent state.


478-505: LGTM!

Well-designed fingerprinting function with deterministic properties:

  • sort_keys=True ensures consistent JSON regardless of dict key order
  • Compact separators avoid whitespace variance
  • Sorted output tuple ensures same tool calls produce identical fingerprints regardless of call order
  • 16 hex chars (64 bits) provides sufficient collision resistance for this use case

The default=str fallback is defensive for non-JSON-serializable types. This is acceptable given tool arguments should typically be JSON-serializable, but be aware this could produce different fingerprints for semantically equivalent objects with non-deterministic __str__ representations.


459-475: LGTM!

Clean integration of fingerprint computation into turn record creation. The tool_call_fingerprints field is correctly populated from the response's tool calls.


526-587: LGTM!

Well-designed advisory stagnation check with appropriate error handling:

  • Correctly returns None on detector absence or failure (non-blocking)
  • Properly logs errors with context before continuing
  • Clear separation between detection (check) and verdict handling
  • Uses PEP 758 exception syntax correctly (except MemoryError, RecursionError:)

The advisory design ensures stagnation detection never interrupts an otherwise-healthy loop, as documented in the PR objectives.


590-645: LGTM!

Clean verdict dispatch with appropriate logging levels:

  • WARNING for STAGNATION_TERMINATED (problematic termination)
  • INFO for STAGNATION_CORRECTION_INJECTED (state transition)

The corrective message is correctly injected as a USER role message, which aligns with the pattern of system interventions that appear as user guidance to the model.

src/synthorg/engine/stagnation/detector.py (7)

1-24: LGTM!

Module docstring clearly describes the dual-signal detection approach. Imports are appropriate, and the logger is correctly initialized with get_logger(__name__) per coding guidelines.


27-103: LGTM!

Clean implementation of the StagnationDetector protocol:

  • check method signature matches the protocol definition
  • get_detector_type returns a unique identifier
  • Early returns for disabled state or insufficient tool-bearing turns
  • Properly delegates computation to focused helper functions

The dual-signal detection (ratio >= threshold OR cycle_length is not None) aligns with the PR objectives.


105-116: LGTM!

Correct windowing logic that filters to tool-bearing turns only. The negative slice [-window_size:] gracefully handles cases where fewer tool turns exist than the window size.


119-140: LGTM!

Correct repetition ratio computation. The formula (sum of c-1 for repeated) / total counts excess occurrences (duplicates beyond the first) as a fraction of total fingerprints. Empty window handling returns (0.0, Counter()) appropriately.


143-175: LGTM!

Sound cycle detection algorithm that identifies repeating A→B→A→B patterns by comparing tail segments with their predecessors. The iteration from cycle_len=2 to n//2 ensures the shortest cycle is found first. For sequences shorter than 4 turns, no cycle can be detected (correctly returns None).


178-213: LGTM!

Correct verdict selection logic:

  • INJECT_PROMPT when corrections_injected < max_corrections
  • TERMINATE when corrections exhausted

The repeated_tools extraction and sorting ensures deterministic logging and message generation. The INFO level for STAGNATION_DETECTED is appropriate as a state transition event (distinct from the termination action logged separately).


216-233: LGTM!

Clear, actionable corrective message that guides the agent to change its approach. The [SYSTEM INTERVENTION: ...] prefix clearly marks this as an automated intervention. The docstring correctly notes that repeated_tools is always non-empty when this function is called — stagnation detection requires at least one repeated fingerprint (via ratio or cycle signal).

…rtion

- Add stagnation_detector to PlanExecuteLoop and ReactLoop Args docstrings
- Strengthen STAGNATION turn count assertion: verify both planning and
  tool-use turns are present (not just >= 1)
@Aureliolo Aureliolo force-pushed the feat/stagnation-detector branch from d6f1137 to f24b1cf Compare March 15, 2026 19:23
@Aureliolo Aureliolo merged commit 8e9f34f into main Mar 15, 2026
27 of 29 checks passed
@Aureliolo Aureliolo deleted the feat/stagnation-detector branch March 15, 2026 19:24
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 15, 2026 19:24 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Mar 15, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.2.6](v0.2.5...v0.2.6)
(2026-03-15)


### Features

* add intra-loop stagnation detector
([#415](#415))
([#458](#458))
([8e9f34f](8e9f34f))
* add RFC 9457 structured error responses (Phase 1)
([#457](#457))
([6612a99](6612a99)),
closes [#419](#419)
* implement AgentStateRepository for runtime state persistence
([#459](#459))
([5009da7](5009da7))
* **site:** add SEO essentials, contact form, early-access banner
([#467](#467))
([11b645e](11b645e)),
closes [#466](#466)


### Bug Fixes

* CLI improvements — config show, completion install, enhanced doctor,
Sigstore verification
([#465](#465))
([9e08cec](9e08cec))
* **site:** add reCAPTCHA v3, main landmark, and docs sitemap
([#469](#469))
([fa6d35c](fa6d35c))
* use force-tag-creation instead of manual tag creation hack
([#462](#462))
([2338004](2338004))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add intra-loop stagnation detector to execution loops

1 participant