feat: add intra-loop stagnation detector (#415)#458
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (26)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds intra-loop stagnation detection: new stagnation package (protocol, models, ToolRepetitionDetector), integrates detector into ReactLoop and PlanExecuteLoop, wires detector through AgentEngine and checkpoint wrapping, augments loop records with tool_call_fingerprints, and adds tests and observability events for stagnation flows and corrective prompt injection/termination. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
📝 Coding Plan
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a crucial feature to enhance the robustness and efficiency of agent execution: intra-loop stagnation detection. By analyzing patterns in tool calls, the system can now identify when an agent is repeatedly performing the same actions without making progress. It then attempts to guide the agent with corrective prompts and, if unsuccessful, terminates the execution early with a specific STAGNATION reason, preventing wasted resources and improving overall agent reliability. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a robust stagnation detection mechanism to prevent agents from getting stuck in unproductive loops. The implementation includes a new StagnationDetector protocol, a ToolRepetitionDetector that uses fingerprinting and cycle detection, and configuration models. The feature is well-integrated into the existing ReactLoop and PlanExecuteLoop, with special care for per-step detection in the latter. The changes are comprehensive, covering core logic, documentation, and extensive tests, including property-based tests. My review found one area for improvement in the implementation of the corrective message generation, where a branch of code appears to be unreachable, which can be simplified.
| def _build_corrective_message(repeated_tools: list[str]) -> str: | ||
| """Build a corrective user-role message for prompt injection. | ||
|
|
||
| Args: | ||
| repeated_tools: Sorted list of repeated tool fingerprints. | ||
|
|
||
| Returns: | ||
| A corrective message string. | ||
| """ | ||
| if repeated_tools: | ||
| tool_list = ", ".join(repeated_tools) | ||
| return ( | ||
| "[SYSTEM INTERVENTION: Stagnation detected — your recent tool " | ||
| "calls show a repeating pattern without progress. The following " | ||
| "tools have been called with identical arguments multiple " | ||
| f"times: {tool_list}. Try a different approach: modify your " | ||
| "arguments, use different tools, or reconsider your strategy.]" | ||
| ) | ||
| return ( | ||
| "[SYSTEM INTERVENTION: Stagnation detected — your recent tool " | ||
| "calls show a repeating cycle pattern without progress. Try a " | ||
| "different approach: modify your arguments, use different tools, " | ||
| "or reconsider your strategy.]" | ||
| ) |
There was a problem hiding this comment.
The else block in this function appears to be unreachable. Stagnation is detected if either the tool repetition ratio is high or a cycle is detected. In both of these cases, there will be repeated tool fingerprints, which means the repeated_tools list will not be empty. This makes the if repeated_tools: check always true and the else block dead code.
You can simplify this function by removing the conditional logic and the unreachable branch.
def _build_corrective_message(repeated_tools: list[str]) -> str:
"""Build a corrective user-role message for prompt injection.
Args:
repeated_tools: Sorted list of repeated tool fingerprints.
Returns:
A corrective message string.
"""
tool_list = ", ".join(repeated_tools)
return (
"[SYSTEM INTERVENTION: Stagnation detected — your recent tool "
"calls show a repeating pattern without progress. The following "
"tools have been called with identical arguments multiple "
f"times: {tool_list}. Try a different approach: modify your "
"arguments, use different tools, or reconsider your strategy.]"
)There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/synthorg/engine/react_loop.py (1)
101-189: 🛠️ Refactor suggestion | 🟠 MajorKeep
execute()under the 50-line cap.The stagnation handoff makes the main loop denser and harder to scan. Extract one iteration or the post-turn stagnation handling into a small helper so
execute()stays orchestration-only.As per coding guidelines,
Functions must be < 50 lines; files must be < 800 lines.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/react_loop.py` around lines 101 - 189, The execute() method exceeds the 50-line guideline because the post-turn stagnation handling is inlined; extract that block into a helper (e.g., _handle_post_turn_stagnation) that accepts (ctx, turns, corrections_injected, execution_id) and returns either an ExecutionResult or (ctx, corrections_injected), then replace the inlined stagnation logic in execute() with a single await call to _handle_post_turn_stagnation and handle its return the same way as the current code; touch references to check_stagnation and the variables ctx, corrections_injected, and turns so the new helper calls check_stagnation and preserves the existing control flow.src/synthorg/engine/agent_engine.py (1)
151-189: 🛠️ Refactor suggestion | 🟠 MajorExtract the new loop-wiring branch out of
AgentEngine.__init__.This constructor is already far beyond the repo's 50-line limit, and the new approval/stagnation branch adds one more responsibility to a very busy setup path. Pull external-loop validation and default-loop construction into a helper so the constructor is easier to audit.
As per coding guidelines,
Functions must be < 50 lines; files must be < 800 lines.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/agent_engine.py` around lines 151 - 189, The constructor AgentEngine.__init__ is doing external-loop validation and default-loop construction inline; extract that branch into a new helper (e.g., _configure_execution_loop or _setup_loop_wiring) so __init__ stays small. Move the logic that reads execution_loop, checks self._approval_gate and self._stagnation_detector to emit APPROVAL_GATE_LOOP_WIRING_WARNING, and assigns self._loop (using self._make_default_loop() when execution_loop is None) into the helper; call this helper from __init__ passing the execution_loop argument and ensure it returns/sets self._loop and retains existing behavior around warning message and wiring semantics.docs/design/engine.md (1)
289-290:⚠️ Potential issue | 🟡 MinorUpdate
TerminationReasonenum documentation to includeSTAGNATION.The enum listing at lines 289-290 does not include the new
STAGNATIONtermination reason that this PR introduces. The documentation should be updated to reflect the complete set of possible termination reasons.`TerminationReason` -: Enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `SHUTDOWN`, `ERROR`, - `PARKED`. `max_turns` defaults to 20. +: Enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `SHUTDOWN`, `ERROR`, + `PARKED`, `STAGNATION`. `max_turns` defaults to 20.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/design/engine.md` around lines 289 - 290, The TerminationReason enum documentation is missing the new STAGNATION value; update the enum listing in docs/design/engine.md so the list of termination reasons under TerminationReason includes `STAGNATION` alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `SHUTDOWN`, `ERROR`, and `PARKED` and ensure any explanatory note (e.g., about `max_turns` defaulting to 20) remains intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/loop_protocol.py`:
- Around line 65-68: Change the type of tool_call_fingerprints from tuple[str,
...] to tuple[NotBlankStr, ...] to enforce non-blank identifiers; update the
Field declaration for tool_call_fingerprints accordingly and ensure NotBlankStr
is imported from core.types at the top of the module so the Field uses
tuple[NotBlankStr, ...] as its type annotation and validation.
In `@tests/unit/engine/stagnation/test_models.py`:
- Around line 60-106: The tests around StagnationConfig (e.g.,
test_window_size_lower_bound, test_window_size_upper_bound,
test_repetition_threshold_lower_bound, test_repetition_threshold_upper_bound,
test_repetition_threshold_negative_rejected,
test_repetition_threshold_above_one_rejected, test_max_corrections_zero,
test_max_corrections_negative_rejected, test_min_tool_turns_lower_bound,
test_min_tool_turns_zero_rejected,
test_min_tool_turns_exceeds_window_size_rejected) are repetitive; consolidate
them using pytest.mark.parametrize to cover similar boundary inputs and expected
ValidationError or accepted values in a single parametrized test per field
(window_size, repetition_threshold, max_corrections, min_tool_turns). Update
tests to iterate tuples of (input_kwargs, should_raise, expected_value_or_match)
and assert by either expecting ValidationError with match or validating the
created StagnationConfig attribute, keeping references to the StagnationConfig
constructor and the specific test names to locate and replace the current
individual tests (also apply the same refactor pattern to the similar tests at
the other section noted around lines 176-195).
In `@tests/unit/engine/test_plan_execute_loop.py`:
- Around line 852-879: The test's _FakeStagnationDetector currently ignores the
corrections_injected argument so the assertion that check_count == 2 can pass
even if PlanExecuteLoop never increments corrections; modify
_FakeStagnationDetector to record the corrections_injected values passed into
check (e.g., add a list attribute like recorded_corrections) and append the
corrections_injected each time check() is called, then in the test assert that
recorded_corrections == [0, 1] (repeat the same change for the other occurrence
around lines 1008-1050) to ensure PlanExecuteLoop actually forwards the
incrementing correction counts.
- Around line 966-1006: The test currently only asserts
detector.last_turns_count == 1 which could be satisfied by a turn from step 1;
update the test to assert that the detector actually received the step-2 turn by
capturing the inspected turn(s) from the fake detector (the
_FakeStagnationDetector instance stored in detector) and asserting properties
that identify the step-2 tool turn (e.g., check
detector.last_inspected_turns[0].tool_calls_made or the inspected turn's tool
call id equals "tc-1" produced by _tool_use_response("echo","tc-1")); modify or
extend _FakeStagnationDetector to record and expose the last inspected turns if
needed, then replace the existing last_turns_count assertion with an assertion
that the inspected turn's tool_calls_made (or tool call id) matches the expected
step-2 value.
In `@tests/unit/engine/test_react_loop.py`:
- Around line 1039-1066: The test's fake stagnation detector
(_FakeStagnationDetector) must record the corrections_injected values passed to
its async check(...) method so the test can assert they were incremented; add an
attribute (e.g., corrections_args: list[int]) to _FakeStagnationDetector, append
the corrections_injected parameter inside check(...), and update the test
assertions (the places around the existing check_count assertions) to assert
that detector.corrections_args == [0, 1] (instead of relying only on
check_count). Apply the same change to the other instance referenced around
lines 1160-1198.
---
Outside diff comments:
In `@docs/design/engine.md`:
- Around line 289-290: The TerminationReason enum documentation is missing the
new STAGNATION value; update the enum listing in docs/design/engine.md so the
list of termination reasons under TerminationReason includes `STAGNATION`
alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `SHUTDOWN`, `ERROR`, and
`PARKED` and ensure any explanatory note (e.g., about `max_turns` defaulting to
20) remains intact.
In `@src/synthorg/engine/agent_engine.py`:
- Around line 151-189: The constructor AgentEngine.__init__ is doing
external-loop validation and default-loop construction inline; extract that
branch into a new helper (e.g., _configure_execution_loop or _setup_loop_wiring)
so __init__ stays small. Move the logic that reads execution_loop, checks
self._approval_gate and self._stagnation_detector to emit
APPROVAL_GATE_LOOP_WIRING_WARNING, and assigns self._loop (using
self._make_default_loop() when execution_loop is None) into the helper; call
this helper from __init__ passing the execution_loop argument and ensure it
returns/sets self._loop and retains existing behavior around warning message and
wiring semantics.
In `@src/synthorg/engine/react_loop.py`:
- Around line 101-189: The execute() method exceeds the 50-line guideline
because the post-turn stagnation handling is inlined; extract that block into a
helper (e.g., _handle_post_turn_stagnation) that accepts (ctx, turns,
corrections_injected, execution_id) and returns either an ExecutionResult or
(ctx, corrections_injected), then replace the inlined stagnation logic in
execute() with a single await call to _handle_post_turn_stagnation and handle
its return the same way as the current code; touch references to
check_stagnation and the variables ctx, corrections_injected, and turns so the
new helper calls check_stagnation and preserves the existing control flow.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 42dde218-8444-4f6d-b77f-bae296aa6b20
📒 Files selected for processing (24)
CLAUDE.mddocs/design/engine.mdsrc/synthorg/engine/__init__.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/loop_protocol.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/stagnation/__init__.pysrc/synthorg/engine/stagnation/detector.pysrc/synthorg/engine/stagnation/models.pysrc/synthorg/engine/stagnation/protocol.pysrc/synthorg/observability/events/stagnation.pytests/unit/engine/stagnation/__init__.pytests/unit/engine/stagnation/test_detector.pytests/unit/engine/stagnation/test_fingerprint.pytests/unit/engine/stagnation/test_models.pytests/unit/engine/stagnation/test_properties.pytests/unit/engine/test_loop_helpers.pytests/unit/engine/test_loop_protocol.pytests/unit/engine/test_plan_execute_loop.pytests/unit/engine/test_react_loop.pytests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) instead ofexcept (A, B):— PEP 758 syntax enforced by ruff on Python 3.14
All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
UseNotBlankStrfromcore.typesfor all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Use@computed_fieldin Pydantic models for derived values instead of storing and validating redundant fields
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations (e.g., multiple tool invocations, parallel agent calls) instead of barecreate_task
Line length: 88 characters (enforced by ruff)
Functions must be < 50 lines; files must be < 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate input at system boundaries (user input, external APIs, config files)
Create new objects instead of mutating existing ones (immutability principle)
Files:
src/synthorg/engine/loop_protocol.pysrc/synthorg/engine/stagnation/__init__.pytests/unit/observability/test_events.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/stagnation/protocol.pytests/unit/engine/test_react_loop.pysrc/synthorg/engine/stagnation/models.pysrc/synthorg/engine/checkpoint/resume.pytests/unit/engine/stagnation/test_fingerprint.pytests/unit/engine/stagnation/test_properties.pytests/unit/engine/test_loop_helpers.pytests/unit/engine/test_loop_protocol.pysrc/synthorg/engine/stagnation/detector.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/observability/events/stagnation.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/agent_engine.pytests/unit/engine/stagnation/test_detector.pytests/unit/engine/test_plan_execute_loop.pytests/unit/engine/stagnation/test_models.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic must importfrom synthorg.observability import get_loggerand definelogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code — use the synthorg logger instead
Always use event name constants from domain-specific modules undersynthorg.observability.events(e.g.,PROVIDER_CALL_STARTfromevents.provider); import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Always log with structured kwargs:logger.info(EVENT, key=value)— never use old-style formattinglogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG level for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never implement retry logic in driver subclasses or calling code — all provider calls go throughBaseCompletionProviderwhich applies retry + rate limiting automatically
SetRetryConfigandRateLimiterConfigper-provider inProviderConfig; retryable errors areRateLimitError,ProviderTimeoutError,ProviderConnectionError,ProviderInternalError
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state
For dict/list fields in frozen Pydantic models, usecopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use Pydantic v2 conventions:BaseModel,model_validator,computed_field,ConfigDict
Files:
src/synthorg/engine/loop_protocol.pysrc/synthorg/engine/stagnation/__init__.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/stagnation/protocol.pysrc/synthorg/engine/stagnation/models.pysrc/synthorg/engine/checkpoint/resume.pysrc/synthorg/engine/stagnation/detector.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/observability/events/stagnation.pysrc/synthorg/engine/react_loop.pysrc/synthorg/engine/agent_engine.py
docs/design/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
ALWAYS read the relevant design page before implementing any feature or planning any issue. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec
Files:
docs/design/engine.md
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
All docstrings in public APIs must be documented in Google style and reflected in
docs/api/auto-generated library reference (via mkdocstrings + Griffe AST-based parsing)
Files:
docs/design/engine.md
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
Maintain 80% minimum code coverage (enforced in CI)
Each test must complete within 30 seconds (timeout enforcement)
Always include-n autowhen running pytest viauv run python -m pytest— never run tests sequentially (pytest-xdist parallelism)
Prefer@pytest.mark.parametrizefor testing similar cases
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names likeexample-provider,example-large-001,test-provider,test-small-001, or size aliases (large/medium/small)
Use Hypothesis for property-based testing with@given+@settingsdecorators; control profiles viaHYPOTHESIS_PROFILEenv var (cifor 200 examples,devfor 1000 examples)
Files:
tests/unit/observability/test_events.pytests/unit/engine/test_react_loop.pytests/unit/engine/stagnation/test_fingerprint.pytests/unit/engine/stagnation/test_properties.pytests/unit/engine/test_loop_helpers.pytests/unit/engine/test_loop_protocol.pytests/unit/engine/stagnation/test_detector.pytests/unit/engine/test_plan_execute_loop.pytests/unit/engine/stagnation/test_models.py
🧠 Learnings (16)
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
tests/unit/observability/test_events.pyCLAUDE.mdsrc/synthorg/engine/loop_helpers.pysrc/synthorg/observability/events/stagnation.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
tests/unit/observability/test_events.pyCLAUDE.mdsrc/synthorg/engine/loop_helpers.pysrc/synthorg/observability/events/stagnation.py
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state
Applied to files:
src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`
Applied to files:
src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
Applied to files:
tests/unit/engine/stagnation/test_properties.py
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)
Applied to files:
tests/unit/engine/stagnation/test_properties.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
CLAUDE.mdsrc/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Always log with structured kwargs: `logger.info(EVENT, key=value)` — never use old-style formatting `logger.info("msg %s", val)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`
Applied to files:
CLAUDE.mdsrc/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T12:05:56.884Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:05:56.884Z
Learning: Applies to src/synthorg/**/*.py : Never use `import logging`, `logging.getLogger()`, or `print()` in application code — use the synthorg logger instead
Applied to files:
CLAUDE.md
🧬 Code graph analysis (14)
src/synthorg/engine/__init__.py (3)
src/synthorg/engine/stagnation/models.py (3)
StagnationConfig(23-81)StagnationResult(84-137)StagnationVerdict(15-20)src/synthorg/engine/stagnation/protocol.py (1)
StagnationDetector(15-46)src/synthorg/engine/stagnation/detector.py (1)
ToolRepetitionDetector(27-116)
tests/unit/engine/test_react_loop.py (4)
src/synthorg/engine/stagnation/detector.py (2)
get_detector_type(50-52)check(54-103)tests/unit/engine/test_loop_helpers.py (5)
_ctx_with_user_msg(93-97)_stop_response(47-53)execute(77-85)_tool_use_response(56-66)_make_invoker(88-90)src/synthorg/engine/react_loop.py (3)
ReactLoop(59-327)stagnation_detector(93-95)execute(101-196)src/synthorg/engine/stagnation/models.py (2)
StagnationResult(84-137)StagnationVerdict(15-20)
src/synthorg/engine/checkpoint/resume.py (2)
src/synthorg/engine/react_loop.py (3)
ReactLoop(59-327)approval_gate(88-90)stagnation_detector(93-95)src/synthorg/engine/plan_execute_loop.py (2)
approval_gate(115-117)stagnation_detector(120-122)
tests/unit/engine/stagnation/test_fingerprint.py (1)
src/synthorg/engine/loop_helpers.py (1)
compute_fingerprints(478-505)
tests/unit/engine/stagnation/test_properties.py (4)
src/synthorg/engine/loop_helpers.py (1)
compute_fingerprints(478-505)src/synthorg/engine/loop_protocol.py (1)
TurnRecord(40-81)src/synthorg/engine/stagnation/detector.py (2)
ToolRepetitionDetector(27-116)check(54-103)src/synthorg/engine/stagnation/models.py (2)
StagnationConfig(23-81)StagnationVerdict(15-20)
tests/unit/engine/test_loop_helpers.py (1)
src/synthorg/engine/loop_helpers.py (2)
make_turn_record(459-475)clear_last_turn_tool_calls(424-437)
tests/unit/engine/test_loop_protocol.py (1)
src/synthorg/engine/loop_protocol.py (2)
TerminationReason(28-37)TurnRecord(40-81)
src/synthorg/engine/stagnation/detector.py (3)
src/synthorg/engine/loop_protocol.py (1)
TurnRecord(40-81)src/synthorg/engine/stagnation/models.py (3)
StagnationConfig(23-81)StagnationResult(84-137)StagnationVerdict(15-20)src/synthorg/engine/stagnation/protocol.py (2)
get_detector_type(44-46)check(26-42)
src/synthorg/engine/loop_helpers.py (4)
src/synthorg/engine/stagnation/models.py (1)
StagnationVerdict(15-20)src/synthorg/engine/stagnation/protocol.py (2)
StagnationDetector(15-46)check(26-42)src/synthorg/engine/loop_protocol.py (1)
TurnRecord(40-81)src/synthorg/engine/stagnation/detector.py (1)
check(54-103)
src/synthorg/engine/plan_execute_loop.py (4)
src/synthorg/engine/loop_helpers.py (1)
check_stagnation(526-600)src/synthorg/engine/stagnation/protocol.py (1)
StagnationDetector(15-46)src/synthorg/engine/stagnation/detector.py (1)
config(46-48)src/synthorg/engine/loop_protocol.py (1)
ExecutionResult(84-145)
src/synthorg/engine/react_loop.py (4)
src/synthorg/engine/loop_helpers.py (1)
check_stagnation(526-600)src/synthorg/engine/stagnation/protocol.py (1)
StagnationDetector(15-46)src/synthorg/engine/plan_execute_loop.py (2)
stagnation_detector(120-122)approval_gate(115-117)src/synthorg/engine/loop_protocol.py (1)
ExecutionResult(84-145)
src/synthorg/engine/agent_engine.py (2)
src/synthorg/engine/stagnation/protocol.py (1)
StagnationDetector(15-46)src/synthorg/engine/react_loop.py (3)
stagnation_detector(93-95)ReactLoop(59-327)approval_gate(88-90)
tests/unit/engine/stagnation/test_detector.py (3)
src/synthorg/engine/loop_protocol.py (1)
TurnRecord(40-81)src/synthorg/engine/stagnation/detector.py (5)
ToolRepetitionDetector(27-116)check(54-103)get_detector_type(50-52)config(46-48)_detect_cycle(153-173)src/synthorg/engine/stagnation/models.py (2)
StagnationConfig(23-81)StagnationVerdict(15-20)
tests/unit/engine/stagnation/test_models.py (1)
src/synthorg/engine/stagnation/models.py (3)
StagnationConfig(23-81)StagnationResult(84-137)StagnationVerdict(15-20)
🔇 Additional comments (36)
src/synthorg/observability/events/stagnation.py (1)
5-8: Looks good — stagnation events are correctly centralized as constants.src/synthorg/engine/loop_protocol.py (1)
36-36:TerminationReason.STAGNATIONaddition is correct and consistent with loop termination semantics.tests/unit/observability/test_events.py (1)
217-217: Good update — domain discovery expectation now includes the newstagnationevents module.src/synthorg/engine/stagnation/__init__.py (1)
10-24: Public stagnation API re-exports are clean and well-scoped.tests/unit/engine/test_loop_protocol.py (1)
34-37: Tests are correctly aligned with the new STAGNATION reason and fingerprint field behavior.Also applies to: 80-93
CLAUDE.md (1)
122-122: Documentation updates correctly capture the new stagnation subsystem and observability events.Also applies to: 190-190
tests/unit/engine/stagnation/test_models.py (1)
13-59: Great coverage depth for stagnation model invariants and validation paths.Also applies to: 108-202
tests/unit/engine/stagnation/test_fingerprint.py (1)
13-101: Fingerprint tests are solid and cover determinism, canonicalization, and output format well.src/synthorg/engine/stagnation/protocol.py (1)
14-46: LGTM — the protocol stays intentionally minimal.
check()andget_detector_type()are the only hooks the loops need, and keepingcheck()async leaves room for service-backed detectors without another API break.src/synthorg/engine/checkpoint/resume.py (1)
115-127: Nice preservation of loop-scoped collaborators.Rebuilding the loop with both
approval_gateandstagnation_detectorkeeps resumed executions behaviorally aligned with the original loop instance.tests/unit/engine/test_loop_helpers.py (1)
522-533: Good coverage around the fingerprint lifecycle.These assertions protect both sides of the
TurnRecord.tool_call_fingerprintscontract: creation inmake_turn_record()and cleanup inclear_last_turn_tool_calls().Also applies to: 623-637
docs/design/engine.md (1)
498-563: Well-documented stagnation detection design.The Stagnation Detection section comprehensively covers the protocol interface, default implementation, configuration options, intervention flow, and loop integration semantics. The distinction between loop-scoped (ReactLoop) and step-scoped (PlanExecuteLoop) correction counters is clearly documented.
src/synthorg/engine/plan_execute_loop.py (3)
96-107: LGTM: Constructor properly accepts and stores stagnation detector.The constructor signature and storage pattern matches the ReactLoop implementation, maintaining consistency across execution loop implementations.
114-122: LGTM: Read-only properties for checkpoint resume support.The
approval_gateandstagnation_detectorproperties enable checkpoint resume to preserve these components, as specified in the PR objectives.
653-686: LGTM: Step-scoped stagnation detection correctly implemented.The implementation properly:
- Initializes
step_start_idxandstep_correctionsat step entry for step-scoped windowing- Passes only current step's turns (
turns[step_start_idx:]) to the detector- Includes
step_numberfor observability- Handles all three return types from
check_stagnationThis aligns with the design spec stating "corrections counter is step-scoped, window resets across step boundaries."
tests/unit/engine/stagnation/test_properties.py (3)
47-48: Good: Tool name strategy correctly filters whitespace-only strings.The filter
lambda s: s.strip()ensures generated tool names comply withNotBlankStrvalidation, preventing invalid test inputs.
55-89: LGTM: Fingerprint property tests.The tests verify two key properties:
- Determinism: Same name + args always produces identical fingerprints
- Format: Fingerprints match the
name:16-char-hexformat specificationUsing
rpartition(":")correctly handles tool names containing colons.
94-116: No action needed. The async test methods correctly omit explicit@pytest.mark.asynciodecorators. Theasyncio_mode = "auto"configuration inpyproject.toml(line 227) enables automatic async test discovery, which is the required pattern for this codebase per the coding guidelines.> Likely an incorrect or invalid review comment.tests/unit/engine/stagnation/test_detector.py (5)
14-27: LGTM: Well-designed test helper.The
_turnhelper correctly derivestool_calls_madefrom fingerprints and sets appropriatefinish_reasonbased on tool presence. This ensures test data is internally consistent.
30-83: LGTM: Comprehensive NO_STAGNATION test coverage.Tests cover all paths that should return NO_STAGNATION:
- Empty turns
- Unique fingerprints
- Below min_tool_turns threshold
- Turns without tool calls
- Disabled detector
- Below repetition threshold
85-138: LGTM: INJECT_PROMPT and TERMINATE verdict tests.Tests verify:
- High repetition triggers INJECT_PROMPT with corrective message
- Cycle detection triggers INJECT_PROMPT with cycle_length
- Exceeding max_corrections triggers TERMINATE
- max_corrections=0 skips INJECT_PROMPT and goes directly to TERMINATE
228-232: LGTM: Protocol conformance test.Verifying
isinstance(detector, StagnationDetector)confirms the implementation satisfies the runtime-checkable protocol.
285-327: LGTM: Exact repetition ratio verification.Testing precise ratio calculations (0.8, 0.5, 0.0) with
pytest.approxensures the algorithm correctness for edge cases.src/synthorg/engine/stagnation/models.py (4)
15-20: LGTM: StagnationVerdict enum.Clean StrEnum with clear semantics for the three possible outcomes.
23-81: LGTM: StagnationConfig model.Well-designed frozen config model with:
- Sensible defaults (window_size=5, repetition_threshold=0.6, max_corrections=1)
- Appropriate constraints (ge/le bounds)
- Cross-field validation ensuring
min_tool_turns <= window_size
84-137: LGTM: StagnationResult model with proper validation.The model correctly:
- Deep-copies
detailsdict at construction boundary (line 124-125)- Enforces
corrective_messageis required for INJECT_PROMPT and forbidden otherwise- Uses frozen config for immutability
Based on learnings: "frozen Pydantic models for config/identity" and "deep-copy at system boundaries."
140-142: LGTM: Reusable NO_STAGNATION_RESULT singleton.Safe to reuse since
StagnationResultis frozen. This avoids allocating new objects for the common no-stagnation case.src/synthorg/engine/loop_helpers.py (4)
435-437: LGTM: Consistent clearing of tool call metadata.Both
tool_calls_madeandtool_call_fingerprintsare cleared together, maintaining consistency when shutdown fires before tool execution.
478-505: LGTM: Deterministic fingerprint computation.The implementation correctly:
- Canonicalizes JSON with
sort_keys=Trueand compact separators- Uses
default=strto handle non-JSON-serializable argument values- Truncates SHA-256 to 16 hex chars (sufficient for fingerprint uniqueness)
- Returns sorted tuple for order-independent comparison across turns
526-599: LGTM: Well-structured stagnation check helper.The implementation:
- Gracefully handles detector failures (logs and continues execution)
- Re-raises
MemoryErrorandRecursionErrorper project conventions- Uses appropriate log levels (WARNING for termination, INFO for correction injection)
- Attaches stagnation metadata to the result for observability
577-582: ✓TerminationReason.STAGNATIONis properly defined in the enum.The enum value is correctly defined at line 36 of
loop_protocol.pyasSTAGNATION = "stagnation".src/synthorg/engine/stagnation/detector.py (5)
27-53: LGTM: ToolRepetitionDetector class structure.The class correctly:
- Defaults to
StagnationConfig()when no config is provided- Exposes configuration via read-only property
- Returns a meaningful detector type identifier
54-116: LGTM: Stagnation check logic.The dual-signal detection approach is well-implemented:
- Early exit for disabled config or insufficient tool turns
- Repetition ratio computed across all fingerprints in window
- Cycle detection runs if enabled
- Either signal triggers stagnation detection
The window extraction correctly filters to tool-bearing turns and takes the most recent
window_sizeentries.
119-138: LGTM: Correct repetition ratio computation.The formula
sum(c - 1 for c in counts.values() if c > 1) / totalcorrectly counts excess occurrences as duplicates:
- 5 identical → 4/5 = 0.8
- A,A,B,B → 2/4 = 0.5
- All unique → 0/n = 0.0
153-173: LGTM: Cycle detection algorithm.The algorithm correctly identifies repeating patterns by comparing the last
kturns with the precedingkturns, starting fromk=2to find the shortest cycle first.
176-237: LGTM: Result building and corrective messaging.The helper functions correctly:
- Choose INJECT_PROMPT vs TERMINATE based on
corrections_injected < max_corrections- Build informative corrective messages with repeated tool names
- Include
repeated_toolsin details for debugging- Log the STAGNATION_DETECTED event with all relevant context
| def test_window_size_lower_bound(self) -> None: | ||
| with pytest.raises(ValidationError, match="greater than or equal to 2"): | ||
| StagnationConfig(window_size=1) | ||
|
|
||
| def test_window_size_upper_bound(self) -> None: | ||
| with pytest.raises(ValidationError, match="less than or equal to 50"): | ||
| StagnationConfig(window_size=51) | ||
|
|
||
| def test_repetition_threshold_lower_bound(self) -> None: | ||
| config = StagnationConfig(repetition_threshold=0.0) | ||
| assert config.repetition_threshold == 0.0 | ||
|
|
||
| def test_repetition_threshold_upper_bound(self) -> None: | ||
| config = StagnationConfig(repetition_threshold=1.0) | ||
| assert config.repetition_threshold == 1.0 | ||
|
|
||
| def test_repetition_threshold_negative_rejected(self) -> None: | ||
| with pytest.raises(ValidationError): | ||
| StagnationConfig(repetition_threshold=-0.1) | ||
|
|
||
| def test_repetition_threshold_above_one_rejected(self) -> None: | ||
| with pytest.raises(ValidationError): | ||
| StagnationConfig(repetition_threshold=1.1) | ||
|
|
||
| def test_max_corrections_zero(self) -> None: | ||
| config = StagnationConfig(max_corrections=0) | ||
| assert config.max_corrections == 0 | ||
|
|
||
| def test_max_corrections_negative_rejected(self) -> None: | ||
| with pytest.raises(ValidationError): | ||
| StagnationConfig(max_corrections=-1) | ||
|
|
||
| def test_min_tool_turns_lower_bound(self) -> None: | ||
| config = StagnationConfig(min_tool_turns=1) | ||
| assert config.min_tool_turns == 1 | ||
|
|
||
| def test_min_tool_turns_zero_rejected(self) -> None: | ||
| with pytest.raises(ValidationError): | ||
| StagnationConfig(min_tool_turns=0) | ||
|
|
||
| def test_min_tool_turns_exceeds_window_size_rejected(self) -> None: | ||
| with pytest.raises( | ||
| ValidationError, | ||
| match=r"min_tool_turns.*exceeds.*window_size", | ||
| ): | ||
| StagnationConfig(window_size=3, min_tool_turns=4) | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consolidate repetitive boundary tests with @pytest.mark.parametrize.
These cases are strong but repetitive; parameterization would reduce duplication and simplify future additions.
As per coding guidelines: "Prefer @pytest.mark.parametrize for testing similar cases".
Also applies to: 176-195
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/stagnation/test_models.py` around lines 60 - 106, The tests
around StagnationConfig (e.g., test_window_size_lower_bound,
test_window_size_upper_bound, test_repetition_threshold_lower_bound,
test_repetition_threshold_upper_bound,
test_repetition_threshold_negative_rejected,
test_repetition_threshold_above_one_rejected, test_max_corrections_zero,
test_max_corrections_negative_rejected, test_min_tool_turns_lower_bound,
test_min_tool_turns_zero_rejected,
test_min_tool_turns_exceeds_window_size_rejected) are repetitive; consolidate
them using pytest.mark.parametrize to cover similar boundary inputs and expected
ValidationError or accepted values in a single parametrized test per field
(window_size, repetition_threshold, max_corrections, min_tool_turns). Update
tests to iterate tuples of (input_kwargs, should_raise, expected_value_or_match)
and assert by either expecting ValidationError with match or validating the
created StagnationConfig attribute, keeping references to the StagnationConfig
constructor and the specific test names to locate and replace the current
individual tests (also apply the same refactor pattern to the similar tests at
the other section noted around lines 176-195).
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #458 +/- ##
==========================================
+ Coverage 93.67% 93.72% +0.05%
==========================================
Files 469 474 +5
Lines 22219 22471 +252
Branches 2143 2166 +23
==========================================
+ Hits 20814 21062 +248
- Misses 1095 1098 +3
- Partials 310 311 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Add stagnation detection that analyzes TurnRecord tool-call fingerprints across a sliding window, intervenes with corrective prompt injection, and terminates early with STAGNATION if correction fails. - StagnationDetector protocol with async check() method - ToolRepetitionDetector: dual-signal (repetition ratio + cycle detection) - StagnationConfig: window_size, repetition_threshold, cycle_detection, max_corrections, min_tool_turns - STAGNATION termination reason + tool_call_fingerprints on TurnRecord - Fingerprint computation: name:sha256(canonical_json)[:16], sorted - ReactLoop integration: loop-scoped corrections counter - PlanExecuteLoop integration: per-step scoped detection - AgentEngine wiring via stagnation_detector parameter - Observability events: check_performed, detected, correction_injected, terminated - Design spec: engine.md stagnation detection section - 54 new tests (models, detector, fingerprints, Hypothesis properties) - Extended loop_protocol, loop_helpers, react_loop, plan_execute_loop tests
Add 'stagnation' to the expected domain modules set in test_all_domain_modules_discovered.
- Fix checkpoint resume dropping stagnation_detector and approval_gate (CRITICAL: make_loop_with_callback now forwards both via properties) - Add error handling around stagnation_detector.check() calls (CRITICAL: wrap in except MemoryError, RecursionError / except Exception) - Extract shared check_stagnation() helper to loop_helpers.py (removes duplicated logic from ReactLoop + PlanExecuteLoop) - Refactor ToolRepetitionDetector.check() into smaller methods (_extract_window, _compute_repetition_ratio, _build_stagnation_result) - Add approval_gate/stagnation_detector read-only properties to both loops - Update agent_engine warning to mention stagnation_detector - Deep-copy StagnationResult.details at construction (matches ExecutionResult) - Add ge=2 constraint to cycle_length field - Add cross-field validator: min_tool_turns <= window_size - Handle empty repeated_tools in corrective message (cycle-only trigger) - Update CLAUDE.md Package Structure with stagnation/ - Add _detect_cycle direct tests (6 cases: short/cycle2/cycle3/almost/shortest/empty) - Add repetition ratio exact-value tests (3 cases) - Add protocol conformance test (isinstance check) - Add PlanExecuteLoop step corrections counter increment test Pre-reviewed by 9 agents, 14 findings addressed
…and Gemini Critical fixes: - Fix incomplete turns in PlanExecuteLoop STAGNATION result (rebuild with full turns) - Add direct unit tests for check_stagnation() (8 tests: verdicts, exception paths) - Add checkpoint resume tests verifying stagnation_detector preservation - Fix TerminationReason docs missing STAGNATION in enum list and post-execution transitions Major fixes: - Refactor check_stagnation into two functions (<50 lines each) - Fix inaccurate repetition_threshold boundary semantics in docstring - Fix repetition ratio description in design spec - Add turn context to stagnation error log (consistent with peer functions) - Use NotBlankStr for tool_call_fingerprints (per CLAUDE.md convention) - Remove dead else branch in _build_corrective_message (Gemini finding) - Add AgentEngine stagnation_detector wiring test - Narrow exception log message to type-only (security hardening) Medium fixes: - Add tool_call_fingerprints to TurnRecord Attributes docstring - Add Args section to check_stagnation docstring - Fix _compute_repetition_ratio docstring precision - Enhance fake detectors: track corrections_seen, use TurnRecord types - Verify step-2 turn identity in step-scoped test
Cast _loop to ReactLoop via isinstance before accessing stagnation_detector property (ExecutionLoop protocol does not expose it).
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/engine/plan_execute_loop.py (1)
88-94:⚠️ Potential issue | 🟡 MinorDocument
stagnation_detectorin the public API docs.The class
Args:block still stops atapproval_gate, so the new constructor parameter is undocumented.As per coding guidelines: "Use Google-style docstrings on all public classes and functions. Ruff D rules enforce this."📝 Suggested docstring update
Args: config: Loop configuration. Defaults to ``PlanExecuteConfig()``. checkpoint_callback: Optional per-turn checkpoint callback. approval_gate: Optional gate that checks for pending escalations after tool execution and parks the agent when approval is required. ``None`` disables approval checks. + stagnation_detector: Optional detector used to identify + repetitive tool-use loops within a step. ``None`` disables + stagnation checks.Also applies to: 96-103
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/plan_execute_loop.py` around lines 88 - 94, The docstring for the public class or function that defines the loop (inspect the PlanExecuteLoop class / __init__ in plan_execute_loop.py) is missing documentation for the new constructor parameter stagnation_detector; update the Google-style Args: block to add an entry for stagnation_detector describing its purpose, expected type (e.g., callable or StagnationDetector), behavior (how it detects/handles stagnation and whether None disables it), and default value, and mirror this addition where the Args block continues around lines 96–103 so the public API docs include the new parameter.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/engine.md`:
- Around line 468-472: The earlier “Transition sequences” summary table is out
of sync with the later text: add STAGNATION and PARKED to the table’s list of
termination reasons that leave the task in its current state (alongside the
existing MAX_TURNS / BUDGET entries) so the lifecycle outcomes match; update the
table rows/columns that mention termination reasons and their resulting task
state to include the STAGNATION and PARKED symbols and ensure the description
for those entries matches the wording used later in the document.
In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 675-690: The code currently reassigns the shared ctx when
check_stagnation() returns an updated AgentContext, which leaks step-scoped
corrective prompts into later steps; change the tuple unpack to capture the
returned context into a new local variable (e.g., step_ctx or ctx_for_check) and
only use that local context for handling this step's logic and step_corrections,
leaving the outer ctx untouched (keep using ctx elsewhere), and ensure
step_corrections is applied only to the current step flow; do not reassign ctx
from the result of check_stagnation() so corrective prompts remain step-scoped
(references: check_stagnation, ctx, step_corrections, stag_outcome,
ExecutionResult, model_copy).
In `@tests/unit/engine/test_plan_execute_loop.py`:
- Around line 924-927: The assertion is too weak and may pass if _execute_step()
returns only a step-local slice; change the test to assert that the full two
turns (planning and tool-use) are present by using a stricter check on
result.turns (e.g., assert len(result.turns) == 2) and/or assert that one of the
turns is the planning turn (inspect turn.role or turn.type) so the STAGNATION
path reconstructs the complete turn sequence rather than just a step slice.
---
Outside diff comments:
In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 88-94: The docstring for the public class or function that defines
the loop (inspect the PlanExecuteLoop class / __init__ in plan_execute_loop.py)
is missing documentation for the new constructor parameter stagnation_detector;
update the Google-style Args: block to add an entry for stagnation_detector
describing its purpose, expected type (e.g., callable or StagnationDetector),
behavior (how it detects/handles stagnation and whether None disables it), and
default value, and mirror this addition where the Args block continues around
lines 96–103 so the public API docs include the new parameter.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6425edb9-dec5-4fc9-a018-a17beef932c3
📒 Files selected for processing (11)
docs/design/engine.mdsrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/loop_protocol.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/stagnation/detector.pysrc/synthorg/engine/stagnation/models.pytests/unit/engine/checkpoint/test_resume.pytests/unit/engine/test_agent_engine.pytests/unit/engine/test_loop_helpers.pytests/unit/engine/test_plan_execute_loop.pytests/unit/engine/test_react_loop.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python code—Python 3.14 has PEP 649 native lazy annotations.
Useexcept A, B:syntax (no parentheses) in exception handlers—PEP 758 except syntax for Python 3.14. Ruff enforces this.
Add type hints to all public functions and classes. Use mypy strict mode.
Use Google-style docstrings on all public classes and functions. Ruff D rules enforce this.
Prefer immutability—create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
Handle errors explicitly—never silently swallow exceptions.
Use Line length of 88 characters (ruff enforced).
Files:
tests/unit/engine/test_agent_engine.pytests/unit/engine/test_react_loop.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/loop_protocol.pysrc/synthorg/engine/loop_helpers.pytests/unit/engine/test_loop_helpers.pytests/unit/engine/checkpoint/test_resume.pysrc/synthorg/engine/stagnation/models.pytests/unit/engine/test_plan_execute_loop.pysrc/synthorg/engine/stagnation/detector.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowfor test organization.
Use asyncio_mode = "auto" in pytest configuration—no manual@pytest.mark.asyncioneeded on async tests.
Prefer@pytest.mark.parametrizefor testing similar cases.
Use Hypothesis for property-based testing with@given+@settings. Run dev profile with HYPOTHESIS_PROFILE=dev for 1000 examples.
Never skip, dismiss, or ignore flaky tests—always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic.
Do NOT use vendor names (Anthropic, OpenAI, Claude, GPT) in tests. Use test-provider, test-small-001, etc.
Files:
tests/unit/engine/test_agent_engine.pytests/unit/engine/test_react_loop.pytests/unit/engine/test_loop_helpers.pytests/unit/engine/checkpoint/test_resume.pytests/unit/engine/test_plan_execute_loop.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with adopted conventions: use@computed_fieldfor derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Use structured concurrency over bare create_task.
Keep functions under 50 lines and files under 800 lines.
Validate at system boundaries (user input, external APIs, config files).
Every module with business logic MUST have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__). Never use import logging / logging.getLogger() / print() in application code.
Always useloggeras the variable name for the module logger (not _logger, not log).
Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events. import EVENT_CONSTANT.
Log structured data with logger.info(EVENT, key=value)—never use logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG level logging is for object creation, internal flow, and entry/exit of key functions.
Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig.
Retryable errors are RateLimitError, P...
Files:
src/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/loop_protocol.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/stagnation/models.pysrc/synthorg/engine/stagnation/detector.py
src/synthorg/{engine,providers}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.
Files:
src/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/loop_protocol.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/stagnation/models.pysrc/synthorg/engine/stagnation/detector.py
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Markdown documentation files must follow Zensical build conventions with mkdocs.yml at repo root.
Files:
docs/design/engine.md
🧠 Learnings (11)
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/api/**/*.py : Authentication uses JWT + API key. Approval gate integration for high-risk operations.
Applied to files:
src/synthorg/engine/plan_execute_loop.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Applied to files:
src/synthorg/engine/loop_protocol.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/engine/loop_protocol.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.
Applied to files:
src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Applied to files:
src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.
Applied to files:
src/synthorg/engine/stagnation/models.py
📚 Learning: 2026-03-15T19:03:01.705Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:03:01.705Z
Learning: Applies to src/synthorg/**/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Applied to files:
src/synthorg/engine/stagnation/models.py
🧬 Code graph analysis (8)
tests/unit/engine/test_agent_engine.py (1)
src/synthorg/engine/react_loop.py (1)
ReactLoop(59-327)
tests/unit/engine/test_react_loop.py (3)
src/synthorg/engine/loop_protocol.py (3)
TerminationReason(28-37)TurnRecord(40-83)execute(166-192)src/synthorg/engine/react_loop.py (3)
ReactLoop(59-327)stagnation_detector(93-95)execute(101-196)src/synthorg/engine/stagnation/models.py (2)
StagnationResult(87-140)StagnationVerdict(15-20)
src/synthorg/engine/plan_execute_loop.py (5)
src/synthorg/engine/loop_helpers.py (1)
check_stagnation(526-587)src/synthorg/engine/stagnation/protocol.py (1)
StagnationDetector(15-46)src/synthorg/engine/react_loop.py (1)
stagnation_detector(93-95)src/synthorg/engine/stagnation/detector.py (1)
config(46-48)src/synthorg/engine/loop_protocol.py (1)
ExecutionResult(86-147)
src/synthorg/engine/loop_helpers.py (3)
src/synthorg/engine/stagnation/models.py (2)
StagnationResult(87-140)StagnationVerdict(15-20)src/synthorg/engine/stagnation/protocol.py (2)
StagnationDetector(15-46)check(26-42)src/synthorg/engine/loop_protocol.py (2)
TurnRecord(40-83)ExecutionResult(86-147)
tests/unit/engine/test_loop_helpers.py (11)
src/synthorg/engine/loop_helpers.py (1)
check_stagnation(526-587)tests/unit/engine/test_cost_recording.py (6)
record(75-80)record(179-180)record(195-196)_turn(21-36)_result(39-48)test_memory_error_propagates(175-189)tests/unit/engine/test_plan_execute_loop.py (3)
_stop_response(90-96)get_detector_type(868-869)check(871-883)tests/unit/engine/test_react_loop.py (3)
_stop_response(41-47)get_detector_type(1055-1056)check(1058-1068)tests/unit/engine/stagnation/test_detector.py (1)
_turn(14-27)tests/unit/engine/stagnation/test_properties.py (1)
_turn(15-26)src/synthorg/engine/loop_protocol.py (3)
TurnRecord(40-83)ExecutionResult(86-147)TerminationReason(28-37)tests/unit/engine/conftest.py (1)
engine(449-460)src/synthorg/engine/stagnation/detector.py (2)
get_detector_type(50-52)check(54-103)src/synthorg/engine/stagnation/models.py (2)
StagnationResult(87-140)StagnationVerdict(15-20)tests/unit/engine/test_agent_engine.py (1)
test_memory_error_propagates(1282-1303)
tests/unit/engine/checkpoint/test_resume.py (1)
src/synthorg/engine/plan_execute_loop.py (1)
stagnation_detector(120-122)
tests/unit/engine/test_plan_execute_loop.py (4)
src/synthorg/engine/loop_protocol.py (3)
TerminationReason(28-37)TurnRecord(40-83)execute(166-192)src/synthorg/engine/stagnation/detector.py (2)
get_detector_type(50-52)check(54-103)src/synthorg/engine/stagnation/models.py (2)
StagnationResult(87-140)StagnationVerdict(15-20)src/synthorg/engine/plan_execute_loop.py (3)
PlanExecuteLoop(82-896)stagnation_detector(120-122)execute(128-206)
src/synthorg/engine/stagnation/detector.py (3)
src/synthorg/engine/loop_protocol.py (1)
TurnRecord(40-83)src/synthorg/engine/stagnation/models.py (3)
StagnationConfig(23-84)StagnationResult(87-140)StagnationVerdict(15-20)src/synthorg/engine/stagnation/protocol.py (2)
get_detector_type(44-46)check(26-42)
🔇 Additional comments (19)
src/synthorg/engine/stagnation/models.py (5)
1-12: LGTM!Module docstring is clear and imports are appropriate. No usage of
from __future__ import annotationsas per coding guidelines. UsingSelffromtypingand Pydantic v2 constructs correctly.
15-21: LGTM!Clean enum definition using
StrEnumwith clear semantic values. The three verdicts (NO_STAGNATION, INJECT_PROMPT, TERMINATE) align well with the stagnation detection flow described in PR objectives.
23-84: LGTM!Well-designed frozen configuration model with sensible defaults and constraints. The
_validate_min_within_windowvalidator correctly ensuresmin_tool_turns <= window_sizewith a clear error message explaining why the configuration is invalid.
87-140: LGTM!Excellent implementation following the coding guidelines. The
__init__override correctly deep-copies thedetailsdict at the construction boundary per the guideline: "copy.deepcopy() at system boundaries". The_validate_corrective_messagevalidator ensures invariant consistency betweenverdictandcorrective_message. Based on learnings: "For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries."
143-146: LGTM!Good optimization pattern — a reusable frozen instance for the common no-stagnation case avoids repeated allocations.
src/synthorg/engine/loop_helpers.py (7)
8-9: LGTM!Standard library imports for fingerprint computation (hashlib for SHA-256, json for canonical serialization).
23-26: LGTM!Event constants correctly imported from the domain-specific
synthorg.observability.events.stagnationmodule per coding guidelines.
424-437: LGTM!Correctly extended to clear both
tool_calls_madeandtool_call_fingerprintstogether. Since fingerprints are derived from tool calls, clearing one without the other would create an inconsistent state.
478-505: LGTM!Well-designed fingerprinting function with deterministic properties:
sort_keys=Trueensures consistent JSON regardless of dict key order- Compact separators avoid whitespace variance
- Sorted output tuple ensures same tool calls produce identical fingerprints regardless of call order
- 16 hex chars (64 bits) provides sufficient collision resistance for this use case
The
default=strfallback is defensive for non-JSON-serializable types. This is acceptable given tool arguments should typically be JSON-serializable, but be aware this could produce different fingerprints for semantically equivalent objects with non-deterministic__str__representations.
459-475: LGTM!Clean integration of fingerprint computation into turn record creation. The
tool_call_fingerprintsfield is correctly populated from the response's tool calls.
526-587: LGTM!Well-designed advisory stagnation check with appropriate error handling:
- Correctly returns
Noneon detector absence or failure (non-blocking)- Properly logs errors with context before continuing
- Clear separation between detection (
check) and verdict handling- Uses PEP 758 exception syntax correctly (
except MemoryError, RecursionError:)The advisory design ensures stagnation detection never interrupts an otherwise-healthy loop, as documented in the PR objectives.
590-645: LGTM!Clean verdict dispatch with appropriate logging levels:
WARNINGfor STAGNATION_TERMINATED (problematic termination)INFOfor STAGNATION_CORRECTION_INJECTED (state transition)The corrective message is correctly injected as a
USERrole message, which aligns with the pattern of system interventions that appear as user guidance to the model.src/synthorg/engine/stagnation/detector.py (7)
1-24: LGTM!Module docstring clearly describes the dual-signal detection approach. Imports are appropriate, and the logger is correctly initialized with
get_logger(__name__)per coding guidelines.
27-103: LGTM!Clean implementation of the
StagnationDetectorprotocol:
checkmethod signature matches the protocol definitionget_detector_typereturns a unique identifier- Early returns for disabled state or insufficient tool-bearing turns
- Properly delegates computation to focused helper functions
The dual-signal detection (
ratio >= thresholdORcycle_length is not None) aligns with the PR objectives.
105-116: LGTM!Correct windowing logic that filters to tool-bearing turns only. The negative slice
[-window_size:]gracefully handles cases where fewer tool turns exist than the window size.
119-140: LGTM!Correct repetition ratio computation. The formula
(sum of c-1 for repeated) / totalcounts excess occurrences (duplicates beyond the first) as a fraction of total fingerprints. Empty window handling returns(0.0, Counter())appropriately.
143-175: LGTM!Sound cycle detection algorithm that identifies repeating A→B→A→B patterns by comparing tail segments with their predecessors. The iteration from
cycle_len=2ton//2ensures the shortest cycle is found first. For sequences shorter than 4 turns, no cycle can be detected (correctly returnsNone).
178-213: LGTM!Correct verdict selection logic:
INJECT_PROMPTwhencorrections_injected < max_correctionsTERMINATEwhen corrections exhaustedThe
repeated_toolsextraction and sorting ensures deterministic logging and message generation. TheINFOlevel forSTAGNATION_DETECTEDis appropriate as a state transition event (distinct from the termination action logged separately).
216-233: LGTM!Clear, actionable corrective message that guides the agent to change its approach. The
[SYSTEM INTERVENTION: ...]prefix clearly marks this as an automated intervention. The docstring correctly notes thatrepeated_toolsis always non-empty when this function is called — stagnation detection requires at least one repeated fingerprint (via ratio or cycle signal).
…rtion - Add stagnation_detector to PlanExecuteLoop and ReactLoop Args docstrings - Strengthen STAGNATION turn count assertion: verify both planning and tool-use turns are present (not just >= 1)
d6f1137 to
f24b1cf
Compare
🤖 I have created a release *beep* *boop* --- ## [0.2.6](v0.2.5...v0.2.6) (2026-03-15) ### Features * add intra-loop stagnation detector ([#415](#415)) ([#458](#458)) ([8e9f34f](8e9f34f)) * add RFC 9457 structured error responses (Phase 1) ([#457](#457)) ([6612a99](6612a99)), closes [#419](#419) * implement AgentStateRepository for runtime state persistence ([#459](#459)) ([5009da7](5009da7)) * **site:** add SEO essentials, contact form, early-access banner ([#467](#467)) ([11b645e](11b645e)), closes [#466](#466) ### Bug Fixes * CLI improvements — config show, completion install, enhanced doctor, Sigstore verification ([#465](#465)) ([9e08cec](9e08cec)) * **site:** add reCAPTCHA v3, main landmark, and docs sitemap ([#469](#469)) ([fa6d35c](fa6d35c)) * use force-tag-creation instead of manual tag creation hack ([#462](#462)) ([2338004](2338004)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Summary
TurnRecordtool-call fingerprints across a sliding window, intervenes with corrective prompt injection, and terminates early withSTAGNATIONif correction failsStagnationDetectorasync protocol withToolRepetitionDetectordefault implementation using dual-signal detection (repetition ratio + cycle detection)StagnationConfigfrozen model with configurable window_size, repetition_threshold, cycle_detection, max_corrections, min_tool_turnsSTAGNATIONtermination reason +tool_call_fingerprintsfield onTurnRecordname:sha256(canonical_json)[:16], sorted per-turncheck_stagnation()helper inloop_helpers.pywith proper error handling (non-critical — logged and skipped on failure)AgentEnginewiring viastagnation_detectorparametercheck_performed,detected,correction_injected,terminateddocs/design/engine.mdstagnation detection sectionTest plan
isinstancecheck)_detect_cyclecoverage (6 cases)Review coverage
Pre-reviewed by 9 agents, 14 findings addressed:
Closes #415