Skip to content

feat(engine): implement Hybrid Plan + ReAct execution loop#582

Merged
Aureliolo merged 9 commits intomainfrom
feat/hybrid-execution-loop
Mar 19, 2026
Merged

feat(engine): implement Hybrid Plan + ReAct execution loop#582
Aureliolo merged 9 commits intomainfrom
feat/hybrid-execution-loop

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

@Aureliolo Aureliolo commented Mar 19, 2026

Summary

  • Implement Loop 3 (Hybrid) from design spec SS6.5 -- combines strategic planning (3-7 steps) with per-step mini-ReAct sub-loops, progress-summary checkpoints, and LLM-decided replanning
  • Extract shared plan utilities into plan_helpers.py to eliminate duplication between PlanExecuteLoop and HybridLoop
  • Make hybrid a first-class buildable loop type in the auto-selection system (loop_selector.py)
  • Wire hybrid_loop_config through AgentEngine for runtime configuration

Key features

  • Per-step turn limit (max_turns_per_step, default 5) -- steps that exhaust their budget are marked FAILED
  • Progress summary after each step via LLM call (tools=None, SYSTEM category) with replan decision parsing
  • Optional LLM-decided replanning on step completion (allow_replan_on_completion) with distinct prompt for success vs failure triggers
  • Plan truncation to max_plan_steps with warning log
  • Turn budget advisory at loop start when config exceeds available turns
  • Full integration with stagnation detection, context compaction, checkpoint, and approval gate subsystems

New files

  • hybrid_loop.py -- HybridLoop class (ExecutionLoop protocol)
  • hybrid_models.py -- HybridLoopConfig (frozen Pydantic, 7 fields)
  • plan_helpers.py -- update_step_status, extract_task_summary, assess_step_success

Pre-PR review

8 agents reviewed this PR (code-reviewer, conventions-enforcer, logging-audit, test-analyzer, silent-failure-hunter, issue-verifier, docs-consistency, type-design). 14 findings identified and addressed in a follow-up commit:

  • Fixed misleading replan prompt (said "failed" on success-path replans)
  • Added logging to _parse_replan_decision and empty summary warning
  • Removed unused EXECUTION_HYBRID_STEP_CHECKPOINT constant
  • Updated stale "not yet implemented" docstrings and design spec references
  • Updated CLAUDE.md Package Structure and engine.md Loop Integration sections
  • Added 12 new tests (replan parsing, checkpoint callback, provider exceptions)

Test plan

  • 59 new tests (27 hybrid loop + 20 config model + 12 review-driven additions)
  • All 9598 unit tests pass
  • Coverage 94% (above 80% threshold)
  • mypy strict mode passes
  • ruff lint + format clean
  • Pre-commit hooks pass (including pre-push mypy + pytest)
  • No regressions in PlanExecuteLoop (28 tests) or loop selector (50 tests)

Closes #199

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced Hybrid execution loop combining initial plan with reactive step-by-step execution, including per-step turn limits, progress checkpoints, and optional intelligent replanning.
  • Bug Fixes

    • Improved error message sanitization for better security and clarity.
  • Documentation

    • Updated engine documentation to describe hybrid execution behavior and configuration parameters.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 19, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 19, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 10c1beed-10f2-4d72-85c1-05262a5583a2

📥 Commits

Reviewing files that changed from the base of the PR and between 83f8be1 and 1d354f6.

📒 Files selected for processing (1)
  • src/synthorg/engine/loop_selector.py

Walkthrough

Adds a new HybridLoop execution strategy and immutable HybridLoopConfig, implementing a hybrid "plan + mini‑ReAct per step" flow with per-step turn limits, progress‑summary checkpoints, optional replanning, compaction at turn boundaries, and observability events for hybrid-specific conditions. Introduces hybrid_helpers and plan_helpers utilities, caps parsed plan step arrays, wires hybrid_loop_config and checkpoint callbacks through the loop selector and AgentEngine, re-exports HybridLoop/HybridLoopConfig, updates engine design docs, and adds comprehensive unit tests. No public API signatures were removed.

Possibly related PRs

  • PR 520: overlaps on compaction/compaction_callback wiring used by execution loops, including HybridLoop.
  • PR 387: touches engine loop implementations and PlanExecuteLoop step/tool handling that interacts with the new plan_helpers refactor.
  • PR 182: modifies AgentEngine initialization and fatal-error sanitization, overlapping the AgentEngine constructor and error-handling changes.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately and concisely describes the main change: implementing a Hybrid Plan + ReAct execution loop, which is the primary objective of the changeset.
Description check ✅ Passed The PR description is comprehensive and clearly related to the changeset, covering implementation details, key features, test results, and linking to issue #199.
Linked Issues check ✅ Passed All key coding requirements from issue #199 are met: HybridLoop implementation with per-step turn limits, progress summary checkpoints, optional replanning, ExecutionLoop protocol compliance, integration with existing subsystems, plan truncation, and loop selector wiring.
Out of Scope Changes check ✅ Passed All changes are in scope: HybridLoop/HybridLoopConfig implementation, plan_helpers extraction, loop_selector updates, AgentEngine wiring, documentation, observability constants, and comprehensive test coverage. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 58.00% which is sufficient. The required threshold is 40.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/hybrid-execution-loop
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent engine's capabilities by introducing a sophisticated Hybrid Plan + ReAct execution loop. This new loop allows agents to perform high-level strategic planning while executing each step tactically with ReAct, incorporating dynamic replanning and progress monitoring. The changes improve the agent's ability to handle complex tasks by providing more granular control over execution, better resource management, and increased adaptability to unforeseen circumstances during task execution.

Highlights

  • Hybrid Plan + ReAct Execution Loop Implemented: Introduced a new HybridLoop that combines strategic planning (3-7 steps) with per-step mini-ReAct sub-loops, enabling more robust and adaptable agent execution. This loop includes per-step turn limits, progress-summary checkpoints, and LLM-decided replanning capabilities.
  • Enhanced Loop Selection and Configuration: Integrated the HybridLoop as a first-class buildable loop type within the loop_selector.py system. The AgentEngine now wires through hybrid_loop_config for runtime customization, and the hybrid_fallback mechanism has been updated to reflect the HybridLoop's availability.
  • Shared Plan Utilities Extracted: Refactored common plan-related helper functions, such as update_step_status, extract_task_summary, and assess_step_success, into a new plan_helpers.py module to reduce duplication and improve maintainability across plan-based execution loops.
  • Comprehensive Testing and Documentation Updates: Added 59 new unit tests covering the hybrid loop and its configuration model, ensuring high coverage and stability. Documentation in CLAUDE.md and docs/design/engine.md was updated to reflect the new loop, its features, and integration details.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 19, 2026 10:56 — with GitHub Actions Inactive
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 19, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 87.01595% with 57 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.46%. Comparing base (dfed931) to head (1d354f6).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/engine/hybrid_helpers.py 82.42% 15 Missing and 14 partials ⚠️
src/synthorg/engine/hybrid_loop.py 89.84% 10 Missing and 10 partials ⚠️
src/synthorg/engine/plan_parsing.py 25.00% 2 Missing and 1 partial ⚠️
src/synthorg/engine/agent_engine.py 77.77% 1 Missing and 1 partial ⚠️
src/synthorg/engine/loop_selector.py 85.71% 0 Missing and 1 partial ⚠️
src/synthorg/engine/plan_execute_loop.py 87.50% 0 Missing and 1 partial ⚠️
src/synthorg/engine/plan_helpers.py 96.15% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #582      +/-   ##
==========================================
- Coverage   92.55%   92.46%   -0.10%     
==========================================
  Files         547      551       +4     
  Lines       27274    27681     +407     
  Branches     2615     2678      +63     
==========================================
+ Hits        25244    25595     +351     
- Misses       1605     1634      +29     
- Partials      425      452      +27     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces the new Hybrid Plan + ReAct execution loop, which combines strategic planning with per-step mini-ReAct sub-loops, progress-summary checkpoints, and LLM-decided replanning. This is a significant feature addition, and the implementation appears robust and well-tested. The changes include adding new files for the HybridLoop and its configuration model, updating documentation, and refactoring shared plan utilities into plan_helpers.py to reduce duplication. The auto-selection system (loop_selector.py) has also been updated to recognize and build the new hybrid loop directly, removing previous fallback mechanisms.

The code adheres to good practices, including clear docstrings, comprehensive unit tests, and proper handling of various execution flow scenarios like budget exhaustion, shutdown, and replanning. The new HybridLoopConfig model is well-defined with appropriate validation. Overall, this is a well-executed feature implementation.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/synthorg/engine/agent_engine.py (1)

186-200: 🧹 Nitpick | 🔵 Trivial

Add hybrid_loop_config to the docstring.

The new hybrid_loop_config parameter is not documented in the class docstring's Args section. Add an entry to maintain docstring completeness:

        hybrid_loop_config: Optional configuration for the Hybrid
            Plan + ReAct loop.  Used when auto-selection picks
            ``"hybrid"`` or when ``loop_type="hybrid"`` is built.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 186 - 200, The class
__init__ docstring is missing the new parameter hybrid_loop_config; update the
class (or __init__) docstring Args section to include an entry for
hybrid_loop_config describing it as an optional configuration for the Hybrid
Plan + ReAct loop, used when auto-selection picks "hybrid" or when
loop_type="hybrid" is built, matching the style of the other parameters (e.g.,
provider, auto_loop_config) and referencing the hybrid_loop_config parameter
name exactly so readers and tools pick up the new parameter.
src/synthorg/engine/plan_execute_loop.py (1)

580-625: ⚠️ Potential issue | 🟡 Minor

Capture the task summary before mutating ctx.

extract_task_summary() now runs after ctx = ctx.with_message(message), so a non-task-bound context with no earlier USER message will record the internal planning/replan prompt itself as original_task_summary. That pollutes plan metadata and any downstream prompt text that relies on it. Cache the summary from the pre-prompt context and pass that cached value into parse_plan().

Suggested fix
-        ctx = ctx.with_message(message)
+        task_summary = extract_task_summary(ctx)
+        ctx = ctx.with_message(message)
@@
         plan = parse_plan(
             response,
             ctx.execution_id,
-            extract_task_summary(ctx),
+            task_summary,
             revision_number=revision_number,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/plan_execute_loop.py` around lines 580 - 625, The code
calls extract_task_summary() after mutating ctx (via ctx =
ctx.with_message(message)), which lets the planning prompt be recorded as the
original task summary; capture the task summary from the pre-mutation context
(e.g., original_task_summary = extract_task_summary(ctx) before ctx =
ctx.with_message(message)) and pass that cached original_task_summary into
parse_plan() (replace the current extract_task_summary(ctx) argument) so
parse_plan() uses the pre-prompt summary; update references to
extract_task_summary in this block accordingly (symbols: ctx.with_message,
extract_task_summary, parse_plan).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/hybrid_helpers.py`:
- Around line 445-447: The checkpoint summary message is being constructed as a
user message (summary_msg = ChatMessage(role=MessageRole.USER,
content=_build_summary_prompt(...))) which incorrectly mutates chat state;
change the role to MessageRole.SYSTEM so the summary is sent as an internal
system prompt—locate the summary_msg construction in hybrid_helpers.py and
replace MessageRole.USER with MessageRole.SYSTEM while keeping usage of
ChatMessage and the _build_summary_prompt call unchanged.
- Around line 610-621: The replan logic in do_replan currently omits the
checkpoint_callback so call_planner is invoked without per-turn checkpointing;
update the do_replan signature to accept a checkpoint_callback parameter
(matching the type used by call_planner) and thread it through any internal
planner invocations so call_planner(...,
checkpoint_callback=checkpoint_callback) is used; then update all callers of
do_replan (including the replan invocation sites around the other occurrence
noted) to pass the same checkpoint_callback they receive (or propagate it from
their caller), ensuring replan turns use the same checkpoint subsystem as
initial planning.
- Around line 254-259: Before appending the prompt or invoking the LLM, check
ctx.has_turns_remaining and return MAX_TURNS if no turns remain; specifically,
in the summary and planner helper flows (e.g., run_progress_summary and the
planner helper) avoid creating the message via ctx.with_message, computing
turn_number from ctx.turn_count, or calling call_provider when
ctx.has_turns_remaining is False — this prevents making the extra provider call
that later causes ctx.with_turn_completed to raise. Ensure the guard is placed
at the start of each helper so you return MAX_TURNS before any prompt
construction or call_provider invocation.
- Around line 373-376: Don't coerce the "replan" value with bool() — instead
parse it explicitly: after json.loads(json_str) and verifying data is a dict,
get the raw value via data.get("replan") and if it's a bool return it; if it's a
str, compare raw.lower() to "true" or "false" and return the corresponding
boolean; for any other type or unrecognized string treat it as a parse miss
(e.g., return None or the existing "no parse" sentinel) rather than using
bool(raw).

In `@src/synthorg/engine/hybrid_loop.py`:
- Around line 416-428: The replan-on-completion path in
_decide_replan_on_completion calls run_progress_summary() but does not re-check
the shutdown/budget guards afterwards, so if the summary changed ctx.turn_count
or cost it can still invoke do_replan(); update _decide_replan_on_completion to
re-evaluate the same shutdown/budget guard logic immediately after
run_progress_summary() and before calling do_replan(), returning the appropriate
non-replan branch when shutdown or out-of-budget is detected (mirror the guard
check used in attempt_replan()); apply the same fix to the analogous block
around lines 451-460 so both success and failure paths consistently re-check
before triggering planner calls.
- Around line 269-273: The loop is modifying the live ExecutionPlan via
update_step_status(plan, step_idx, ...) but all_plans remains stale so
metadata["final_plan"] ends up containing the pre-execution snapshot;
synchronize the live plan with the stored history by either replacing the last
entry in all_plans with the updated plan after every call to update_step_status
(locations: where plan = update_step_status(...) around lines with plan/step_idx
and in attempt_replan()), or change _finalize() to accept the live plan and
derive final_plan from that live instance instead of all_plans[-1]; ensure the
same fix is applied to the failed-step update path in attempt_replan() and the
other occurrences noted (similar blocks at the other indicated locations).

In `@src/synthorg/engine/plan_helpers.py`:
- Around line 72-76: Replace the hardcoded event string in the logger.warning
call inside extract_task_summary_fallback with the appropriate event constant
from synthorg.observability.events (e.g., import the fallback constant from
synthorg.observability.events.plan or the relevant domain and use that constant
instead of "plan_helpers.extract_task_summary_fallback"); update the import at
top of src/synthorg/engine/plan_helpers.py to pull the chosen event constant
(for example TASK_SUMMARY_FALLBACK or SUMMARY_FALLBACK from
synthorg.observability.events.plan) and pass that constant as the first argument
to logger.warning while leaving the execution_id and note kwargs unchanged.

In `@tests/unit/engine/test_hybrid_loop.py`:
- Around line 62-71: The _single_step_plan() fixture returns a one-step
CompletionResponse which lets tests pass despite breaking the intended Hybrid
multi-step contract; update _single_step_plan() (and any callers relying on
_plan_response) to produce a minimum of 3 steps (preferably 3–7 as described in
docs/design/engine.md) with distinct step_number and description fields, and
adjust any tests that intentionally expect a single-step to either use a new
single-step fixture name or be updated to assert the multi-step behavior; ensure
the returned type remains CompletionResponse and reuse _plan_response to build
the multi-step payload.
- Around line 1-1217: The test module is too large and should be split into
focused test files to meet the guideline of files < 800 lines and functions < 50
lines; refactor by extracting related test groups (e.g., TestHybridLoopProtocol,
TestHybridLoopBasic, TestHybridLoopWithTools, TestHybridLoopReplanning,
TestHybridLoopBudget/Shutdown/MaxTurns, TestHybridLoopStagnation,
TestHybridLoopModelTiering/Metadata/ContextImmutability,
TestHybridLoopCheckpointCallback/TestHybridLoopCompaction, and
TestParseReplanDecision/_parse_replan_decision helper tests) into separate
modules (one per concern), keep helper functions like _single_step_plan,
_multi_step_plan, _usage, _make_invoker, and _ctx_with_user_msg in a shared test
utils module to avoid duplication, and ensure each new file imports the required
symbols (HybridLoop, HybridLoopConfig, do_replan, _parse_replan_decision,
AgentContext, mock_provider_factory fixtures, etc.) and that no test function
exceeds 50 lines; run test suite to update any relative imports or fixtures
after splitting.

In `@tests/unit/engine/test_loop_selector.py`:
- Around line 406-435: Add assertions to the hybrid factory tests to verify
checkpoint_callback and compaction_callback are forwarded by
build_execution_loop; in the test functions (e.g., test_build_hybrid,
test_build_hybrid_with_config, test_build_hybrid_with_gates) pass mock callbacks
into build_execution_loop via the checkpoint_callback and compaction_callback
parameters and assert that the returned HybridLoop instance (HybridLoop) has its
checkpoint_callback and compaction_callback attributes set to those mocks,
ensuring build_execution_loop correctly propagates these callbacks.

---

Outside diff comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 186-200: The class __init__ docstring is missing the new parameter
hybrid_loop_config; update the class (or __init__) docstring Args section to
include an entry for hybrid_loop_config describing it as an optional
configuration for the Hybrid Plan + ReAct loop, used when auto-selection picks
"hybrid" or when loop_type="hybrid" is built, matching the style of the other
parameters (e.g., provider, auto_loop_config) and referencing the
hybrid_loop_config parameter name exactly so readers and tools pick up the new
parameter.

In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 580-625: The code calls extract_task_summary() after mutating ctx
(via ctx = ctx.with_message(message)), which lets the planning prompt be
recorded as the original task summary; capture the task summary from the
pre-mutation context (e.g., original_task_summary = extract_task_summary(ctx)
before ctx = ctx.with_message(message)) and pass that cached
original_task_summary into parse_plan() (replace the current
extract_task_summary(ctx) argument) so parse_plan() uses the pre-prompt summary;
update references to extract_task_summary in this block accordingly (symbols:
ctx.with_message, extract_task_summary, parse_plan).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d728b2d2-fc97-4e1a-97c6-d601e5d4cf0a

📥 Commits

Reviewing files that changed from the base of the PR and between 962c324 and 8fe07e3.

📒 Files selected for processing (18)
  • CLAUDE.md
  • docs/design/engine.md
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/hybrid_models.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/engine/plan_models.py
  • src/synthorg/engine/plan_parsing.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_hybrid_loop.py
  • tests/unit/engine/test_hybrid_models.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_plan_helpers.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict) for all data models
Use PEP 758 except syntax: except A, B: (no parentheses) — not PEP 654 syntax
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Include type hints on all public functions and classes; enforce with mypy strict mode
Use Google-style docstrings for all public classes and functions (enforced by ruff D rules)
Prefer immutability: create new objects rather than mutating existing ones; use copy.deepcopy() at construction and MappingProxyType for non-Pydantic collections
Use asyncio.TaskGroup for fan-out/fan-in parallel operations (structured concurrency) instead of bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Set line length to 88 characters (enforced by ruff)

Files:

  • src/synthorg/engine/plan_parsing.py
  • src/synthorg/engine/plan_models.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_hybrid_models.py
  • src/synthorg/engine/plan_helpers.py
  • tests/unit/engine/test_loop_selector.py
  • src/synthorg/engine/hybrid_models.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/engine/test_hybrid_loop.py
  • tests/unit/engine/test_plan_helpers.py
  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/hybrid_helpers.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST import logger via from synthorg.observability import get_logger then use logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code; only use the centralized get_logger function
Always use structured logging with keyword arguments: logger.info(EVENT, key=value) — never use formatted strings like logger.info('msg %s', val)
Use event name constants from synthorg.observability.events. (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Use NotBlankStr (from core.types) for all identifier/name fields in models instead of manual whitespace validators
Never implement retry logic in provider driver subclasses or calling code — it is handled automatically by BaseCompletionProvider base class
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, or comments; use generic names like example-provider or example-large-001
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves
Use @computed_field for derived values instead of storing and validating redundant fields

Files:

  • src/synthorg/engine/plan_parsing.py
  • src/synthorg/engine/plan_models.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/engine/hybrid_models.py
  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/hybrid_helpers.py
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

ALWAYS use uv sync (not cd ... commands) when referencing command execution in documentation and examples

Files:

  • CLAUDE.md
  • docs/design/engine.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark all unit tests with @pytest.mark.unit, integration tests with @pytest.mark.integration, e2e tests with @pytest.mark.e2e, and slow tests with @pytest.mark.slow
Use pytest.mark.parametrize for testing multiple similar cases instead of writing separate test functions
Use Hypothesis (@given + @settings) for property-based testing in Python unit tests
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests; use generic names like test-provider, test-small-001, or test-medium-001
Set test timeout to 30 seconds per test; use pytest-xdist with -n auto for parallel execution (ALWAYS include this flag, never run tests sequentially)
Maintain minimum 80% code coverage (enforced in CI); use pytest --cov=synthorg --cov-fail-under=80 to verify locally
Mock time.monotonic() and asyncio.sleep() in timing-sensitive tests to make them deterministic instead of widening timing margins
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally

Files:

  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_hybrid_models.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_hybrid_loop.py
  • tests/unit/engine/test_plan_helpers.py
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

docs/design/**/*.md: ALWAYS read the relevant docs/design/ page before implementing any feature or planning any issue; design spec is the starting point for architecture and behavior
If implementation deviates from the spec, alert the user and explain why; user decides whether to proceed or update the spec; then update the relevant design page to reflect approved deviations

Files:

  • docs/design/engine.md
🧠 Learnings (9)
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves

Applied to files:

  • src/synthorg/engine/plan_models.py
  • tests/unit/engine/test_hybrid_models.py
  • src/synthorg/engine/hybrid_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/plan_models.py
  • src/synthorg/engine/hybrid_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

  • src/synthorg/engine/plan_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST import logger via `from synthorg.observability import get_logger` then use `logger = get_logger(__name__)`

Applied to files:

  • src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events.<domain> (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events

Applied to files:

  • src/synthorg/observability/events/execution.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/execution.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • tests/unit/engine/test_plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to tests/**/*.py : Maintain minimum 80% code coverage (enforced in CI); use `pytest --cov=synthorg --cov-fail-under=80` to verify locally

Applied to files:

  • tests/unit/engine/test_plan_helpers.py
🧬 Code graph analysis (10)
src/synthorg/engine/__init__.py (2)
src/synthorg/engine/hybrid_loop.py (1)
  • HybridLoop (86-784)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/agent_engine.py (2)
tests/unit/engine/conftest.py (1)
  • engine (449-460)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/plan_helpers.py (3)
  • assess_step_success (80-98)
  • extract_task_summary (55-77)
  • update_step_status (24-52)
src/synthorg/engine/plan_models.py (1)
  • StepStatus (15-22)
src/synthorg/engine/loop_selector.py (4)
src/synthorg/engine/hybrid_loop.py (5)
  • HybridLoop (86-784)
  • approval_gate (124-126)
  • stagnation_detector (129-131)
  • compaction_callback (134-136)
  • config (119-121)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/plan_execute_loop.py (4)
  • approval_gate (131-133)
  • stagnation_detector (136-138)
  • compaction_callback (141-143)
  • config (126-128)
src/synthorg/engine/loop_protocol.py (1)
  • ExecutionLoop (158-196)
tests/unit/engine/test_agent_engine_auto_loop.py (3)
src/synthorg/budget/enforcer.py (2)
  • BudgetEnforcer (56-473)
  • cost_tracker (89-91)
web/src/api/types.ts (2)
  • BudgetConfig (406-413)
  • BudgetAlertConfig (393-397)
tests/unit/engine/conftest.py (2)
  • mock_provider_factory (314-316)
  • MockCompletionProvider (207-289)
tests/unit/engine/test_hybrid_models.py (1)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
tests/unit/engine/test_loop_selector.py (5)
tests/unit/engine/conftest.py (1)
  • engine (449-460)
src/synthorg/engine/hybrid_loop.py (5)
  • HybridLoop (86-784)
  • config (119-121)
  • get_loop_type (138-140)
  • approval_gate (124-126)
  • stagnation_detector (129-131)
src/synthorg/engine/loop_selector.py (3)
  • select_loop_type (228-263)
  • AutoLoopConfig (94-168)
  • build_execution_loop (266-324)
src/synthorg/engine/plan_execute_loop.py (4)
  • config (126-128)
  • get_loop_type (145-147)
  • approval_gate (131-133)
  • stagnation_detector (136-138)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
tests/unit/engine/test_plan_helpers.py (4)
src/synthorg/engine/context.py (3)
  • AgentContext (104-412)
  • from_identity (190-225)
  • with_message (227-236)
src/synthorg/engine/plan_helpers.py (3)
  • assess_step_success (80-98)
  • extract_task_summary (55-77)
  • update_step_status (24-52)
src/synthorg/engine/plan_models.py (3)
  • ExecutionPlan (53-88)
  • PlanStep (25-50)
  • StepStatus (15-22)
src/synthorg/providers/enums.py (2)
  • FinishReason (24-31)
  • MessageRole (15-21)
src/synthorg/engine/hybrid_loop.py (9)
src/synthorg/engine/hybrid_helpers.py (5)
  • attempt_replan (509-607)
  • call_planner (228-308)
  • do_replan (610-698)
  • run_progress_summary (406-503)
  • warn_insufficient_budget (167-189)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/loop_helpers.py (5)
  • build_result (512-527)
  • call_provider (156-213)
  • check_budget (107-153)
  • check_response_errors (216-250)
  • check_shutdown (62-104)
src/synthorg/engine/loop_protocol.py (3)
  • ExecutionResult (86-147)
  • TerminationReason (28-37)
  • TurnRecord (40-83)
src/synthorg/engine/plan_helpers.py (1)
  • update_step_status (24-52)
src/synthorg/engine/plan_models.py (3)
  • ExecutionPlan (53-88)
  • PlanStep (25-50)
  • StepStatus (15-22)
src/synthorg/engine/plan_execute_loop.py (3)
  • config (126-128)
  • execute (149-227)
  • _run_steps (256-350)
src/synthorg/engine/context.py (2)
  • AgentContext (104-412)
  • has_turns_remaining (410-412)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
src/synthorg/engine/hybrid_helpers.py (11)
src/synthorg/providers/enums.py (2)
  • FinishReason (24-31)
  • MessageRole (15-21)
src/synthorg/engine/loop_helpers.py (7)
  • build_result (512-527)
  • call_provider (156-213)
  • check_budget (107-153)
  • check_response_errors (216-250)
  • check_shutdown (62-104)
  • make_turn_record (463-479)
  • response_to_message (454-460)
src/synthorg/engine/loop_protocol.py (3)
  • ExecutionResult (86-147)
  • TerminationReason (28-37)
  • TurnRecord (40-83)
src/synthorg/engine/plan_helpers.py (3)
  • assess_step_success (80-98)
  • extract_task_summary (55-77)
  • update_step_status (24-52)
src/synthorg/engine/plan_models.py (3)
  • ExecutionPlan (53-88)
  • PlanStep (25-50)
  • StepStatus (15-22)
src/synthorg/engine/plan_parsing.py (1)
  • parse_plan (57-100)
src/synthorg/engine/context.py (3)
  • AgentContext (104-412)
  • with_turn_completed (238-287)
  • has_turns_remaining (410-412)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/hybrid_loop.py (1)
  • config (119-121)
src/synthorg/engine/plan_execute_loop.py (1)
  • config (126-128)
tests/integration/engine/test_graceful_shutdown.py (1)
  • shutdown_checker (208-214)
🔇 Additional comments (9)
src/synthorg/engine/plan_models.py (1)

103-103: LGTM! Stricter validation for config model.

Adding extra="forbid" improves config hygiene by rejecting unknown fields, catching typos and misconfigurations early. This aligns with HybridLoopConfig and is appropriate for frozen config models.

src/synthorg/engine/hybrid_models.py (1)

12-73: LGTM! Well-structured configuration model.

The HybridLoopConfig model follows project conventions:

  • Frozen + extra="forbid" for strict config validation
  • NotBlankStr | None for optional model overrides
  • Bounded numeric constraints with sensible defaults
  • Comprehensive docstring documenting all attributes

Based on learnings: "Use frozen Pydantic models for config/identity."

tests/unit/engine/test_hybrid_models.py (1)

1-101: LGTM! Comprehensive config model tests.

Good coverage of:

  • Default values verification
  • Custom value overrides
  • Immutability enforcement (frozen)
  • Schema strictness (extra="forbid")
  • Numeric range constraints (both violations and boundaries)
  • String validation (NotBlankStr rejection)

Effective use of @pytest.mark.parametrize for testing multiple similar cases.

src/synthorg/observability/events/execution.py (1)

83-95: LGTM! Hybrid loop event constants.

New constants follow existing naming conventions (EXECUTION_HYBRID_*"execution.hybrid.*") and cover the key observability points for the hybrid loop: step turn limits, progress summaries, replan decisions, budget warnings, and plan truncation.

src/synthorg/engine/plan_parsing.py (1)

210-221: LGTM! Defensive parse-time cap on plan steps.

Good safeguard against unbounded memory allocation from misbehaving LLM output. The 50-step cap at parse time provides a safety net before individual loop configs apply their own (typically stricter) limits. Clear warning log captures the original count for debugging.

src/synthorg/engine/__init__.py (1)

113-114: LGTM! Clean public API re-exports.

HybridLoop and HybridLoopConfig are properly imported and added to __all__, following the existing pattern for loop exports and maintaining alphabetical ordering.

Also applies to: 287-288

CLAUDE.md (1)

127-127: LGTM! Documentation updated for hybrid loop.

Package structure documentation accurately reflects the new hybrid execution loop components: HybridLoop, HybridLoopConfig, shared plan helpers, and the optional hybrid fallback in loop selection. The description captures key features (planning, mini-ReAct steps, turn limits, checkpoints, replanning).

src/synthorg/engine/agent_engine.py (2)

1069-1069: LGTM! Hybrid config correctly forwarded to loop builder.

The hybrid_loop_config is properly passed to build_execution_loop, enabling the HybridLoop to receive its configuration when auto-selected.


1221-1225: LGTM! Error message sanitization.

Good defensive practice:

  • Truncates to 200 chars to prevent log bloat from verbose exceptions
  • Filters non-printable characters for log hygiene
  • Prevents internal paths from leaking into user-facing error messages

Comment on lines +445 to +447
summary_msg = ChatMessage(
role=MessageRole.USER,
content=_build_summary_prompt(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use a system message for the checkpoint summary prompt.

Issue #199 calls this summary hop out as a tools=None SYSTEM call. Sending it as MessageRole.USER turns an internal coordination prompt into a new user request and changes the chat state in a way the design was explicitly trying to avoid. Please switch this to MessageRole.SYSTEM.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/hybrid_helpers.py` around lines 445 - 447, The checkpoint
summary message is being constructed as a user message (summary_msg =
ChatMessage(role=MessageRole.USER, content=_build_summary_prompt(...))) which
incorrectly mutates chat state; change the role to MessageRole.SYSTEM so the
summary is sent as an internal system prompt—locate the summary_msg construction
in hybrid_helpers.py and replace MessageRole.USER with MessageRole.SYSTEM while
keeping usage of ChatMessage and the _build_summary_prompt call unchanged.

Comment on lines +269 to +273
plan = update_step_status(
plan,
step_idx,
StepStatus.IN_PROGRESS,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

final_plan is built from a stale plan snapshot.

update_step_status() returns a new ExecutionPlan, but the loop only appends to all_plans when a plan is first created or re-planned. By the time _finalize() runs, all_plans[-1] can still be the pre-execution version, so metadata["final_plan"] misses the real COMPLETED/FAILED outcomes. Please either keep the current plan synchronized in all_plans or pass the live plan into _finalize() and derive final_plan from that. The failed-step update in attempt_replan() has the same problem.

Also applies to: 381-385, 775-780

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/hybrid_loop.py` around lines 269 - 273, The loop is
modifying the live ExecutionPlan via update_step_status(plan, step_idx, ...) but
all_plans remains stale so metadata["final_plan"] ends up containing the
pre-execution snapshot; synchronize the live plan with the stored history by
either replacing the last entry in all_plans with the updated plan after every
call to update_step_status (locations: where plan = update_step_status(...)
around lines with plan/step_idx and in attempt_replan()), or change _finalize()
to accept the live plan and derive final_plan from that live instance instead of
all_plans[-1]; ensure the same fix is applied to the failed-step update path in
attempt_replan() and the other occurrences noted (similar blocks at the other
indicated locations).

Comment on lines +62 to +71
def _single_step_plan() -> CompletionResponse:
return _plan_response(
[
{
"step_number": 1,
"description": "Analyze and solve the problem",
"expected_outcome": "Problem solved",
},
]
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These fixtures currently bless single-step hybrid plans.

The PR objective and docs/design/engine.md both describe Hybrid as a strategic 3–7 step loop, but _single_step_plan() is reused as a passing happy-path fixture across much of this suite. That means the tests will still go green if the planner collapses into a one-step pseudo-ReAct flow, which is the exact failure mode Loop 3 was meant to avoid. Please either enforce the minimum step count and update these fixtures, or explicitly relax the design contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_hybrid_loop.py` around lines 62 - 71, The
_single_step_plan() fixture returns a one-step CompletionResponse which lets
tests pass despite breaking the intended Hybrid multi-step contract; update
_single_step_plan() (and any callers relying on _plan_response) to produce a
minimum of 3 steps (preferably 3–7 as described in docs/design/engine.md) with
distinct step_number and description fields, and adjust any tests that
intentionally expect a single-step to either use a new single-step fixture name
or be updated to assert the multi-step behavior; ensure the returned type
remains CompletionResponse and reuse _plan_response to build the multi-step
payload.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/engine/plan_execute_loop.py (1)

279-283: ⚠️ Potential issue | 🟠 Major

Mirror immutable plan updates into all_plans.

update_step_status() returns a new ExecutionPlan, but _finalize() serializes all_plans, not the local plan. Because these new plan objects never get written back, metadata["plans"] and metadata["final_plan"] keep stale step statuses whenever a step starts, completes, or fails.

🧩 Minimal fix
             plan = update_step_status(
                 plan,
                 step_idx,
                 StepStatus.IN_PROGRESS,
             )
+            all_plans[-1] = plan
@@
                 plan = update_step_status(
                     plan,
                     step_idx,
                     StepStatus.COMPLETED,
                 )
+                all_plans[-1] = plan
@@
-        plan = update_step_status(plan, step_idx, StepStatus.FAILED)
+        plan = update_step_status(plan, step_idx, StepStatus.FAILED)
+        all_plans[-1] = plan

Also applies to: 310-314, 373-373

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/plan_execute_loop.py` around lines 279 - 283,
update_step_status returns a new ExecutionPlan but the code only assigns it to
the local variable plan, leaving all_plans (and
metadata["plans"]/metadata["final_plan"] used by _finalize) stale; after each
call to update_step_status (e.g., in the blocks that update step statuses to
IN_PROGRESS, SUCCESS, FAILURE), replace the corresponding entry in the all_plans
list with the returned plan and also update metadata["plans"] (and
metadata["final_plan"] if applicable) to reference that new plan so serialized
state reflects the immutable change before calling _finalize.
♻️ Duplicate comments (2)
tests/unit/engine/_hybrid_loop_helpers.py (1)

46-55: ⚠️ Potential issue | 🟠 Major

Don't make one-step plans the default Hybrid fixture.

This shared helper is reused as a normal success path, so the suite still goes green if the planner collapses into a one-step pseudo-ReAct loop. Keep a one-step fixture only for explicit degenerate/replan-edge tests, and make the default planner fixture multi-step.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/_hybrid_loop_helpers.py` around lines 46 - 55, The helper
_single_step_plan currently returns a one-step plan which makes the shared
default fixture too trivial; change the default helper to produce a multi-step
plan (eg. at least two steps with distinct step_number, description,
expected_outcome) and rename it or add a new _multi_step_plan used as the
default planner fixture, leaving _single_step_plan only for explicit
degenerate/replan-edge tests; update any tests or fixtures that referenced the
old default to use the new multi-step helper (and keep _plan_response as the
underlying constructor).
src/synthorg/engine/hybrid_helpers.py (1)

458-459: ⚠️ Potential issue | 🟡 Minor

Summary message uses MessageRole.USER instead of MessageRole.SYSTEM.

The past review flagged that issue #199 specifies the progress-summary as a tools=None SYSTEM call. However, line 459 still constructs the message with MessageRole.USER. This turns an internal coordination prompt into a user request, which may affect chat state differently than intended.

🔧 Proposed fix
     summary_msg = ChatMessage(
-        role=MessageRole.USER,
+        role=MessageRole.SYSTEM,
         content=_build_summary_prompt(
             plan,
             step_idx,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/hybrid_helpers.py` around lines 458 - 459, The
summary_msg is being constructed with MessageRole.USER but should be a SYSTEM
message; update the ChatMessage creation for summary_msg to use
MessageRole.SYSTEM (replace MessageRole.USER with MessageRole.SYSTEM) so the
progress-summary is sent as an internal SYSTEM call rather than a user prompt;
ensure the change is made where summary_msg = ChatMessage(...) is defined so any
downstream handling expects a SYSTEM role.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 1224-1228: raw_msg -> sanitized -> error_msg currently only strips
non-printable chars and still allows filesystem paths/URLs into
ExecutionResult.error_message; update the sanitization logic in agent_engine.py
(where raw_msg, sanitized, and error_msg are set) to detect and redact path-like
substrings (e.g., absolute/relative paths, drive letters, URL patterns, and long
dot-delimited tokens) before truncation, replacing them with a generic
placeholder like "[REDACTED_PATH]"; after redaction, trim to the 200-char limit
and if the result contains no alphanumeric characters (or is empty) set
error_msg to a safe fallback such as "Error: details redacted" while preserving
the exception class name (type(exc).__name__) in the final string.

In `@src/synthorg/engine/plan_helpers.py`:
- Around line 43-48: The bounds check in update_step_status currently raises
IndexError without logging; before raising, emit a WARNING-level observability
event that includes step_idx, step_count (len(plan.steps)) and
plan.revision_number using the plan domain event constant from
synthorg.observability.events.plan (e.g., PLAN_STEP_INDEX_OUT_OF_RANGE), then
raise the IndexError as before; locate the check around update_step_status,
reference plan.steps and plan.revision_number, and ensure the warning uses the
event constant and structured fields (step_idx, step_count, revision) per the
project logging guidelines.

In `@tests/unit/engine/test_hybrid_loop_advanced.py`:
- Around line 310-351: The TestParseReplanDecision class contains many
near-duplicate tests for the helper _parse_replan_decision; collapse them into a
single parameterized test: add a pytest.mark.parametrize on (content, expected)
with the 12 input/expected pairs from the removed methods and implement a single
test method (e.g., test_parse_replan_decision) that asserts
_parse_replan_decision(content) is expected; keep descriptive ids for each case
and remove the individual test_* methods.

---

Outside diff comments:
In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 279-283: update_step_status returns a new ExecutionPlan but the
code only assigns it to the local variable plan, leaving all_plans (and
metadata["plans"]/metadata["final_plan"] used by _finalize) stale; after each
call to update_step_status (e.g., in the blocks that update step statuses to
IN_PROGRESS, SUCCESS, FAILURE), replace the corresponding entry in the all_plans
list with the returned plan and also update metadata["plans"] (and
metadata["final_plan"] if applicable) to reference that new plan so serialized
state reflects the immutable change before calling _finalize.

---

Duplicate comments:
In `@src/synthorg/engine/hybrid_helpers.py`:
- Around line 458-459: The summary_msg is being constructed with
MessageRole.USER but should be a SYSTEM message; update the ChatMessage creation
for summary_msg to use MessageRole.SYSTEM (replace MessageRole.USER with
MessageRole.SYSTEM) so the progress-summary is sent as an internal SYSTEM call
rather than a user prompt; ensure the change is made where summary_msg =
ChatMessage(...) is defined so any downstream handling expects a SYSTEM role.

In `@tests/unit/engine/_hybrid_loop_helpers.py`:
- Around line 46-55: The helper _single_step_plan currently returns a one-step
plan which makes the shared default fixture too trivial; change the default
helper to produce a multi-step plan (eg. at least two steps with distinct
step_number, description, expected_outcome) and rename it or add a new
_multi_step_plan used as the default planner fixture, leaving _single_step_plan
only for explicit degenerate/replan-edge tests; update any tests or fixtures
that referenced the old default to use the new multi-step helper (and keep
_plan_response as the underlying constructor).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3843e4a7-f844-4442-abfe-9c650adea9b6

📥 Commits

Reviewing files that changed from the base of the PR and between 8fe07e3 and 02f362f.

📒 Files selected for processing (11)
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/engine/_hybrid_loop_helpers.py
  • tests/unit/engine/test_hybrid_loop.py
  • tests/unit/engine/test_hybrid_loop_advanced.py
  • tests/unit/engine/test_hybrid_loop_replanning.py
  • tests/unit/engine/test_loop_selector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict) for all data models
Use PEP 758 except syntax: except A, B: (no parentheses) — not PEP 654 syntax
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Include type hints on all public functions and classes; enforce with mypy strict mode
Use Google-style docstrings for all public classes and functions (enforced by ruff D rules)
Prefer immutability: create new objects rather than mutating existing ones; use copy.deepcopy() at construction and MappingProxyType for non-Pydantic collections
Use asyncio.TaskGroup for fan-out/fan-in parallel operations (structured concurrency) instead of bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Set line length to 88 characters (enforced by ruff)

Files:

  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/plan_execute_loop.py
  • tests/unit/engine/test_hybrid_loop_replanning.py
  • tests/unit/engine/_hybrid_loop_helpers.py
  • tests/unit/engine/test_hybrid_loop.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_hybrid_loop_advanced.py
  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/hybrid_helpers.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST import logger via from synthorg.observability import get_logger then use logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code; only use the centralized get_logger function
Always use structured logging with keyword arguments: logger.info(EVENT, key=value) — never use formatted strings like logger.info('msg %s', val)
Use event name constants from synthorg.observability.events. (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Use NotBlankStr (from core.types) for all identifier/name fields in models instead of manual whitespace validators
Never implement retry logic in provider driver subclasses or calling code — it is handled automatically by BaseCompletionProvider base class
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, or comments; use generic names like example-provider or example-large-001
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves
Use @computed_field for derived values instead of storing and validating redundant fields

Files:

  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/hybrid_helpers.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark all unit tests with @pytest.mark.unit, integration tests with @pytest.mark.integration, e2e tests with @pytest.mark.e2e, and slow tests with @pytest.mark.slow
Use pytest.mark.parametrize for testing multiple similar cases instead of writing separate test functions
Use Hypothesis (@given + @settings) for property-based testing in Python unit tests
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests; use generic names like test-provider, test-small-001, or test-medium-001
Set test timeout to 30 seconds per test; use pytest-xdist with -n auto for parallel execution (ALWAYS include this flag, never run tests sequentially)
Maintain minimum 80% code coverage (enforced in CI); use pytest --cov=synthorg --cov-fail-under=80 to verify locally
Mock time.monotonic() and asyncio.sleep() in timing-sensitive tests to make them deterministic instead of widening timing margins
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally

Files:

  • tests/unit/engine/test_hybrid_loop_replanning.py
  • tests/unit/engine/_hybrid_loop_helpers.py
  • tests/unit/engine/test_hybrid_loop.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_hybrid_loop_advanced.py
🧠 Learnings (10)
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to **/*.py : Keep functions under 50 lines and files under 800 lines

Applied to files:

  • tests/unit/engine/test_hybrid_loop.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events.<domain> (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events

Applied to files:

  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Always use structured logging with keyword arguments: `logger.info(EVENT, key=value)` — never use formatted strings like `logger.info('msg %s', val)`

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR level with context before raising exceptions

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • src/synthorg/engine/plan_helpers.py
🧬 Code graph analysis (7)
src/synthorg/engine/agent_engine.py (1)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/plan_helpers.py (3)
  • assess_step_success (81-99)
  • extract_task_summary (56-78)
  • update_step_status (25-53)
src/synthorg/engine/plan_models.py (1)
  • StepStatus (15-22)
tests/unit/engine/test_loop_selector.py (3)
src/synthorg/engine/hybrid_loop.py (6)
  • HybridLoop (84-794)
  • config (116-118)
  • get_loop_type (135-137)
  • approval_gate (121-123)
  • stagnation_detector (126-128)
  • compaction_callback (131-133)
src/synthorg/engine/loop_selector.py (3)
  • select_loop_type (228-263)
  • AutoLoopConfig (94-168)
  • build_execution_loop (266-324)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
tests/unit/engine/test_hybrid_loop_advanced.py (6)
src/synthorg/engine/hybrid_helpers.py (1)
  • _parse_replan_decision (356-413)
src/synthorg/engine/hybrid_loop.py (5)
  • HybridLoop (84-794)
  • stagnation_detector (126-128)
  • execute (139-214)
  • config (116-118)
  • compaction_callback (131-133)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/loop_protocol.py (2)
  • TerminationReason (28-37)
  • TurnRecord (40-83)
src/synthorg/engine/stagnation/models.py (2)
  • StagnationResult (87-140)
  • StagnationVerdict (15-20)
tests/unit/engine/_hybrid_loop_helpers.py (7)
  • _ctx_with_user_msg (159-161)
  • _make_invoker (154-156)
  • _single_step_plan (46-55)
  • _stop_response (80-86)
  • _summary_response (89-100)
  • _tool_use_response (103-113)
  • execute (143-151)
src/synthorg/engine/plan_helpers.py (3)
src/synthorg/providers/enums.py (2)
  • FinishReason (24-31)
  • MessageRole (15-21)
src/synthorg/engine/context.py (1)
  • AgentContext (104-412)
src/synthorg/engine/plan_models.py (2)
  • ExecutionPlan (53-88)
  • StepStatus (15-22)
src/synthorg/engine/hybrid_loop.py (7)
src/synthorg/engine/hybrid_helpers.py (8)
  • attempt_replan (522-624)
  • build_step_message (103-127)
  • call_planner (228-312)
  • do_replan (627-719)
  • handle_step_completion (130-161)
  • invoke_checkpoint_callback (195-222)
  • run_progress_summary (416-516)
  • warn_insufficient_budget (167-189)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/loop_helpers.py (9)
  • build_result (512-527)
  • call_provider (156-213)
  • check_budget (107-153)
  • check_response_errors (216-250)
  • check_shutdown (62-104)
  • execute_tool_calls (253-355)
  • invoke_compaction (654-689)
  • make_turn_record (463-479)
  • response_to_message (454-460)
src/synthorg/engine/loop_protocol.py (3)
  • ExecutionResult (86-147)
  • TerminationReason (28-37)
  • TurnRecord (40-83)
src/synthorg/engine/plan_helpers.py (1)
  • update_step_status (25-53)
src/synthorg/engine/plan_models.py (3)
  • ExecutionPlan (53-88)
  • PlanStep (25-50)
  • StepStatus (15-22)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
src/synthorg/engine/hybrid_helpers.py (7)
src/synthorg/providers/enums.py (2)
  • FinishReason (24-31)
  • MessageRole (15-21)
src/synthorg/engine/loop_helpers.py (4)
  • build_result (512-527)
  • call_provider (156-213)
  • check_budget (107-153)
  • check_response_errors (216-250)
src/synthorg/engine/loop_protocol.py (3)
  • ExecutionResult (86-147)
  • TerminationReason (28-37)
  • TurnRecord (40-83)
src/synthorg/engine/plan_helpers.py (3)
  • assess_step_success (81-99)
  • extract_task_summary (56-78)
  • update_step_status (25-53)
src/synthorg/engine/plan_models.py (3)
  • ExecutionPlan (53-88)
  • PlanStep (25-50)
  • StepStatus (15-22)
src/synthorg/engine/plan_parsing.py (1)
  • parse_plan (57-100)
src/synthorg/engine/context.py (4)
  • AgentContext (104-412)
  • has_turns_remaining (410-412)
  • with_message (227-236)
  • with_turn_completed (238-287)
🔇 Additional comments (13)
tests/unit/engine/test_hybrid_loop_advanced.py (2)

35-77: LGTM!

The stagnation detection tests are well-structured, with inline detector classes that correctly implement the StagnationDetector protocol. The test verifies that TerminationReason.STAGNATION is returned when the detector signals TERMINATE.


354-398: LGTM!

Provider error handling tests correctly verify that TerminationReason.ERROR is returned when the provider raises during planning or step execution. The type: ignore[arg-type] comments are appropriate for the test doubles.

src/synthorg/engine/hybrid_loop.py (5)

377-378: LGTM!

The stale plan snapshot issue flagged in the past review has been addressed. The code now synchronizes the live plan into all_plans after updating step status:

if all_plans:
    all_plans[-1] = plan

This ensures final_plan in metadata reflects the real COMPLETED/FAILED outcomes.


457-462: LGTM!

The shutdown/budget guard issue flagged in the past review has been addressed. The code now re-checks both conditions after the progress summary and before triggering do_replan():

shutdown_result = check_shutdown(ctx, shutdown_checker, turns)
if shutdown_result is not None:
    return self._finalize(shutdown_result, all_plans, replans_used)
budget_result = check_budget(ctx, budget_checker, turns)
if budget_result is not None:
    return self._finalize(budget_result, all_plans, replans_used)

139-214: LGTM!

The execute method is well-structured with clear phase separation:

  1. Pre-execution setup (model selection, tool definitions, budget warning)
  2. Planning phase with proper early termination handling
  3. Step execution delegation

The method correctly handles ExecutionResult returns from the planning phase and propagates them through _finalize.


628-642: LGTM!

The stagnation detection is correctly scoped to only the current step's turns (turns[step_start_idx:]), preventing cross-step false positives. The correction count is tracked per-step and the stagnation result is properly handled with turn records updated on termination.


778-793: LGTM!

The _finalize method correctly:

  1. Deep-copies the metadata dict per project conventions
  2. Adds hybrid-specific fields (loop_type, plans, final_plan, replans_used)
  3. Returns an immutable copy via model_copy
src/synthorg/engine/hybrid_helpers.py (6)

627-639: LGTM!

The checkpoint callback threading issue flagged in the past review has been addressed. The do_replan function now:

  1. Accepts checkpoint_callback as a keyword parameter (line 638)
  2. Passes it through to call_planner (line 707)

This ensures replan turns are checkpointed consistently with other LLM turns.


254-255: LGTM!

The turn exhaustion guard flagged in the past review has been addressed. The function now returns MAX_TURNS early before attempting any LLM calls:

if not ctx.has_turns_remaining:
    return build_result(ctx, TerminationReason.MAX_TURNS, turns)

377-386: LGTM!

The bool() parsing issue flagged in the past review has been addressed. The function now explicitly handles boolean and string types:

raw = data.get("replan")
if isinstance(raw, bool):
    return raw
if isinstance(raw, str):
    return raw.lower() == "true"
# Non-bool, non-str, or missing -- treat as no-replan
return False

This prevents bool("false") from incorrectly returning True.


214-215: LGTM!

The exception handling uses the correct PEP 758 syntax (except MemoryError, RecursionError:) as required by the coding guidelines.


695-698: LGTM!

The replan message correctly uses MessageRole.USER since it's requesting a new plan from the LLM (similar to the initial planning prompt). This is distinct from the progress-summary which is internal coordination.


70-97: LGTM!

The truncate_plan function correctly handles plan truncation:

  1. Returns the original plan unchanged if within limits
  2. Logs a warning with detailed context
  3. Renumbers steps to maintain sequential step numbers (required by ExecutionPlan validation)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 1228-1232: The current sanitization call using re.sub on raw_msg
(assigning to sanitized) misses several common prefixes and relative paths;
update the regex passed to re.sub (the pattern used where sanitized is defined)
to also match /root, /srv, /app, /data and relative paths starting with ./ or
../, or alternatively broaden the absolute-path branch to cover additional
common top-level dirs and a dot-segment branch for ./ and ../ so all such
occurrences are replaced with "[REDACTED_PATH]". Ensure the change targets the
re.sub invocation that produces sanitized from raw_msg so all new patterns are
sanitized consistently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fcfec21d-4680-4f9e-a4a4-6775708a447e

📥 Commits

Reviewing files that changed from the base of the PR and between 02f362f and 34b19d3.

📒 Files selected for processing (5)
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/engine/test_hybrid_loop_advanced.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict) for all data models
Use PEP 758 except syntax: except A, B: (no parentheses) — not PEP 654 syntax
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Include type hints on all public functions and classes; enforce with mypy strict mode
Use Google-style docstrings for all public classes and functions (enforced by ruff D rules)
Prefer immutability: create new objects rather than mutating existing ones; use copy.deepcopy() at construction and MappingProxyType for non-Pydantic collections
Use asyncio.TaskGroup for fan-out/fan-in parallel operations (structured concurrency) instead of bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Set line length to 88 characters (enforced by ruff)

Files:

  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/engine/test_hybrid_loop_advanced.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST import logger via from synthorg.observability import get_logger then use logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code; only use the centralized get_logger function
Always use structured logging with keyword arguments: logger.info(EVENT, key=value) — never use formatted strings like logger.info('msg %s', val)
Use event name constants from synthorg.observability.events. (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Use NotBlankStr (from core.types) for all identifier/name fields in models instead of manual whitespace validators
Never implement retry logic in provider driver subclasses or calling code — it is handled automatically by BaseCompletionProvider base class
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, or comments; use generic names like example-provider or example-large-001
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves
Use @computed_field for derived values instead of storing and validating redundant fields

Files:

  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/observability/events/execution.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark all unit tests with @pytest.mark.unit, integration tests with @pytest.mark.integration, e2e tests with @pytest.mark.e2e, and slow tests with @pytest.mark.slow
Use pytest.mark.parametrize for testing multiple similar cases instead of writing separate test functions
Use Hypothesis (@given + @settings) for property-based testing in Python unit tests
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests; use generic names like test-provider, test-small-001, or test-medium-001
Set test timeout to 30 seconds per test; use pytest-xdist with -n auto for parallel execution (ALWAYS include this flag, never run tests sequentially)
Maintain minimum 80% code coverage (enforced in CI); use pytest --cov=synthorg --cov-fail-under=80 to verify locally
Mock time.monotonic() and asyncio.sleep() in timing-sensitive tests to make them deterministic instead of widening timing margins
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally

Files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
🧠 Learnings (14)
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events.<domain> (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events

Applied to files:

  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/observability/events/execution.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR level with context before raising exceptions

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Always use structured logging with keyword arguments: `logger.info(EVENT, key=value)` — never use formatted strings like `logger.info('msg %s', val)`

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to tests/**/*.py : Use `pytest.mark.parametrize` for testing multiple similar cases instead of writing separate test functions

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Parametrize: Prefer pytest.mark.parametrize for testing similar cases.

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
🧬 Code graph analysis (4)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/plan_helpers.py (3)
  • assess_step_success (91-109)
  • extract_task_summary (66-88)
  • update_step_status (28-63)
src/synthorg/engine/plan_models.py (1)
  • StepStatus (15-22)
src/synthorg/engine/agent_engine.py (1)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/plan_helpers.py (3)
src/synthorg/providers/enums.py (2)
  • FinishReason (24-31)
  • MessageRole (15-21)
src/synthorg/engine/context.py (1)
  • AgentContext (104-412)
src/synthorg/engine/plan_models.py (2)
  • ExecutionPlan (53-88)
  • StepStatus (15-22)
tests/unit/engine/test_hybrid_loop_advanced.py (6)
src/synthorg/engine/hybrid_helpers.py (1)
  • _parse_replan_decision (356-413)
src/synthorg/engine/hybrid_loop.py (5)
  • HybridLoop (84-794)
  • stagnation_detector (126-128)
  • execute (139-214)
  • config (116-118)
  • compaction_callback (131-133)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/loop_protocol.py (2)
  • TerminationReason (28-37)
  • TurnRecord (40-83)
src/synthorg/engine/stagnation/models.py (2)
  • StagnationResult (87-140)
  • StagnationVerdict (15-20)
tests/unit/engine/_hybrid_loop_helpers.py (7)
  • _ctx_with_user_msg (159-161)
  • _make_invoker (154-156)
  • _single_step_plan (46-55)
  • _stop_response (80-86)
  • _summary_response (89-100)
  • _tool_use_response (103-113)
  • execute (143-151)
🔇 Additional comments (15)
src/synthorg/engine/agent_engine.py (2)

9-9: Path sanitization implemented correctly.

The sanitization logic addresses the prior review finding. The regex patterns cover:

  • Windows paths (C:\...)
  • Unix paths starting with /home, /usr, /var, /tmp, /etc, /opt
  • HTTP/HTTPS URLs

Also applies to: 1225-1237


164-166: Clean integration of hybrid_loop_config.

The configuration is correctly threaded through:

  1. Accepted as optional parameter in __init__
  2. Stored on the instance
  3. Passed to build_execution_loop when auto-selection triggers

Also applies to: 190-190, 204-204, 1069-1074

src/synthorg/engine/plan_execute_loop.py (3)

59-63: Good extraction of shared utilities.

Importing assess_step_success, extract_task_summary, and update_step_status from plan_helpers consolidates logic shared with HybridLoop, improving maintainability.


444-446: Correct fix for stale plan metadata.

Syncing all_plans[-1] = plan before building the final result ensures that metadata["final_plan"] reflects the latest step statuses. This addresses the bug mentioned in the commit messages.


583-583: Helper function usage is consistent.

The calls to extract_task_summary, assess_step_success, and update_step_status properly replace the removed in-class methods while maintaining the same behavior.

Also applies to: 628-628, 818-818

src/synthorg/engine/plan_helpers.py (4)

1-26: Well-structured module setup.

The module correctly follows all coding guidelines:

  • Centralized logger via get_logger(__name__)
  • Event constants imported from synthorg.observability.events.execution
  • Type hints and Google-style docstrings present

28-63: Correct implementation of update_step_status.

The function:

  • Validates bounds and logs at WARNING before raising IndexError
  • Uses model_copy to maintain immutability
  • Includes all required context (step_idx, step_count, revision) in the log

66-88: Correct implementation of extract_task_summary.

The function properly handles the fallback chain:

  1. Task title (preferred)
  2. First user message content
  3. Default "task" with warning log using the event constant

91-109: LGTM!

The function is simple and well-documented. The rationale for treating MAX_TOKENS as success (truncated summary still represents completed work) is clearly explained.

tests/unit/engine/test_hybrid_loop_advanced.py (5)

31-128: Comprehensive stagnation detection tests.

Both test cases properly verify:

  • TERMINATE verdict leading to TerminationReason.STAGNATION
  • INJECT_PROMPT verdict allowing recovery to TerminationReason.COMPLETED

The custom detector classes follow the protocol interface correctly.


130-159: Good verification of model tiering.

The test correctly validates that planner_model and executor_model are used for their respective phases, checking the recorded model sequence.


162-307: Well-structured integration tests.

The tests cover important scenarios:

  • Metadata structure validation (loop_type, replans_used, final_plan)
  • Context immutability (original turn_count and conversation unchanged)
  • Checkpoint callback invocation count and failure isolation
  • Compaction callback integration

310-344: Excellent use of pytest.mark.parametrize.

The 12 test cases are properly consolidated into a single parametrized test with descriptive IDs, following the coding guidelines. This addresses the previous review comment.


347-391: Good error handling test coverage.

Both tests verify that provider failures during planning and step execution correctly result in TerminationReason.ERROR. The inline FailingProvider classes are appropriate for testing error paths.

src/synthorg/observability/events/execution.py (1)

62-64: Event constants follow naming conventions.

The new constants are properly:

  • Typed as Final[str]
  • Namespaced under execution.plan.* and execution.hybrid.*
  • Named descriptively for their use cases

Also applies to: 86-99

Aureliolo and others added 8 commits March 19, 2026 14:18
Implement Loop 3 (Hybrid) from the design spec (SS6.5). The hybrid loop
combines strategic planning (3-7 steps) with per-step mini-ReAct
sub-loops, progress-summary checkpoints, and LLM-decided replanning.

Key features:
- Per-step turn limit (max_turns_per_step, default 5)
- Progress summary after each step via LLM call (tools=None)
- Optional replanning on completion (allow_replan_on_completion)
- Plan truncation to max_plan_steps with warning
- Turn budget advisory at loop start
- Full integration with stagnation, compaction, and checkpoint systems

New files:
- hybrid_loop.py: HybridLoop class (ExecutionLoop protocol)
- hybrid_models.py: HybridLoopConfig (frozen Pydantic)
- plan_helpers.py: extracted shared plan utilities

Integration:
- loop_selector.py: hybrid added to _BUILDABLE_LOOP_TYPES, factory updated
- AutoLoopConfig: hybrid_fallback defaults to None (hybrid is native)
- agent_engine.py: hybrid_loop_config wired through

Tests: 47 new tests (27 hybrid loop + 20 config model), all existing
tests updated for hybrid-native behavior. Full suite: 9437 passed.

Closes #199

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add type: ignore annotations for parametrized dict unpacking and
isinstance guards before len/indexing on metadata values.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ogging

Pre-reviewed by 8 agents, 14 findings addressed:

- Fix misleading replan prompt: _do_replan now differentiates between
  step failure and success-triggered replanning via step_failed param
- Extract _handle_completed_step from _run_steps to reduce nesting
- Extract _build_summary_prompt and _parse_replan_decision to module level
- Add logging to _parse_replan_decision (JSON parse fallback, non-dict)
- Warn on empty progress summary response content
- Remove unused EXECUTION_HYBRID_STEP_CHECKPOINT constant
- Update stale "not yet implemented" docstrings in loop_selector.py
- Update CLAUDE.md Package Structure with hybrid loop files
- Update docs/design/engine.md: auto-selection, stagnation, compaction
- Add stagnation turns fixup comment
- Extract _MAX_TASK_SUMMARY_LENGTH constant in plan_helpers.py
- Add 12 new tests: _parse_replan_decision (8), checkpoint callback (2),
  provider exceptions (2)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docs:
- Fix wrong field name allow_replan -> allow_replan_on_completion
- Add missing hybrid YAML config fields (planner_model, executor_model, max_replans)
- Update CompactionCallback wiring to include HybridLoop
- Replace em-dashes with ASCII dashes in plan_execute_loop.py
- Fix misleading "3-7 steps" docstring (only ceiling enforced)
- Remove unnecessary inline comments

Security:
- Escape LLM-sourced step content before XML-framed re-injection
- Cap plan step count at parse time (max 50) to prevent unbounded allocation
- Sanitize exception messages in agent_engine error path

Code quality:
- Split hybrid_loop.py (1219 -> 785 lines) into hybrid_loop.py + hybrid_helpers.py
- Fix _parse_replan_decision: regex heuristic, warning on fallback, dedicated event constant
- Add TOOL_USE with no tool calls guard (parity with ReactLoop)
- Add bounds check to update_step_status
- Add logger declaration to plan_helpers.py
- Add PlanExecuteConfig extra="forbid" for strict validation
- Add checkpoint_callback parameter to build_execution_loop factory
- Add new event constants: EXECUTION_HYBRID_REPLAN_PARSE_TRACE, EXECUTION_HYBRID_PROGRESS_SUMMARY_EMPTY

Tests:
- Add direct unit tests for plan_helpers.py (10 tests)
- Add tests: compaction callback, replan budget sharing, last-step guard, success-path replan prompt
- Fix test_provider_error_during_step to use FailingProvider
- Move _parse_replan_decision import to module level
- Strengthen final_plan metadata assertion
- Extract budget setup helper in auto_loop tests
- Parametrize duplicate fallback tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Import _parse_replan_decision from hybrid_helpers (not hybrid_loop)
- Add CompletionConfig import for do_replan test calls
- Remove stale re-export of _parse_replan_decision from hybrid_loop

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Source fixes:
- Thread checkpoint_callback through do_replan and attempt_replan
  so replan turns use the same checkpoint subsystem as initial planning
- Add has_turns_remaining guard in call_planner and run_progress_summary
  to prevent wasted LLM calls when turn budget is exhausted
- Parse replan decision explicitly (bool/str) instead of coercing with
  bool() to prevent "yes"/1/[1] from triggering unintended replans
- Add shutdown/budget re-check in _decide_replan_on_completion before
  do_replan (mirrors attempt_replan guards)
- Sync live plan into all_plans before finalization so final_plan
  metadata reflects actual step statuses
- Replace hardcoded event string with EXECUTION_PLAN_SUMMARY_FALLBACK
  constant in plan_helpers.py
- Capture task_summary before ctx.with_message in call_planner and
  plan_execute_loop._call_planner to prevent planning prompt from
  being used as task summary
- Add hybrid_loop_config to AgentEngine class docstring

Tests:
- Add checkpoint_callback/compaction_callback assertions to
  build_execution_loop factory tests
- Split test_hybrid_loop.py (1217 lines) into 3 focused modules
  plus shared helpers -- all under 800 lines
- Add replan parse tests for string and integer values

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Redact filesystem paths and URLs from error_message in agent_engine
  to prevent internal details leaking to API consumers
- Log WARNING with event constant before raising IndexError in
  update_step_status (CLAUDE.md: all error paths must log before raising)
- Parametrize TestParseReplanDecision (12 individual methods -> 1
  parametrized test with descriptive IDs)
- Fix stale all_plans in PlanExecuteLoop._build_final_result (same bug
  as hybrid_loop, syncs live plan so final_plan metadata reflects
  actual step statuses)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover /root, /srv, /app, /data and relative paths (./  ../) in
the error_message sanitization regex.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/engine/loop_selector.py (1)

266-317: ⚠️ Potential issue | 🟠 Major

Pass checkpoint_callback to ReactLoop in build_execution_loop().

ReactLoop.__init__ accepts and uses checkpoint_callback (lines 87, 93, 275–277), but build_execution_loop() omits it when instantiating ReactLoop. This parameter should be wired consistently across all loop types for feature parity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/loop_selector.py` around lines 266 - 317,
build_execution_loop currently omits passing checkpoint_callback into the
ReactLoop constructor; update the ReactLoop instantiation inside
build_execution_loop to include checkpoint_callback=checkpoint_callback so
ReactLoop receives the same per-turn checkpoint callback as PlanExecuteLoop and
HybridLoop (refer to build_execution_loop, ReactLoop, and the
checkpoint_callback parameter).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/hybrid_helpers.py`:
- Around line 458-467: Change the ChatMessage construction for the progress
summary so it uses MessageRole.SYSTEM instead of MessageRole.USER: in the
creation of summary_msg (which calls _build_summary_prompt with plan, step_idx,
and ask_replan), replace MessageRole.USER with MessageRole.SYSTEM to align this
system-initiated internal prompt with other summary/reconciliation prompts (see
uses in compaction/summarizer and checkpoint/resume).

---

Outside diff comments:
In `@src/synthorg/engine/loop_selector.py`:
- Around line 266-317: build_execution_loop currently omits passing
checkpoint_callback into the ReactLoop constructor; update the ReactLoop
instantiation inside build_execution_loop to include
checkpoint_callback=checkpoint_callback so ReactLoop receives the same per-turn
checkpoint callback as PlanExecuteLoop and HybridLoop (refer to
build_execution_loop, ReactLoop, and the checkpoint_callback parameter).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 40a20aa7-29a2-40cf-8874-be8a0ea46f93

📥 Commits

Reviewing files that changed from the base of the PR and between 34b19d3 and 83f8be1.

📒 Files selected for processing (21)
  • CLAUDE.md
  • docs/design/engine.md
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/hybrid_models.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/engine/plan_models.py
  • src/synthorg/engine/plan_parsing.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/engine/_hybrid_loop_helpers.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_hybrid_loop.py
  • tests/unit/engine/test_hybrid_loop_advanced.py
  • tests/unit/engine/test_hybrid_loop_replanning.py
  • tests/unit/engine/test_hybrid_models.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_plan_helpers.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Web
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Python 3.14+ with PEP 649 native lazy annotations; no from __future__ import annotations
Use PEP 758 except syntax: except A, B: (no parentheses) in Python 3.14
All public functions and classes require type hints; mypy strict mode enforced
Use Google-style docstrings on all public classes and functions; enforced by ruff D rules
Create new objects instead of mutating existing ones; use copy.deepcopy() at construction and MappingProxyType wrapping for non-Pydantic collections (registries, BaseTool)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models for runtime state evolution; never mix static config fields with mutable runtime fields
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values instead of storing redundant fields; use NotBlankStr for all identifier/name fields
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code; prefer structured concurrency over bare create_task
Enforce 88-character line length via ruff
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files

Files:

  • src/synthorg/engine/plan_parsing.py
  • src/synthorg/engine/plan_models.py
  • src/synthorg/engine/__init__.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/plan_execute_loop.py
  • tests/unit/engine/test_hybrid_models.py
  • src/synthorg/engine/loop_selector.py
  • tests/unit/engine/test_hybrid_loop_replanning.py
  • tests/unit/engine/_hybrid_loop_helpers.py
  • tests/unit/engine/test_loop_selector.py
  • src/synthorg/engine/hybrid_models.py
  • src/synthorg/engine/plan_helpers.py
  • tests/unit/engine/test_hybrid_loop.py
  • tests/unit/engine/test_hybrid_loop_advanced.py
  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/hybrid_loop.py
  • tests/unit/engine/test_plan_helpers.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST use from synthorg.observability import get_logger and logger = get_logger(__name__); never use import logging or logging.getLogger() in application code
Variable name must always be logger (not _logger, not log)
Use event name constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api); import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging: logger.info(EVENT, key=value); never use formatted strings like logger.info("msg %s", val)
Log all error paths at WARNING or ERROR with context before raising
Log all state transitions at INFO level
Use DEBUG level for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically; never implement retry logic in driver subclasses or calling code
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig
Retryable errors: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError (all with is_retryable=True). Non-retryable errors raise immediately without retry. RetryExhaustedError signals all retries failed.
Rate limiter respects RateLimitError.retry_after from providers and automatically pauses future requests
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Vendor names allowed only in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths.

Files:

  • src/synthorg/engine/plan_parsing.py
  • src/synthorg/engine/plan_models.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/engine/hybrid_models.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/hybrid_loop.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Linter (ruff) enforces ruff check + format; mypy strict type-checking enforced in CI and pre-push; pytest with coverage 80% minimum enforced in CI
PEP 649 lazy annotations supported natively in Python 3.14; no workarounds needed for forward references

Files:

  • src/synthorg/engine/plan_parsing.py
  • src/synthorg/engine/plan_models.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/engine/hybrid_models.py
  • src/synthorg/engine/plan_helpers.py
  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/hybrid_loop.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use test-provider, test-small-001, etc. instead of real vendor names; never use vendor names in tests
Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain 80% minimum code coverage; enforced in CI
asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed
30-second timeout per test; enforce via pytest configuration
ALWAYS include -n auto when running pytest; never run tests sequentially via pytest-xdist parallelism
Prefer @pytest.mark.parametrize for testing similar cases
Use Hypothesis with @given and @settings for property-based testing; profiles: ci (50 examples, default) and dev (1000 examples) via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Files:

  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_hybrid_models.py
  • tests/unit/engine/test_hybrid_loop_replanning.py
  • tests/unit/engine/_hybrid_loop_helpers.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_hybrid_loop.py
  • tests/unit/engine/test_hybrid_loop_advanced.py
  • tests/unit/engine/test_plan_helpers.py
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Docs are in docs/ (Markdown, built with Zensical); design spec in docs/design/ (7 pages); architecture in docs/architecture/; roadmap in docs/roadmap/; security in docs/security.md; licensing in docs/licensing.md; reference in docs/reference/; library reference auto-generated in docs/api/ from docstrings via mkdocstrings + Griffe

Files:

  • docs/design/engine.md
🧠 Learnings (21)
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models for runtime state evolution; never mix static config fields with mutable runtime fields

Applied to files:

  • src/synthorg/engine/plan_models.py
  • src/synthorg/engine/hybrid_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/plan_models.py
  • src/synthorg/engine/hybrid_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

  • src/synthorg/engine/plan_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST use `from synthorg.observability import get_logger` and `logger = get_logger(__name__)`; never use `import logging` or `logging.getLogger()` in application code

Applied to files:

  • src/synthorg/engine/loop_selector.py
  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to src/synthorg/**/*.py : Always use structured logging: `logger.info(EVENT, key=value)`; never use formatted strings like `logger.info("msg %s", val)`

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to src/synthorg/**/*.py : Variable name must always be `logger` (not `_logger`, not `log`)

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to **/*.py : Keep functions under 50 lines and files under 800 lines

Applied to files:

  • tests/unit/engine/test_hybrid_loop.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Parametrize: Prefer pytest.mark.parametrize for testing similar cases.

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins.

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.

Applied to files:

  • tests/unit/engine/test_hybrid_loop_advanced.py
🧬 Code graph analysis (10)
src/synthorg/engine/__init__.py (2)
src/synthorg/engine/hybrid_loop.py (1)
  • HybridLoop (84-794)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
tests/unit/engine/test_agent_engine_auto_loop.py (3)
src/synthorg/budget/enforcer.py (2)
  • BudgetEnforcer (56-473)
  • cost_tracker (89-91)
web/src/api/types.ts (2)
  • BudgetConfig (406-413)
  • BudgetAlertConfig (393-397)
tests/unit/engine/conftest.py (2)
  • mock_provider_factory (314-316)
  • MockCompletionProvider (207-289)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/plan_helpers.py (3)
  • assess_step_success (91-109)
  • extract_task_summary (66-88)
  • update_step_status (28-63)
src/synthorg/engine/plan_models.py (1)
  • StepStatus (15-22)
tests/unit/engine/test_hybrid_models.py (1)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/loop_selector.py (5)
src/synthorg/engine/hybrid_loop.py (5)
  • HybridLoop (84-794)
  • approval_gate (121-123)
  • stagnation_detector (126-128)
  • compaction_callback (131-133)
  • config (116-118)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/plan_execute_loop.py (4)
  • approval_gate (131-133)
  • stagnation_detector (136-138)
  • compaction_callback (141-143)
  • config (126-128)
src/synthorg/engine/plan_models.py (1)
  • PlanExecuteConfig (91-118)
src/synthorg/engine/loop_protocol.py (1)
  • ExecutionLoop (158-196)
tests/unit/engine/test_loop_selector.py (5)
tests/unit/engine/conftest.py (1)
  • engine (449-460)
src/synthorg/engine/hybrid_loop.py (6)
  • HybridLoop (84-794)
  • config (116-118)
  • get_loop_type (135-137)
  • approval_gate (121-123)
  • stagnation_detector (126-128)
  • compaction_callback (131-133)
src/synthorg/engine/loop_selector.py (3)
  • select_loop_type (228-263)
  • AutoLoopConfig (94-168)
  • build_execution_loop (266-324)
src/synthorg/engine/plan_execute_loop.py (5)
  • config (126-128)
  • get_loop_type (145-147)
  • approval_gate (131-133)
  • stagnation_detector (136-138)
  • compaction_callback (141-143)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/plan_helpers.py (3)
src/synthorg/providers/enums.py (2)
  • FinishReason (24-31)
  • MessageRole (15-21)
src/synthorg/engine/context.py (1)
  • AgentContext (104-412)
src/synthorg/engine/plan_models.py (2)
  • ExecutionPlan (53-88)
  • StepStatus (15-22)
tests/unit/engine/test_hybrid_loop_advanced.py (4)
src/synthorg/engine/hybrid_helpers.py (1)
  • _parse_replan_decision (356-413)
src/synthorg/engine/hybrid_loop.py (3)
  • HybridLoop (84-794)
  • stagnation_detector (126-128)
  • config (116-118)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/loop_protocol.py (1)
  • TerminationReason (28-37)
src/synthorg/engine/hybrid_loop.py (8)
src/synthorg/providers/enums.py (1)
  • MessageRole (15-21)
src/synthorg/engine/hybrid_helpers.py (5)
  • attempt_replan (522-624)
  • call_planner (228-312)
  • do_replan (627-719)
  • invoke_checkpoint_callback (195-222)
  • run_progress_summary (416-516)
src/synthorg/engine/hybrid_models.py (1)
  • HybridLoopConfig (12-73)
src/synthorg/engine/loop_helpers.py (7)
  • build_result (512-527)
  • call_provider (156-213)
  • check_budget (107-153)
  • check_response_errors (216-250)
  • check_shutdown (62-104)
  • check_stagnation (530-591)
  • execute_tool_calls (253-355)
src/synthorg/engine/loop_protocol.py (3)
  • ExecutionResult (86-147)
  • TerminationReason (28-37)
  • TurnRecord (40-83)
src/synthorg/engine/plan_helpers.py (1)
  • update_step_status (28-63)
src/synthorg/engine/plan_models.py (3)
  • ExecutionPlan (53-88)
  • PlanStep (25-50)
  • StepStatus (15-22)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
tests/unit/engine/test_plan_helpers.py (3)
src/synthorg/engine/context.py (3)
  • AgentContext (104-412)
  • from_identity (190-225)
  • with_message (227-236)
src/synthorg/engine/plan_models.py (3)
  • ExecutionPlan (53-88)
  • PlanStep (25-50)
  • StepStatus (15-22)
src/synthorg/providers/enums.py (2)
  • FinishReason (24-31)
  • MessageRole (15-21)
🔇 Additional comments (75)
src/synthorg/engine/plan_models.py (1)

103-103: LGTM — stricter config validation.

Adding extra="forbid" aligns PlanExecuteConfig with the new HybridLoopConfig pattern and prevents misconfigured fields from being silently ignored.

src/synthorg/engine/plan_parsing.py (1)

210-221: LGTM — sensible parse-time safety cap.

The 50-step limit prevents unbounded allocation from misbehaving LLM output while being generous enough for legitimate plans. The warning log provides good context for debugging. This complements the per-loop max_plan_steps config that applies further truncation after parsing.

src/synthorg/engine/__init__.py (1)

113-114: LGTM — clean public API extension.

The new hybrid-loop types are properly re-exported and added to __all__, following the existing alphabetical ordering convention.

Also applies to: 287-288

CLAUDE.md (1)

127-127: LGTM — comprehensive documentation update.

The engine section now accurately documents the hybrid loop module, its configuration, and the shared plan helpers. The "optional hybrid fallback" wording correctly reflects the default None behavior in AutoLoopConfig.

src/synthorg/engine/agent_engine.py (2)

164-166: LGTM — clean hybrid config threading.

The hybrid_loop_config parameter is properly:

  • Documented in the class docstring
  • Accepted as an optional __init__ parameter
  • Stored on the instance
  • Passed to build_execution_loop during auto-selection

This enables runtime configuration of hybrid loops without breaking the existing API.

Also applies to: 190-190, 204-204, 1073-1073


1225-1239: LGTM — improved error message sanitization.

The sanitization logic now:

  • Redacts Windows paths (C:\...)
  • Redacts common Unix paths (/home, /usr, /var, /tmp, /etc, /opt, /root, /srv, /app, /data)
  • Redacts relative paths (./, ../)
  • Redacts URLs (http://, https://)
  • Strips non-printable characters
  • Caps at 200 characters
  • Falls back to "details redacted" when no alphanumeric content remains

This addresses the prior review feedback about path leakage.

tests/unit/engine/test_agent_engine_auto_loop.py (2)

55-67: LGTM — good test helper extraction.

The _make_budget_enforcer() helper centralizes budget setup with sensible defaults, improving test maintainability and reducing duplication.


214-264: LGTM — test expectations updated for hybrid loop.

The tests correctly expect "hybrid" for complex tasks with OK budget (or unknown budget), and the additional summary_response accounts for the hybrid loop's progress-summary LLM call after step execution.

Also applies to: 274-325

docs/design/engine.md (3)

405-411: LGTM — config example matches implementation.

The YAML example now correctly uses allow_replan_on_completion (not allow_replan) and includes max_replans: 3, matching HybridLoopConfig field names.


434-436: LGTM — hybrid fallback docs updated.

The documentation now correctly reflects that hybrid_fallback defaults to None and the hybrid loop runs directly when not configured, aligning with AutoLoopConfig defaults.

Also applies to: 487-489


606-608: LGTM — subsystem integration docs extended.

Stagnation detection and compaction documentation now correctly describe HybridLoop's per-step mini-ReAct semantics, matching the implementation pattern shared with PlanExecuteLoop.

Also applies to: 650-651, 688-691

src/synthorg/engine/plan_execute_loop.py (3)

59-63: LGTM — clean extraction to shared helpers.

The refactor to use assess_step_success, extract_task_summary, and update_step_status from plan_helpers enables code reuse with HybridLoop while maintaining the same behavior.


444-446: LGTM — ensures final metadata reflects step statuses.

Syncing all_plans[-1] = plan before building the final result ensures the final_plan metadata correctly includes the latest step statuses (COMPLETED, FAILED, etc.) rather than stale PENDING statuses.


279-283: LGTM — consistent use of shared helpers.

All call sites now use the extracted helper functions with the same semantics as the original inline logic.

Also applies to: 310-314, 373-373, 583-583, 628-628, 818-818

src/synthorg/engine/loop_selector.py (4)

22-25: LGTM!

Correct imports added for HybridLoop and the new CheckpointCallback type. The type-checking import pattern is properly used for types only needed in annotations.


47-50: LGTM!

Adding "hybrid" to _BUILDABLE_LOOP_TYPES correctly enables the factory to construct hybrid loops. The docstring accurately describes the purpose.


104-129: LGTM!

The hybrid_fallback field defaulting to None aligns with the PR objective of making hybrid a first-class loop type. The docstring clearly explains the new behavior.


156-168: LGTM!

The validation logic correctly ensures both hybrid_fallback and default_loop_type are buildable when set, preventing infinite loops or runtime failures.

src/synthorg/engine/hybrid_models.py (1)

1-73: LGTM!

Well-designed frozen configuration model following project conventions:

  • ConfigDict(frozen=True, extra="forbid") ensures immutability and rejects unknown fields.
  • NotBlankStr | None for model overrides prevents whitespace-only values.
  • Sensible numeric constraints (ge/le) with reasonable defaults.
  • Comprehensive Google-style docstring documents all attributes.
tests/unit/engine/test_hybrid_models.py (1)

1-101: LGTM!

Comprehensive unit tests for HybridLoopConfig:

  • Default value assertions match the model definition.
  • Custom value override test confirms all fields are settable.
  • Immutability test catches frozen model violations.
  • Parametrized boundary tests cover all numeric constraints efficiently.
  • Blank string rejection validates NotBlankStr behavior.
  • Uses test-* model names per project guidelines.
tests/unit/engine/_hybrid_loop_helpers.py (1)

1-180: LGTM!

Well-organized test helper module that centralizes fixture construction:

  • Factory functions provide consistent CompletionResponse variants for different scenarios.
  • _StubTool and _make_invoker enable tool-use testing without external dependencies.
  • _make_plan_model creates valid ExecutionPlan fixtures.
  • Extracted to keep individual test files under 800 lines per project guidelines.
tests/unit/engine/test_loop_selector.py (4)

165-184: LGTM!

Good use of parametrization to test both hybrid_fallback=None (preserves hybrid) and explicit fallback behavior in a single test method.


217-227: LGTM!

New test case validates the important scenario where budget is OK and no fallback is configured, ensuring hybrid is preserved.


310-321: LGTM!

Good tests confirming that hybrid_fallback=None and default_loop_type="hybrid" are now valid configurations since hybrid is buildable.


406-441: LGTM!

Comprehensive hybrid loop factory tests:

  • Basic construction and get_loop_type() verification.
  • Config propagation confirms HybridLoopConfig values flow through.
  • Gate/callback wiring test addresses prior review feedback by asserting checkpoint_callback and compaction_callback propagation.

Note: Accessing loop._checkpoint_callback (Line 440) is acceptable in tests to verify internal wiring.

tests/unit/engine/test_hybrid_loop_replanning.py (3)

1-95: LGTM!

Solid replanning unit tests:

  • test_max_replans_exhausted verifies max_replans=0 terminates with ERROR.
  • test_successful_replan_on_failure validates replan → completion flow and metadata tracking.
  • test_content_filter_during_step_returns_error covers content filter edge case.

97-159: LGTM!

Excellent test validating that replan prompts differ semantically between success and failure triggers. Directly testing do_replan helper with step_failed=True/False and asserting prompt content differences ensures the LLM receives appropriate context.


162-228: LGTM!

Important behavioral tests:

  • test_replan_budget_shared_between_failure_and_completion confirms max_replans is a shared budget across all replan triggers.
  • test_last_step_no_replan_on_completion verifies completion-triggered replanning is correctly skipped on the final step.
src/synthorg/engine/plan_helpers.py (4)

1-26: LGTM!

Proper module setup:

  • Correct logger initialization with get_logger(__name__).
  • Event constants imported from synthorg.observability.events.execution.
  • Type-checking imports used appropriately.

28-63: LGTM!

update_step_status correctly:

  • Validates bounds and logs EXECUTION_PLAN_STEP_INDEX_OUT_OF_RANGE with structured context before raising IndexError.
  • Returns a new immutable plan via model_copy without mutating the original.
  • Converts steps list back to tuple for the frozen model.

66-88: LGTM!

extract_task_summary properly prioritizes sources:

  1. Task title from task_execution when available.
  2. First user message content.
  3. Fallback to "task" with a warning log including execution_id.

Truncation to _MAX_TASK_SUMMARY_LENGTH (200) prevents overly long summaries.


91-109: LGTM!

assess_step_success correctly treats both STOP and MAX_TOKENS as successful outcomes. The docstring clearly explains why MAX_TOKENS is considered success (truncated summary still represents completed work).

tests/unit/engine/test_plan_helpers.py (4)

1-57: LGTM!

Well-structured test helpers:

  • _make_plan creates valid ExecutionPlan fixtures with configurable step count.
  • _make_response creates minimal CompletionResponse for testing finish reason classification.

64-112: LGTM!

Thorough update_step_status test coverage:

  • Verifies immutability (original plan unchanged).
  • Tests boundary conditions (first index, last index).
  • Validates IndexError with descriptive message for out-of-range and negative indices.

120-179: LGTM!

Comprehensive extract_task_summary tests:

  • Validates task title extraction when task_execution present.
  • Tests fallback to first user message.
  • Verifies "task" fallback for empty context.
  • Confirms exact 200-character truncation.

187-208: LGTM!

Efficient parametrized test covers all FinishReason variants, validating that STOP and MAX_TOKENS return True while TOOL_USE, CONTENT_FILTER, and ERROR return False.

src/synthorg/observability/events/execution.py (1)

62-99: LGTM!

The new observability event constants follow the established naming conventions and are well-organized with a clear section comment for hybrid loop events. The use of Final[str] is consistent with the rest of the module.

tests/unit/engine/test_hybrid_loop.py (7)

1-42: LGTM!

The module structure is clean with proper imports, clear docstring explaining test scope, and appropriate separation of concerns (replanning and advanced tests in separate files as documented).


44-67: LGTM!

Protocol compliance and configuration tests are concise and verify the essential contract: loop type identity, interface implementation, and config defaults/overrides.


70-146: LGTM!

Basic execution tests properly verify turn sequencing and call categories. The mix of single-step and multi-step plans appropriately covers both edge cases and the strategic multi-step flow.


149-179: LGTM!

Tool invocation test correctly verifies that tool calls are counted and the turn sequence includes the tool response handling.


182-242: LGTM!

Per-step turn limit tests effectively verify the hybrid-specific behavior where steps that exhaust their turn budget are marked as failed and trigger the replan exhaustion path.


245-298: LGTM!

Progress summary tests properly verify the LLM-decided replanning mechanism with both enabled and disabled configurations.


301-485: LGTM!

Budget, shutdown, max turns, and plan parsing tests comprehensively cover termination conditions. The counter-based callbacks effectively simulate mid-execution interruptions, and the plan truncation test verifies that max_plan_steps is enforced.

tests/unit/engine/test_hybrid_loop_advanced.py (7)

1-29: LGTM!

The module correctly separates advanced test scenarios with clean imports and proper use of shared test helpers.


31-127: LGTM!

Stagnation tests properly implement the StagnationDetector protocol and verify both termination and correction injection behaviors within step execution.


130-159: LGTM!

Model tiering test correctly verifies that planner_model is used for planning and progress summaries while executor_model is used for step execution. The test uses appropriate test-* model names per coding guidelines.


162-272: LGTM!

Metadata structure, context immutability, and checkpoint callback tests provide solid coverage of hybrid-specific requirements. The callback failure test correctly verifies that checkpoint exceptions don't propagate.


275-307: LGTM!

Compaction callback test verifies the integration point is working. The assertion len(compaction_calls) >= 1 appropriately accounts for implementation details of when compaction is triggered.


310-344: LGTM!

The parametrized test structure follows the coding guidelines and comprehensively covers _parse_replan_decision edge cases including JSON parsing, markdown fences, text heuristics, and malformed inputs.


347-391: LGTM!

Provider error handling tests correctly verify that exceptions during planning and step execution result in TerminationReason.ERROR. The inline FailingProvider classes effectively simulate provider failures.

src/synthorg/engine/hybrid_loop.py (12)

1-82: LGTM!

The module docstring, imports, and logging setup follow coding guidelines. The use of get_logger(__name__) and event constants from the observability module is correct.


84-138: LGTM!

The HybridLoop class follows the immutable config pattern with proper encapsulation. The constructor accepts optional dependencies and stores them as private attributes with read-only property accessors.


139-214: LGTM!

The execute method has a clean two-phase structure with proper initialization of mutable state within method scope. The turn budget warning is appropriately issued at loop start.


243-354: Verify that all_plans is properly synchronized after IN_PROGRESS status update.

The status update at line 266 sets plan to a new copy with IN_PROGRESS status, but all_plans isn't immediately updated. This is fine because:

  1. On step completion, _handle_completed_step (line 377-378) syncs all_plans[-1] = plan
  2. On step failure, attempt_replan appends the new plan to all_plans
  3. _build_final_result (line 505-506) syncs before building the result

The IN_PROGRESS status is transient during step execution and correctly transitions to COMPLETED or FAILED before metadata finalization.


356-423: LGTM!

The completed step handler correctly updates step status, syncs the live plan to all_plans[-1], and conditionally runs progress summaries with proper error propagation.


425-491: LGTM!

The replan decision logic correctly re-checks shutdown and budget guards after the progress summary before issuing the replan call. The guard conditions at lines 448-454 prevent unnecessary replanning when not warranted.


493-535: LGTM!

The final result builder correctly syncs the live plan to all_plans before building metadata, ensuring final_plan reflects actual step status changes. The explicit comment at lines 503-504 documents this intent.


537-575: LGTM!

The plan generation method cleanly delegates to call_planner and applies truncation. The logging provides good observability with step count and revision number.


577-653: LGTM!

The step execution method implements the mini-ReAct sub-loop with proper per-step turn limits, stagnation detection scoped to step turns, and context compaction at turn boundaries.


664-748: LGTM!

The step turn execution properly handles the response flow: shutdown/budget checks, provider call, turn recording, error checking, context update, checkpointing, and branching to tool call handling or step completion assessment.


750-774: LGTM!

Tool call handling correctly re-checks shutdown before execution and clears pending tool calls from the turn record when shutdown is triggered, preventing incomplete tool execution from being recorded.


778-794: LGTM!

The finalize method correctly deep-copies metadata before mutation and properly serializes execution plans using model_dump(). The empty all_plans guard prevents index errors.

src/synthorg/engine/hybrid_helpers.py (12)

1-65: LGTM!

The module docstring clearly explains the purpose (stateless helpers to keep the main loop under 800 lines). Imports are well-organized with TYPE_CHECKING for forward references.


67-97: LGTM!

The truncation function correctly maintains immutability by returning a new plan copy and properly renumbers steps to satisfy the sequential step number invariant.


100-127: LGTM!

The step message builder properly escapes < and > characters in user-provided content to prevent prompt injection, and the instruction explicitly tells the LLM to treat XML tag contents as data.


130-161: LGTM!

Step completion handling correctly identifies the anomalous case of TOOL_USE without actual tool calls as a failure, and logs truncation warnings for MAX_TOKENS responses.


164-189: LGTM!

The budget warning calculation is a reasonable estimate that helps users identify configuration issues early. The comment appropriately notes that replan overhead is excluded from the estimate.


192-222: LGTM!

The checkpoint callback invocation correctly uses PEP 758 except syntax (except MemoryError, RecursionError:) and follows the principle that checkpointing failures must not interrupt execution while critical errors (memory, recursion) should propagate.


225-312: LGTM!

The planner call helper correctly guards against exhausted turns before making the call, uses SYSTEM category for planning turns, and threads the checkpoint callback through.


315-353: LGTM!

The summary prompt builder provides clear context about plan progress and conditionally includes the replan question with a JSON format example when appropriate.


356-413: LGTM!

The replan decision parser correctly handles JSON with explicit boolean and string parsing (avoiding the bool() coercion issue from the past review), supports markdown-fenced JSON, and falls back to a regex heuristic with appropriate logging.


468-516: LGTM!

The remainder of the progress summary function correctly handles the provider call, turn recording with SYSTEM category, empty content warnings, and replan decision parsing.


519-624: LGTM!

The replan attempt function correctly marks the failed step, enforces replan budget limits with appropriate error messages, guards against exhausted resources, and threads the checkpoint callback through to do_replan.


627-719: LGTM!

The replanning function correctly threads the checkpoint callback to call_planner (addressing the past review concern), builds context-appropriate prompts distinguishing between failure and completion triggers, and applies plan truncation to the result.

Comment on lines +458 to +467
summary_msg = ChatMessage(
role=MessageRole.USER,
content=_build_summary_prompt(
plan,
step_idx,
ask_replan=(
config.allow_replan_on_completion and step_idx < len(plan.steps) - 1
),
),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if there's documentation about MessageRole usage for progress summaries
rg -n "progress.?summary" -A3 -B3 --type=md | head -50

# Check how other internal coordination prompts use MessageRole
ast-grep --pattern 'ChatMessage(
  $$$
  role=MessageRole.$_,
  $$$
)'

Repository: Aureliolo/synthorg

Length of output: 3205


🏁 Script executed:

cat -n src/synthorg/engine/hybrid_helpers.py | sed -n '440,490p'

Repository: Aureliolo/synthorg

Length of output: 1832


🏁 Script executed:

# Find all ChatMessage instantiations to understand MessageRole patterns
rg -n "ChatMessage\(" -A2 src/ --type=py | grep -A2 "role=" | head -80

Repository: Aureliolo/synthorg

Length of output: 4027


🏁 Script executed:

# Search for LLMCallCategory.SYSTEM usage to understand when it's used
rg -n "LLMCallCategory\.SYSTEM" -B5 -A2 src/ --type=py | head -100

Repository: Aureliolo/synthorg

Length of output: 2235


Change MessageRole.USER to MessageRole.SYSTEM for the progress summary prompt.

The progress summary at line 459 uses MessageRole.USER but is recorded with LLMCallCategory.SYSTEM. Similar internal coordination operations—such as the compaction summarizer (compaction/summarizer.py:188) and checkpoint recovery prompt (checkpoint/resume.py:87)—use MessageRole.SYSTEM for their summary and reconciliation prompts. Using MessageRole.SYSTEM here aligns with the pattern for system-initiated internal operations and makes the conversation structure consistent with comparable prompts in the codebase.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/hybrid_helpers.py` around lines 458 - 467, Change the
ChatMessage construction for the progress summary so it uses MessageRole.SYSTEM
instead of MessageRole.USER: in the creation of summary_msg (which calls
_build_summary_prompt with plan, step_idx, and ask_replan), replace
MessageRole.USER with MessageRole.SYSTEM to align this system-initiated internal
prompt with other summary/reconciliation prompts (see uses in
compaction/summarizer and checkpoint/resume).

…_loop

PlanExecuteLoop and HybridLoop already received checkpoint_callback
from the factory; ReactLoop was the only loop type missing it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Aureliolo Aureliolo merged commit 008147c into main Mar 19, 2026
11 of 13 checks passed
@Aureliolo Aureliolo deleted the feat/hybrid-execution-loop branch March 19, 2026 13:32
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 19, 2026 13:33 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Mar 19, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.7](v0.3.6...v0.3.7)
(2026-03-19)


### Features

* **engine:** implement Hybrid Plan + ReAct execution loop
([#582](#582))
([008147c](008147c))
* implement first-run setup wizard
([#584](#584))
([dfed931](dfed931))


### Bug Fixes

* **api:** address ZAP DAST scan findings
([#579](#579))
([ce9a3e0](ce9a3e0))
* **cli:** regenerate compose and re-exec binary on update
([#576](#576))
([3f226eb](3f226eb))


### CI/CD

* add SBOM generation to Docker and CLI releases
([#580](#580))
([db459cf](db459cf))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 19, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.7](v0.3.6...v0.3.7)
(2026-03-19)


### Features

* **engine:** implement Hybrid Plan + ReAct execution loop
([#582](#582))
([008147c](008147c))
* implement first-run setup wizard
([#584](#584))
([dfed931](dfed931))


### Bug Fixes

* **api:** address ZAP DAST scan findings
([#579](#579))
([ce9a3e0](ce9a3e0))
* **cli:** auto-delete binary on Windows, prune images, fix GoReleaser
([#590](#590))
([eb7c691](eb7c691))
* **cli:** regenerate compose and re-exec binary on update
([#576](#576))
([3f226eb](3f226eb))


### CI/CD

* add SBOM generation to Docker and CLI releases
([#580](#580))
([db459cf](db459cf))


### Maintenance

* **main:** release 0.3.7
([#583](#583))
([bf58779](bf58779))
* reset failed v0.3.7 release
([#591](#591))
([b69000d](b69000d))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
  * Added engine Hybrid Plan + ReAct execution loop
  * Added first-run setup wizard

* **Bug Fixes**
  * Addressed ZAP DAST scan issues
  * Fixed CLI Windows/image/update issues

* **Maintenance**
  * Added SBOM generation for Docker/CLI releases
  * General maintenance updates

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Aureliolo added a commit that referenced this pull request Mar 19, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.7](v0.3.6...v0.3.7)
(2026-03-19)


### Features

* **engine:** implement Hybrid Plan + ReAct execution loop
([#582](#582))
([008147c](008147c))
* implement first-run setup wizard
([#584](#584))
([dfed931](dfed931))


### Bug Fixes

* **api:** address ZAP DAST scan findings
([#579](#579))
([ce9a3e0](ce9a3e0))
* **ci:** reset failed v0.3.7 release and fix syft SBOM scan
([#593](#593))
([d1508c2](d1508c2))
* **cli:** auto-delete binary on Windows, prune images, fix GoReleaser
([#590](#590))
([eb7c691](eb7c691))
* **cli:** regenerate compose and re-exec binary on update
([#576](#576))
([3f226eb](3f226eb))


### CI/CD

* add SBOM generation to Docker and CLI releases
([#580](#580))
([db459cf](db459cf))


---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 19, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.7](v0.3.6...v0.3.7)
(2026-03-19)


### Features

* **engine:** implement Hybrid Plan + ReAct execution loop
([#582](#582))
([008147c](008147c))
* implement first-run setup wizard
([#584](#584))
([dfed931](dfed931))


### Bug Fixes

* **api:** address ZAP DAST scan findings
([#579](#579))
([ce9a3e0](ce9a3e0))
* **ci:** remove CLI SBOM generation, reset failed v0.3.7
([#595](#595))
([d0f4992](d0f4992))
* **ci:** reset failed v0.3.7 release and fix syft SBOM scan
([#593](#593))
([d1508c2](d1508c2))
* **cli:** auto-delete binary on Windows, prune images, fix GoReleaser
([#590](#590))
([eb7c691](eb7c691))
* **cli:** regenerate compose and re-exec binary on update
([#576](#576))
([3f226eb](3f226eb))


### CI/CD

* add SBOM generation to Docker and CLI releases
([#580](#580))
([db459cf](db459cf))




---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement Hybrid Plan + ReAct execution loop

1 participant