feat(engine): implement Hybrid Plan + ReAct execution loop#582
feat(engine): implement Hybrid Plan + ReAct execution loop#582
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughAdds a new HybridLoop execution strategy and immutable HybridLoopConfig, implementing a hybrid "plan + mini‑ReAct per step" flow with per-step turn limits, progress‑summary checkpoints, optional replanning, compaction at turn boundaries, and observability events for hybrid-specific conditions. Introduces hybrid_helpers and plan_helpers utilities, caps parsed plan step arrays, wires hybrid_loop_config and checkpoint callbacks through the loop selector and AgentEngine, re-exports HybridLoop/HybridLoopConfig, updates engine design docs, and adds comprehensive unit tests. No public API signatures were removed. Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
📝 Coding Plan
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the agent engine's capabilities by introducing a sophisticated Hybrid Plan + ReAct execution loop. This new loop allows agents to perform high-level strategic planning while executing each step tactically with ReAct, incorporating dynamic replanning and progress monitoring. The changes improve the agent's ability to handle complex tasks by providing more granular control over execution, better resource management, and increased adaptability to unforeseen circumstances during task execution. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #582 +/- ##
==========================================
- Coverage 92.55% 92.46% -0.10%
==========================================
Files 547 551 +4
Lines 27274 27681 +407
Branches 2615 2678 +63
==========================================
+ Hits 25244 25595 +351
- Misses 1605 1634 +29
- Partials 425 452 +27 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
The pull request introduces the new Hybrid Plan + ReAct execution loop, which combines strategic planning with per-step mini-ReAct sub-loops, progress-summary checkpoints, and LLM-decided replanning. This is a significant feature addition, and the implementation appears robust and well-tested. The changes include adding new files for the HybridLoop and its configuration model, updating documentation, and refactoring shared plan utilities into plan_helpers.py to reduce duplication. The auto-selection system (loop_selector.py) has also been updated to recognize and build the new hybrid loop directly, removing previous fallback mechanisms.
The code adheres to good practices, including clear docstrings, comprehensive unit tests, and proper handling of various execution flow scenarios like budget exhaustion, shutdown, and replanning. The new HybridLoopConfig model is well-defined with appropriate validation. Overall, this is a well-executed feature implementation.
962c324 to
8fe07e3
Compare
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/synthorg/engine/agent_engine.py (1)
186-200: 🧹 Nitpick | 🔵 TrivialAdd
hybrid_loop_configto the docstring.The new
hybrid_loop_configparameter is not documented in the class docstring's Args section. Add an entry to maintain docstring completeness:hybrid_loop_config: Optional configuration for the Hybrid Plan + ReAct loop. Used when auto-selection picks ``"hybrid"`` or when ``loop_type="hybrid"`` is built.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/agent_engine.py` around lines 186 - 200, The class __init__ docstring is missing the new parameter hybrid_loop_config; update the class (or __init__) docstring Args section to include an entry for hybrid_loop_config describing it as an optional configuration for the Hybrid Plan + ReAct loop, used when auto-selection picks "hybrid" or when loop_type="hybrid" is built, matching the style of the other parameters (e.g., provider, auto_loop_config) and referencing the hybrid_loop_config parameter name exactly so readers and tools pick up the new parameter.src/synthorg/engine/plan_execute_loop.py (1)
580-625:⚠️ Potential issue | 🟡 MinorCapture the task summary before mutating
ctx.
extract_task_summary()now runs afterctx = ctx.with_message(message), so a non-task-bound context with no earlier USER message will record the internal planning/replan prompt itself asoriginal_task_summary. That pollutes plan metadata and any downstream prompt text that relies on it. Cache the summary from the pre-prompt context and pass that cached value intoparse_plan().Suggested fix
- ctx = ctx.with_message(message) + task_summary = extract_task_summary(ctx) + ctx = ctx.with_message(message) @@ plan = parse_plan( response, ctx.execution_id, - extract_task_summary(ctx), + task_summary, revision_number=revision_number, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/plan_execute_loop.py` around lines 580 - 625, The code calls extract_task_summary() after mutating ctx (via ctx = ctx.with_message(message)), which lets the planning prompt be recorded as the original task summary; capture the task summary from the pre-mutation context (e.g., original_task_summary = extract_task_summary(ctx) before ctx = ctx.with_message(message)) and pass that cached original_task_summary into parse_plan() (replace the current extract_task_summary(ctx) argument) so parse_plan() uses the pre-prompt summary; update references to extract_task_summary in this block accordingly (symbols: ctx.with_message, extract_task_summary, parse_plan).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/hybrid_helpers.py`:
- Around line 445-447: The checkpoint summary message is being constructed as a
user message (summary_msg = ChatMessage(role=MessageRole.USER,
content=_build_summary_prompt(...))) which incorrectly mutates chat state;
change the role to MessageRole.SYSTEM so the summary is sent as an internal
system prompt—locate the summary_msg construction in hybrid_helpers.py and
replace MessageRole.USER with MessageRole.SYSTEM while keeping usage of
ChatMessage and the _build_summary_prompt call unchanged.
- Around line 610-621: The replan logic in do_replan currently omits the
checkpoint_callback so call_planner is invoked without per-turn checkpointing;
update the do_replan signature to accept a checkpoint_callback parameter
(matching the type used by call_planner) and thread it through any internal
planner invocations so call_planner(...,
checkpoint_callback=checkpoint_callback) is used; then update all callers of
do_replan (including the replan invocation sites around the other occurrence
noted) to pass the same checkpoint_callback they receive (or propagate it from
their caller), ensuring replan turns use the same checkpoint subsystem as
initial planning.
- Around line 254-259: Before appending the prompt or invoking the LLM, check
ctx.has_turns_remaining and return MAX_TURNS if no turns remain; specifically,
in the summary and planner helper flows (e.g., run_progress_summary and the
planner helper) avoid creating the message via ctx.with_message, computing
turn_number from ctx.turn_count, or calling call_provider when
ctx.has_turns_remaining is False — this prevents making the extra provider call
that later causes ctx.with_turn_completed to raise. Ensure the guard is placed
at the start of each helper so you return MAX_TURNS before any prompt
construction or call_provider invocation.
- Around line 373-376: Don't coerce the "replan" value with bool() — instead
parse it explicitly: after json.loads(json_str) and verifying data is a dict,
get the raw value via data.get("replan") and if it's a bool return it; if it's a
str, compare raw.lower() to "true" or "false" and return the corresponding
boolean; for any other type or unrecognized string treat it as a parse miss
(e.g., return None or the existing "no parse" sentinel) rather than using
bool(raw).
In `@src/synthorg/engine/hybrid_loop.py`:
- Around line 416-428: The replan-on-completion path in
_decide_replan_on_completion calls run_progress_summary() but does not re-check
the shutdown/budget guards afterwards, so if the summary changed ctx.turn_count
or cost it can still invoke do_replan(); update _decide_replan_on_completion to
re-evaluate the same shutdown/budget guard logic immediately after
run_progress_summary() and before calling do_replan(), returning the appropriate
non-replan branch when shutdown or out-of-budget is detected (mirror the guard
check used in attempt_replan()); apply the same fix to the analogous block
around lines 451-460 so both success and failure paths consistently re-check
before triggering planner calls.
- Around line 269-273: The loop is modifying the live ExecutionPlan via
update_step_status(plan, step_idx, ...) but all_plans remains stale so
metadata["final_plan"] ends up containing the pre-execution snapshot;
synchronize the live plan with the stored history by either replacing the last
entry in all_plans with the updated plan after every call to update_step_status
(locations: where plan = update_step_status(...) around lines with plan/step_idx
and in attempt_replan()), or change _finalize() to accept the live plan and
derive final_plan from that live instance instead of all_plans[-1]; ensure the
same fix is applied to the failed-step update path in attempt_replan() and the
other occurrences noted (similar blocks at the other indicated locations).
In `@src/synthorg/engine/plan_helpers.py`:
- Around line 72-76: Replace the hardcoded event string in the logger.warning
call inside extract_task_summary_fallback with the appropriate event constant
from synthorg.observability.events (e.g., import the fallback constant from
synthorg.observability.events.plan or the relevant domain and use that constant
instead of "plan_helpers.extract_task_summary_fallback"); update the import at
top of src/synthorg/engine/plan_helpers.py to pull the chosen event constant
(for example TASK_SUMMARY_FALLBACK or SUMMARY_FALLBACK from
synthorg.observability.events.plan) and pass that constant as the first argument
to logger.warning while leaving the execution_id and note kwargs unchanged.
In `@tests/unit/engine/test_hybrid_loop.py`:
- Around line 62-71: The _single_step_plan() fixture returns a one-step
CompletionResponse which lets tests pass despite breaking the intended Hybrid
multi-step contract; update _single_step_plan() (and any callers relying on
_plan_response) to produce a minimum of 3 steps (preferably 3–7 as described in
docs/design/engine.md) with distinct step_number and description fields, and
adjust any tests that intentionally expect a single-step to either use a new
single-step fixture name or be updated to assert the multi-step behavior; ensure
the returned type remains CompletionResponse and reuse _plan_response to build
the multi-step payload.
- Around line 1-1217: The test module is too large and should be split into
focused test files to meet the guideline of files < 800 lines and functions < 50
lines; refactor by extracting related test groups (e.g., TestHybridLoopProtocol,
TestHybridLoopBasic, TestHybridLoopWithTools, TestHybridLoopReplanning,
TestHybridLoopBudget/Shutdown/MaxTurns, TestHybridLoopStagnation,
TestHybridLoopModelTiering/Metadata/ContextImmutability,
TestHybridLoopCheckpointCallback/TestHybridLoopCompaction, and
TestParseReplanDecision/_parse_replan_decision helper tests) into separate
modules (one per concern), keep helper functions like _single_step_plan,
_multi_step_plan, _usage, _make_invoker, and _ctx_with_user_msg in a shared test
utils module to avoid duplication, and ensure each new file imports the required
symbols (HybridLoop, HybridLoopConfig, do_replan, _parse_replan_decision,
AgentContext, mock_provider_factory fixtures, etc.) and that no test function
exceeds 50 lines; run test suite to update any relative imports or fixtures
after splitting.
In `@tests/unit/engine/test_loop_selector.py`:
- Around line 406-435: Add assertions to the hybrid factory tests to verify
checkpoint_callback and compaction_callback are forwarded by
build_execution_loop; in the test functions (e.g., test_build_hybrid,
test_build_hybrid_with_config, test_build_hybrid_with_gates) pass mock callbacks
into build_execution_loop via the checkpoint_callback and compaction_callback
parameters and assert that the returned HybridLoop instance (HybridLoop) has its
checkpoint_callback and compaction_callback attributes set to those mocks,
ensuring build_execution_loop correctly propagates these callbacks.
---
Outside diff comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 186-200: The class __init__ docstring is missing the new parameter
hybrid_loop_config; update the class (or __init__) docstring Args section to
include an entry for hybrid_loop_config describing it as an optional
configuration for the Hybrid Plan + ReAct loop, used when auto-selection picks
"hybrid" or when loop_type="hybrid" is built, matching the style of the other
parameters (e.g., provider, auto_loop_config) and referencing the
hybrid_loop_config parameter name exactly so readers and tools pick up the new
parameter.
In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 580-625: The code calls extract_task_summary() after mutating ctx
(via ctx = ctx.with_message(message)), which lets the planning prompt be
recorded as the original task summary; capture the task summary from the
pre-mutation context (e.g., original_task_summary = extract_task_summary(ctx)
before ctx = ctx.with_message(message)) and pass that cached
original_task_summary into parse_plan() (replace the current
extract_task_summary(ctx) argument) so parse_plan() uses the pre-prompt summary;
update references to extract_task_summary in this block accordingly (symbols:
ctx.with_message, extract_task_summary, parse_plan).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d728b2d2-fc97-4e1a-97c6-d601e5d4cf0a
📒 Files selected for processing (18)
CLAUDE.mddocs/design/engine.mdsrc/synthorg/engine/__init__.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/hybrid_models.pysrc/synthorg/engine/loop_selector.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/engine/plan_models.pysrc/synthorg/engine/plan_parsing.pysrc/synthorg/observability/events/execution.pytests/unit/engine/test_agent_engine_auto_loop.pytests/unit/engine/test_hybrid_loop.pytests/unit/engine/test_hybrid_models.pytests/unit/engine/test_loop_selector.pytests/unit/engine/test_plan_helpers.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict) for all data models
Use PEP 758 except syntax:except A, B:(no parentheses) — not PEP 654 syntax
Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Include type hints on all public functions and classes; enforce with mypy strict mode
Use Google-style docstrings for all public classes and functions (enforced by ruff D rules)
Prefer immutability: create new objects rather than mutating existing ones; use copy.deepcopy() at construction and MappingProxyType for non-Pydantic collections
Use asyncio.TaskGroup for fan-out/fan-in parallel operations (structured concurrency) instead of bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Set line length to 88 characters (enforced by ruff)
Files:
src/synthorg/engine/plan_parsing.pysrc/synthorg/engine/plan_models.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/loop_selector.pytests/unit/engine/test_agent_engine_auto_loop.pytests/unit/engine/test_hybrid_models.pysrc/synthorg/engine/plan_helpers.pytests/unit/engine/test_loop_selector.pysrc/synthorg/engine/hybrid_models.pysrc/synthorg/observability/events/execution.pytests/unit/engine/test_hybrid_loop.pytests/unit/engine/test_plan_helpers.pysrc/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/hybrid_helpers.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST import logger viafrom synthorg.observability import get_loggerthen uselogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code; only use the centralized get_logger function
Always use structured logging with keyword arguments:logger.info(EVENT, key=value)— never use formatted strings likelogger.info('msg %s', val)
Use event name constants from synthorg.observability.events. (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Use NotBlankStr (from core.types) for all identifier/name fields in models instead of manual whitespace validators
Never implement retry logic in provider driver subclasses or calling code — it is handled automatically by BaseCompletionProvider base class
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, or comments; use generic names like example-provider or example-large-001
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves
Use@computed_fieldfor derived values instead of storing and validating redundant fields
Files:
src/synthorg/engine/plan_parsing.pysrc/synthorg/engine/plan_models.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/loop_selector.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/engine/hybrid_models.pysrc/synthorg/observability/events/execution.pysrc/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/hybrid_helpers.py
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
ALWAYS use
uv sync(notcd ...commands) when referencing command execution in documentation and examples
Files:
CLAUDE.mddocs/design/engine.md
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark all unit tests with@pytest.mark.unit, integration tests with@pytest.mark.integration, e2e tests with@pytest.mark.e2e, and slow tests with@pytest.mark.slow
Usepytest.mark.parametrizefor testing multiple similar cases instead of writing separate test functions
Use Hypothesis (@given+@settings) for property-based testing in Python unit tests
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests; use generic names like test-provider, test-small-001, or test-medium-001
Set test timeout to 30 seconds per test; use pytest-xdist with-n autofor parallel execution (ALWAYS include this flag, never run tests sequentially)
Maintain minimum 80% code coverage (enforced in CI); usepytest --cov=synthorg --cov-fail-under=80to verify locally
Mock time.monotonic() and asyncio.sleep() in timing-sensitive tests to make them deterministic instead of widening timing margins
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally
Files:
tests/unit/engine/test_agent_engine_auto_loop.pytests/unit/engine/test_hybrid_models.pytests/unit/engine/test_loop_selector.pytests/unit/engine/test_hybrid_loop.pytests/unit/engine/test_plan_helpers.py
docs/design/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
docs/design/**/*.md: ALWAYS read the relevant docs/design/ page before implementing any feature or planning any issue; design spec is the starting point for architecture and behavior
If implementation deviates from the spec, alert the user and explain why; user decides whether to proceed or update the spec; then update the relevant design page to reflect approved deviations
Files:
docs/design/engine.md
🧠 Learnings (9)
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves
Applied to files:
src/synthorg/engine/plan_models.pytests/unit/engine/test_hybrid_models.pysrc/synthorg/engine/hybrid_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/plan_models.pysrc/synthorg/engine/hybrid_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Applied to files:
src/synthorg/engine/plan_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST import logger via `from synthorg.observability import get_logger` then use `logger = get_logger(__name__)`
Applied to files:
src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events.<domain> (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events
Applied to files:
src/synthorg/observability/events/execution.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/execution.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).
Applied to files:
tests/unit/engine/test_plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to tests/**/*.py : Maintain minimum 80% code coverage (enforced in CI); use `pytest --cov=synthorg --cov-fail-under=80` to verify locally
Applied to files:
tests/unit/engine/test_plan_helpers.py
🧬 Code graph analysis (10)
src/synthorg/engine/__init__.py (2)
src/synthorg/engine/hybrid_loop.py (1)
HybridLoop(86-784)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)
src/synthorg/engine/agent_engine.py (2)
tests/unit/engine/conftest.py (1)
engine(449-460)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/plan_helpers.py (3)
assess_step_success(80-98)extract_task_summary(55-77)update_step_status(24-52)src/synthorg/engine/plan_models.py (1)
StepStatus(15-22)
src/synthorg/engine/loop_selector.py (4)
src/synthorg/engine/hybrid_loop.py (5)
HybridLoop(86-784)approval_gate(124-126)stagnation_detector(129-131)compaction_callback(134-136)config(119-121)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)src/synthorg/engine/plan_execute_loop.py (4)
approval_gate(131-133)stagnation_detector(136-138)compaction_callback(141-143)config(126-128)src/synthorg/engine/loop_protocol.py (1)
ExecutionLoop(158-196)
tests/unit/engine/test_agent_engine_auto_loop.py (3)
src/synthorg/budget/enforcer.py (2)
BudgetEnforcer(56-473)cost_tracker(89-91)web/src/api/types.ts (2)
BudgetConfig(406-413)BudgetAlertConfig(393-397)tests/unit/engine/conftest.py (2)
mock_provider_factory(314-316)MockCompletionProvider(207-289)
tests/unit/engine/test_hybrid_models.py (1)
src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)
tests/unit/engine/test_loop_selector.py (5)
tests/unit/engine/conftest.py (1)
engine(449-460)src/synthorg/engine/hybrid_loop.py (5)
HybridLoop(86-784)config(119-121)get_loop_type(138-140)approval_gate(124-126)stagnation_detector(129-131)src/synthorg/engine/loop_selector.py (3)
select_loop_type(228-263)AutoLoopConfig(94-168)build_execution_loop(266-324)src/synthorg/engine/plan_execute_loop.py (4)
config(126-128)get_loop_type(145-147)approval_gate(131-133)stagnation_detector(136-138)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)
tests/unit/engine/test_plan_helpers.py (4)
src/synthorg/engine/context.py (3)
AgentContext(104-412)from_identity(190-225)with_message(227-236)src/synthorg/engine/plan_helpers.py (3)
assess_step_success(80-98)extract_task_summary(55-77)update_step_status(24-52)src/synthorg/engine/plan_models.py (3)
ExecutionPlan(53-88)PlanStep(25-50)StepStatus(15-22)src/synthorg/providers/enums.py (2)
FinishReason(24-31)MessageRole(15-21)
src/synthorg/engine/hybrid_loop.py (9)
src/synthorg/engine/hybrid_helpers.py (5)
attempt_replan(509-607)call_planner(228-308)do_replan(610-698)run_progress_summary(406-503)warn_insufficient_budget(167-189)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)src/synthorg/engine/loop_helpers.py (5)
build_result(512-527)call_provider(156-213)check_budget(107-153)check_response_errors(216-250)check_shutdown(62-104)src/synthorg/engine/loop_protocol.py (3)
ExecutionResult(86-147)TerminationReason(28-37)TurnRecord(40-83)src/synthorg/engine/plan_helpers.py (1)
update_step_status(24-52)src/synthorg/engine/plan_models.py (3)
ExecutionPlan(53-88)PlanStep(25-50)StepStatus(15-22)src/synthorg/engine/plan_execute_loop.py (3)
config(126-128)execute(149-227)_run_steps(256-350)src/synthorg/engine/context.py (2)
AgentContext(104-412)has_turns_remaining(410-412)src/synthorg/engine/stagnation/protocol.py (1)
StagnationDetector(15-46)
src/synthorg/engine/hybrid_helpers.py (11)
src/synthorg/providers/enums.py (2)
FinishReason(24-31)MessageRole(15-21)src/synthorg/engine/loop_helpers.py (7)
build_result(512-527)call_provider(156-213)check_budget(107-153)check_response_errors(216-250)check_shutdown(62-104)make_turn_record(463-479)response_to_message(454-460)src/synthorg/engine/loop_protocol.py (3)
ExecutionResult(86-147)TerminationReason(28-37)TurnRecord(40-83)src/synthorg/engine/plan_helpers.py (3)
assess_step_success(80-98)extract_task_summary(55-77)update_step_status(24-52)src/synthorg/engine/plan_models.py (3)
ExecutionPlan(53-88)PlanStep(25-50)StepStatus(15-22)src/synthorg/engine/plan_parsing.py (1)
parse_plan(57-100)src/synthorg/engine/context.py (3)
AgentContext(104-412)with_turn_completed(238-287)has_turns_remaining(410-412)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)src/synthorg/engine/hybrid_loop.py (1)
config(119-121)src/synthorg/engine/plan_execute_loop.py (1)
config(126-128)tests/integration/engine/test_graceful_shutdown.py (1)
shutdown_checker(208-214)
🔇 Additional comments (9)
src/synthorg/engine/plan_models.py (1)
103-103: LGTM! Stricter validation for config model.Adding
extra="forbid"improves config hygiene by rejecting unknown fields, catching typos and misconfigurations early. This aligns withHybridLoopConfigand is appropriate for frozen config models.src/synthorg/engine/hybrid_models.py (1)
12-73: LGTM! Well-structured configuration model.The
HybridLoopConfigmodel follows project conventions:
- Frozen + extra="forbid" for strict config validation
NotBlankStr | Nonefor optional model overrides- Bounded numeric constraints with sensible defaults
- Comprehensive docstring documenting all attributes
Based on learnings: "Use frozen Pydantic models for config/identity."
tests/unit/engine/test_hybrid_models.py (1)
1-101: LGTM! Comprehensive config model tests.Good coverage of:
- Default values verification
- Custom value overrides
- Immutability enforcement (frozen)
- Schema strictness (extra="forbid")
- Numeric range constraints (both violations and boundaries)
- String validation (NotBlankStr rejection)
Effective use of
@pytest.mark.parametrizefor testing multiple similar cases.src/synthorg/observability/events/execution.py (1)
83-95: LGTM! Hybrid loop event constants.New constants follow existing naming conventions (
EXECUTION_HYBRID_*→"execution.hybrid.*") and cover the key observability points for the hybrid loop: step turn limits, progress summaries, replan decisions, budget warnings, and plan truncation.src/synthorg/engine/plan_parsing.py (1)
210-221: LGTM! Defensive parse-time cap on plan steps.Good safeguard against unbounded memory allocation from misbehaving LLM output. The 50-step cap at parse time provides a safety net before individual loop configs apply their own (typically stricter) limits. Clear warning log captures the original count for debugging.
src/synthorg/engine/__init__.py (1)
113-114: LGTM! Clean public API re-exports.
HybridLoopandHybridLoopConfigare properly imported and added to__all__, following the existing pattern for loop exports and maintaining alphabetical ordering.Also applies to: 287-288
CLAUDE.md (1)
127-127: LGTM! Documentation updated for hybrid loop.Package structure documentation accurately reflects the new hybrid execution loop components:
HybridLoop,HybridLoopConfig, shared plan helpers, and the optional hybrid fallback in loop selection. The description captures key features (planning, mini-ReAct steps, turn limits, checkpoints, replanning).src/synthorg/engine/agent_engine.py (2)
1069-1069: LGTM! Hybrid config correctly forwarded to loop builder.The
hybrid_loop_configis properly passed tobuild_execution_loop, enabling the HybridLoop to receive its configuration when auto-selected.
1221-1225: LGTM! Error message sanitization.Good defensive practice:
- Truncates to 200 chars to prevent log bloat from verbose exceptions
- Filters non-printable characters for log hygiene
- Prevents internal paths from leaking into user-facing error messages
| summary_msg = ChatMessage( | ||
| role=MessageRole.USER, | ||
| content=_build_summary_prompt( |
There was a problem hiding this comment.
Use a system message for the checkpoint summary prompt.
Issue #199 calls this summary hop out as a tools=None SYSTEM call. Sending it as MessageRole.USER turns an internal coordination prompt into a new user request and changes the chat state in a way the design was explicitly trying to avoid. Please switch this to MessageRole.SYSTEM.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/engine/hybrid_helpers.py` around lines 445 - 447, The checkpoint
summary message is being constructed as a user message (summary_msg =
ChatMessage(role=MessageRole.USER, content=_build_summary_prompt(...))) which
incorrectly mutates chat state; change the role to MessageRole.SYSTEM so the
summary is sent as an internal system prompt—locate the summary_msg construction
in hybrid_helpers.py and replace MessageRole.USER with MessageRole.SYSTEM while
keeping usage of ChatMessage and the _build_summary_prompt call unchanged.
| plan = update_step_status( | ||
| plan, | ||
| step_idx, | ||
| StepStatus.IN_PROGRESS, | ||
| ) |
There was a problem hiding this comment.
final_plan is built from a stale plan snapshot.
update_step_status() returns a new ExecutionPlan, but the loop only appends to all_plans when a plan is first created or re-planned. By the time _finalize() runs, all_plans[-1] can still be the pre-execution version, so metadata["final_plan"] misses the real COMPLETED/FAILED outcomes. Please either keep the current plan synchronized in all_plans or pass the live plan into _finalize() and derive final_plan from that. The failed-step update in attempt_replan() has the same problem.
Also applies to: 381-385, 775-780
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/engine/hybrid_loop.py` around lines 269 - 273, The loop is
modifying the live ExecutionPlan via update_step_status(plan, step_idx, ...) but
all_plans remains stale so metadata["final_plan"] ends up containing the
pre-execution snapshot; synchronize the live plan with the stored history by
either replacing the last entry in all_plans with the updated plan after every
call to update_step_status (locations: where plan = update_step_status(...)
around lines with plan/step_idx and in attempt_replan()), or change _finalize()
to accept the live plan and derive final_plan from that live instance instead of
all_plans[-1]; ensure the same fix is applied to the failed-step update path in
attempt_replan() and the other occurrences noted (similar blocks at the other
indicated locations).
| def _single_step_plan() -> CompletionResponse: | ||
| return _plan_response( | ||
| [ | ||
| { | ||
| "step_number": 1, | ||
| "description": "Analyze and solve the problem", | ||
| "expected_outcome": "Problem solved", | ||
| }, | ||
| ] | ||
| ) |
There was a problem hiding this comment.
These fixtures currently bless single-step hybrid plans.
The PR objective and docs/design/engine.md both describe Hybrid as a strategic 3–7 step loop, but _single_step_plan() is reused as a passing happy-path fixture across much of this suite. That means the tests will still go green if the planner collapses into a one-step pseudo-ReAct flow, which is the exact failure mode Loop 3 was meant to avoid. Please either enforce the minimum step count and update these fixtures, or explicitly relax the design contract.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_hybrid_loop.py` around lines 62 - 71, The
_single_step_plan() fixture returns a one-step CompletionResponse which lets
tests pass despite breaking the intended Hybrid multi-step contract; update
_single_step_plan() (and any callers relying on _plan_response) to produce a
minimum of 3 steps (preferably 3–7 as described in docs/design/engine.md) with
distinct step_number and description fields, and adjust any tests that
intentionally expect a single-step to either use a new single-step fixture name
or be updated to assert the multi-step behavior; ensure the returned type
remains CompletionResponse and reuse _plan_response to build the multi-step
payload.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/engine/plan_execute_loop.py (1)
279-283:⚠️ Potential issue | 🟠 MajorMirror immutable plan updates into
all_plans.
update_step_status()returns a newExecutionPlan, but_finalize()serializesall_plans, not the localplan. Because these new plan objects never get written back,metadata["plans"]andmetadata["final_plan"]keep stale step statuses whenever a step starts, completes, or fails.🧩 Minimal fix
plan = update_step_status( plan, step_idx, StepStatus.IN_PROGRESS, ) + all_plans[-1] = plan @@ plan = update_step_status( plan, step_idx, StepStatus.COMPLETED, ) + all_plans[-1] = plan @@ - plan = update_step_status(plan, step_idx, StepStatus.FAILED) + plan = update_step_status(plan, step_idx, StepStatus.FAILED) + all_plans[-1] = planAlso applies to: 310-314, 373-373
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/plan_execute_loop.py` around lines 279 - 283, update_step_status returns a new ExecutionPlan but the code only assigns it to the local variable plan, leaving all_plans (and metadata["plans"]/metadata["final_plan"] used by _finalize) stale; after each call to update_step_status (e.g., in the blocks that update step statuses to IN_PROGRESS, SUCCESS, FAILURE), replace the corresponding entry in the all_plans list with the returned plan and also update metadata["plans"] (and metadata["final_plan"] if applicable) to reference that new plan so serialized state reflects the immutable change before calling _finalize.
♻️ Duplicate comments (2)
tests/unit/engine/_hybrid_loop_helpers.py (1)
46-55:⚠️ Potential issue | 🟠 MajorDon't make one-step plans the default Hybrid fixture.
This shared helper is reused as a normal success path, so the suite still goes green if the planner collapses into a one-step pseudo-ReAct loop. Keep a one-step fixture only for explicit degenerate/replan-edge tests, and make the default planner fixture multi-step.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/engine/_hybrid_loop_helpers.py` around lines 46 - 55, The helper _single_step_plan currently returns a one-step plan which makes the shared default fixture too trivial; change the default helper to produce a multi-step plan (eg. at least two steps with distinct step_number, description, expected_outcome) and rename it or add a new _multi_step_plan used as the default planner fixture, leaving _single_step_plan only for explicit degenerate/replan-edge tests; update any tests or fixtures that referenced the old default to use the new multi-step helper (and keep _plan_response as the underlying constructor).src/synthorg/engine/hybrid_helpers.py (1)
458-459:⚠️ Potential issue | 🟡 MinorSummary message uses
MessageRole.USERinstead ofMessageRole.SYSTEM.The past review flagged that issue
#199specifies the progress-summary as atools=NoneSYSTEMcall. However, line 459 still constructs the message withMessageRole.USER. This turns an internal coordination prompt into a user request, which may affect chat state differently than intended.🔧 Proposed fix
summary_msg = ChatMessage( - role=MessageRole.USER, + role=MessageRole.SYSTEM, content=_build_summary_prompt( plan, step_idx,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/hybrid_helpers.py` around lines 458 - 459, The summary_msg is being constructed with MessageRole.USER but should be a SYSTEM message; update the ChatMessage creation for summary_msg to use MessageRole.SYSTEM (replace MessageRole.USER with MessageRole.SYSTEM) so the progress-summary is sent as an internal SYSTEM call rather than a user prompt; ensure the change is made where summary_msg = ChatMessage(...) is defined so any downstream handling expects a SYSTEM role.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 1224-1228: raw_msg -> sanitized -> error_msg currently only strips
non-printable chars and still allows filesystem paths/URLs into
ExecutionResult.error_message; update the sanitization logic in agent_engine.py
(where raw_msg, sanitized, and error_msg are set) to detect and redact path-like
substrings (e.g., absolute/relative paths, drive letters, URL patterns, and long
dot-delimited tokens) before truncation, replacing them with a generic
placeholder like "[REDACTED_PATH]"; after redaction, trim to the 200-char limit
and if the result contains no alphanumeric characters (or is empty) set
error_msg to a safe fallback such as "Error: details redacted" while preserving
the exception class name (type(exc).__name__) in the final string.
In `@src/synthorg/engine/plan_helpers.py`:
- Around line 43-48: The bounds check in update_step_status currently raises
IndexError without logging; before raising, emit a WARNING-level observability
event that includes step_idx, step_count (len(plan.steps)) and
plan.revision_number using the plan domain event constant from
synthorg.observability.events.plan (e.g., PLAN_STEP_INDEX_OUT_OF_RANGE), then
raise the IndexError as before; locate the check around update_step_status,
reference plan.steps and plan.revision_number, and ensure the warning uses the
event constant and structured fields (step_idx, step_count, revision) per the
project logging guidelines.
In `@tests/unit/engine/test_hybrid_loop_advanced.py`:
- Around line 310-351: The TestParseReplanDecision class contains many
near-duplicate tests for the helper _parse_replan_decision; collapse them into a
single parameterized test: add a pytest.mark.parametrize on (content, expected)
with the 12 input/expected pairs from the removed methods and implement a single
test method (e.g., test_parse_replan_decision) that asserts
_parse_replan_decision(content) is expected; keep descriptive ids for each case
and remove the individual test_* methods.
---
Outside diff comments:
In `@src/synthorg/engine/plan_execute_loop.py`:
- Around line 279-283: update_step_status returns a new ExecutionPlan but the
code only assigns it to the local variable plan, leaving all_plans (and
metadata["plans"]/metadata["final_plan"] used by _finalize) stale; after each
call to update_step_status (e.g., in the blocks that update step statuses to
IN_PROGRESS, SUCCESS, FAILURE), replace the corresponding entry in the all_plans
list with the returned plan and also update metadata["plans"] (and
metadata["final_plan"] if applicable) to reference that new plan so serialized
state reflects the immutable change before calling _finalize.
---
Duplicate comments:
In `@src/synthorg/engine/hybrid_helpers.py`:
- Around line 458-459: The summary_msg is being constructed with
MessageRole.USER but should be a SYSTEM message; update the ChatMessage creation
for summary_msg to use MessageRole.SYSTEM (replace MessageRole.USER with
MessageRole.SYSTEM) so the progress-summary is sent as an internal SYSTEM call
rather than a user prompt; ensure the change is made where summary_msg =
ChatMessage(...) is defined so any downstream handling expects a SYSTEM role.
In `@tests/unit/engine/_hybrid_loop_helpers.py`:
- Around line 46-55: The helper _single_step_plan currently returns a one-step
plan which makes the shared default fixture too trivial; change the default
helper to produce a multi-step plan (eg. at least two steps with distinct
step_number, description, expected_outcome) and rename it or add a new
_multi_step_plan used as the default planner fixture, leaving _single_step_plan
only for explicit degenerate/replan-edge tests; update any tests or fixtures
that referenced the old default to use the new multi-step helper (and keep
_plan_response as the underlying constructor).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3843e4a7-f844-4442-abfe-9c650adea9b6
📒 Files selected for processing (11)
src/synthorg/engine/agent_engine.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/observability/events/execution.pytests/unit/engine/_hybrid_loop_helpers.pytests/unit/engine/test_hybrid_loop.pytests/unit/engine/test_hybrid_loop_advanced.pytests/unit/engine/test_hybrid_loop_replanning.pytests/unit/engine/test_loop_selector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build Sandbox
- GitHub Check: Build Backend
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict) for all data models
Use PEP 758 except syntax:except A, B:(no parentheses) — not PEP 654 syntax
Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Include type hints on all public functions and classes; enforce with mypy strict mode
Use Google-style docstrings for all public classes and functions (enforced by ruff D rules)
Prefer immutability: create new objects rather than mutating existing ones; use copy.deepcopy() at construction and MappingProxyType for non-Pydantic collections
Use asyncio.TaskGroup for fan-out/fan-in parallel operations (structured concurrency) instead of bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Set line length to 88 characters (enforced by ruff)
Files:
src/synthorg/engine/agent_engine.pysrc/synthorg/engine/plan_execute_loop.pytests/unit/engine/test_hybrid_loop_replanning.pytests/unit/engine/_hybrid_loop_helpers.pytests/unit/engine/test_hybrid_loop.pytests/unit/engine/test_loop_selector.pytests/unit/engine/test_hybrid_loop_advanced.pysrc/synthorg/observability/events/execution.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/hybrid_helpers.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST import logger viafrom synthorg.observability import get_loggerthen uselogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code; only use the centralized get_logger function
Always use structured logging with keyword arguments:logger.info(EVENT, key=value)— never use formatted strings likelogger.info('msg %s', val)
Use event name constants from synthorg.observability.events. (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Use NotBlankStr (from core.types) for all identifier/name fields in models instead of manual whitespace validators
Never implement retry logic in provider driver subclasses or calling code — it is handled automatically by BaseCompletionProvider base class
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, or comments; use generic names like example-provider or example-large-001
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves
Use@computed_fieldfor derived values instead of storing and validating redundant fields
Files:
src/synthorg/engine/agent_engine.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/observability/events/execution.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/hybrid_helpers.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark all unit tests with@pytest.mark.unit, integration tests with@pytest.mark.integration, e2e tests with@pytest.mark.e2e, and slow tests with@pytest.mark.slow
Usepytest.mark.parametrizefor testing multiple similar cases instead of writing separate test functions
Use Hypothesis (@given+@settings) for property-based testing in Python unit tests
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests; use generic names like test-provider, test-small-001, or test-medium-001
Set test timeout to 30 seconds per test; use pytest-xdist with-n autofor parallel execution (ALWAYS include this flag, never run tests sequentially)
Maintain minimum 80% code coverage (enforced in CI); usepytest --cov=synthorg --cov-fail-under=80to verify locally
Mock time.monotonic() and asyncio.sleep() in timing-sensitive tests to make them deterministic instead of widening timing margins
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally
Files:
tests/unit/engine/test_hybrid_loop_replanning.pytests/unit/engine/_hybrid_loop_helpers.pytests/unit/engine/test_hybrid_loop.pytests/unit/engine/test_loop_selector.pytests/unit/engine/test_hybrid_loop_advanced.py
🧠 Learnings (10)
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to **/*.py : Keep functions under 50 lines and files under 800 lines
Applied to files:
tests/unit/engine/test_hybrid_loop.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events.<domain> (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events
Applied to files:
src/synthorg/observability/events/execution.pysrc/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Always use structured logging with keyword arguments: `logger.info(EVENT, key=value)` — never use formatted strings like `logger.info('msg %s', val)`
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR level with context before raising exceptions
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
src/synthorg/engine/plan_helpers.py
🧬 Code graph analysis (7)
src/synthorg/engine/agent_engine.py (1)
src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/plan_helpers.py (3)
assess_step_success(81-99)extract_task_summary(56-78)update_step_status(25-53)src/synthorg/engine/plan_models.py (1)
StepStatus(15-22)
tests/unit/engine/test_loop_selector.py (3)
src/synthorg/engine/hybrid_loop.py (6)
HybridLoop(84-794)config(116-118)get_loop_type(135-137)approval_gate(121-123)stagnation_detector(126-128)compaction_callback(131-133)src/synthorg/engine/loop_selector.py (3)
select_loop_type(228-263)AutoLoopConfig(94-168)build_execution_loop(266-324)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)
tests/unit/engine/test_hybrid_loop_advanced.py (6)
src/synthorg/engine/hybrid_helpers.py (1)
_parse_replan_decision(356-413)src/synthorg/engine/hybrid_loop.py (5)
HybridLoop(84-794)stagnation_detector(126-128)execute(139-214)config(116-118)compaction_callback(131-133)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)src/synthorg/engine/loop_protocol.py (2)
TerminationReason(28-37)TurnRecord(40-83)src/synthorg/engine/stagnation/models.py (2)
StagnationResult(87-140)StagnationVerdict(15-20)tests/unit/engine/_hybrid_loop_helpers.py (7)
_ctx_with_user_msg(159-161)_make_invoker(154-156)_single_step_plan(46-55)_stop_response(80-86)_summary_response(89-100)_tool_use_response(103-113)execute(143-151)
src/synthorg/engine/plan_helpers.py (3)
src/synthorg/providers/enums.py (2)
FinishReason(24-31)MessageRole(15-21)src/synthorg/engine/context.py (1)
AgentContext(104-412)src/synthorg/engine/plan_models.py (2)
ExecutionPlan(53-88)StepStatus(15-22)
src/synthorg/engine/hybrid_loop.py (7)
src/synthorg/engine/hybrid_helpers.py (8)
attempt_replan(522-624)build_step_message(103-127)call_planner(228-312)do_replan(627-719)handle_step_completion(130-161)invoke_checkpoint_callback(195-222)run_progress_summary(416-516)warn_insufficient_budget(167-189)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)src/synthorg/engine/loop_helpers.py (9)
build_result(512-527)call_provider(156-213)check_budget(107-153)check_response_errors(216-250)check_shutdown(62-104)execute_tool_calls(253-355)invoke_compaction(654-689)make_turn_record(463-479)response_to_message(454-460)src/synthorg/engine/loop_protocol.py (3)
ExecutionResult(86-147)TerminationReason(28-37)TurnRecord(40-83)src/synthorg/engine/plan_helpers.py (1)
update_step_status(25-53)src/synthorg/engine/plan_models.py (3)
ExecutionPlan(53-88)PlanStep(25-50)StepStatus(15-22)src/synthorg/engine/stagnation/protocol.py (1)
StagnationDetector(15-46)
src/synthorg/engine/hybrid_helpers.py (7)
src/synthorg/providers/enums.py (2)
FinishReason(24-31)MessageRole(15-21)src/synthorg/engine/loop_helpers.py (4)
build_result(512-527)call_provider(156-213)check_budget(107-153)check_response_errors(216-250)src/synthorg/engine/loop_protocol.py (3)
ExecutionResult(86-147)TerminationReason(28-37)TurnRecord(40-83)src/synthorg/engine/plan_helpers.py (3)
assess_step_success(81-99)extract_task_summary(56-78)update_step_status(25-53)src/synthorg/engine/plan_models.py (3)
ExecutionPlan(53-88)PlanStep(25-50)StepStatus(15-22)src/synthorg/engine/plan_parsing.py (1)
parse_plan(57-100)src/synthorg/engine/context.py (4)
AgentContext(104-412)has_turns_remaining(410-412)with_message(227-236)with_turn_completed(238-287)
🔇 Additional comments (13)
tests/unit/engine/test_hybrid_loop_advanced.py (2)
35-77: LGTM!The stagnation detection tests are well-structured, with inline detector classes that correctly implement the
StagnationDetectorprotocol. The test verifies thatTerminationReason.STAGNATIONis returned when the detector signalsTERMINATE.
354-398: LGTM!Provider error handling tests correctly verify that
TerminationReason.ERRORis returned when the provider raises during planning or step execution. Thetype: ignore[arg-type]comments are appropriate for the test doubles.src/synthorg/engine/hybrid_loop.py (5)
377-378: LGTM!The stale plan snapshot issue flagged in the past review has been addressed. The code now synchronizes the live plan into
all_plansafter updating step status:if all_plans: all_plans[-1] = planThis ensures
final_planin metadata reflects the realCOMPLETED/FAILEDoutcomes.
457-462: LGTM!The shutdown/budget guard issue flagged in the past review has been addressed. The code now re-checks both conditions after the progress summary and before triggering
do_replan():shutdown_result = check_shutdown(ctx, shutdown_checker, turns) if shutdown_result is not None: return self._finalize(shutdown_result, all_plans, replans_used) budget_result = check_budget(ctx, budget_checker, turns) if budget_result is not None: return self._finalize(budget_result, all_plans, replans_used)
139-214: LGTM!The
executemethod is well-structured with clear phase separation:
- Pre-execution setup (model selection, tool definitions, budget warning)
- Planning phase with proper early termination handling
- Step execution delegation
The method correctly handles
ExecutionResultreturns from the planning phase and propagates them through_finalize.
628-642: LGTM!The stagnation detection is correctly scoped to only the current step's turns (
turns[step_start_idx:]), preventing cross-step false positives. The correction count is tracked per-step and the stagnation result is properly handled with turn records updated on termination.
778-793: LGTM!The
_finalizemethod correctly:
- Deep-copies the metadata dict per project conventions
- Adds hybrid-specific fields (
loop_type,plans,final_plan,replans_used)- Returns an immutable copy via
model_copysrc/synthorg/engine/hybrid_helpers.py (6)
627-639: LGTM!The checkpoint callback threading issue flagged in the past review has been addressed. The
do_replanfunction now:
- Accepts
checkpoint_callbackas a keyword parameter (line 638)- Passes it through to
call_planner(line 707)This ensures replan turns are checkpointed consistently with other LLM turns.
254-255: LGTM!The turn exhaustion guard flagged in the past review has been addressed. The function now returns
MAX_TURNSearly before attempting any LLM calls:if not ctx.has_turns_remaining: return build_result(ctx, TerminationReason.MAX_TURNS, turns)
377-386: LGTM!The
bool()parsing issue flagged in the past review has been addressed. The function now explicitly handles boolean and string types:raw = data.get("replan") if isinstance(raw, bool): return raw if isinstance(raw, str): return raw.lower() == "true" # Non-bool, non-str, or missing -- treat as no-replan return FalseThis prevents
bool("false")from incorrectly returningTrue.
214-215: LGTM!The exception handling uses the correct PEP 758 syntax (
except MemoryError, RecursionError:) as required by the coding guidelines.
695-698: LGTM!The replan message correctly uses
MessageRole.USERsince it's requesting a new plan from the LLM (similar to the initial planning prompt). This is distinct from the progress-summary which is internal coordination.
70-97: LGTM!The
truncate_planfunction correctly handles plan truncation:
- Returns the original plan unchanged if within limits
- Logs a warning with detailed context
- Renumbers steps to maintain sequential step numbers (required by
ExecutionPlanvalidation)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 1228-1232: The current sanitization call using re.sub on raw_msg
(assigning to sanitized) misses several common prefixes and relative paths;
update the regex passed to re.sub (the pattern used where sanitized is defined)
to also match /root, /srv, /app, /data and relative paths starting with ./ or
../, or alternatively broaden the absolute-path branch to cover additional
common top-level dirs and a dot-segment branch for ./ and ../ so all such
occurrences are replaced with "[REDACTED_PATH]". Ensure the change targets the
re.sub invocation that produces sanitized from raw_msg so all new patterns are
sanitized consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fcfec21d-4680-4f9e-a4a4-6775708a447e
📒 Files selected for processing (5)
src/synthorg/engine/agent_engine.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/observability/events/execution.pytests/unit/engine/test_hybrid_loop_advanced.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict) for all data models
Use PEP 758 except syntax:except A, B:(no parentheses) — not PEP 654 syntax
Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Include type hints on all public functions and classes; enforce with mypy strict mode
Use Google-style docstrings for all public classes and functions (enforced by ruff D rules)
Prefer immutability: create new objects rather than mutating existing ones; use copy.deepcopy() at construction and MappingProxyType for non-Pydantic collections
Use asyncio.TaskGroup for fan-out/fan-in parallel operations (structured concurrency) instead of bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Set line length to 88 characters (enforced by ruff)
Files:
src/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/observability/events/execution.pytests/unit/engine/test_hybrid_loop_advanced.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST import logger viafrom synthorg.observability import get_loggerthen uselogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code; only use the centralized get_logger function
Always use structured logging with keyword arguments:logger.info(EVENT, key=value)— never use formatted strings likelogger.info('msg %s', val)
Use event name constants from synthorg.observability.events. (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Use NotBlankStr (from core.types) for all identifier/name fields in models instead of manual whitespace validators
Never implement retry logic in provider driver subclasses or calling code — it is handled automatically by BaseCompletionProvider base class
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, or comments; use generic names like example-provider or example-large-001
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (model_copy(update=...)) for runtime state that evolves
Use@computed_fieldfor derived values instead of storing and validating redundant fields
Files:
src/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/observability/events/execution.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark all unit tests with@pytest.mark.unit, integration tests with@pytest.mark.integration, e2e tests with@pytest.mark.e2e, and slow tests with@pytest.mark.slow
Usepytest.mark.parametrizefor testing multiple similar cases instead of writing separate test functions
Use Hypothesis (@given+@settings) for property-based testing in Python unit tests
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests; use generic names like test-provider, test-small-001, or test-medium-001
Set test timeout to 30 seconds per test; use pytest-xdist with-n autofor parallel execution (ALWAYS include this flag, never run tests sequentially)
Maintain minimum 80% code coverage (enforced in CI); usepytest --cov=synthorg --cov-fail-under=80to verify locally
Mock time.monotonic() and asyncio.sleep() in timing-sensitive tests to make them deterministic instead of widening timing margins
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally
Files:
tests/unit/engine/test_hybrid_loop_advanced.py
🧠 Learnings (14)
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events.<domain> (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) for all log events
Applied to files:
src/synthorg/engine/plan_helpers.pysrc/synthorg/observability/events/execution.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR level with context before raising exceptions
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to src/synthorg/**/*.py : Always use structured logging with keyword arguments: `logger.info(EVENT, key=value)` — never use formatted strings like `logger.info('msg %s', val)`
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to tests/**/*.py : Use `pytest.mark.parametrize` for testing multiple similar cases instead of writing separate test functions
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Parametrize: Prefer pytest.mark.parametrize for testing similar cases.
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-19T11:29:37.522Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:29:37.522Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
🧬 Code graph analysis (4)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/plan_helpers.py (3)
assess_step_success(91-109)extract_task_summary(66-88)update_step_status(28-63)src/synthorg/engine/plan_models.py (1)
StepStatus(15-22)
src/synthorg/engine/agent_engine.py (1)
src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)
src/synthorg/engine/plan_helpers.py (3)
src/synthorg/providers/enums.py (2)
FinishReason(24-31)MessageRole(15-21)src/synthorg/engine/context.py (1)
AgentContext(104-412)src/synthorg/engine/plan_models.py (2)
ExecutionPlan(53-88)StepStatus(15-22)
tests/unit/engine/test_hybrid_loop_advanced.py (6)
src/synthorg/engine/hybrid_helpers.py (1)
_parse_replan_decision(356-413)src/synthorg/engine/hybrid_loop.py (5)
HybridLoop(84-794)stagnation_detector(126-128)execute(139-214)config(116-118)compaction_callback(131-133)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)src/synthorg/engine/loop_protocol.py (2)
TerminationReason(28-37)TurnRecord(40-83)src/synthorg/engine/stagnation/models.py (2)
StagnationResult(87-140)StagnationVerdict(15-20)tests/unit/engine/_hybrid_loop_helpers.py (7)
_ctx_with_user_msg(159-161)_make_invoker(154-156)_single_step_plan(46-55)_stop_response(80-86)_summary_response(89-100)_tool_use_response(103-113)execute(143-151)
🔇 Additional comments (15)
src/synthorg/engine/agent_engine.py (2)
9-9: Path sanitization implemented correctly.The sanitization logic addresses the prior review finding. The regex patterns cover:
- Windows paths (
C:\...)- Unix paths starting with
/home,/usr,/var,/tmp,/etc,/opt- HTTP/HTTPS URLs
Also applies to: 1225-1237
164-166: Clean integration ofhybrid_loop_config.The configuration is correctly threaded through:
- Accepted as optional parameter in
__init__- Stored on the instance
- Passed to
build_execution_loopwhen auto-selection triggersAlso applies to: 190-190, 204-204, 1069-1074
src/synthorg/engine/plan_execute_loop.py (3)
59-63: Good extraction of shared utilities.Importing
assess_step_success,extract_task_summary, andupdate_step_statusfromplan_helpersconsolidates logic shared withHybridLoop, improving maintainability.
444-446: Correct fix for stale plan metadata.Syncing
all_plans[-1] = planbefore building the final result ensures thatmetadata["final_plan"]reflects the latest step statuses. This addresses the bug mentioned in the commit messages.
583-583: Helper function usage is consistent.The calls to
extract_task_summary,assess_step_success, andupdate_step_statusproperly replace the removed in-class methods while maintaining the same behavior.Also applies to: 628-628, 818-818
src/synthorg/engine/plan_helpers.py (4)
1-26: Well-structured module setup.The module correctly follows all coding guidelines:
- Centralized logger via
get_logger(__name__)- Event constants imported from
synthorg.observability.events.execution- Type hints and Google-style docstrings present
28-63: Correct implementation ofupdate_step_status.The function:
- Validates bounds and logs at WARNING before raising
IndexError- Uses
model_copyto maintain immutability- Includes all required context (
step_idx,step_count,revision) in the log
66-88: Correct implementation ofextract_task_summary.The function properly handles the fallback chain:
- Task title (preferred)
- First user message content
- Default
"task"with warning log using the event constant
91-109: LGTM!The function is simple and well-documented. The rationale for treating
MAX_TOKENSas success (truncated summary still represents completed work) is clearly explained.tests/unit/engine/test_hybrid_loop_advanced.py (5)
31-128: Comprehensive stagnation detection tests.Both test cases properly verify:
TERMINATEverdict leading toTerminationReason.STAGNATIONINJECT_PROMPTverdict allowing recovery toTerminationReason.COMPLETEDThe custom detector classes follow the protocol interface correctly.
130-159: Good verification of model tiering.The test correctly validates that
planner_modelandexecutor_modelare used for their respective phases, checking the recorded model sequence.
162-307: Well-structured integration tests.The tests cover important scenarios:
- Metadata structure validation (
loop_type,replans_used,final_plan)- Context immutability (original
turn_countand conversation unchanged)- Checkpoint callback invocation count and failure isolation
- Compaction callback integration
310-344: Excellent use ofpytest.mark.parametrize.The 12 test cases are properly consolidated into a single parametrized test with descriptive IDs, following the coding guidelines. This addresses the previous review comment.
347-391: Good error handling test coverage.Both tests verify that provider failures during planning and step execution correctly result in
TerminationReason.ERROR. The inlineFailingProviderclasses are appropriate for testing error paths.src/synthorg/observability/events/execution.py (1)
62-64: Event constants follow naming conventions.The new constants are properly:
- Typed as
Final[str]- Namespaced under
execution.plan.*andexecution.hybrid.*- Named descriptively for their use cases
Also applies to: 86-99
Implement Loop 3 (Hybrid) from the design spec (SS6.5). The hybrid loop combines strategic planning (3-7 steps) with per-step mini-ReAct sub-loops, progress-summary checkpoints, and LLM-decided replanning. Key features: - Per-step turn limit (max_turns_per_step, default 5) - Progress summary after each step via LLM call (tools=None) - Optional replanning on completion (allow_replan_on_completion) - Plan truncation to max_plan_steps with warning - Turn budget advisory at loop start - Full integration with stagnation, compaction, and checkpoint systems New files: - hybrid_loop.py: HybridLoop class (ExecutionLoop protocol) - hybrid_models.py: HybridLoopConfig (frozen Pydantic) - plan_helpers.py: extracted shared plan utilities Integration: - loop_selector.py: hybrid added to _BUILDABLE_LOOP_TYPES, factory updated - AutoLoopConfig: hybrid_fallback defaults to None (hybrid is native) - agent_engine.py: hybrid_loop_config wired through Tests: 47 new tests (27 hybrid loop + 20 config model), all existing tests updated for hybrid-native behavior. Full suite: 9437 passed. Closes #199 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add type: ignore annotations for parametrized dict unpacking and isinstance guards before len/indexing on metadata values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ogging Pre-reviewed by 8 agents, 14 findings addressed: - Fix misleading replan prompt: _do_replan now differentiates between step failure and success-triggered replanning via step_failed param - Extract _handle_completed_step from _run_steps to reduce nesting - Extract _build_summary_prompt and _parse_replan_decision to module level - Add logging to _parse_replan_decision (JSON parse fallback, non-dict) - Warn on empty progress summary response content - Remove unused EXECUTION_HYBRID_STEP_CHECKPOINT constant - Update stale "not yet implemented" docstrings in loop_selector.py - Update CLAUDE.md Package Structure with hybrid loop files - Update docs/design/engine.md: auto-selection, stagnation, compaction - Add stagnation turns fixup comment - Extract _MAX_TASK_SUMMARY_LENGTH constant in plan_helpers.py - Add 12 new tests: _parse_replan_decision (8), checkpoint callback (2), provider exceptions (2) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Docs: - Fix wrong field name allow_replan -> allow_replan_on_completion - Add missing hybrid YAML config fields (planner_model, executor_model, max_replans) - Update CompactionCallback wiring to include HybridLoop - Replace em-dashes with ASCII dashes in plan_execute_loop.py - Fix misleading "3-7 steps" docstring (only ceiling enforced) - Remove unnecessary inline comments Security: - Escape LLM-sourced step content before XML-framed re-injection - Cap plan step count at parse time (max 50) to prevent unbounded allocation - Sanitize exception messages in agent_engine error path Code quality: - Split hybrid_loop.py (1219 -> 785 lines) into hybrid_loop.py + hybrid_helpers.py - Fix _parse_replan_decision: regex heuristic, warning on fallback, dedicated event constant - Add TOOL_USE with no tool calls guard (parity with ReactLoop) - Add bounds check to update_step_status - Add logger declaration to plan_helpers.py - Add PlanExecuteConfig extra="forbid" for strict validation - Add checkpoint_callback parameter to build_execution_loop factory - Add new event constants: EXECUTION_HYBRID_REPLAN_PARSE_TRACE, EXECUTION_HYBRID_PROGRESS_SUMMARY_EMPTY Tests: - Add direct unit tests for plan_helpers.py (10 tests) - Add tests: compaction callback, replan budget sharing, last-step guard, success-path replan prompt - Fix test_provider_error_during_step to use FailingProvider - Move _parse_replan_decision import to module level - Strengthen final_plan metadata assertion - Extract budget setup helper in auto_loop tests - Parametrize duplicate fallback tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Import _parse_replan_decision from hybrid_helpers (not hybrid_loop) - Add CompletionConfig import for do_replan test calls - Remove stale re-export of _parse_replan_decision from hybrid_loop Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Source fixes: - Thread checkpoint_callback through do_replan and attempt_replan so replan turns use the same checkpoint subsystem as initial planning - Add has_turns_remaining guard in call_planner and run_progress_summary to prevent wasted LLM calls when turn budget is exhausted - Parse replan decision explicitly (bool/str) instead of coercing with bool() to prevent "yes"/1/[1] from triggering unintended replans - Add shutdown/budget re-check in _decide_replan_on_completion before do_replan (mirrors attempt_replan guards) - Sync live plan into all_plans before finalization so final_plan metadata reflects actual step statuses - Replace hardcoded event string with EXECUTION_PLAN_SUMMARY_FALLBACK constant in plan_helpers.py - Capture task_summary before ctx.with_message in call_planner and plan_execute_loop._call_planner to prevent planning prompt from being used as task summary - Add hybrid_loop_config to AgentEngine class docstring Tests: - Add checkpoint_callback/compaction_callback assertions to build_execution_loop factory tests - Split test_hybrid_loop.py (1217 lines) into 3 focused modules plus shared helpers -- all under 800 lines - Add replan parse tests for string and integer values Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Redact filesystem paths and URLs from error_message in agent_engine to prevent internal details leaking to API consumers - Log WARNING with event constant before raising IndexError in update_step_status (CLAUDE.md: all error paths must log before raising) - Parametrize TestParseReplanDecision (12 individual methods -> 1 parametrized test with descriptive IDs) - Fix stale all_plans in PlanExecuteLoop._build_final_result (same bug as hybrid_loop, syncs live plan so final_plan metadata reflects actual step statuses) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover /root, /srv, /app, /data and relative paths (./ ../) in the error_message sanitization regex. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
34b19d3 to
83f8be1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/engine/loop_selector.py (1)
266-317:⚠️ Potential issue | 🟠 MajorPass
checkpoint_callbacktoReactLoopinbuild_execution_loop().
ReactLoop.__init__accepts and usescheckpoint_callback(lines 87, 93, 275–277), butbuild_execution_loop()omits it when instantiatingReactLoop. This parameter should be wired consistently across all loop types for feature parity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/loop_selector.py` around lines 266 - 317, build_execution_loop currently omits passing checkpoint_callback into the ReactLoop constructor; update the ReactLoop instantiation inside build_execution_loop to include checkpoint_callback=checkpoint_callback so ReactLoop receives the same per-turn checkpoint callback as PlanExecuteLoop and HybridLoop (refer to build_execution_loop, ReactLoop, and the checkpoint_callback parameter).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/hybrid_helpers.py`:
- Around line 458-467: Change the ChatMessage construction for the progress
summary so it uses MessageRole.SYSTEM instead of MessageRole.USER: in the
creation of summary_msg (which calls _build_summary_prompt with plan, step_idx,
and ask_replan), replace MessageRole.USER with MessageRole.SYSTEM to align this
system-initiated internal prompt with other summary/reconciliation prompts (see
uses in compaction/summarizer and checkpoint/resume).
---
Outside diff comments:
In `@src/synthorg/engine/loop_selector.py`:
- Around line 266-317: build_execution_loop currently omits passing
checkpoint_callback into the ReactLoop constructor; update the ReactLoop
instantiation inside build_execution_loop to include
checkpoint_callback=checkpoint_callback so ReactLoop receives the same per-turn
checkpoint callback as PlanExecuteLoop and HybridLoop (refer to
build_execution_loop, ReactLoop, and the checkpoint_callback parameter).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 40a20aa7-29a2-40cf-8874-be8a0ea46f93
📒 Files selected for processing (21)
CLAUDE.mddocs/design/engine.mdsrc/synthorg/engine/__init__.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/hybrid_models.pysrc/synthorg/engine/loop_selector.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/engine/plan_models.pysrc/synthorg/engine/plan_parsing.pysrc/synthorg/observability/events/execution.pytests/unit/engine/_hybrid_loop_helpers.pytests/unit/engine/test_agent_engine_auto_loop.pytests/unit/engine/test_hybrid_loop.pytests/unit/engine/test_hybrid_loop_advanced.pytests/unit/engine/test_hybrid_loop_replanning.pytests/unit/engine/test_hybrid_models.pytests/unit/engine/test_loop_selector.pytests/unit/engine/test_plan_helpers.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python 3.14+ with PEP 649 native lazy annotations; nofrom __future__ import annotations
Use PEP 758 except syntax:except A, B:(no parentheses) in Python 3.14
All public functions and classes require type hints; mypy strict mode enforced
Use Google-style docstrings on all public classes and functions; enforced by ruff D rules
Create new objects instead of mutating existing ones; usecopy.deepcopy()at construction andMappingProxyTypewrapping for non-Pydantic collections (registries, BaseTool)
Use frozen Pydantic models for config/identity; separate mutable-via-copy models for runtime state evolution; never mix static config fields with mutable runtime fields
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict); use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStrfor all identifier/name fields
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code; prefer structured concurrency over barecreate_task
Enforce 88-character line length via ruff
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files
Files:
src/synthorg/engine/plan_parsing.pysrc/synthorg/engine/plan_models.pysrc/synthorg/engine/__init__.pytests/unit/engine/test_agent_engine_auto_loop.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/plan_execute_loop.pytests/unit/engine/test_hybrid_models.pysrc/synthorg/engine/loop_selector.pytests/unit/engine/test_hybrid_loop_replanning.pytests/unit/engine/_hybrid_loop_helpers.pytests/unit/engine/test_loop_selector.pysrc/synthorg/engine/hybrid_models.pysrc/synthorg/engine/plan_helpers.pytests/unit/engine/test_hybrid_loop.pytests/unit/engine/test_hybrid_loop_advanced.pysrc/synthorg/observability/events/execution.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/hybrid_loop.pytests/unit/engine/test_plan_helpers.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST usefrom synthorg.observability import get_loggerandlogger = get_logger(__name__); never useimport loggingorlogging.getLogger()in application code
Variable name must always belogger(not_logger, notlog)
Use event name constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api); import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging:logger.info(EVENT, key=value); never use formatted strings likelogger.info("msg %s", val)
Log all error paths at WARNING or ERROR with context before raising
Log all state transitions at INFO level
Use DEBUG level for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
All provider calls go throughBaseCompletionProviderwhich applies retry + rate limiting automatically; never implement retry logic in driver subclasses or calling code
RetryConfigandRateLimiterConfigare set per-provider inProviderConfig
Retryable errors:RateLimitError,ProviderTimeoutError,ProviderConnectionError,ProviderInternalError(all withis_retryable=True). Non-retryable errors raise immediately without retry.RetryExhaustedErrorsignals all retries failed.
Rate limiter respectsRateLimitError.retry_afterfrom providers and automatically pauses future requests
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001. Vendor names allowed only in: (1) Operations design page, (2).claude/files, (3) third-party import paths.
Files:
src/synthorg/engine/plan_parsing.pysrc/synthorg/engine/plan_models.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/loop_selector.pysrc/synthorg/engine/hybrid_models.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/observability/events/execution.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/hybrid_loop.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Linter (ruff) enforces ruff check + format; mypy strict type-checking enforced in CI and pre-push; pytest with coverage 80% minimum enforced in CI
PEP 649 lazy annotations supported natively in Python 3.14; no workarounds needed for forward references
Files:
src/synthorg/engine/plan_parsing.pysrc/synthorg/engine/plan_models.pysrc/synthorg/engine/__init__.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/loop_selector.pysrc/synthorg/engine/hybrid_models.pysrc/synthorg/engine/plan_helpers.pysrc/synthorg/observability/events/execution.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/hybrid_loop.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Tests must usetest-provider,test-small-001, etc. instead of real vendor names; never use vendor names in tests
Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow
Maintain 80% minimum code coverage; enforced in CI
asyncio_mode = "auto"— no manual@pytest.mark.asyncioneeded
30-second timeout per test; enforce via pytest configuration
ALWAYS include-n autowhen running pytest; never run tests sequentially viapytest-xdistparallelism
Prefer@pytest.mark.parametrizefor testing similar cases
Use Hypothesis with@givenand@settingsfor property-based testing; profiles:ci(50 examples, default) anddev(1000 examples) viaHYPOTHESIS_PROFILEenv var. Run dev profile:HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic instead of widening timing margins.
Files:
tests/unit/engine/test_agent_engine_auto_loop.pytests/unit/engine/test_hybrid_models.pytests/unit/engine/test_hybrid_loop_replanning.pytests/unit/engine/_hybrid_loop_helpers.pytests/unit/engine/test_loop_selector.pytests/unit/engine/test_hybrid_loop.pytests/unit/engine/test_hybrid_loop_advanced.pytests/unit/engine/test_plan_helpers.py
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Docs are in
docs/(Markdown, built with Zensical); design spec indocs/design/(7 pages); architecture indocs/architecture/; roadmap indocs/roadmap/; security indocs/security.md; licensing indocs/licensing.md; reference indocs/reference/; library reference auto-generated indocs/api/from docstrings via mkdocstrings + Griffe
Files:
docs/design/engine.md
🧠 Learnings (21)
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models for runtime state evolution; never mix static config fields with mutable runtime fields
Applied to files:
src/synthorg/engine/plan_models.pysrc/synthorg/engine/hybrid_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/plan_models.pysrc/synthorg/engine/hybrid_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Applied to files:
src/synthorg/engine/plan_models.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST use `from synthorg.observability import get_logger` and `logger = get_logger(__name__)`; never use `import logging` or `logging.getLogger()` in application code
Applied to files:
src/synthorg/engine/loop_selector.pysrc/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to src/synthorg/**/*.py : Always use structured logging: `logger.info(EVENT, key=value)`; never use formatted strings like `logger.info("msg %s", val)`
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to src/synthorg/**/*.py : Variable name must always be `logger` (not `_logger`, not `log`)
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Structured kwargs in logging: always `logger.info(EVENT, key=value)` — never `logger.info('msg %s', val)`.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
src/synthorg/engine/plan_helpers.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to **/*.py : Keep functions under 50 lines and files under 800 lines
Applied to files:
tests/unit/engine/test_hybrid_loop.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Parametrize: Prefer pytest.mark.parametrize for testing similar cases.
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-19T13:19:14.438Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T13:19:14.438Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins.
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.
Applied to files:
tests/unit/engine/test_hybrid_loop_advanced.py
🧬 Code graph analysis (10)
src/synthorg/engine/__init__.py (2)
src/synthorg/engine/hybrid_loop.py (1)
HybridLoop(84-794)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)
tests/unit/engine/test_agent_engine_auto_loop.py (3)
src/synthorg/budget/enforcer.py (2)
BudgetEnforcer(56-473)cost_tracker(89-91)web/src/api/types.ts (2)
BudgetConfig(406-413)BudgetAlertConfig(393-397)tests/unit/engine/conftest.py (2)
mock_provider_factory(314-316)MockCompletionProvider(207-289)
src/synthorg/engine/plan_execute_loop.py (2)
src/synthorg/engine/plan_helpers.py (3)
assess_step_success(91-109)extract_task_summary(66-88)update_step_status(28-63)src/synthorg/engine/plan_models.py (1)
StepStatus(15-22)
tests/unit/engine/test_hybrid_models.py (1)
src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)
src/synthorg/engine/loop_selector.py (5)
src/synthorg/engine/hybrid_loop.py (5)
HybridLoop(84-794)approval_gate(121-123)stagnation_detector(126-128)compaction_callback(131-133)config(116-118)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)src/synthorg/engine/plan_execute_loop.py (4)
approval_gate(131-133)stagnation_detector(136-138)compaction_callback(141-143)config(126-128)src/synthorg/engine/plan_models.py (1)
PlanExecuteConfig(91-118)src/synthorg/engine/loop_protocol.py (1)
ExecutionLoop(158-196)
tests/unit/engine/test_loop_selector.py (5)
tests/unit/engine/conftest.py (1)
engine(449-460)src/synthorg/engine/hybrid_loop.py (6)
HybridLoop(84-794)config(116-118)get_loop_type(135-137)approval_gate(121-123)stagnation_detector(126-128)compaction_callback(131-133)src/synthorg/engine/loop_selector.py (3)
select_loop_type(228-263)AutoLoopConfig(94-168)build_execution_loop(266-324)src/synthorg/engine/plan_execute_loop.py (5)
config(126-128)get_loop_type(145-147)approval_gate(131-133)stagnation_detector(136-138)compaction_callback(141-143)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)
src/synthorg/engine/plan_helpers.py (3)
src/synthorg/providers/enums.py (2)
FinishReason(24-31)MessageRole(15-21)src/synthorg/engine/context.py (1)
AgentContext(104-412)src/synthorg/engine/plan_models.py (2)
ExecutionPlan(53-88)StepStatus(15-22)
tests/unit/engine/test_hybrid_loop_advanced.py (4)
src/synthorg/engine/hybrid_helpers.py (1)
_parse_replan_decision(356-413)src/synthorg/engine/hybrid_loop.py (3)
HybridLoop(84-794)stagnation_detector(126-128)config(116-118)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)src/synthorg/engine/loop_protocol.py (1)
TerminationReason(28-37)
src/synthorg/engine/hybrid_loop.py (8)
src/synthorg/providers/enums.py (1)
MessageRole(15-21)src/synthorg/engine/hybrid_helpers.py (5)
attempt_replan(522-624)call_planner(228-312)do_replan(627-719)invoke_checkpoint_callback(195-222)run_progress_summary(416-516)src/synthorg/engine/hybrid_models.py (1)
HybridLoopConfig(12-73)src/synthorg/engine/loop_helpers.py (7)
build_result(512-527)call_provider(156-213)check_budget(107-153)check_response_errors(216-250)check_shutdown(62-104)check_stagnation(530-591)execute_tool_calls(253-355)src/synthorg/engine/loop_protocol.py (3)
ExecutionResult(86-147)TerminationReason(28-37)TurnRecord(40-83)src/synthorg/engine/plan_helpers.py (1)
update_step_status(28-63)src/synthorg/engine/plan_models.py (3)
ExecutionPlan(53-88)PlanStep(25-50)StepStatus(15-22)src/synthorg/engine/stagnation/protocol.py (1)
StagnationDetector(15-46)
tests/unit/engine/test_plan_helpers.py (3)
src/synthorg/engine/context.py (3)
AgentContext(104-412)from_identity(190-225)with_message(227-236)src/synthorg/engine/plan_models.py (3)
ExecutionPlan(53-88)PlanStep(25-50)StepStatus(15-22)src/synthorg/providers/enums.py (2)
FinishReason(24-31)MessageRole(15-21)
🔇 Additional comments (75)
src/synthorg/engine/plan_models.py (1)
103-103: LGTM — stricter config validation.Adding
extra="forbid"alignsPlanExecuteConfigwith the newHybridLoopConfigpattern and prevents misconfigured fields from being silently ignored.src/synthorg/engine/plan_parsing.py (1)
210-221: LGTM — sensible parse-time safety cap.The 50-step limit prevents unbounded allocation from misbehaving LLM output while being generous enough for legitimate plans. The warning log provides good context for debugging. This complements the per-loop
max_plan_stepsconfig that applies further truncation after parsing.src/synthorg/engine/__init__.py (1)
113-114: LGTM — clean public API extension.The new hybrid-loop types are properly re-exported and added to
__all__, following the existing alphabetical ordering convention.Also applies to: 287-288
CLAUDE.md (1)
127-127: LGTM — comprehensive documentation update.The engine section now accurately documents the hybrid loop module, its configuration, and the shared plan helpers. The "optional hybrid fallback" wording correctly reflects the default
Nonebehavior inAutoLoopConfig.src/synthorg/engine/agent_engine.py (2)
164-166: LGTM — clean hybrid config threading.The
hybrid_loop_configparameter is properly:
- Documented in the class docstring
- Accepted as an optional
__init__parameter- Stored on the instance
- Passed to
build_execution_loopduring auto-selectionThis enables runtime configuration of hybrid loops without breaking the existing API.
Also applies to: 190-190, 204-204, 1073-1073
1225-1239: LGTM — improved error message sanitization.The sanitization logic now:
- Redacts Windows paths (
C:\...)- Redacts common Unix paths (
/home,/usr,/var,/tmp,/etc,/opt,/root,/srv,/app,/data)- Redacts relative paths (
./,../)- Redacts URLs (
http://,https://)- Strips non-printable characters
- Caps at 200 characters
- Falls back to "details redacted" when no alphanumeric content remains
This addresses the prior review feedback about path leakage.
tests/unit/engine/test_agent_engine_auto_loop.py (2)
55-67: LGTM — good test helper extraction.The
_make_budget_enforcer()helper centralizes budget setup with sensible defaults, improving test maintainability and reducing duplication.
214-264: LGTM — test expectations updated for hybrid loop.The tests correctly expect
"hybrid"for complex tasks with OK budget (or unknown budget), and the additionalsummary_responseaccounts for the hybrid loop's progress-summary LLM call after step execution.Also applies to: 274-325
docs/design/engine.md (3)
405-411: LGTM — config example matches implementation.The YAML example now correctly uses
allow_replan_on_completion(notallow_replan) and includesmax_replans: 3, matchingHybridLoopConfigfield names.
434-436: LGTM — hybrid fallback docs updated.The documentation now correctly reflects that
hybrid_fallbackdefaults toNoneand the hybrid loop runs directly when not configured, aligning withAutoLoopConfigdefaults.Also applies to: 487-489
606-608: LGTM — subsystem integration docs extended.Stagnation detection and compaction documentation now correctly describe
HybridLoop's per-step mini-ReAct semantics, matching the implementation pattern shared withPlanExecuteLoop.Also applies to: 650-651, 688-691
src/synthorg/engine/plan_execute_loop.py (3)
59-63: LGTM — clean extraction to shared helpers.The refactor to use
assess_step_success,extract_task_summary, andupdate_step_statusfromplan_helpersenables code reuse withHybridLoopwhile maintaining the same behavior.
444-446: LGTM — ensures final metadata reflects step statuses.Syncing
all_plans[-1] = planbefore building the final result ensures thefinal_planmetadata correctly includes the latest step statuses (COMPLETED, FAILED, etc.) rather than stale PENDING statuses.
279-283: LGTM — consistent use of shared helpers.All call sites now use the extracted helper functions with the same semantics as the original inline logic.
Also applies to: 310-314, 373-373, 583-583, 628-628, 818-818
src/synthorg/engine/loop_selector.py (4)
22-25: LGTM!Correct imports added for
HybridLoopand the newCheckpointCallbacktype. The type-checking import pattern is properly used for types only needed in annotations.
47-50: LGTM!Adding
"hybrid"to_BUILDABLE_LOOP_TYPEScorrectly enables the factory to construct hybrid loops. The docstring accurately describes the purpose.
104-129: LGTM!The
hybrid_fallbackfield defaulting toNonealigns with the PR objective of making hybrid a first-class loop type. The docstring clearly explains the new behavior.
156-168: LGTM!The validation logic correctly ensures both
hybrid_fallbackanddefault_loop_typeare buildable when set, preventing infinite loops or runtime failures.src/synthorg/engine/hybrid_models.py (1)
1-73: LGTM!Well-designed frozen configuration model following project conventions:
ConfigDict(frozen=True, extra="forbid")ensures immutability and rejects unknown fields.NotBlankStr | Nonefor model overrides prevents whitespace-only values.- Sensible numeric constraints (
ge/le) with reasonable defaults.- Comprehensive Google-style docstring documents all attributes.
tests/unit/engine/test_hybrid_models.py (1)
1-101: LGTM!Comprehensive unit tests for
HybridLoopConfig:
- Default value assertions match the model definition.
- Custom value override test confirms all fields are settable.
- Immutability test catches frozen model violations.
- Parametrized boundary tests cover all numeric constraints efficiently.
- Blank string rejection validates
NotBlankStrbehavior.- Uses
test-*model names per project guidelines.tests/unit/engine/_hybrid_loop_helpers.py (1)
1-180: LGTM!Well-organized test helper module that centralizes fixture construction:
- Factory functions provide consistent
CompletionResponsevariants for different scenarios._StubTooland_make_invokerenable tool-use testing without external dependencies._make_plan_modelcreates validExecutionPlanfixtures.- Extracted to keep individual test files under 800 lines per project guidelines.
tests/unit/engine/test_loop_selector.py (4)
165-184: LGTM!Good use of parametrization to test both
hybrid_fallback=None(preserves hybrid) and explicit fallback behavior in a single test method.
217-227: LGTM!New test case validates the important scenario where budget is OK and no fallback is configured, ensuring hybrid is preserved.
310-321: LGTM!Good tests confirming that
hybrid_fallback=Noneanddefault_loop_type="hybrid"are now valid configurations since hybrid is buildable.
406-441: LGTM!Comprehensive hybrid loop factory tests:
- Basic construction and
get_loop_type()verification.- Config propagation confirms
HybridLoopConfigvalues flow through.- Gate/callback wiring test addresses prior review feedback by asserting
checkpoint_callbackandcompaction_callbackpropagation.Note: Accessing
loop._checkpoint_callback(Line 440) is acceptable in tests to verify internal wiring.tests/unit/engine/test_hybrid_loop_replanning.py (3)
1-95: LGTM!Solid replanning unit tests:
test_max_replans_exhaustedverifiesmax_replans=0terminates with ERROR.test_successful_replan_on_failurevalidates replan → completion flow and metadata tracking.test_content_filter_during_step_returns_errorcovers content filter edge case.
97-159: LGTM!Excellent test validating that replan prompts differ semantically between success and failure triggers. Directly testing
do_replanhelper withstep_failed=True/Falseand asserting prompt content differences ensures the LLM receives appropriate context.
162-228: LGTM!Important behavioral tests:
test_replan_budget_shared_between_failure_and_completionconfirmsmax_replansis a shared budget across all replan triggers.test_last_step_no_replan_on_completionverifies completion-triggered replanning is correctly skipped on the final step.src/synthorg/engine/plan_helpers.py (4)
1-26: LGTM!Proper module setup:
- Correct logger initialization with
get_logger(__name__).- Event constants imported from
synthorg.observability.events.execution.- Type-checking imports used appropriately.
28-63: LGTM!
update_step_statuscorrectly:
- Validates bounds and logs
EXECUTION_PLAN_STEP_INDEX_OUT_OF_RANGEwith structured context before raisingIndexError.- Returns a new immutable plan via
model_copywithout mutating the original.- Converts steps list back to tuple for the frozen model.
66-88: LGTM!
extract_task_summaryproperly prioritizes sources:
- Task title from
task_executionwhen available.- First user message content.
- Fallback to
"task"with a warning log includingexecution_id.Truncation to
_MAX_TASK_SUMMARY_LENGTH(200) prevents overly long summaries.
91-109: LGTM!
assess_step_successcorrectly treats bothSTOPandMAX_TOKENSas successful outcomes. The docstring clearly explains whyMAX_TOKENSis considered success (truncated summary still represents completed work).tests/unit/engine/test_plan_helpers.py (4)
1-57: LGTM!Well-structured test helpers:
_make_plancreates validExecutionPlanfixtures with configurable step count._make_responsecreates minimalCompletionResponsefor testing finish reason classification.
64-112: LGTM!Thorough
update_step_statustest coverage:
- Verifies immutability (original plan unchanged).
- Tests boundary conditions (first index, last index).
- Validates
IndexErrorwith descriptive message for out-of-range and negative indices.
120-179: LGTM!Comprehensive
extract_task_summarytests:
- Validates task title extraction when
task_executionpresent.- Tests fallback to first user message.
- Verifies
"task"fallback for empty context.- Confirms exact 200-character truncation.
187-208: LGTM!Efficient parametrized test covers all
FinishReasonvariants, validating thatSTOPandMAX_TOKENSreturnTruewhileTOOL_USE,CONTENT_FILTER, andERRORreturnFalse.src/synthorg/observability/events/execution.py (1)
62-99: LGTM!The new observability event constants follow the established naming conventions and are well-organized with a clear section comment for hybrid loop events. The use of
Final[str]is consistent with the rest of the module.tests/unit/engine/test_hybrid_loop.py (7)
1-42: LGTM!The module structure is clean with proper imports, clear docstring explaining test scope, and appropriate separation of concerns (replanning and advanced tests in separate files as documented).
44-67: LGTM!Protocol compliance and configuration tests are concise and verify the essential contract: loop type identity, interface implementation, and config defaults/overrides.
70-146: LGTM!Basic execution tests properly verify turn sequencing and call categories. The mix of single-step and multi-step plans appropriately covers both edge cases and the strategic multi-step flow.
149-179: LGTM!Tool invocation test correctly verifies that tool calls are counted and the turn sequence includes the tool response handling.
182-242: LGTM!Per-step turn limit tests effectively verify the hybrid-specific behavior where steps that exhaust their turn budget are marked as failed and trigger the replan exhaustion path.
245-298: LGTM!Progress summary tests properly verify the LLM-decided replanning mechanism with both enabled and disabled configurations.
301-485: LGTM!Budget, shutdown, max turns, and plan parsing tests comprehensively cover termination conditions. The counter-based callbacks effectively simulate mid-execution interruptions, and the plan truncation test verifies that
max_plan_stepsis enforced.tests/unit/engine/test_hybrid_loop_advanced.py (7)
1-29: LGTM!The module correctly separates advanced test scenarios with clean imports and proper use of shared test helpers.
31-127: LGTM!Stagnation tests properly implement the
StagnationDetectorprotocol and verify both termination and correction injection behaviors within step execution.
130-159: LGTM!Model tiering test correctly verifies that
planner_modelis used for planning and progress summaries whileexecutor_modelis used for step execution. The test uses appropriatetest-*model names per coding guidelines.
162-272: LGTM!Metadata structure, context immutability, and checkpoint callback tests provide solid coverage of hybrid-specific requirements. The callback failure test correctly verifies that checkpoint exceptions don't propagate.
275-307: LGTM!Compaction callback test verifies the integration point is working. The assertion
len(compaction_calls) >= 1appropriately accounts for implementation details of when compaction is triggered.
310-344: LGTM!The parametrized test structure follows the coding guidelines and comprehensively covers
_parse_replan_decisionedge cases including JSON parsing, markdown fences, text heuristics, and malformed inputs.
347-391: LGTM!Provider error handling tests correctly verify that exceptions during planning and step execution result in
TerminationReason.ERROR. The inlineFailingProviderclasses effectively simulate provider failures.src/synthorg/engine/hybrid_loop.py (12)
1-82: LGTM!The module docstring, imports, and logging setup follow coding guidelines. The use of
get_logger(__name__)and event constants from the observability module is correct.
84-138: LGTM!The
HybridLoopclass follows the immutable config pattern with proper encapsulation. The constructor accepts optional dependencies and stores them as private attributes with read-only property accessors.
139-214: LGTM!The
executemethod has a clean two-phase structure with proper initialization of mutable state within method scope. The turn budget warning is appropriately issued at loop start.
243-354: Verify thatall_plansis properly synchronized after IN_PROGRESS status update.The status update at line 266 sets
planto a new copy withIN_PROGRESSstatus, butall_plansisn't immediately updated. This is fine because:
- On step completion,
_handle_completed_step(line 377-378) syncsall_plans[-1] = plan- On step failure,
attempt_replanappends the new plan toall_plans_build_final_result(line 505-506) syncs before building the resultThe
IN_PROGRESSstatus is transient during step execution and correctly transitions toCOMPLETEDorFAILEDbefore metadata finalization.
356-423: LGTM!The completed step handler correctly updates step status, syncs the live plan to
all_plans[-1], and conditionally runs progress summaries with proper error propagation.
425-491: LGTM!The replan decision logic correctly re-checks shutdown and budget guards after the progress summary before issuing the replan call. The guard conditions at lines 448-454 prevent unnecessary replanning when not warranted.
493-535: LGTM!The final result builder correctly syncs the live plan to
all_plansbefore building metadata, ensuringfinal_planreflects actual step status changes. The explicit comment at lines 503-504 documents this intent.
537-575: LGTM!The plan generation method cleanly delegates to
call_plannerand applies truncation. The logging provides good observability with step count and revision number.
577-653: LGTM!The step execution method implements the mini-ReAct sub-loop with proper per-step turn limits, stagnation detection scoped to step turns, and context compaction at turn boundaries.
664-748: LGTM!The step turn execution properly handles the response flow: shutdown/budget checks, provider call, turn recording, error checking, context update, checkpointing, and branching to tool call handling or step completion assessment.
750-774: LGTM!Tool call handling correctly re-checks shutdown before execution and clears pending tool calls from the turn record when shutdown is triggered, preventing incomplete tool execution from being recorded.
778-794: LGTM!The finalize method correctly deep-copies metadata before mutation and properly serializes execution plans using
model_dump(). The emptyall_plansguard prevents index errors.src/synthorg/engine/hybrid_helpers.py (12)
1-65: LGTM!The module docstring clearly explains the purpose (stateless helpers to keep the main loop under 800 lines). Imports are well-organized with TYPE_CHECKING for forward references.
67-97: LGTM!The truncation function correctly maintains immutability by returning a new plan copy and properly renumbers steps to satisfy the sequential step number invariant.
100-127: LGTM!The step message builder properly escapes
<and>characters in user-provided content to prevent prompt injection, and the instruction explicitly tells the LLM to treat XML tag contents as data.
130-161: LGTM!Step completion handling correctly identifies the anomalous case of
TOOL_USEwithout actual tool calls as a failure, and logs truncation warnings forMAX_TOKENSresponses.
164-189: LGTM!The budget warning calculation is a reasonable estimate that helps users identify configuration issues early. The comment appropriately notes that replan overhead is excluded from the estimate.
192-222: LGTM!The checkpoint callback invocation correctly uses PEP 758 except syntax (
except MemoryError, RecursionError:) and follows the principle that checkpointing failures must not interrupt execution while critical errors (memory, recursion) should propagate.
225-312: LGTM!The planner call helper correctly guards against exhausted turns before making the call, uses
SYSTEMcategory for planning turns, and threads the checkpoint callback through.
315-353: LGTM!The summary prompt builder provides clear context about plan progress and conditionally includes the replan question with a JSON format example when appropriate.
356-413: LGTM!The replan decision parser correctly handles JSON with explicit boolean and string parsing (avoiding the
bool()coercion issue from the past review), supports markdown-fenced JSON, and falls back to a regex heuristic with appropriate logging.
468-516: LGTM!The remainder of the progress summary function correctly handles the provider call, turn recording with
SYSTEMcategory, empty content warnings, and replan decision parsing.
519-624: LGTM!The replan attempt function correctly marks the failed step, enforces replan budget limits with appropriate error messages, guards against exhausted resources, and threads the checkpoint callback through to
do_replan.
627-719: LGTM!The replanning function correctly threads the checkpoint callback to
call_planner(addressing the past review concern), builds context-appropriate prompts distinguishing between failure and completion triggers, and applies plan truncation to the result.
| summary_msg = ChatMessage( | ||
| role=MessageRole.USER, | ||
| content=_build_summary_prompt( | ||
| plan, | ||
| step_idx, | ||
| ask_replan=( | ||
| config.allow_replan_on_completion and step_idx < len(plan.steps) - 1 | ||
| ), | ||
| ), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there's documentation about MessageRole usage for progress summaries
rg -n "progress.?summary" -A3 -B3 --type=md | head -50
# Check how other internal coordination prompts use MessageRole
ast-grep --pattern 'ChatMessage(
$$$
role=MessageRole.$_,
$$$
)'Repository: Aureliolo/synthorg
Length of output: 3205
🏁 Script executed:
cat -n src/synthorg/engine/hybrid_helpers.py | sed -n '440,490p'Repository: Aureliolo/synthorg
Length of output: 1832
🏁 Script executed:
# Find all ChatMessage instantiations to understand MessageRole patterns
rg -n "ChatMessage\(" -A2 src/ --type=py | grep -A2 "role=" | head -80Repository: Aureliolo/synthorg
Length of output: 4027
🏁 Script executed:
# Search for LLMCallCategory.SYSTEM usage to understand when it's used
rg -n "LLMCallCategory\.SYSTEM" -B5 -A2 src/ --type=py | head -100Repository: Aureliolo/synthorg
Length of output: 2235
Change MessageRole.USER to MessageRole.SYSTEM for the progress summary prompt.
The progress summary at line 459 uses MessageRole.USER but is recorded with LLMCallCategory.SYSTEM. Similar internal coordination operations—such as the compaction summarizer (compaction/summarizer.py:188) and checkpoint recovery prompt (checkpoint/resume.py:87)—use MessageRole.SYSTEM for their summary and reconciliation prompts. Using MessageRole.SYSTEM here aligns with the pattern for system-initiated internal operations and makes the conversation structure consistent with comparable prompts in the codebase.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/engine/hybrid_helpers.py` around lines 458 - 467, Change the
ChatMessage construction for the progress summary so it uses MessageRole.SYSTEM
instead of MessageRole.USER: in the creation of summary_msg (which calls
_build_summary_prompt with plan, step_idx, and ask_replan), replace
MessageRole.USER with MessageRole.SYSTEM to align this system-initiated internal
prompt with other summary/reconciliation prompts (see uses in
compaction/summarizer and checkpoint/resume).
…_loop PlanExecuteLoop and HybridLoop already received checkpoint_callback from the factory; ReactLoop was the only loop type missing it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
🤖 I have created a release *beep* *boop* --- ## [0.3.7](v0.3.6...v0.3.7) (2026-03-19) ### Features * **engine:** implement Hybrid Plan + ReAct execution loop ([#582](#582)) ([008147c](008147c)) * implement first-run setup wizard ([#584](#584)) ([dfed931](dfed931)) ### Bug Fixes * **api:** address ZAP DAST scan findings ([#579](#579)) ([ce9a3e0](ce9a3e0)) * **cli:** regenerate compose and re-exec binary on update ([#576](#576)) ([3f226eb](3f226eb)) ### CI/CD * add SBOM generation to Docker and CLI releases ([#580](#580)) ([db459cf](db459cf)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.3.7](v0.3.6...v0.3.7) (2026-03-19) ### Features * **engine:** implement Hybrid Plan + ReAct execution loop ([#582](#582)) ([008147c](008147c)) * implement first-run setup wizard ([#584](#584)) ([dfed931](dfed931)) ### Bug Fixes * **api:** address ZAP DAST scan findings ([#579](#579)) ([ce9a3e0](ce9a3e0)) * **cli:** auto-delete binary on Windows, prune images, fix GoReleaser ([#590](#590)) ([eb7c691](eb7c691)) * **cli:** regenerate compose and re-exec binary on update ([#576](#576)) ([3f226eb](3f226eb)) ### CI/CD * add SBOM generation to Docker and CLI releases ([#580](#580)) ([db459cf](db459cf)) ### Maintenance * **main:** release 0.3.7 ([#583](#583)) ([bf58779](bf58779)) * reset failed v0.3.7 release ([#591](#591)) ([b69000d](b69000d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added engine Hybrid Plan + ReAct execution loop * Added first-run setup wizard * **Bug Fixes** * Addressed ZAP DAST scan issues * Fixed CLI Windows/image/update issues * **Maintenance** * Added SBOM generation for Docker/CLI releases * General maintenance updates <!-- end of auto-generated comment: release notes by coderabbit.ai -->
🤖 I have created a release *beep* *boop* --- ## [0.3.7](v0.3.6...v0.3.7) (2026-03-19) ### Features * **engine:** implement Hybrid Plan + ReAct execution loop ([#582](#582)) ([008147c](008147c)) * implement first-run setup wizard ([#584](#584)) ([dfed931](dfed931)) ### Bug Fixes * **api:** address ZAP DAST scan findings ([#579](#579)) ([ce9a3e0](ce9a3e0)) * **ci:** reset failed v0.3.7 release and fix syft SBOM scan ([#593](#593)) ([d1508c2](d1508c2)) * **cli:** auto-delete binary on Windows, prune images, fix GoReleaser ([#590](#590)) ([eb7c691](eb7c691)) * **cli:** regenerate compose and re-exec binary on update ([#576](#576)) ([3f226eb](3f226eb)) ### CI/CD * add SBOM generation to Docker and CLI releases ([#580](#580)) ([db459cf](db459cf)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.3.7](v0.3.6...v0.3.7) (2026-03-19) ### Features * **engine:** implement Hybrid Plan + ReAct execution loop ([#582](#582)) ([008147c](008147c)) * implement first-run setup wizard ([#584](#584)) ([dfed931](dfed931)) ### Bug Fixes * **api:** address ZAP DAST scan findings ([#579](#579)) ([ce9a3e0](ce9a3e0)) * **ci:** remove CLI SBOM generation, reset failed v0.3.7 ([#595](#595)) ([d0f4992](d0f4992)) * **ci:** reset failed v0.3.7 release and fix syft SBOM scan ([#593](#593)) ([d1508c2](d1508c2)) * **cli:** auto-delete binary on Windows, prune images, fix GoReleaser ([#590](#590)) ([eb7c691](eb7c691)) * **cli:** regenerate compose and re-exec binary on update ([#576](#576)) ([3f226eb](3f226eb)) ### CI/CD * add SBOM generation to Docker and CLI releases ([#580](#580)) ([db459cf](db459cf)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Summary
plan_helpers.pyto eliminate duplication between PlanExecuteLoop and HybridLooploop_selector.py)hybrid_loop_configthroughAgentEnginefor runtime configurationKey features
max_turns_per_step, default 5) -- steps that exhaust their budget are marked FAILEDallow_replan_on_completion) with distinct prompt for success vs failure triggersmax_plan_stepswith warning logNew files
hybrid_loop.py-- HybridLoop class (ExecutionLoop protocol)hybrid_models.py-- HybridLoopConfig (frozen Pydantic, 7 fields)plan_helpers.py--update_step_status,extract_task_summary,assess_step_successPre-PR review
8 agents reviewed this PR (code-reviewer, conventions-enforcer, logging-audit, test-analyzer, silent-failure-hunter, issue-verifier, docs-consistency, type-design). 14 findings identified and addressed in a follow-up commit:
_parse_replan_decisionand empty summary warningEXECUTION_HYBRID_STEP_CHECKPOINTconstantTest plan
Closes #199
🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation