Skip to content

feat(engine): implement execution loop auto-selection based on task complexity#567

Merged
Aureliolo merged 11 commits intomainfrom
feat/execution-loop-auto-select
Mar 19, 2026
Merged

feat(engine): implement execution loop auto-selection based on task complexity#567
Aureliolo merged 11 commits intomainfrom
feat/execution-loop-auto-select

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Add automatic execution loop selection that maps task estimated_complexity to the optimal loop: simple -> ReAct, medium -> Plan-and-Execute, complex/epic -> Hybrid (falls back to Plan-and-Execute until HybridLoop is implemented)
  • Budget-aware downgrade: when monthly utilization >= threshold (default 80%), complex tasks use Plan-and-Execute instead of Hybrid to conserve budget
  • New AutoLoopConfig (frozen Pydantic model) with configurable rules, budget threshold, and hybrid fallback
  • BudgetEnforcer.get_budget_utilization_pct() for querying current monthly budget state
  • AgentEngine accepts auto_loop_config (mutually exclusive with execution_loop) and resolves the loop per-task in _execute()

Test plan

  • 35 unit tests for loop_selector.py (all complexity mappings, budget downgrade, hybrid fallback, interaction priority, model validation, factory, logging)
  • 6 integration tests for AgentEngine auto-loop (simple->react, medium->plan_execute, mutual exclusivity, budget-aware tight/ok, budget error fallback)
  • 6 tests for BudgetEnforcer.get_budget_utilization_pct (correct %, disabled, zero, over-budget, tracker failure, MemoryError propagation)
  • Full suite: 9484 passed, 93.93% coverage
  • Lint (ruff), type-check (mypy), format all clean
  • Pre-push hooks passed (mypy + pytest + gitleaks)

Review coverage

Pre-reviewed by 11 agents (docs-consistency, code-reviewer, python-reviewer, test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, conventions-enforcer, resilience-audit, async-concurrency-reviewer, issue-resolution-verifier). 16 findings addressed in second commit.

Closes #200

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 18, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 18, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Automatic per-task execution-loop auto-selection by complexity with budget-aware downgrade, hybrid fallback and configurable rules; engine queries budget utilization and emits new observability events for auto-selection, budget downgrade/unavailable, hybrid fallback, rule misses, and unknown loop types.
  • Documentation

    • Agent orchestration docs and README updated to describe auto-selection and the revised execution pipeline.
  • Bug Fixes

  • Tests

    • Extensive unit and integration tests covering selection logic, budget interactions, fallbacks, validation, and resume paths.

Walkthrough

Adds automatic per-task execution-loop selection based on task complexity with budget-aware downgrade and hybrid fallback. Introduces a new loop_selector module and re-exports, integrates auto-loop config into AgentEngine (per-task resolution and resume wiring), adds budget utilization querying, new observability events, tests, docs updates, and CI vulnerability ignore entries.

Changes

Cohort / File(s) Summary
Documentation
CLAUDE.md, README.md, docs/design/engine.md
Documented execution-loop auto-selection behavior, three-step selection (rule match → budget-aware downgrade → hybrid fallback), default_loop_type, and AgentEngine “Resolve execution loop” step.
Auto-loop Selection Core
src/synthorg/engine/loop_selector.py, src/synthorg/engine/__init__.py
New module and re-exports: AutoLoopRule, AutoLoopConfig, DEFAULT_AUTO_LOOP_RULES, select_loop_type(), build_execution_loop() implementing rule validation, selection logic (including budget downgrade and hybrid fallback), and loop factory creation.
AgentEngine Integration
src/synthorg/engine/agent_engine.py
Adds auto_loop_config to constructor (mutually exclusive with execution_loop), per-execution _resolve_loop(...) that dry-runs selection, optionally queries BudgetEnforcer.get_budget_utilization_pct(), builds the concrete loop via build_execution_loop, threads resolved loop through execution and resume paths, and emits new execution events.
Budget & Observability
src/synthorg/budget/enforcer.py, src/synthorg/observability/events/budget.py, src/synthorg/observability/events/execution.py
Adds BudgetEnforcer.get_budget_utilization_pct() with error handling and observability events; introduces new budget and execution-loop event constants used during selection and error paths.
Tests
tests/unit/budget/test_enforcer.py, tests/unit/engine/test_loop_selector.py, tests/unit/engine/test_agent_engine_auto_loop.py
New unit tests covering budget utilization behavior, select_loop_type rules/validation/fallbacks, build_execution_loop, AgentEngine auto-loop integration (including resume path), and emitted observability events.
Config / CI ignores
.github/.grype.yaml, .github/.trivyignore.yaml
Added CVE-2026-32767 ignore entries for vulnerability scanners (configuration-only changes).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AgentEngine
    participant LoopSelector
    participant BudgetEnforcer
    participant LoopFactory
    participant ExecutionLoop

    Client->>AgentEngine: run(task, auto_loop_config)
    AgentEngine->>AgentEngine: _resolve_loop(task)
    AgentEngine->>LoopSelector: select_loop_type(complexity, rules, budget_utilization=None)
    LoopSelector-->>AgentEngine: loop_type
    alt loop_type == "hybrid" and BudgetEnforcer present
        AgentEngine->>BudgetEnforcer: get_budget_utilization_pct()
        BudgetEnforcer-->>AgentEngine: budget_pct or None
        AgentEngine->>LoopSelector: select_loop_type(complexity, rules, budget_pct)
        LoopSelector-->>AgentEngine: final_loop_type
    end
    AgentEngine->>LoopFactory: build_execution_loop(final_loop_type, ...)
    LoopFactory-->>AgentEngine: ExecutionLoop instance
    AgentEngine->>ExecutionLoop: execute(task)
    ExecutionLoop-->>AgentEngine: result
    AgentEngine-->>Client: AgentRunResult
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 67.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically summarizes the main change: implementing automatic execution loop selection based on task complexity, which is the primary objective of this PR.
Description check ✅ Passed The description comprehensively covers the changeset: auto-selection logic, budget-aware downgrade, new models/configs, BudgetEnforcer method, AgentEngine integration, test coverage, and review status.
Linked Issues check ✅ Passed The PR fully implements issue #200 objectives: auto-selection mapping complexity to loop types [loop_selector.py], configurable rules via AutoLoopConfig, budget-aware downgrade via get_budget_utilization_pct [enforcer.py], AgentEngine integration with per-task resolution, and comprehensive validation.
Out of Scope Changes check ✅ Passed All changes are scoped to the linked issue #200: loop auto-selection, budget integration, and AgentEngine refactoring. Two CVE ignores added to .github configs are unrelated maintenance but minimal and non-functional.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/execution-loop-auto-select
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/execution-loop-auto-select
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a sophisticated, automated system for dynamically selecting the most appropriate execution loop for agent tasks. By considering both task complexity and current budget utilization, the system optimizes resource allocation and operational efficiency, ensuring that agents use the most suitable strategy for their given workload while also managing costs. This enhancement significantly improves the adaptability and intelligence of the agent engine.

Highlights

  • Automatic Execution Loop Selection: Implemented automatic execution loop selection that maps task complexity (simple, medium, complex/epic) to the optimal loop type (ReAct, Plan-and-Execute, Hybrid). Hybrid currently falls back to Plan-and-Execute until fully implemented.
  • Budget-Aware Downgrade: Introduced a budget-aware mechanism that downgrades complex tasks from Hybrid to Plan-and-Execute when monthly budget utilization exceeds a configurable threshold (default 80%), aiming to conserve budget.
  • New Configuration Model: Added a new AutoLoopConfig (a frozen Pydantic model) to configure auto-selection rules, budget thresholds, and hybrid fallback behavior.
  • Budget Utilization Query: Provided a new BudgetEnforcer.get_budget_utilization_pct() method for querying the current monthly budget utilization state.
  • AgentEngine Integration: Updated AgentEngine to accept auto_loop_config (mutually exclusive with execution_loop) and dynamically resolve the appropriate execution loop for each task in its _execute() method.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 18, 2026 22:34 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-designed feature for automatically selecting the agent execution loop based on task complexity and budget utilization. The implementation is robust, with new configuration models, a dedicated selection module, and comprehensive tests. My review identifies a critical syntax error and a significant logic bug in the checkpoint resume path that should be addressed. I've also included a couple of medium-severity suggestions to improve logging and future flexibility.

monthly_cost = await self._cost_tracker.get_total_cost(
start=period_start,
)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The except MemoryError, RecursionError: syntax is from Python 2. For Python 3, multiple exceptions in a single except block must be grouped in a tuple. This will cause a SyntaxError at runtime.

        except (MemoryError, RecursionError):

)

loop = self._make_loop_with_callback(agent_id, task_id)
loop = self._make_loop_with_callback(self._loop, agent_id, task_id)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When resuming from a checkpoint, the execution loop is hardcoded to self._loop, which bypasses the new auto-selection logic. If a task was using an auto-selected loop (e.g., plan_execute for a complex task) and crashed, it would be resumed with the default loop (e.g., ReactLoop), which could lead to incorrect behavior. The loop for a resumed execution should also be resolved dynamically using _resolve_loop based on the task's complexity.

        resolved_loop = await self._resolve_loop(checkpoint_ctx.task_execution.task)
        loop = self._make_loop_with_callback(resolved_loop, agent_id, task_id)

Comment on lines +1005 to +1009
note=(
"budget enforcer present but utilization "
"unknown; proceeding without budget-aware "
"loop downgrade"
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message for unavailable budget utilization is a bit verbose and wrapped in unnecessary parentheses. It can be simplified for better readability in logs.

                    note="budget utilization unknown; proceeding without budget-aware loop downgrade"

Comment on lines +155 to +158
loop_type = next(
(r.loop_type for r in rules if r.complexity == complexity),
"react",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The fallback loop type for when no rule matches a task's complexity is hardcoded to "react". This could be made more flexible by adding a default_loop_type field to AutoLoopConfig, similar to hybrid_fallback. This would allow users to configure a different default if desired, for instance, defaulting to plan_execute for any unmapped complexities.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 18, 2026

Codecov Report

❌ Patch coverage is 93.98496% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 92.66%. Comparing base (9b6bf33) to head (2c232a3).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/engine/loop_selector.py 91.56% 5 Missing and 2 partials ⚠️
src/synthorg/engine/agent_engine.py 96.15% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #567      +/-   ##
==========================================
+ Coverage   92.64%   92.66%   +0.01%     
==========================================
  Files         544      545       +1     
  Lines       26931    27061     +130     
  Branches     2582     2603      +21     
==========================================
+ Hits        24951    25075     +124     
  Misses       1568     1568              
- Partials      412      418       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 903-904: The resume path is rebuilding the loop using self._loop
(which may be the default ReactLoop) causing checkpoints created under a
different loop like PlanExecuteLoop to resume under the wrong loop; change the
resume code in the resume path (where self._make_loop_with_callback and
loop.execute are used) to resolve/recreate the original loop type from persisted
checkpoint metadata (preferably stored on checkpoint_ctx, e.g.
checkpoint_ctx.task_execution.task or an explicit loop_type field) before
injecting the checkpoint callback, and pass that reconstructed loop into
self._make_loop_with_callback instead of self._loop so loop-specific state and
callbacks are preserved (also apply same change to the other resume locations
around the 987-1031 range).

In `@src/synthorg/engine/loop_selector.py`:
- Around line 40-41: The AutoLoopConfig should validate provided loop_type and
hybrid_fallback against the allowed set _KNOWN_LOOP_TYPES as soon as the config
is constructed (instead of letting build_execution_loop fail later); add a
validation guard in AutoLoopConfig (e.g., in its __post_init__ or factory
method) that checks loop_type is in _KNOWN_LOOP_TYPES and, if hybrid_fallback is
provided, that it is also in _KNOWN_LOOP_TYPES (and disallow None if "hybrid"
cannot be built), raising a clear ValueError for invalid values; update any
callers that construct AutoLoopConfig to rely on this and remove downstream
assumptions in build_execution_loop or factory code (reference symbols:
AutoLoopConfig, loop_type, hybrid_fallback, _KNOWN_LOOP_TYPES,
build_execution_loop).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0665d678-2aab-4e6a-b3c7-1f434f03681f

📥 Commits

Reviewing files that changed from the base of the PR and between ae01b06 and 79d4708.

📒 Files selected for processing (12)
  • CLAUDE.md
  • README.md
  • docs/design/engine.md
  • src/synthorg/budget/enforcer.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/budget/test_enforcer.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_loop_selector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (10)
{src/synthorg/**/*.py,tests/**/*.py,**/*.md,web/**/{*.ts,*.js,*.vue}}

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names only in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths/modules

Files:

  • README.md
  • docs/design/engine.md
  • src/synthorg/budget/enforcer.py
  • CLAUDE.md
  • src/synthorg/observability/events/execution.py
  • tests/unit/engine/test_loop_selector.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/observability/events/budget.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/__init__.py
  • tests/unit/budget/test_enforcer.py
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Markdown: use for all documentation files (docs/, site/, README, etc.)

Files:

  • README.md
  • docs/design/engine.md
  • CLAUDE.md
docs/design/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

docs/design/*.md: When approved deviations occur, update the relevant docs/design/ page to reflect the new reality
Design spec pages: 7 pages in docs/design/ — index, agents, organization, communication, engine, memory, operations

Files:

  • docs/design/engine.md
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14
Type hints required on all public functions, mypy strict mode
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr for all identifier/name fields including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over bare create_task
Line length: 88 characters (ruff)
Functions should be < 50 lines, files < 800 lines
Validate at system boundaries (user input, external APIs, config files)

Files:

  • src/synthorg/budget/enforcer.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/engine/test_loop_selector.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/observability/events/budget.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/__init__.py
  • tests/unit/budget/test_enforcer.py
**/{*.py,*.go}

📄 CodeRabbit inference engine (CLAUDE.md)

Handle errors explicitly, never silently swallow them

Files:

  • src/synthorg/budget/enforcer.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/engine/test_loop_selector.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/observability/events/budget.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/__init__.py
  • tests/unit/budget/test_enforcer.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code
Variable name for logger: always logger (not _logger, not log)
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging: always logger.info(EVENT, key=value) — never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
Library reference: auto-generated from docstrings via mkdocstrings + Griffe (AST-based, no imports) in docs/api/
Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP, tool factory, sandbox factory)

Files:

  • src/synthorg/budget/enforcer.py
  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/__init__.py
src/synthorg/budget/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Files:

  • src/synthorg/budget/enforcer.py
src/synthorg/observability/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Files:

  • src/synthorg/observability/events/execution.py
  • src/synthorg/observability/events/budget.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Async testing: asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed
Prefer @pytest.mark.parametrize for testing similar cases
Never skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins

Files:

  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/budget/test_enforcer.py
src/synthorg/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Files:

  • src/synthorg/engine/loop_selector.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/__init__.py
🧠 Learnings (14)
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Applied to files:

  • README.md
  • docs/design/engine.md
  • CLAUDE.md
  • src/synthorg/engine/loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/__init__.py
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/budget/enforcer.py
  • src/synthorg/observability/events/budget.py
  • tests/unit/budget/test_enforcer.py
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • CLAUDE.md
  • src/synthorg/engine/__init__.py
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/communication/**/*.py : Communication package (communication/): message bus, dispatcher, messenger, channels, delegation, loop prevention, conflict resolution; meeting/ subpackage for meeting protocol (round-robin, position papers, structured phases), scheduler (frequency, participant resolver), orchestrator

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue — DESIGN_SPEC.md is a pointer file linking to 7 design pages (Agents, Organization, Communication, Engine, Memory, Operations)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/security/**/*.py : Security package (security/): SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/backup/**/*.py : Backup package (backup/): scheduled/manual/lifecycle backups of persistence DB, agent memory, company config. BackupService orchestrator, BackupScheduler (periodic asyncio task), RetentionManager (count + age pruning), tar.gz compression, SHA-256 checksums, manifest tracking, validated restore with atomic rollback and safety backup. handlers/ subpackage: ComponentHandler protocol + concrete handlers (PersistenceComponentHandler, MemoryComponentHandler, ConfigComponentHandler)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/execution.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • src/synthorg/observability/events/execution.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/execution.py
  • src/synthorg/observability/events/budget.py
🧬 Code graph analysis (5)
tests/unit/engine/test_loop_selector.py (1)
src/synthorg/engine/loop_selector.py (4)
  • AutoLoopConfig (76-116)
  • AutoLoopRule (44-56)
  • build_execution_loop (194-236)
  • select_loop_type (119-191)
src/synthorg/engine/loop_selector.py (4)
src/synthorg/engine/plan_execute_loop.py (1)
  • PlanExecuteLoop (84-930)
src/synthorg/engine/react_loop.py (1)
  • ReactLoop (61-352)
src/synthorg/engine/loop_protocol.py (1)
  • ExecutionLoop (158-196)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
src/synthorg/engine/agent_engine.py (6)
src/synthorg/engine/loop_selector.py (3)
  • AutoLoopConfig (76-116)
  • build_execution_loop (194-236)
  • select_loop_type (119-191)
src/synthorg/engine/react_loop.py (3)
  • stagnation_detector (104-106)
  • get_loop_type (113-115)
  • approval_gate (99-101)
src/synthorg/engine/plan_execute_loop.py (3)
  • stagnation_detector (131-133)
  • get_loop_type (140-142)
  • approval_gate (126-128)
src/synthorg/engine/loop_protocol.py (2)
  • get_loop_type (194-196)
  • ExecutionLoop (158-196)
src/synthorg/engine/checkpoint/resume.py (1)
  • make_loop_with_callback (99-144)
src/synthorg/budget/enforcer.py (1)
  • get_budget_utilization_pct (96-130)
src/synthorg/engine/__init__.py (1)
src/synthorg/engine/loop_selector.py (4)
  • AutoLoopConfig (76-116)
  • AutoLoopRule (44-56)
  • build_execution_loop (194-236)
  • select_loop_type (119-191)
tests/unit/budget/test_enforcer.py (3)
tests/unit/budget/test_enforcer_quota.py (1)
  • _patch_periods (61-75)
tests/unit/budget/conftest.py (1)
  • make_cost_record (288-309)
src/synthorg/budget/enforcer.py (2)
  • cost_tracker (92-94)
  • get_budget_utilization_pct (96-130)
🪛 markdownlint-cli2 (0.21.0)
docs/design/engine.md

[warning] 421-421: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

🔇 Additional comments (11)
tests/unit/budget/test_enforcer.py (1)

1088-1161: LGTM! Comprehensive test coverage for get_budget_utilization_pct.

The test suite covers all key scenarios: correct percentage calculation, disabled budget (returns None), zero spend, over-budget (>100%), graceful degradation on tracker failure, and MemoryError propagation. Good use of pytest.approx for float comparisons and proper isolation via _patch_periods.

CLAUDE.md (1)

127-127: LGTM! Documentation accurately reflects the new auto-loop selection feature.

The engine package description now includes the loop selector module and its public API surface (AutoLoopConfig, AutoLoopRule, select_loop_type, build_execution_loop), aligning with the implementation in loop_selector.py.

README.md (1)

35-35: LGTM! README accurately reflects the new auto-selection capability.

The concise addition of "auto-selection by complexity" correctly summarizes the feature for end users.

src/synthorg/observability/events/budget.py (1)

34-36: LGTM! New budget utilization event constants follow established patterns.

The constants BUDGET_UTILIZATION_QUERIED and BUDGET_UTILIZATION_ERROR are correctly typed with Final[str] and follow the existing budget.<domain>.<action> naming convention.

src/synthorg/engine/__init__.py (1)

121-127: LGTM! Public API surface correctly expanded for auto-loop selection.

The engine package properly re-exports the new loop selector symbols (AutoLoopConfig, AutoLoopRule, DEFAULT_AUTO_LOOP_RULES, build_execution_loop, select_loop_type) with alphabetically sorted __all__ entries.

Also applies to: 213-213, 237-238, 375-375, 382-382

src/synthorg/budget/enforcer.py (1)

96-130: LGTM! Well-implemented budget utilization query with proper error handling.

The method follows established patterns in the class:

  • Graceful degradation on tracker failures (returns None)
  • MemoryError/RecursionError properly re-raised
  • Structured logging with domain-specific event constants
  • Clear docstring explaining return semantics
src/synthorg/observability/events/execution.py (1)

75-82: LGTM! New auto-selection event constants follow established patterns.

The six new EXECUTION_LOOP_* constants are consistently named, properly typed with Final[str], and cover the key decision points in the auto-loop selection flow (auto-selected, budget downgrade, hybrid fallback, no rule match, unknown type, budget unavailable).

docs/design/engine.md (2)

417-428: LGTM! Auto-selection documentation clearly explains the three-layer decision process.

The tip block now accurately describes:

  1. Rule matching (complexity → loop type)
  2. Budget-aware downgrade (hybrid → plan_execute when utilization ≥ threshold)
  3. Hybrid fallback (until HybridLoop is implemented)

This aligns with the implementation in select_loop_type().


475-493: LGTM! Pipeline steps updated to reflect auto-loop resolution.

The new step 8 "Resolve execution loop" clearly documents the dynamic loop selection via select_loop_type() with budget utilization query, and subsequent steps are properly renumbered.

tests/unit/engine/test_loop_selector.py (1)

168-193: Good precedence coverage.

These cases pin the budget downgrade -> hybrid fallback ordering, which is exactly the kind of selector behavior that tends to regress when new branches get added.

tests/unit/engine/test_agent_engine_auto_loop.py (1)

150-253: Useful end-to-end budget-selection coverage.

Driving the real engine and asserting EXECUTION_LOOP_AUTO_SELECTED gives good protection against selector/engine wiring regressions.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/synthorg/engine/loop_selector.py (1)

122-143: ⚠️ Potential issue | 🟠 Major

The validator still allows runtime-only hybrid failures.

It now rejects unknown loop names, but configs like the default rules plus hybrid_fallback=None or hybrid_fallback="hybrid" still pass construction even though build_execution_loop() cannot instantiate "hybrid". That means the first complex/epic task fails only at execution time. Reject any config whose reachable output set still includes "hybrid" until HybridLoop exists.

Suggested validation guard
+_BUILDABLE_LOOP_TYPES: frozenset[str] = frozenset({"react", "plan_execute"})
+
 class AutoLoopConfig(BaseModel):
@@
     `@model_validator`(mode="after")
     def _validate_rules_and_fallbacks(self) -> Self:
         """Validate unique complexities and known loop types."""
         seen: set[Complexity] = set()
         for rule in self.rules:
@@
                 msg = f"Unknown loop type in rules: {rule.loop_type!r}"
                 raise ValueError(msg)
             seen.add(rule.complexity)
+        hybrid_reachable = (
+            self.default_loop_type == "hybrid"
+            or any(rule.loop_type == "hybrid" for rule in self.rules)
+        )
         if (
             self.hybrid_fallback is not None
-            and self.hybrid_fallback not in _KNOWN_LOOP_TYPES
+            and self.hybrid_fallback not in _BUILDABLE_LOOP_TYPES
         ):
-            msg = f"Unknown hybrid_fallback: {self.hybrid_fallback!r}"
+            msg = f"Unsupported hybrid_fallback: {self.hybrid_fallback!r}"
             raise ValueError(msg)
+        if hybrid_reachable and self.hybrid_fallback is None:
+            msg = "hybrid_fallback=None is unsupported until HybridLoop exists"
+            raise ValueError(msg)
         if self.default_loop_type not in _KNOWN_LOOP_TYPES:
             msg = f"Unknown default_loop_type: {self.default_loop_type!r}"
             raise ValueError(msg)
         return self

As per coding guidelines, "Validate at system boundaries (user input, external APIs, config files)."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/loop_selector.py` around lines 122 - 143, The validator
_validate_rules_and_fallbacks currently only checks membership in
_KNOWN_LOOP_TYPES, allowing configurations (rules, hybrid_fallback,
default_loop_type) that include the "hybrid" name to pass even though
build_execution_loop cannot instantiate a HybridLoop; update the validator to
treat "hybrid" as currently unavailable by rejecting any configuration where any
reachable loop name (from each rule.loop_type, hybrid_fallback if not None, and
default_loop_type) equals "hybrid" (or more generally any name in a new
UNAVAILABLE_LOOP_TYPES set) and raise a ValueError with a clear message; refer
to symbols _validate_rules_and_fallbacks, rules, rule.loop_type,
hybrid_fallback, default_loop_type, build_execution_loop, and HybridLoop so the
check prevents configs that would fail at execution time.
src/synthorg/engine/agent_engine.py (1)

909-914: ⚠️ Potential issue | 🟠 Major

Resume still recomputes the loop from live state.

This uses the current budget/config instead of the loop that produced the checkpoint. With a custom hybrid_fallback="react", a complex task can checkpoint under ReactLoop and resume under PlanExecuteLoop once budget crosses the threshold, or the reverse. That replays checkpoint state under a different loop family. Persist the selected loop type with the checkpoint and rebuild from that value on resume.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 909 - 914, Resume currently
recomputes the loop from live config via _resolve_loop, which can replay a
checkpoint under a different loop family (e.g., ReactLoop vs PlanExecuteLoop);
persist the loop identity with the checkpoint (e.g., checkpoint_ctx.loop_type or
similar) when creating a checkpoint, and on resume use that persisted loop_type
to reconstruct the original base loop instead of calling _resolve_loop. Update
the checkpoint creation path to record the loop type and update the resume path
around checkpoint_ctx.task_execution / base_loop selection to rebuild the exact
loop class (via the existing loop factory/mapping used by _resolve_loop or a
small switch handling ReactLoop/PlanExecuteLoop/hybrid_fallback) before calling
_make_loop_with_callback(agent_id, task_id).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/engine.md`:
- Around line 417-433: The admonition under "Auto-selection" is being parsed as
an indented code block because the numbered list and subsequent lines are
indented; remove the leading indentation so the content is normal prose.
Specifically, unindent the numbered list and the paragraphs referencing
execution_loop: "auto", AutoLoopConfig.rules, AutoLoopRule, hybrid_fallback, and
default_loop_type so they align directly under the admonition header (ensure
there's a blank line after the tip header), use standard Markdown list
indentation (no 4-space block), and verify the three numbered items render as a
normal ordered list rather than a code fence.

In `@tests/unit/engine/test_agent_engine_auto_loop.py`:
- Around line 331-358: The test currently calls _resolve_loop() directly instead
of exercising the resume path; update the test to call
AgentEngine._execute_resumed_loop() and ensure the resume path awaits
_resolve_loop by either (a) providing a minimal checkpoint-like context so
_execute_resumed_loop runs through the resume branch, or (b)
patching/monkeypatching AgentEngine._resolve_loop with an async mock/coroutine
that records when it's awaited and returns a loop, then call
engine._execute_resumed_loop(task, ...) and assert the mock was awaited and
returned value used (e.g., loop.get_loop_type() or log event); reference the
methods _execute_resumed_loop and _resolve_loop and keep the existing assertions
about the EXECUTION_LOOP_AUTO_SELECTED log entry.

---

Duplicate comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 909-914: Resume currently recomputes the loop from live config via
_resolve_loop, which can replay a checkpoint under a different loop family
(e.g., ReactLoop vs PlanExecuteLoop); persist the loop identity with the
checkpoint (e.g., checkpoint_ctx.loop_type or similar) when creating a
checkpoint, and on resume use that persisted loop_type to reconstruct the
original base loop instead of calling _resolve_loop. Update the checkpoint
creation path to record the loop type and update the resume path around
checkpoint_ctx.task_execution / base_loop selection to rebuild the exact loop
class (via the existing loop factory/mapping used by _resolve_loop or a small
switch handling ReactLoop/PlanExecuteLoop/hybrid_fallback) before calling
_make_loop_with_callback(agent_id, task_id).

In `@src/synthorg/engine/loop_selector.py`:
- Around line 122-143: The validator _validate_rules_and_fallbacks currently
only checks membership in _KNOWN_LOOP_TYPES, allowing configurations (rules,
hybrid_fallback, default_loop_type) that include the "hybrid" name to pass even
though build_execution_loop cannot instantiate a HybridLoop; update the
validator to treat "hybrid" as currently unavailable by rejecting any
configuration where any reachable loop name (from each rule.loop_type,
hybrid_fallback if not None, and default_loop_type) equals "hybrid" (or more
generally any name in a new UNAVAILABLE_LOOP_TYPES set) and raise a ValueError
with a clear message; refer to symbols _validate_rules_and_fallbacks, rules,
rule.loop_type, hybrid_fallback, default_loop_type, build_execution_loop, and
HybridLoop so the check prevents configs that would fail at execution time.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3e2b2f75-d27f-48c4-9a7f-875dddb2bfae

📥 Commits

Reviewing files that changed from the base of the PR and between 79d4708 and 1e690af.

📒 Files selected for processing (7)
  • CLAUDE.md
  • docs/design/engine.md
  • src/synthorg/budget/enforcer.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_loop_selector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (9)
{src/synthorg/**/*.py,tests/**/*.py,**/*.md,web/**/{*.ts,*.js,*.vue}}

📄 CodeRabbit inference engine (CLAUDE.md)

Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names only in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths/modules

Files:

  • CLAUDE.md
  • src/synthorg/budget/enforcer.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
  • docs/design/engine.md
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Markdown: use for all documentation files (docs/, site/, README, etc.)

Files:

  • CLAUDE.md
  • docs/design/engine.md
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14
Type hints required on all public functions, mypy strict mode
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr for all identifier/name fields including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over bare create_task
Line length: 88 characters (ruff)
Functions should be < 50 lines, files < 800 lines
Validate at system boundaries (user input, external APIs, config files)

Files:

  • src/synthorg/budget/enforcer.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
**/{*.py,*.go}

📄 CodeRabbit inference engine (CLAUDE.md)

Handle errors explicitly, never silently swallow them

Files:

  • src/synthorg/budget/enforcer.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code
Variable name for logger: always logger (not _logger, not log)
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging: always logger.info(EVENT, key=value) — never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging for object creation, internal flow, entry/exit of key functions
Library reference: auto-generated from docstrings via mkdocstrings + Griffe (AST-based, no imports) in docs/api/
Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP, tool factory, sandbox factory)

Files:

  • src/synthorg/budget/enforcer.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
src/synthorg/budget/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Files:

  • src/synthorg/budget/enforcer.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Async testing: asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed
Prefer @pytest.mark.parametrize for testing similar cases
Never skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins

Files:

  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
src/synthorg/engine/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Files:

  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
docs/design/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

docs/design/*.md: When approved deviations occur, update the relevant docs/design/ page to reflect the new reality
Design spec pages: 7 pages in docs/design/ — index, agents, organization, communication, engine, memory, operations

Files:

  • docs/design/engine.md
🧠 Learnings (13)
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Applied to files:

  • CLAUDE.md
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
  • docs/design/engine.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/communication/**/*.py : Communication package (communication/): message bus, dispatcher, messenger, channels, delegation, loop prevention, conflict resolution; meeting/ subpackage for meeting protocol (round-robin, position papers, structured phases), scheduler (frequency, participant resolver), orchestrator

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue — DESIGN_SPEC.md is a pointer file linking to 7 design pages (Agents, Organization, Communication, Engine, Memory, Operations)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/security/**/*.py : Security package (security/): SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/backup/**/*.py : Backup package (backup/): scheduled/manual/lifecycle backups of persistence DB, agent memory, company config. BackupService orchestrator, BackupScheduler (periodic asyncio task), RetentionManager (count + age pruning), tar.gz compression, SHA-256 checksums, manifest tracking, validated restore with atomic rollback and safety backup. handlers/ subpackage: ComponentHandler protocol + concrete handlers (PersistenceComponentHandler, MemoryComponentHandler, ConfigComponentHandler)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/budget/enforcer.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-18T21:35:45.198Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:35:45.198Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

  • src/synthorg/engine/loop_selector.py
🧬 Code graph analysis (3)
tests/unit/engine/test_agent_engine_auto_loop.py (4)
src/synthorg/budget/enforcer.py (2)
  • BudgetEnforcer (56-473)
  • cost_tracker (89-91)
tests/unit/engine/conftest.py (4)
  • engine (449-460)
  • MockCompletionProvider (207-289)
  • make_completion_response (292-310)
  • mock_provider_factory (314-316)
src/synthorg/engine/agent_engine.py (3)
  • AgentEngine (130-1296)
  • run (287-397)
  • _resolve_loop (998-1044)
src/synthorg/engine/loop_selector.py (1)
  • AutoLoopConfig (83-143)
src/synthorg/engine/agent_engine.py (3)
src/synthorg/engine/loop_selector.py (3)
  • AutoLoopConfig (83-143)
  • build_execution_loop (228-272)
  • select_loop_type (146-225)
src/synthorg/engine/checkpoint/resume.py (1)
  • make_loop_with_callback (99-144)
src/synthorg/budget/enforcer.py (1)
  • get_budget_utilization_pct (93-123)
src/synthorg/engine/loop_selector.py (5)
tests/unit/engine/conftest.py (1)
  • engine (449-460)
src/synthorg/engine/plan_execute_loop.py (1)
  • PlanExecuteLoop (84-930)
src/synthorg/engine/react_loop.py (1)
  • ReactLoop (61-352)
src/synthorg/engine/loop_protocol.py (1)
  • ExecutionLoop (158-196)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
🪛 markdownlint-cli2 (0.21.0)
docs/design/engine.md

[warning] 421-421: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
tests/unit/engine/test_agent_engine_auto_loop.py (1)

357-385: ⚠️ Potential issue | 🟡 Minor

Assert that the resolved loop is the one that actually runs.

resolve_mock.assert_awaited_once() only proves the resume path looked up a loop. If _execute_resumed_loop() regresses to self._loop after that await, this test can still pass. Patch resolved_loop.execute (or return a loop double) and assert that exact mock was awaited.

🧪 Tighten the assertion
         resolved_loop: ExecutionLoop = PlanExecuteLoop()
         resolve_mock = AsyncMock(return_value=resolved_loop)
@@
         with (
             patch.object(engine, "_resolve_loop", resolve_mock),
             patch.object(
-                PlanExecuteLoop,
+                resolved_loop,
                 "execute",
                 new_callable=AsyncMock,
                 return_value=exec_result,
-            ),
+            ) as execute_mock,
         ):
             await engine._execute_resumed_loop(
                 checkpoint_ctx,
                 str(sample_agent_with_personality.id),
                 str(task.id),
             )
@@
         resolve_mock.assert_awaited_once()
+        execute_mock.assert_awaited_once()
         call_task = resolve_mock.call_args[0][0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_agent_engine_auto_loop.py` around lines 357 - 385, The
test currently only asserts resolve_mock.assert_awaited_once(), which doesn't
prove the returned resolved_loop actually executed; update the test to patch or
replace resolved_loop.execute with an AsyncMock (instead of patching
PlanExecuteLoop.execute) and have resolve_mock return that resolved_loop; then
after awaiting engine._execute_resumed_loop(...) assert that
resolved_loop.execute (the AsyncMock) was awaited exactly once to ensure the
specific resolved loop instance was run. Use the existing resolved_loop and
resolve_mock identifiers and assert on resolved_loop.execute.await_count or
resolved_loop.execute.assert_awaited_once().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/.trivyignore.yaml:
- Around line 13-19: Update the suppression for CVE-2026-32767 in
.trivyignore.yaml to include package scope and an expiration: add a purls entry
that targets the specific package (e.g., pkg:deb/debian/libexpat@2.7.4) and add
an expired_at field set ~90 days out so the suppression is time-limited; ensure
the suppression still contains the original statement text and the CVE id
(CVE-2026-32767) so future scans only ignore this exact CVE for that specific
package until the expiration.

In `@src/synthorg/engine/loop_selector.py`:
- Around line 167-246: The select_loop_type function is too large and should be
split into small helpers: extract the rule lookup, budget downgrade, and hybrid
fallback into three private functions (e.g., _match_loop_type(rules, complexity,
default_loop_type), _downgrade_for_budget(loop_type, budget_utilization_pct,
budget_tight_threshold), and _apply_hybrid_fallback(loop_type, hybrid_fallback))
and have select_loop_type call them in order; ensure each helper preserves the
existing logging calls and semantics (use the same log events
EXECUTION_LOOP_NO_RULE_MATCH, EXECUTION_LOOP_BUDGET_DOWNGRADE,
EXECUTION_LOOP_HYBRID_FALLBACK and the same parameter names), accept and return
the same values (strings or None where applicable), and keep select_loop_type
under 50 lines by delegating rule matching, budget-aware downgrade, and hybrid
fallback to these helpers.
- Around line 148-157: The current validation in loop_selector.py incorrectly
rejects configurations where default_loop_type == "hybrid" even when
self.hybrid_fallback is provided and would redirect to a buildable loop type;
update the validation logic so that the ValueError for an unbuildable
default_loop_type is only raised if the default remains unbuildable after
applying hybrid_fallback (i.e., if self.default_loop_type not in
_BUILDABLE_LOOP_TYPES AND not (self.default_loop_type == "hybrid" and
self.hybrid_fallback in _BUILDABLE_LOOP_TYPES)). Keep the existing
has_hybrid_rule check that requires hybrid_fallback when unbuildable loop types
exist, and ensure select_loop_type and AutoLoopConfig behavior remains
consistent with this adjusted validation.

---

Duplicate comments:
In `@tests/unit/engine/test_agent_engine_auto_loop.py`:
- Around line 357-385: The test currently only asserts
resolve_mock.assert_awaited_once(), which doesn't prove the returned
resolved_loop actually executed; update the test to patch or replace
resolved_loop.execute with an AsyncMock (instead of patching
PlanExecuteLoop.execute) and have resolve_mock return that resolved_loop; then
after awaiting engine._execute_resumed_loop(...) assert that
resolved_loop.execute (the AsyncMock) was awaited exactly once to ensure the
specific resolved loop instance was run. Use the existing resolved_loop and
resolve_mock identifiers and assert on resolved_loop.execute.await_count or
resolved_loop.execute.assert_awaited_once().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fcd9ff83-0145-4eee-9e0e-7f7428653746

📥 Commits

Reviewing files that changed from the base of the PR and between 1e690af and 30c2a9a.

📒 Files selected for processing (5)
  • .github/.grype.yaml
  • .github/.trivyignore.yaml
  • src/synthorg/engine/loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_loop_selector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) for exception handling — PEP 758 except syntax, enforced by ruff on Python 3.14.
All public functions require type hints — mypy strict mode enforced.
Docstrings must use Google style and are required on all public classes and functions — enforced by ruff D rules.

Files:

  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_loop_selector.py
  • src/synthorg/engine/loop_selector.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Async tests use asyncio_mode = 'auto' — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Use Hypothesis for property-based testing with @given + @settings. Hypothesis profiles: ci (50 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Files:

  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_loop_selector.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff).
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow them.
Validate at system boundaries (user input, external APIs, config files).
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/engine/loop_selector.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__).
Never use import logging / logging.getLogger() / print() in application code.
Always use logger as the variable name (not _logger, not log).
Event names must always use constants from the domain-specific module under synthorg.observability.events. Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs in logger calls: always logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately.
RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Files:

  • src/synthorg/engine/loop_selector.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/**/*.py : Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Security scanning: pip-audit (Python), npm audit (web dependencies), Trivy + Grype (Docker images), govulncheck (Go), gitleaks (secrets), zizmor (GitHub Actions workflows), OSSF Scorecard (supply chain), ZAP DAST (API), Socket.dev (typosquatting/malware detection).

Applied to files:

  • .github/.grype.yaml
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/**/*.py : Validate at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/loop_selector.py
🧬 Code graph analysis (2)
tests/unit/engine/test_agent_engine_auto_loop.py (6)
src/synthorg/budget/enforcer.py (1)
  • BudgetEnforcer (56-473)
tests/unit/engine/conftest.py (3)
  • engine (449-460)
  • MockCompletionProvider (207-289)
  • make_completion_response (292-310)
src/synthorg/engine/agent_engine.py (1)
  • _execute_resumed_loop (886-926)
src/synthorg/engine/loop_selector.py (1)
  • AutoLoopConfig (85-164)
src/synthorg/engine/plan_execute_loop.py (1)
  • PlanExecuteLoop (84-930)
src/synthorg/engine/react_loop.py (1)
  • ReactLoop (61-352)
tests/unit/engine/test_loop_selector.py (2)
src/synthorg/engine/loop_selector.py (4)
  • AutoLoopConfig (85-164)
  • AutoLoopRule (53-65)
  • build_execution_loop (249-293)
  • select_loop_type (167-246)
src/synthorg/engine/react_loop.py (1)
  • ReactLoop (61-352)
🔇 Additional comments (1)
.github/.grype.yaml (1)

10-14: Duplicate of the same suppression-risk pattern: avoid CVE-only global ignore.

Mirror the Trivy fix here: constrain by package/artifact where possible and track an expiry/revalidation date so this exception doesn’t become permanent technical risk.

In the current Grype configuration format, what ignore-rule fields can constrain a CVE suppression to a specific package/image/namespace, and is there built-in support for expiration metadata?

Based on learnings, this repository treats Trivy and Grype as core security scanners and should keep suppressions tightly controlled.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
.github/.trivyignore.yaml (1)

14-16: ⚠️ Potential issue | 🟠 Major

Use Trivy package scoping key purls (not paths) and verify expired_at schema format.

At Line 14, this looks like a package URL scope but it is under paths. In Trivy YAML ignores, package scoping is handled via purls; using paths here can make the suppression mis-scoped or ineffective. Also confirm whether Line 16 accepts RFC3339 timestamp or requires date-only format in your Trivy version.

Suggested fix
   - id: CVE-2026-32767
-    paths:
+    purls:
       - "pkg:apk/alpine/libexpat"
-    expired_at: "2026-06-17T00:00:00Z"
+    expired_at: "2026-06-17"
#!/bin/bash
# Read-only verification for Trivy ignore schema usage in repo.
# 1) Confirm current key usage in ignore file.
# 2) Check Trivy docs for supported fields and date format.

set -euo pipefail

echo "== Current ignore entry =="
cat -n .github/.trivyignore.yaml | sed -n '10,30p'

echo
echo "== Search for purls/paths usage =="
rg -n '^\s*(purls|paths|expired_at)\s*:' .github/.trivyignore.yaml

echo
echo "== Fetch Trivy docs snippets (public) =="
curl -fsSL https://trivy.dev/latest/docs/configuration/filtering/ | rg -n 'trivyignore|purls|paths|expired_at' -C 2 || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/.trivyignore.yaml around lines 14 - 16, Replace the incorrect
"paths" key with the Trivy package scoping key "purls" for the entry that
currently reads a package URL (i.e., change the mapping that uses "paths: -
\"pkg:apk/alpine/libexpat\"" to use "purls: - \"pkg:apk/alpine/libexpat\""), and
validate/adjust the "expired_at" value to the schema your Trivy version expects
(confirm whether it requires full RFC3339 timestamp or a date-only string and
update the "expired_at" value accordingly); look for the keys "paths", "purls",
and "expired_at" to locate and fix the entry.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/loop_selector.py`:
- Around line 62-65: The Pydantic models AutoLoopRule and AutoLoopConfig
currently use model_config = ConfigDict(frozen=True) which leaves the default
extra behavior (ignore); update both to forbid unknown fields by changing their
model_config to ConfigDict(frozen=True, extra="forbid") so typos in config keys
raise errors instead of being dropped.
- Around line 53-65: AutoLoopRule currently allows any non-blank string for
loop_type though the docstring lists allowed values; add a pydantic
field_validator on AutoLoopRule.loop_type that checks the value is one of the
known options ("react","plan_execute","hybrid") and raises a ValidationError for
anything else so invalid types are rejected at construction; update error text
to mention the allowed values and ensure this prevents typos from propagating to
select_loop_type() and build_execution_loop() (AutoLoopConfig can still perform
its own validation but the model must enforce the contract).

---

Duplicate comments:
In @.github/.trivyignore.yaml:
- Around line 14-16: Replace the incorrect "paths" key with the Trivy package
scoping key "purls" for the entry that currently reads a package URL (i.e.,
change the mapping that uses "paths: - \"pkg:apk/alpine/libexpat\"" to use
"purls: - \"pkg:apk/alpine/libexpat\""), and validate/adjust the "expired_at"
value to the schema your Trivy version expects (confirm whether it requires full
RFC3339 timestamp or a date-only string and update the "expired_at" value
accordingly); look for the keys "paths", "purls", and "expired_at" to locate and
fix the entry.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dce009fd-f218-4410-bd7c-16c08c1532df

📥 Commits

Reviewing files that changed from the base of the PR and between 30c2a9a and 93f65f4.

📒 Files selected for processing (4)
  • .github/.trivyignore.yaml
  • src/synthorg/engine/loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_loop_selector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) for exception handling — PEP 758 except syntax, enforced by ruff on Python 3.14.
All public functions require type hints — mypy strict mode enforced.
Docstrings must use Google style and are required on all public classes and functions — enforced by ruff D rules.

Files:

  • src/synthorg/engine/loop_selector.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff).
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow them.
Validate at system boundaries (user input, external APIs, config files).
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/engine/loop_selector.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__).
Never use import logging / logging.getLogger() / print() in application code.
Always use logger as the variable name (not _logger, not log).
Event names must always use constants from the domain-specific module under synthorg.observability.events. Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs in logger calls: always logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately.
RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Files:

  • src/synthorg/engine/loop_selector.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Async tests use asyncio_mode = 'auto' — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Use Hypothesis for property-based testing with @given + @settings. Hypothesis profiles: ci (50 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Files:

  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/**/*.py : Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/**/*.py : Validate at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/**/*.py : Functions must be less than 50 lines; files must be less than 800 lines.

Applied to files:

  • src/synthorg/engine/loop_selector.py
🧬 Code graph analysis (2)
tests/unit/engine/test_loop_selector.py (1)
src/synthorg/engine/loop_selector.py (4)
  • AutoLoopConfig (85-171)
  • AutoLoopRule (53-65)
  • build_execution_loop (270-314)
  • select_loop_type (231-267)
tests/unit/engine/test_agent_engine_auto_loop.py (4)
src/synthorg/budget/enforcer.py (1)
  • cost_tracker (89-91)
src/synthorg/engine/agent_engine.py (1)
  • _execute_resumed_loop (886-926)
src/synthorg/engine/loop_selector.py (1)
  • AutoLoopConfig (85-171)
src/synthorg/engine/react_loop.py (1)
  • ReactLoop (61-352)

@Aureliolo Aureliolo force-pushed the feat/execution-loop-auto-select branch from 93f65f4 to e891cb7 Compare March 19, 2026 08:02
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 19, 2026 08:03 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
.github/.trivyignore.yaml (1)

14-15: ⚠️ Potential issue | 🟠 Major

Narrow purls to the concrete vulnerable package version.

Good improvement adding purls, but Line 15 still suppresses all Alpine libexpat versions. Use a versioned PURL so the ignore remains minimal and doesn’t hide future unrelated findings.

🔧 Proposed tightening
-    purls:
-      - "pkg:apk/alpine/libexpat"
+    purls:
+      - "pkg:apk/alpine/libexpat@2.7.4"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/.trivyignore.yaml around lines 14 - 15, The purls entry currently
suppresses all Alpine libexpat versions by using "pkg:apk/alpine/libexpat";
narrow this to the concrete vulnerable package version by replacing that string
with a versioned PURL (e.g., "pkg:apk/alpine/libexpat@<version>" or include the
revision like "@<version>-r<rev>") that matches the exact version reported by
the scanner; ensure you follow the purl format used elsewhere so only the
specific vulnerable release is ignored rather than all libexpat packages.
src/synthorg/engine/agent_engine.py (1)

909-914: ⚠️ Potential issue | 🟠 Major

Persist the selected loop in checkpoint metadata.

This resume path re-runs auto-selection against the current budget/config state. With a valid config like hybrid_fallback="react", a task can checkpoint under ReactLoop and resume under PlanExecuteLoop after the budget threshold flips, which changes loop semantics mid-execution. Resume should rebuild from the loop type stored with the checkpoint instead of calling _resolve_loop(...) again here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 909 - 914, The resume path
currently re-runs loop auto-selection by calling _resolve_loop and then
_make_loop_with_callback, which allows the loop type to change between
checkpoint/save and resume; instead, persist the resolved loop identifier/type
into the checkpoint metadata when creating/saving a checkpoint and, in the
resume code path where checkpoint_ctx.task_execution is present, read that
stored loop type and reconstruct the same loop instance (use the stored loop id
to choose the loop factory instead of calling _resolve_loop); ensure
_make_loop_with_callback is invoked with the loop built from the stored loop
type so resumed executions retain the original loop semantics.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/.grype.yaml:
- Around line 10-14: The suppression for CVE-2026-32767 is currently global;
narrow it to the affected package by adding a package scope (e.g., set
package.name: libexpat, and optionally package.version/type/language) under the
CVE-2026-32767 entry so only libexpat matches, and consider adding namespace or
match-type/fix-state if needed; also add a nearby comment with an audit date to
remind reviewers there is no native expiration.

In @.github/.trivyignore.yaml:
- Line 16: The expired_at value is using an RFC3339 timestamp instead of the
date-only format Trivy expects; update the expired_at key value (expired_at)
from the timestamp string to a date-only string in YYYY-MM-DD format (e.g.,
"2026-06-17") so the .trivyignore rule is recognized correctly.

In `@src/synthorg/engine/agent_engine.py`:
- Around line 1013-1031: Determine the candidate loop type first without hitting
the budget API by calling select_loop_type with budget_utilization_pct=None
(using the same complexity, rules, budget_tight_threshold, hybrid_fallback and
default_loop_type from _auto_loop_config and task.estimated_complexity) to get a
preliminary_loop_type; only if preliminary_loop_type == "hybrid" and
self._budget_enforcer is not None, call await
self._budget_enforcer.get_budget_utilization_pct(), log
EXECUTION_LOOP_BUDGET_UNAVAILABLE if it returns None, then call select_loop_type
again with the obtained budget_utilization_pct to get the final loop_type;
reference functions/fields: select_loop_type, get_budget_utilization_pct,
self._budget_enforcer, EXECUTION_LOOP_BUDGET_UNAVAILABLE, _auto_loop_config,
task.estimated_complexity.

---

Duplicate comments:
In @.github/.trivyignore.yaml:
- Around line 14-15: The purls entry currently suppresses all Alpine libexpat
versions by using "pkg:apk/alpine/libexpat"; narrow this to the concrete
vulnerable package version by replacing that string with a versioned PURL (e.g.,
"pkg:apk/alpine/libexpat@<version>" or include the revision like
"@<version>-r<rev>") that matches the exact version reported by the scanner;
ensure you follow the purl format used elsewhere so only the specific vulnerable
release is ignored rather than all libexpat packages.

In `@src/synthorg/engine/agent_engine.py`:
- Around line 909-914: The resume path currently re-runs loop auto-selection by
calling _resolve_loop and then _make_loop_with_callback, which allows the loop
type to change between checkpoint/save and resume; instead, persist the resolved
loop identifier/type into the checkpoint metadata when creating/saving a
checkpoint and, in the resume code path where checkpoint_ctx.task_execution is
present, read that stored loop type and reconstruct the same loop instance (use
the stored loop id to choose the loop factory instead of calling _resolve_loop);
ensure _make_loop_with_callback is invoked with the loop built from the stored
loop type so resumed executions retain the original loop semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9ecf9e7a-ad8b-4d3b-8ecd-1176eeee76d9

📥 Commits

Reviewing files that changed from the base of the PR and between 93f65f4 and e891cb7.

📒 Files selected for processing (14)
  • .github/.grype.yaml
  • .github/.trivyignore.yaml
  • CLAUDE.md
  • README.md
  • docs/design/engine.md
  • src/synthorg/budget/enforcer.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/budget/test_enforcer.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • tests/unit/engine/test_loop_selector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) for exception handling — PEP 758 except syntax, enforced by ruff on Python 3.14.
All public functions require type hints — mypy strict mode enforced.
Docstrings must use Google style and are required on all public classes and functions — enforced by ruff D rules.

Files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/engine/__init__.py
  • tests/unit/budget/test_enforcer.py
  • src/synthorg/budget/enforcer.py
  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/agent_engine.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
  • src/synthorg/engine/loop_selector.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff).
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow them.
Validate at system boundaries (user input, external APIs, config files).
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/budget/enforcer.py
  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__).
Never use import logging / logging.getLogger() / print() in application code.
Always use logger as the variable name (not _logger, not log).
Event names must always use constants from the domain-specific module under synthorg.observability.events. Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs in logger calls: always logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately.
RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/engine/__init__.py
  • src/synthorg/budget/enforcer.py
  • src/synthorg/observability/events/execution.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/loop_selector.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Async tests use asyncio_mode = 'auto' — no manual @pytest.mark.asyncio needed.
Test timeout: 30 seconds per test.
Prefer @pytest.mark.parametrize for testing similar cases.
Use Hypothesis for property-based testing with @given + @settings. Hypothesis profiles: ci (50 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

Files:

  • tests/unit/budget/test_enforcer.py
  • tests/unit/engine/test_loop_selector.py
  • tests/unit/engine/test_agent_engine_auto_loop.py
🧠 Learnings (15)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/**/*.py : Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Security scanning: pip-audit (Python), npm audit (web dependencies), Trivy + Grype (Docker images), govulncheck (Go), gitleaks (secrets), zizmor (GitHub Actions workflows), OSSF Scorecard (supply chain), ZAP DAST (API), Socket.dev (typosquatting/malware detection).

Applied to files:

  • .github/.grype.yaml
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Docs source in docs/ (Markdown, built with Zensical). Design spec: docs/design/ (7 pages: index, agents, organization, communication, engine, memory, operations). Architecture: docs/architecture/. Roadmap: docs/roadmap/. Security: docs/security.md. Licensing: docs/licensing.md. Reference: docs/reference/. Custom templates: docs/overrides/.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Always read the relevant `docs/design/` page before implementing any feature or planning any issue. DESIGN_SPEC.md is a pointer file linking to the 7 design pages (index, agents, organization, communication, engine, memory, operations).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/observability/events/execution.py
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/synthorg/**/*.py : Event names must always use constants from the domain-specific module under synthorg.observability.events. Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/observability/events/execution.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • tests/unit/budget/test_enforcer.py
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (no parentheses) for exception handling — PEP 758 except syntax, enforced by ruff on Python 3.14.

Applied to files:

  • src/synthorg/budget/enforcer.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to tests/**/*.py : Test timeout: 30 seconds per test.

Applied to files:

  • tests/unit/engine/test_loop_selector.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/**/*.py : Validate at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/**/*.py : Functions must be less than 50 lines; files must be less than 800 lines.

Applied to files:

  • src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/**/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

  • src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

  • src/synthorg/engine/loop_selector.py
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Applies to src/**/*.py : Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/loop_selector.py
🧬 Code graph analysis (5)
src/synthorg/engine/__init__.py (2)
tests/unit/engine/conftest.py (1)
  • engine (449-460)
src/synthorg/engine/loop_selector.py (4)
  • AutoLoopConfig (94-180)
  • AutoLoopRule (53-74)
  • build_execution_loop (279-323)
  • select_loop_type (240-276)
tests/unit/budget/test_enforcer.py (4)
tests/unit/budget/test_enforcer_quota.py (2)
  • _make_budget_config (33-42)
  • _patch_periods (61-75)
tests/unit/engine/test_agent_engine_budget.py (1)
  • _make_budget_config (34-46)
tests/unit/budget/conftest.py (1)
  • make_cost_record (288-309)
src/synthorg/budget/enforcer.py (2)
  • cost_tracker (89-91)
  • get_budget_utilization_pct (93-123)
src/synthorg/engine/agent_engine.py (1)
src/synthorg/engine/loop_selector.py (3)
  • AutoLoopConfig (94-180)
  • build_execution_loop (279-323)
  • select_loop_type (240-276)
tests/unit/engine/test_agent_engine_auto_loop.py (4)
src/synthorg/budget/enforcer.py (2)
  • BudgetEnforcer (56-473)
  • cost_tracker (89-91)
src/synthorg/engine/agent_engine.py (3)
  • AgentEngine (130-1296)
  • run (287-397)
  • _execute_resumed_loop (886-926)
src/synthorg/engine/loop_selector.py (1)
  • AutoLoopConfig (94-180)
src/synthorg/engine/react_loop.py (1)
  • ReactLoop (61-352)
src/synthorg/engine/loop_selector.py (4)
src/synthorg/engine/plan_execute_loop.py (1)
  • PlanExecuteLoop (84-930)
src/synthorg/engine/react_loop.py (1)
  • ReactLoop (61-352)
src/synthorg/engine/loop_protocol.py (1)
  • ExecutionLoop (158-196)
src/synthorg/engine/stagnation/protocol.py (1)
  • StagnationDetector (15-46)
🪛 markdownlint-cli2 (0.21.0)
docs/design/engine.md

[warning] 421-421: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)

🔇 Additional comments (12)
src/synthorg/observability/events/execution.py (1)

75-82: LGTM!

The new execution loop auto-selection event constants follow the established naming convention (execution.loop.* namespace) and are properly typed with Final[str]. The grouping comment clearly identifies their purpose.

README.md (1)

35-35: LGTM!

The README update accurately reflects the new auto-selection capability alongside the existing ReAct and Plan-and-Execute loops.

CLAUDE.md (1)

127-127: LGTM!

The package structure documentation accurately captures all new loop selector components and their behavior (complexity-based routing, budget-aware downgrade, hybrid fallback, and configurable defaults).

src/synthorg/observability/events/budget.py (1)

34-36: LGTM!

The new budget utilization event constants follow the established naming convention and are consistent with the existing budget event patterns.

src/synthorg/engine/__init__.py (2)

121-127: LGTM!

The new loop selector API is properly imported and re-exported, maintaining the package's public interface consistency.


213-238: LGTM!

The __all__ list is correctly updated with the new exports in alphabetical order.

tests/unit/budget/test_enforcer.py (1)

1088-1161: LGTM!

Comprehensive test coverage for get_budget_utilization_pct() including:

  • Normal percentage calculation
  • Disabled budget (total_monthly=0) returning None
  • Zero spend returning 0.0%
  • Over-budget scenario returning >100%
  • Graceful degradation on tracker failure
  • MemoryError propagation (correctly not swallowed)

The tests follow project conventions with proper markers and use pytest.approx for float comparisons.

docs/design/engine.md (2)

417-433: LGTM!

The "Auto-selection" documentation clearly explains the three-layer selection flow:

  1. Rule matching with default_loop_type fallback
  2. Budget-aware downgrade when utilization ≥ threshold
  3. Hybrid fallback when hybrid loop is unavailable

This matches the implementation in select_loop_type().


479-486: LGTM!

The new "Resolve execution loop" step (step 8) accurately documents the per-task loop resolution flow, including budget utilization query and the various fallback behaviors.

src/synthorg/budget/enforcer.py (2)

93-123: LGTM!

The new get_budget_utilization_pct() method is well-implemented:

  • Correctly guards against division by zero with the total_monthly <= 0 early return
  • Uses PEP 758 except syntax (except MemoryError, RecursionError:)
  • Implements graceful degradation (returns None on failure, logs exception)
  • Properly propagates critical errors (MemoryError, RecursionError)
  • Structured logging with domain-specific event constants

34-36: LGTM!

New budget utilization event constant imports properly added alongside existing budget event imports.

src/synthorg/engine/loop_selector.py (1)

133-180: Good fail-fast validation boundary.

Rejecting unknown and unbuildable loop configs in AutoLoopConfig keeps bad selector setups from escaping into runtime failures in build_execution_loop().

Based on learnings, "Validate at system boundaries (user input, external APIs, config files)."

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 318-325: The auto-selection event currently omits agent/task
identifiers making it hard to correlate under concurrency; either (A) add
agent_id and task_id to the EXECUTION_LOOP_AUTO_SELECTED log by passing them
into _resolve_loop() so that _resolve_loop() includes agent_id and task_id in
its event emission (update the _resolve_loop signature and any callers
accordingly), or (B) emit a separate logger.info immediately after determining
loop_mode (the block using loop_mode and EXECUTION_ENGINE_START) that logs
EXECUTION_LOOP_AUTO_SELECTED with agent_id and task_id and the resolved loop
value; update references to _auto_loop_config, loop_mode,
EXECUTION_LOOP_AUTO_SELECTED and _resolve_loop() to ensure the identifiers are
included in the emitted event.
- Around line 998-1057: The _resolve_loop method mixes loop-selection logic,
budget I/O, logging, and loop construction and should be extracted into a
standalone resolver function in loop_selector.py; move the core logic that calls
select_loop_type (both preliminary and final), awaits
self._budget_enforcer.get_budget_utilization_pct(), logs the
EXECUTION_LOOP_AUTO_SELECTED/EXECUTION_LOOP_BUDGET_UNAVAILABLE events, and
returns the result of build_execution_loop into a new function (e.g.,
resolve_execution_loop(cfg, task, budget_enforcer, approval_gate,
stagnation_detector)) and have AgentEngine._resolve_loop delegate to it while
preserving use of self._auto_loop_config, self._budget_enforcer,
self._approval_gate, and self._stagnation_detector so tests can import and
exercise the selector in isolation.
- Around line 184-197: In AgentEngine.__init__, besides the mutual-exclusion
check for execution_loop and auto_loop_config, validate the provided
auto_loop_config (self._auto_loop_config / auto_loop_config) immediately to
ensure it cannot resolve to an unbuildable loop type (specifically avoid any
reachable "hybrid" selection); if validation fails, log via
EXECUTION_ENGINE_ERROR with a clear reason and raise ValueError so invalid
configs fail fast (this is the same validation that must prevent the later
failure when the engine attempts to build the loop around lines ~1053-1057).
Implement or call a helper validator that enumerates reachable auto-selected
loop types from auto_loop_config, rejects any disallowed type (e.g., "hybrid"),
and surface a descriptive error during construction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: fbfc3ef4-983e-48b4-a606-7aa2fa3d67c7

📥 Commits

Reviewing files that changed from the base of the PR and between e891cb7 and e65c05b.

📒 Files selected for processing (3)
  • .github/.grype.yaml
  • .github/.trivyignore.yaml
  • src/synthorg/engine/agent_engine.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) for exception handling — PEP 758 except syntax, enforced by ruff on Python 3.14.
All public functions require type hints — mypy strict mode enforced.
Docstrings must use Google style and are required on all public classes and functions — enforced by ruff D rules.

Files:

  • src/synthorg/engine/agent_engine.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff).
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow them.
Validate at system boundaries (user input, external APIs, config files).
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/engine/agent_engine.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__).
Never use import logging / logging.getLogger() / print() in application code.
Always use logger as the variable name (not _logger, not log).
Event names must always use constants from the domain-specific module under synthorg.observability.events. Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs in logger calls: always logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately.
RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Files:

  • src/synthorg/engine/agent_engine.py
🧠 Learnings (1)
📚 Learning: 2026-03-19T07:09:59.660Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:09:59.660Z
Learning: Security scanning: pip-audit (Python), npm audit (web dependencies), Trivy + Grype (Docker images), govulncheck (Go), gitleaks (secrets), zizmor (GitHub Actions workflows), OSSF Scorecard (supply chain), ZAP DAST (API), Socket.dev (typosquatting/malware detection).

Applied to files:

  • .github/.grype.yaml
🧬 Code graph analysis (1)
src/synthorg/engine/agent_engine.py (2)
src/synthorg/engine/loop_selector.py (3)
  • AutoLoopConfig (94-180)
  • build_execution_loop (279-323)
  • select_loop_type (240-276)
src/synthorg/budget/enforcer.py (1)
  • get_budget_utilization_pct (93-123)
🔇 Additional comments (2)
.github/.trivyignore.yaml (1)

13-22: Scoped, time-bounded Trivy suppression looks good.

This ignore rule is now properly constrained with purls and expired_at, and the rationale is clear.

.github/.grype.yaml (1)

10-18: Grype suppression scoping and audit note are well implemented.

Package-level scoping plus the explicit review reminder is a solid, low-risk configuration update.

Comment on lines +184 to +197
auto_loop_config: AutoLoopConfig | None = None,
) -> None:
if execution_loop is not None and auto_loop_config is not None:
msg = "execution_loop and auto_loop_config are mutually exclusive"
logger.warning(
EXECUTION_ENGINE_ERROR,
reason=msg,
)
raise ValueError(msg)
self._provider = provider
self._approval_store = approval_store
self._parked_context_repo = parked_context_repo
self._stagnation_detector = stagnation_detector
self._auto_loop_config = auto_loop_config
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject unbuildable auto-loop configs at construction.

Right now __init__ only enforces mutual exclusivity. A config that can still resolve to hybrid will instantiate successfully and then fail later when Line 1053 tries to build that loop, so the engine accepts bad configuration and surfaces it only on live traffic. Fail fast here by validating the reachable auto-selected loop types during initialization.

As per coding guidelines, Validate at system boundaries (user input, external APIs, config files).

Also applies to: 1053-1057

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 184 - 197, In
AgentEngine.__init__, besides the mutual-exclusion check for execution_loop and
auto_loop_config, validate the provided auto_loop_config (self._auto_loop_config
/ auto_loop_config) immediately to ensure it cannot resolve to an unbuildable
loop type (specifically avoid any reachable "hybrid" selection); if validation
fails, log via EXECUTION_ENGINE_ERROR with a clear reason and raise ValueError
so invalid configs fail fast (this is the same validation that must prevent the
later failure when the engine attempts to build the loop around lines
~1053-1057). Implement or call a helper validator that enumerates reachable
auto-selected loop types from auto_loop_config, rejects any disallowed type
(e.g., "hybrid"), and surface a descriptive error during construction.

Comment on lines +998 to +1057
async def _resolve_loop(self, task: Task) -> ExecutionLoop:
"""Select the execution loop for a task.

When ``auto_loop_config`` is set, selects the loop based on
task complexity and optional budget state. Otherwise returns
the statically configured loop (``self._loop``).

Note: auto-selected loops use default ``PlanExecuteConfig``
and do not receive a compaction callback. Provide an
``execution_loop`` directly for custom plan-execute config
or compaction.
"""
if self._auto_loop_config is None:
return self._loop

cfg = self._auto_loop_config
# Dry-run without budget and without hybrid fallback to see the
# raw rule result. Only query budget when "hybrid" is the raw
# match (budget downgrade applies before hybrid fallback).
preliminary = select_loop_type(
complexity=task.estimated_complexity,
rules=cfg.rules,
budget_utilization_pct=None,
budget_tight_threshold=cfg.budget_tight_threshold,
hybrid_fallback=None,
default_loop_type=cfg.default_loop_type,
)

budget_utilization_pct: float | None = None
if preliminary == "hybrid" and self._budget_enforcer is not None:
budget_utilization_pct = (
await self._budget_enforcer.get_budget_utilization_pct()
)
if budget_utilization_pct is None:
logger.debug(
EXECUTION_LOOP_BUDGET_UNAVAILABLE,
note="budget utilization unknown; skipping budget-aware downgrade",
)

loop_type = select_loop_type(
complexity=task.estimated_complexity,
rules=cfg.rules,
budget_utilization_pct=budget_utilization_pct,
budget_tight_threshold=cfg.budget_tight_threshold,
hybrid_fallback=cfg.hybrid_fallback,
default_loop_type=cfg.default_loop_type,
)

logger.info(
EXECUTION_LOOP_AUTO_SELECTED,
complexity=task.estimated_complexity.value,
selected_loop=loop_type,
budget_utilization_pct=budget_utilization_pct,
)

return build_execution_loop(
loop_type,
approval_gate=self._approval_gate,
stagnation_detector=self._stagnation_detector,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Extract _resolve_loop() into a dedicated resolver.

This helper now mixes rule evaluation, budget I/O, observability, and loop construction inside a ~60-line method on a ~1.3k-line class. Moving it beside src/synthorg/engine/loop_selector.py would keep AgentEngine within the repo’s size limits and make the auto-selection path easier to test in isolation.

As per coding guidelines, Functions must be less than 50 lines; files must be less than 800 lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 998 - 1057, The
_resolve_loop method mixes loop-selection logic, budget I/O, logging, and loop
construction and should be extracted into a standalone resolver function in
loop_selector.py; move the core logic that calls select_loop_type (both
preliminary and final), awaits
self._budget_enforcer.get_budget_utilization_pct(), logs the
EXECUTION_LOOP_AUTO_SELECTED/EXECUTION_LOOP_BUDGET_UNAVAILABLE events, and
returns the result of build_execution_loop into a new function (e.g.,
resolve_execution_loop(cfg, task, budget_enforcer, approval_gate,
stagnation_detector)) and have AgentEngine._resolve_loop delegate to it while
preserving use of self._auto_loop_config, self._budget_enforcer,
self._approval_gate, and self._stagnation_detector so tests can import and
exercise the selector in isolation.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/engine/agent_engine.py (1)

1-10: 🧹 Nitpick | 🔵 Trivial

File exceeds 800-line limit (currently ~1318 lines).

The file significantly exceeds the 800-line limit per coding guidelines. Consider extracting cohesive groups of functionality:

  • Auto-loop resolution logic → loop_selector.py (as previously suggested)
  • Checkpoint resume methods (_resume_from_checkpoint, _reconstruct_and_run_resume, _execute_resumed_loop, _finalize_resume) → a dedicated resume.py or similar
  • Error handling methods (_handle_budget_error, _handle_fatal_error, _build_error_execution) → error_handlers.py

This would improve maintainability and testability while bringing the file within guidelines.

As per coding guidelines, Functions must be less than 50 lines; files must be less than 800 lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 1 - 10, The file
src/synthorg/engine/agent_engine.py exceeds the 800-line limit and must be split
into smaller modules; extract the auto-loop resolution logic into a new
loop_selector.py, move checkpoint resume methods (_resume_from_checkpoint,
_reconstruct_and_run_resume, _execute_resumed_loop, _finalize_resume) into a
resume.py, and move error handling methods (_handle_budget_error,
_handle_fatal_error, _build_error_execution) into error_handlers.py; update
imports in agent_engine.py to reference the new modules and ensure all moved
functions keep their signatures and any shared helper dependencies are either
moved or imported so agent_engine.run() and its callers continue to work without
behavior changes.
♻️ Duplicate comments (1)
src/synthorg/engine/agent_engine.py (1)

1000-1066: 🧹 Nitpick | 🔵 Trivial

Function exceeds 50-line limit — consider extraction to loop_selector.py.

_resolve_loop is approximately 55-60 lines of code (excluding docstring), slightly exceeding the 50-line function limit. A past review suggested extracting this logic into loop_selector.py alongside the other selection functions. This would:

  1. Keep AgentEngine within file size limits.
  2. Make the auto-selection logic testable in isolation.
  3. Improve cohesion by co-locating all loop selection logic.

The implementation itself is correct: the preliminary check optimizes budget lookups, and the logging includes the required identifiers.

As per coding guidelines, Functions must be less than 50 lines; files must be less than 800 lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 1000 - 1066, _extract the
auto-selection logic from AgentEngine._resolve_loop into a new helper module
loop_selector.py: move the preliminary select_loop_type budget-dry-run, the
conditional budget_utilization_pct fetch (using self._budget_enforcer), the
second select_loop_type call, and the logger.info(EXECUTION_LOOP_AUTO_SELECTED)
into a single exported function (e.g., select_and_build_execution_loop) that
accepts the task (or task.estimated_complexity), cfg (self._auto_loop_config),
agent_id, task_id, budget_enforcer, approval_gate, and stagnation_detector and
returns the built ExecutionLoop via build_execution_loop; then simplify
AgentEngine._resolve_loop to return self._loop when _auto_loop_config is None or
delegate to the new helper when present. Ensure you reference and reuse existing
symbols: select_loop_type, build_execution_loop, EXECUTION_LOOP_AUTO_SELECTED,
self._auto_loop_config, self._budget_enforcer, self._approval_gate, and
self._stagnation_detector.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 909-916: The resume logic currently calls _resolve_loop during
resume (using checkpoint_ctx.task_execution.task, agent_id, task_id) which can
yield a different loop type if budget/utilization changed; instead persist the
originally selected loop type in the checkpoint metadata when creating/saving a
checkpoint and use that stored loop type on resume to reconstruct base_loop
(fall back to calling _resolve_loop only if the persisted loop type is missing
or invalid). Update the checkpoint save/load paths to include the loop type,
then modify the resume path that computes base_loop (checkpoint_ctx,
_resolve_loop, _make_loop_with_callback, base_loop, task_execution, task,
agent_id, task_id) to prefer the persisted loop type and only recompute as a
last resort.
- Around line 630-633: The parameter named loop is reassigned with
self._make_loop_with_callback(loop, agent_id, task_id) which shadows the
original parameter; rename the reassigned variable (e.g., wrapped_loop) to avoid
confusion and improve readability. Update the assignment where loop =
self._make_loop_with_callback(...) to wrapped_loop =
self._make_loop_with_callback(...), and then update subsequent uses (such as
wrapped_loop.execute(...) that currently use coro = loop.execute(...)) to
reference wrapped_loop; ensure references to the original parameter name are
preserved if needed elsewhere in the function.

---

Outside diff comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 1-10: The file src/synthorg/engine/agent_engine.py exceeds the
800-line limit and must be split into smaller modules; extract the auto-loop
resolution logic into a new loop_selector.py, move checkpoint resume methods
(_resume_from_checkpoint, _reconstruct_and_run_resume, _execute_resumed_loop,
_finalize_resume) into a resume.py, and move error handling methods
(_handle_budget_error, _handle_fatal_error, _build_error_execution) into
error_handlers.py; update imports in agent_engine.py to reference the new
modules and ensure all moved functions keep their signatures and any shared
helper dependencies are either moved or imported so agent_engine.run() and its
callers continue to work without behavior changes.

---

Duplicate comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 1000-1066: _extract the auto-selection logic from
AgentEngine._resolve_loop into a new helper module loop_selector.py: move the
preliminary select_loop_type budget-dry-run, the conditional
budget_utilization_pct fetch (using self._budget_enforcer), the second
select_loop_type call, and the logger.info(EXECUTION_LOOP_AUTO_SELECTED) into a
single exported function (e.g., select_and_build_execution_loop) that accepts
the task (or task.estimated_complexity), cfg (self._auto_loop_config), agent_id,
task_id, budget_enforcer, approval_gate, and stagnation_detector and returns the
built ExecutionLoop via build_execution_loop; then simplify
AgentEngine._resolve_loop to return self._loop when _auto_loop_config is None or
delegate to the new helper when present. Ensure you reference and reuse existing
symbols: select_loop_type, build_execution_loop, EXECUTION_LOOP_AUTO_SELECTED,
self._auto_loop_config, self._budget_enforcer, self._approval_gate, and
self._stagnation_detector.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7a5d58c1-deb6-4c41-8eb1-21a827103f90

📥 Commits

Reviewing files that changed from the base of the PR and between e65c05b and 98b51c3.

📒 Files selected for processing (1)
  • src/synthorg/engine/agent_engine.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) for exception handling — PEP 758 except syntax, enforced by ruff on Python 3.14.
All public functions require type hints — mypy strict mode enforced.
Docstrings must use Google style and are required on all public classes and functions — enforced by ruff D rules.

Files:

  • src/synthorg/engine/agent_engine.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff).
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow them.
Validate at system boundaries (user input, external APIs, config files).
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/engine/agent_engine.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__).
Never use import logging / logging.getLogger() / print() in application code.
Always use logger as the variable name (not _logger, not log).
Event names must always use constants from the domain-specific module under synthorg.observability.events. Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs in logger calls: always logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately.
RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Files:

  • src/synthorg/engine/agent_engine.py
🧠 Learnings (1)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/agent_engine.py
🔇 Additional comments (6)
src/synthorg/engine/agent_engine.py (6)

38-42: LGTM!

Imports for the new auto-loop selection components are correctly added.


244-255: LGTM!

The loop_type logging correctly distinguishes between auto mode and static loop configuration. DEBUG level is appropriate for engine creation events.


318-327: LGTM!

Loop mode determination and logging at execution start is correctly implemented with proper INFO-level observability.


431-443: LGTM!

Per-task loop resolution is correctly integrated into the execution flow, enabling task complexity-based loop selection.


597-611: LGTM!

The explicit loop parameter enables both static and auto-selected loops to receive checkpoint callbacks.


184-197: AutoLoopConfig already validates unbuildable loop types at construction—no additional validation needed.

AutoLoopConfig's @model_validator (loop_selector.py:133–187) comprehensively validates that configs cannot resolve to unbuildable loop types:

  • If rules contain unbuildable types (e.g., "hybrid"), hybrid_fallback must be non-None and buildable
  • If default_loop_type="hybrid", it must have a buildable hybrid_fallback
  • These constraints ensure _match_loop_type() never returns an unbuildable type

Validation occurs at construction, so any invalid config fails immediately. The code at lines 184–197 simply stores an already-validated instance.

Comment on lines +909 to +916
base_loop = self._loop
if checkpoint_ctx.task_execution is not None:
base_loop = await self._resolve_loop(
checkpoint_ctx.task_execution.task,
agent_id,
task_id,
)
loop = self._make_loop_with_callback(base_loop, agent_id, task_id)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Resume path now resolves loop from task — note potential inconsistency.

This correctly addresses the past review comment about resume bypassing auto-loop resolution. However, if budget utilization changes between the original execution and resume, the re-resolved loop type may differ from what was originally selected. For deterministic resume behavior, consider persisting the originally selected loop type in checkpoint metadata rather than recomputing it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 909 - 916, The resume logic
currently calls _resolve_loop during resume (using
checkpoint_ctx.task_execution.task, agent_id, task_id) which can yield a
different loop type if budget/utilization changed; instead persist the
originally selected loop type in the checkpoint metadata when creating/saving a
checkpoint and use that stored loop type on resume to reconstruct base_loop
(fall back to calling _resolve_loop only if the persisted loop type is missing
or invalid). Update the checkpoint save/load paths to include the loop type,
then modify the resume path that computes base_loop (checkpoint_ctx,
_resolve_loop, _make_loop_with_callback, base_loop, task_execution, task,
agent_id, task_id) to prefer the persisted loop type and only recompute as a
last resort.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 19, 2026 08:49 — with GitHub Actions Inactive
Aureliolo and others added 11 commits March 19, 2026 09:53
…omplexity

Add automatic execution loop selection that inspects task
estimated_complexity and optional budget state to choose the
optimal loop per task:

- simple -> ReAct
- medium -> Plan-and-Execute
- complex/epic -> Hybrid (falls back to Plan-and-Execute until
  HybridLoop is implemented)

Budget-aware: when monthly utilization >= threshold, complex tasks
are downgraded from Hybrid to Plan-and-Execute to conserve budget.

New modules:
- loop_selector.py: AutoLoopConfig, AutoLoopRule, select_loop_type(),
  build_execution_loop()
- BudgetEnforcer.get_budget_utilization_pct() for budget state queries

AgentEngine accepts auto_loop_config (mutually exclusive with
execution_loop) and resolves the loop per-task in _execute().

Closes #200

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ction

Pre-reviewed by 11 agents, 16 findings addressed:

- Add NotBlankStr for loop_type and hybrid_fallback fields
- Add uniqueness validator for complexities in AutoLoopConfig.rules
- Add import-time completeness guard on DEFAULT_AUTO_LOOP_RULES
- Add warning log before ValueError raises in build_execution_loop
  and AgentEngine.__init__
- Fix EXECUTION_ENGINE_CREATED log to show "auto" when auto_loop_config set
- Add budget-unavailable warning in _resolve_loop
- Add no-rule-match warning in select_loop_type
- Use next() idiom instead of for-loop + break
- Update module docstring to describe budget-downgrade layer
- Add MemoryError re-raise test for get_budget_utilization_pct
- Add validation boundary tests for AutoLoopConfig
- Update CLAUDE.md Package Structure with loop_selector.py
- Update docs/design/engine.md auto-selection tip with 3-layer logic
- Add loop resolution step to AgentEngine pipeline docs
- Update README.md with auto-selection mention

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… and Gemini

- Fix resume path to call _resolve_loop instead of using static self._loop (#1)
- Validate loop_type/hybrid_fallback against _KNOWN_LOOP_TYPES at config time (#3)
- Fix redundant any() scan producing false-positive NO_RULE_MATCH warning (#4)
- Downgrade EXECUTION_LOOP_BUDGET_UNAVAILABLE to DEBUG to avoid log noise (#5)
- Add auto_loop_config to AgentEngine class docstring (#6)
- Reduce enforcer.py to 799 lines (was 806, limit 800) (#7)
- Fix select_loop_type Returns docstring accuracy (#8)
- Fix build_execution_loop docstring to mention hybrid (#9)
- Add EXECUTION_LOOP_BUDGET_UNAVAILABLE assertion in budget-error test (#10)
- Add resume path test for _resolve_loop (#11)
- Add test: rule mapping to react does not trigger NO_RULE_MATCH (#12)
- Add _resolve_loop docstring note about compaction/plan_execute_config (#13)
- Update module docstring to mention AutoLoopConfig/AutoLoopRule (#14)
- Simplify verbose log note string (#15)
- Add configurable default_loop_type to AutoLoopConfig (Gemini enhancement)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…election

- Reject unbuildable loop configs at construction: hybrid_fallback=None
  with hybrid rules, unbuildable default_loop_type/hybrid_fallback
- Add _BUILDABLE_LOOP_TYPES set (react, plan_execute) for validation
- Rewrite resume path test to exercise _execute_resumed_loop via
  mocked _resolve_loop (verifies wiring, not just direct call)
- 5 new tests for buildability validation

Assessed and skipped (not needed):
- engine.md admonition formatting: 4-space indent is correct MkDocs
  admonition syntax; markdownlint MD046 is a false positive
- Persist loop type in checkpoint: requires AgentContext schema change;
  current _resolve_loop approach gives consistent results for same task
  complexity (acceptable interim)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CVE-2026-32767 is a SiYuan Note authorization bypass (CWE-863,
arbitrary SQL execution), not a libexpat vulnerability. Trivy
incorrectly maps this CVE to libexpat 2.7.4 in the web image.
Our nginx-unprivileged image serves static files and does not
run SiYuan or any SQL database.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…election

- Split select_loop_type into three private helpers (_match_loop_type,
  _downgrade_for_budget, _apply_hybrid_fallback) to satisfy <50 line limit
- Fix validation: default_loop_type="hybrid" is now accepted when
  hybrid_fallback redirects to a buildable type (was incorrectly rejected)
- Add _BUILDABLE_LOOP_TYPES validation for hybrid_fallback (must be buildable
  since it is the redirect target, not the source)
- Resume test now verifies resolved_loop.execute was actually awaited, not
  just that _resolve_loop was called
- Trivyignore: add paths scope (pkg:apk/alpine/libexpat) and expired_at
  (90 days) for CVE-2026-32767 suppression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ore purls

- AutoLoopRule and AutoLoopConfig: add extra="forbid" to reject typos
- AutoLoopRule: field_validator on loop_type checks _KNOWN_LOOP_TYPES
  at rule construction (catches typos before reaching AutoLoopConfig)
- .trivyignore.yaml: fix paths -> purls for PURL-scoped suppression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- _resolve_loop: skip budget API call unless raw rule match is "hybrid"
  (dry-run with hybrid_fallback=None to see pre-fallback result)
- .grype.yaml: scope CVE-2026-32767 to package libexpat/apk + audit date
- .trivyignore.yaml: fix expired_at from RFC3339 to YYYY-MM-DD date format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
EXECUTION_LOOP_AUTO_SELECTED now includes agent_id and task_id for
log correlation under concurrency. _resolve_loop accepts optional
agent_id/task_id params, threaded from _execute and resume path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…h_timeout

Avoids parameter shadowing where the loop param was reassigned after
wrapping with checkpoint callback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Trivy's Go time parser requires full RFC3339 timestamps
(2026-06-17T00:00:00Z), not date-only strings. The previous
round's change to YYYY-MM-DD broke all three Docker image scans.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/synthorg/engine/agent_engine.py (2)

909-916: ⚠️ Potential issue | 🟠 Major

Persist the selected loop type in checkpoint metadata.

Line 911 re-runs auto-selection from the current task/budget state. Because select_loop_type() downgrades for budget before applying hybrid_fallback, a checkpoint can resume under a different concrete loop than the one that created it. Prefer rebuilding base_loop from a loop type stored with the checkpoint, and only fall back to _resolve_loop() for older checkpoints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 909 - 916, The checkpoint
resume logic should use the loop type that was recorded when the checkpoint was
created instead of re-running select_loop_type() (which can downgrade due to
budget/hybrid_fallback); update the checkpoint writer to persist the chosen loop
type (e.g., a checkpoint_ctx.loop_type or similar metadata field) when creating
checkpoints, and modify the resume path in agent_engine.py (the block that sets
base_loop and calls _resolve_loop/_make_loop_with_callback) to reconstruct
base_loop from that persisted loop type first, only calling
_resolve_loop(agent_task, agent_id, task_id) for older checkpoints that lack the
stored loop_type; ensure the code references the persisted field name
consistently and retains the existing _make_loop_with_callback usage.

1000-1066: 🛠️ Refactor suggestion | 🟠 Major

Extract _resolve_loop() into a dedicated resolver.

This helper is already 60+ lines inside a 1.3k-line class, and it now mixes rule evaluation, budget I/O, observability, and loop construction. Moving it beside loop_selector.py would bring this change back inside the repo’s size limits and make the auto-selection path easier to test in isolation. As per coding guidelines, Functions must be less than 50 lines; files must be less than 800 lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 1000 - 1066, The
_resolve_loop method mixes rule evaluation, budget I/O, logging, and loop
construction inside a large class; extract it into a dedicated resolver function
(e.g., resolve_execution_loop) placed next to loop_selector.py. Move the logic
that calls select_loop_type twice, queries
self._budget_enforcer.get_budget_utilization_pct(), logs using
EXECUTION_LOOP_AUTO_SELECTED/EXECUTION_LOOP_BUDGET_UNAVAILABLE, and returns
build_execution_loop into the new resolver; make it accept the Task (or its
estimated_complexity), the auto-loop config (cfg), budget_enforcer,
approval_gate, stagnation_detector, and agent_id/task_id as parameters so no
class state is referenced directly. Replace the original _resolve_loop body with
a thin wrapper that forwards the right attributes to the new resolver. Ensure
you preserve behavior (including hybrid/dry-run semantics, budget None handling,
and the same log fields) and add unit tests for select_loop_type interactions
and budget-unavailable branching.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 909-916: The checkpoint resume logic should use the loop type that
was recorded when the checkpoint was created instead of re-running
select_loop_type() (which can downgrade due to budget/hybrid_fallback); update
the checkpoint writer to persist the chosen loop type (e.g., a
checkpoint_ctx.loop_type or similar metadata field) when creating checkpoints,
and modify the resume path in agent_engine.py (the block that sets base_loop and
calls _resolve_loop/_make_loop_with_callback) to reconstruct base_loop from that
persisted loop type first, only calling _resolve_loop(agent_task, agent_id,
task_id) for older checkpoints that lack the stored loop_type; ensure the code
references the persisted field name consistently and retains the existing
_make_loop_with_callback usage.
- Around line 1000-1066: The _resolve_loop method mixes rule evaluation, budget
I/O, logging, and loop construction inside a large class; extract it into a
dedicated resolver function (e.g., resolve_execution_loop) placed next to
loop_selector.py. Move the logic that calls select_loop_type twice, queries
self._budget_enforcer.get_budget_utilization_pct(), logs using
EXECUTION_LOOP_AUTO_SELECTED/EXECUTION_LOOP_BUDGET_UNAVAILABLE, and returns
build_execution_loop into the new resolver; make it accept the Task (or its
estimated_complexity), the auto-loop config (cfg), budget_enforcer,
approval_gate, stagnation_detector, and agent_id/task_id as parameters so no
class state is referenced directly. Replace the original _resolve_loop body with
a thin wrapper that forwards the right attributes to the new resolver. Ensure
you preserve behavior (including hybrid/dry-run semantics, budget None handling,
and the same log fields) and add unit tests for select_loop_type interactions
and budget-unavailable branching.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5f71b15-2ccf-4a5e-bb6c-ffe2c68f8955

📥 Commits

Reviewing files that changed from the base of the PR and between 98b51c3 and cf33c78.

📒 Files selected for processing (1)
  • src/synthorg/engine/agent_engine.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) for exception handling — PEP 758 except syntax, enforced by ruff on Python 3.14.
All public functions require type hints — mypy strict mode enforced.
Docstrings must use Google style and are required on all public classes and functions — enforced by ruff D rules.

Files:

  • src/synthorg/engine/agent_engine.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff).
Functions must be less than 50 lines; files must be less than 800 lines.
Handle errors explicitly, never silently swallow them.
Validate at system boundaries (user input, external APIs, config files).
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/engine/agent_engine.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__).
Never use import logging / logging.getLogger() / print() in application code.
Always use logger as the variable name (not _logger, not log).
Event names must always use constants from the domain-specific module under synthorg.observability.events. Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Use structured kwargs in logger calls: always logger.info(EVENT, key=value) — never logger.info("msg %s", val).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO level.
DEBUG logging for object creation, internal flow, entry/exit of key functions.
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True) include: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately.
RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Files:

  • src/synthorg/engine/agent_engine.py
🧠 Learnings (1)
📚 Learning: 2026-03-15T11:48:14.867Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T11:48:14.867Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/agent_engine.py
🧬 Code graph analysis (1)
src/synthorg/engine/agent_engine.py (3)
src/synthorg/engine/loop_selector.py (3)
  • AutoLoopConfig (94-180)
  • build_execution_loop (279-323)
  • select_loop_type (240-276)
src/synthorg/engine/loop_protocol.py (3)
  • get_loop_type (194-196)
  • ExecutionLoop (158-196)
  • execute (166-192)
src/synthorg/budget/enforcer.py (1)
  • get_budget_utilization_pct (93-123)
🔇 Additional comments (1)
src/synthorg/engine/agent_engine.py (1)

431-434: Per-task loop resolution is wired correctly.

Resolving the loop inside _execute() and threading it through _run_loop_with_timeout() keeps auto mode on the selected loop instead of falling back to self._loop.

@Aureliolo Aureliolo force-pushed the feat/execution-loop-auto-select branch from cf33c78 to 2c232a3 Compare March 19, 2026 08:56
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 19, 2026 08:58 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit 5bfc2c6 into main Mar 19, 2026
29 of 30 checks passed
@Aureliolo Aureliolo deleted the feat/execution-loop-auto-select branch March 19, 2026 09:00
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 19, 2026 09:00 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Mar 19, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.3.6](v0.3.5...v0.3.6)
(2026-03-19)


### Features

* **cli:** add backup subcommands (backup, backup list, backup restore)
([#568](#568))
([4c06b1d](4c06b1d))
* **engine:** implement execution loop auto-selection based on task
complexity ([#567](#567))
([5bfc2c6](5bfc2c6))


### Bug Fixes

* activate structured logging pipeline -- wire 8-sink system, integrate
Uvicorn, suppress spam
([#572](#572))
([9b6bf33](9b6bf33))
* **cli:** bump grpc-go v1.79.3 -- CVE-2026-33186 auth bypass
([#574](#574))
([f0171c9](f0171c9))
* resolve OpenAPI schema validation warnings for union/optional fields
([#558](#558))
([5d96b2b](5d96b2b))


### CI/CD

* bump codecov/codecov-action from 5.5.2 to 5.5.3 in the minor-and-patch
group ([#571](#571))
([267f685](267f685))
* ignore chainguard/python in Dependabot docker updates
([#575](#575))
([1935eaa](1935eaa))


### Maintenance

* bump the major group across 1 directory with 2 updates
([#570](#570))
([b98f82c](b98f82c))
* bump the minor-and-patch group across 2 directories with 4 updates
([#569](#569))
([3295168](3295168))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: implement execution loop auto-selection based on task complexity

1 participant