Skip to content

refactor: pre-PR review improvements for ExecutionLoop + ReAct loop (#124)#141

Merged
Aureliolo merged 3 commits intomainfrom
feat/execution-loop-protocol
Mar 6, 2026
Merged

refactor: pre-PR review improvements for ExecutionLoop + ReAct loop (#124)#141
Aureliolo merged 3 commits intomainfrom
feat/execution-loop-protocol

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • loop_protocol.py: Convert total_tool_calls and total_tokens to @computed_field, add model_validator for error_message/termination_reason cross-field consistency
  • react_loop.py: Decompose monolithic execute() into 5 focused helpers (_check_budget, _call_provider, _check_response_errors, _handle_completion, _execute_tool_calls), add comprehensive error handling for provider failures, budget checker exceptions, and tool execution errors
  • errors.py: Update docstrings to clarify relationship between loop-internal TerminationReason and engine-layer exceptions
  • DESIGN_SPEC.md: Add ExecutionLoop protocol interface docs to §6.5, update §15.3 project structure with loop_protocol.py and react_loop.py, fix diagram termination text ("blocked" → "error")
  • Tests: Add 6 new test cases covering provider exceptions, tool execution errors, MAX_TOKENS finish reason, TOOL_USE with empty calls, budget checker exceptions, and computed field validation

Closes #124

Test plan

  • uv run ruff check src/ tests/ — all checks passed
  • uv run ruff format src/ tests/ — all formatted
  • uv run mypy src/ tests/ — no issues in 215 files
  • uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80 — 1896 passed, 95.11% coverage

Review coverage

Pre-reviewed by 10 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency). 16 findings consolidated, all 16 implemented.

Aureliolo and others added 2 commits March 6, 2026 19:14
Implement the ExecutionLoop protocol and ReAct execution loop that
wires together the provider, tool system, and agent context into a
think-act-observe loop. This is the critical-path foundation for the
agent engine (#11).

New files:
- engine/loop_protocol.py: ExecutionLoop protocol, ExecutionResult,
  TurnRecord, TerminationReason, BudgetChecker
- engine/react_loop.py: ReactLoop implementation

Modified:
- engine/errors.py: BudgetExhaustedError, LoopExecutionError
- observability/events/execution.py: 7 loop event constants
- tools/invoker.py: registry property (read-only)
- engine/__init__.py: re-export new public API
- CLAUDE.md: add dependency-order planning, GH query, PR ref rules

Tests: 27 new tests (100% coverage on new code)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…124)

- Convert total_tool_calls and total_tokens to @computed_field
- Add model_validator for error_message/termination_reason consistency
- Decompose ReactLoop.execute() into focused helper methods
- Add comprehensive error handling (provider, budget, tool exceptions)
- Use logger.exception in except blocks (TRY400 compliance)
- Use PEP 758 except syntax where compatible with ruff
- Update DESIGN_SPEC.md §6.5 with ExecutionLoop protocol docs
- Update §15.3 project structure with loop_protocol.py and react_loop.py
- Add tests for new error paths and computed fields

Pre-reviewed by 10 agents, 16 findings addressed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 18:41
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 666a93e4-839a-4d74-822a-9efb8c69bdcf

📥 Commits

Reviewing files that changed from the base of the PR and between 62732ba and 6e739c7.

📒 Files selected for processing (5)
  • src/ai_company/engine/loop_protocol.py
  • src/ai_company/engine/react_loop.py
  • tests/unit/engine/conftest.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/engine/test_react_loop.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added ExecutionLoop protocol and ReactLoop for structured agent execution, per‑turn tracking (tokens, cost, tool usage), explicit termination reasons, new observability events, and a read‑only ToolInvoker registry.
  • Bug Fixes

    • Added specialized exceptions for budget exhaustion and execution failures to improve error reporting.
  • Tests

    • Comprehensive unit tests and test doubles for loop protocol and ReactLoop behaviors.
  • Documentation

    • Planning guidance (dependency-first prioritization) and GitHub workflow/query guidance.

Walkthrough

Adds an ExecutionLoop protocol and a ReAct implementation, supporting structured execution results, termination reasons, turn records, budget checking, observability events, new engine errors, a ToolInvoker registry accessor, and extensive unit tests validating loop behavior.

Changes

Cohort / File(s) Summary
Design spec
DESIGN_SPEC.md
Introduces ExecutionLoop abstraction and documents ReAct loop + termination update.
Engine public API
src/ai_company/engine/__init__.py
Re-exports new loop protocol, ReactLoop, errors, and types (BudgetChecker, ExecutionResult, TurnRecord, TerminationReason, LoopExecutionError, BudgetExhaustedError).
Loop protocol
src/ai_company/engine/loop_protocol.py
New ExecutionLoop protocol, TerminationReason enum, TurnRecord and ExecutionResult immutable models, BudgetChecker type alias and validation logic.
ReAct implementation
src/ai_company/engine/react_loop.py
New ReactLoop class implementing ExecutionLoop: async execute flow, budget checks, provider calls, tool invocation, turn recording, termination handling, logging/events, and error mapping.
Engine errors
src/ai_company/engine/errors.py
Adds BudgetExhaustedError and LoopExecutionError subclasses of EngineError for loop-term signaling.
Observability events
src/ai_company/observability/events/execution.py
Adds execution.loop.* event constants (start, turn_start, turn_complete, tool_calls, terminated, budget_exhausted, error).
Tool invoker
src/ai_company/tools/invoker.py
Adds read-only registry property to ToolInvoker exposing underlying ToolRegistry.
Tests – engine fixtures
tests/unit/engine/conftest.py
Adds MockCompletionProvider test double and fixture for provider construction.
Tests – unit
tests/unit/engine/test_loop_protocol.py, tests/unit/engine/test_react_loop.py, tests/unit/engine/test_errors.py
New comprehensive tests for loop protocol models, ReactLoop behavior (budget, tools, errors, turn accounting), and new engine errors.

Sequence Diagram(s)

mermaid
sequenceDiagram
actor Client
participant ReactLoop as "ReactLoop"
participant Provider as "CompletionProvider"
participant Tools as "ToolInvoker"
participant Budget as "BudgetChecker"
participant Observ as "Observability"
Client->>ReactLoop: execute(context, provider, tool_invoker?, budget_checker?)
ReactLoop->>Budget: check(context)
Budget-->>ReactLoop: allowed / exhausted
ReactLoop->>Provider: complete(messages, config)
Provider-->>ReactLoop: CompletionResponse
ReactLoop->>Observ: emit EXECUTION_LOOP_TURN_START
alt response requests tools
ReactLoop->>Tools: invoke(tool_calls)
Tools-->>ReactLoop: tool results
ReactLoop->>Observ: emit EXECUTION_LOOP_TOOL_CALLS
end
ReactLoop->>Observ: emit EXECUTION_LOOP_TURN_COMPLETE
ReactLoop-->>Client: ExecutionResult (context, termination_reason, turn_records)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 29.27% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: refactoring improvements to ExecutionLoop and ReAct loop implementations with focus on code quality and error handling.
Description check ✅ Passed The description clearly relates to the changeset, detailing specific improvements including computed field conversions, helper method decomposition, error handling, documentation updates, and test additions.
Linked Issues check ✅ Passed All major objectives from issue #124 are met: ExecutionLoop protocol is defined with standard operations (execute, get_loop_type), ReAct loop implements think→act→observe with error handling, protocol supports future architectures, and comprehensive unit tests cover all specified scenarios including max-turns, tool calls, and budget exhaustion.
Out of Scope Changes check ✅ Passed All changes remain directly within the scope of issue #124: protocol definition, ReAct implementation, error handling, documentation updates per DESIGN_SPEC §6.5, and testing. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/execution-loop-protocol

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the agent's execution loop, moving towards a more modular and robust design. By introducing a formal ExecutionLoop protocol and breaking down the ReAct loop into smaller, focused components, the system gains improved clarity and error management. The changes enhance the reliability of agent execution by explicitly handling various termination conditions and potential failures, while also updating documentation and adding comprehensive tests to support the new architecture.

Highlights

  • Execution Loop Protocol: Introduced a new ExecutionLoop protocol and supporting models (TerminationReason, TurnRecord, ExecutionResult, BudgetChecker) to standardize agent execution.
  • ReAct Loop Refactoring: The monolithic ReactLoop.execute() method was decomposed into five focused helper methods, significantly improving readability, maintainability, and testability.
  • Enhanced Error Handling: Comprehensive error handling was added for provider failures, budget checker exceptions, and tool execution errors, ensuring robust loop termination and clear error reporting.
  • New Error Types: Added BudgetExhaustedError and LoopExecutionError to the engine layer for converting internal loop termination reasons into explicit exceptions.
  • Documentation Updates: The DESIGN_SPEC.md was updated to include documentation for the ExecutionLoop protocol, reflect the new project structure, and correct diagram terminology.
  • New Test Cases: Six new test cases were added to cover various scenarios, including provider exceptions, tool execution errors, MAX_TOKENS finish reason, TOOL_USE with empty calls, budget checker exceptions, and computed field validation.
Changelog
  • CLAUDE.md
    • Added a guideline to prioritize issues by dependency order.
    • Added guidelines for GitHub issue queries and preserving PR issue references.
  • DESIGN_SPEC.md
    • Added a new section detailing the ExecutionLoop protocol interface and its supporting models.
    • Updated the diagram termination text from 'blocked' to 'error'.
    • Updated the project structure section to include loop_protocol.py and react_loop.py.
  • src/ai_company/engine/init.py
    • Updated the module docstring to reflect the inclusion of execution loops.
    • Imported new error types (BudgetExhaustedError, LoopExecutionError) and loop-related models/protocols (BudgetChecker, ExecutionLoop, ExecutionResult, TerminationReason, TurnRecord, ReactLoop).
    • Updated the __all__ export list to include the newly imported loop components and error types.
  • src/ai_company/engine/errors.py
    • Added BudgetExhaustedError class with a clarifying docstring.
    • Added LoopExecutionError class with a clarifying docstring.
  • src/ai_company/engine/loop_protocol.py
    • Added a new file defining the ExecutionLoop runtime-checkable protocol.
    • Defined TerminationReason enum for various loop termination states.
    • Defined TurnRecord Pydantic model for per-turn metadata, including a total_tokens computed field.
    • Defined ExecutionResult Pydantic model for loop outcomes, including a total_tool_calls computed field and a model_validator for error_message consistency.
    • Defined BudgetChecker as a callable type alias.
  • src/ai_company/engine/react_loop.py
    • Added a new file implementing the ReactLoop class, conforming to the ExecutionLoop protocol.
    • Decomposed the main execute method into five private helper methods: _check_budget, _call_provider, _check_response_errors, _handle_completion, and _execute_tool_calls.
    • Implemented comprehensive error handling within the loop for provider exceptions, budget checker failures, and tool execution issues, returning ExecutionResult with TerminationReason.ERROR.
    • Added logic to handle MAX_TOKENS finish reason as a COMPLETED termination.
    • Added logic to handle TOOL_USE finish reason with no actual tool calls as an ERROR termination.
  • src/ai_company/observability/events/execution.py
    • Added new constants for execution loop observability events, such as EXECUTION_LOOP_START, EXECUTION_LOOP_TURN_START, EXECUTION_LOOP_TURN_COMPLETE, EXECUTION_LOOP_TOOL_CALLS, EXECUTION_LOOP_TERMINATED, EXECUTION_LOOP_BUDGET_EXHAUSTED, and EXECUTION_LOOP_ERROR.
  • src/ai_company/tools/invoker.py
    • Added a registry property to ToolInvoker for read-only access to the underlying ToolRegistry.
  • tests/unit/engine/conftest.py
    • Added TYPE_CHECKING import.
    • Imported additional models (ChatMessage, CompletionConfig, CompletionResponse, StreamChunk) and ModelCapabilities.
    • Added MockCompletionProvider class as a test double for CompletionProvider.
    • Added mock_provider_factory fixture to expose MockCompletionProvider for test construction.
  • tests/unit/engine/test_errors.py
    • Imported BudgetExhaustedError and LoopExecutionError.
    • Added unit tests to verify BudgetExhaustedError and LoopExecutionError are subclasses of EngineError and behave as expected.
  • tests/unit/engine/test_loop_protocol.py
    • Added a new file with unit tests for TerminationReason enum, TurnRecord model (including computed fields and immutability), and ExecutionResult model (including computed fields, error message validation, and immutability).
    • Added tests to confirm ReactLoop correctly conforms to the ExecutionLoop protocol.
  • tests/unit/engine/test_react_loop.py
    • Added a new file with extensive unit tests for the ReactLoop implementation.
    • Included tests for basic completion, single and multi-turn tool calls, max turns termination, budget exhaustion, missing tool invoker scenarios, LLM error responses (content filter, error finish reason), turn record accuracy, context immutability, conversation state preservation, custom completion configuration, provider exceptions, tool execution exceptions, MAX_TOKENS finish reason, TOOL_USE with empty tool calls, and budget checker exceptions.
Activity
  • The pull request was pre-reviewed by 10 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency).
  • 16 findings from the pre-review were consolidated and all 16 have been implemented in this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-designed ExecutionLoop protocol and a robust ReactLoop implementation, which is a significant architectural improvement for agent execution. The code is well-structured with a clear separation of concerns, and the test coverage is comprehensive. The decomposition of the main execution logic into smaller helper methods is excellent for readability and maintainability. My primary concern is a critical syntax issue in the exception handling within the new react_loop.py file, which seems to originate from a misleading guideline in the project's documentation. This issue will prevent the code from running in a standard Python 3 environment.

Comment on lines +231 to +232
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except block uses Python 2 syntax for catching multiple exceptions. In Python 3, this will raise a SyntaxError. Multiple exception types must be enclosed in parentheses as a tuple.

While I notice the CLAUDE.md file recommends this comma-separated syntax, that guideline appears to be based on a misunderstanding. The associated PEP 758 was rejected, and this syntax is not valid in any version of Python 3. To ensure the code is runnable in a standard Python 3 environment, this should be corrected.

        except (MemoryError, RecursionError):
            raise

Comment on lines +361 to +362
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

As with the other except block in this file, this uses Python 2 syntax. This will cause a SyntaxError in Python 3. The exceptions should be grouped in a tuple to ensure the code is valid and runnable.

Suggested change
except MemoryError, RecursionError:
raise
except (MemoryError, RecursionError):
raise

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR refactors the execution loop layer by introducing a formal ExecutionLoop protocol (loop_protocol.py) and decomposing the monolithic execute() method into 5 focused helpers inside a new ReactLoop class (react_loop.py). Supporting additions include two new engine-layer exception classes, 7 observability event constants, a registry property on ToolInvoker, and 6 new test cases covering previously untested failure modes.

Key changes:

  • loop_protocol.py — defines ExecutionLoop (runtime-checkable protocol), ExecutionResult, TurnRecord, TerminationReason, and BudgetChecker; uses @computed_field for derived totals and a cross-field @model_validator to enforce the error_message/termination_reason invariant
  • react_loop.py — implements the protocol via ReactLoop; each helper (_check_budget, _call_provider, _check_response_errors, _handle_completion, _execute_tool_calls) has a single responsibility; non-recoverable errors (MemoryError, RecursionError) are re-raised using the project's PEP 758 syntax (except A, B:)
  • Bug in _check_response_errors — when a CONTENT_FILTER or ERROR finish reason is returned, the method manually constructs the updated context via model_copy but omits the task_execution.with_cost(usage) update that AgentContext.with_turn_completed() always performs; for task-bound executions this creates a silent discrepancy between result.context.accumulated_cost (updated) and result.context.task_execution.accumulated_cost (stale); none of the new tests cover this path because all test contexts are created without a bound task
  • Test coverage is otherwise excellent — 13 test classes covering budget exhaustion, provider failures, MemoryError/RecursionError propagation, MAX_TOKENS, TOOL_USE with empty calls, and invoker exceptions

Confidence Score: 3/5

  • Safe to merge for non-task-bound executions; task-bound agents that receive CONTENT_FILTER or ERROR responses will silently under-report cost in task_execution.
  • The overall architecture and decomposition are clean and the vast majority of the code is correct. The single confirmed bug — missing task_execution.with_cost() in _check_response_errors — is narrowly scoped to task-bound executions hitting error finish reasons, but it is a silent data-integrity issue with no test coverage, which warrants holding the score back from a 4.
  • src/ai_company/engine/react_loop.py — specifically _check_response_errors (lines 279–286)

Important Files Changed

Filename Overview
src/ai_company/engine/react_loop.py New ReAct loop implementation; well-structured decomposition into 5 helpers with correct error handling, but _check_response_errors misses task_execution.with_cost() for task-bound agents on CONTENT_FILTER/ERROR responses.
src/ai_company/engine/loop_protocol.py New protocol file defining ExecutionLoop, ExecutionResult, TurnRecord, TerminationReason, and BudgetChecker; clean Pydantic v2 design with correct @computed_field and cross-field @model_validator usage.
tests/unit/engine/test_react_loop.py Comprehensive test coverage across 13 test classes; good coverage of error paths, budget, tool exceptions, and config propagation — but no test exercises CONTENT_FILTER or ERROR finish reasons on a task-bound context, leaving the task_execution cost-tracking bug undetected.
src/ai_company/engine/errors.py Adds BudgetExhaustedError and LoopExecutionError engine-layer exceptions with clear docstrings explaining the loop-internal vs engine-layer separation.
tests/unit/engine/test_loop_protocol.py Thorough unit tests for TurnRecord, ExecutionResult, and TerminationReason; correctly validates frozen behaviour, computed fields, and the error_message cross-field invariant.
tests/unit/engine/conftest.py Adds MockCompletionProvider and mock_provider_factory fixture; correctly implements the full CompletionProvider interface including stream and get_model_capabilities.
src/ai_company/engine/init.py Re-exports all new public symbols (ReactLoop, ExecutionLoop, ExecutionResult, TurnRecord, TerminationReason, BudgetChecker, BudgetExhaustedError, LoopExecutionError); __all__ kept alphabetically sorted.
src/ai_company/tools/invoker.py Adds a registry read-only property to ToolInvoker to expose the underlying ToolRegistry; minimal, clean addition needed by _get_tool_definitions.
src/ai_company/observability/events/execution.py Adds 7 new Final[str] event constants for loop lifecycle events; follows established naming convention.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([execute called]) --> B{has_turns_remaining?}
    B -- No --> Z([return MAX_TURNS])
    B -- Yes --> C[_check_budget]
    C -- exhausted --> Y([return BUDGET_EXHAUSTED])
    C -- checker raises --> X([return ERROR])
    C -- ok --> D[_call_provider]
    D -- provider raises --> W([return ERROR])
    D -- ok --> E[append TurnRecord]
    E --> F[_check_response_errors]
    F -- CONTENT_FILTER / ERROR --> V([return ERROR\n⚠️ task_execution cost NOT updated])
    F -- ok --> G[ctx.with_turn_completed]
    G --> H{tool_calls empty?}
    H -- Yes --> I[_handle_completion]
    I -- TOOL_USE + no calls --> U([return ERROR])
    I -- STOP / MAX_TOKENS --> T([return COMPLETED])
    H -- No --> J[_execute_tool_calls]
    J -- no invoker --> S([return ERROR])
    J -- invoke_all raises --> R([return ERROR])
    J -- ok --> K[append tool result messages to ctx]
    K --> B
Loading

Last reviewed commit: 6e739c7

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the ExecutionLoop protocol and a ReAct-based loop implementation with structured observability, clearer engine-layer error types, updated design docs, and comprehensive unit coverage for loop termination/error paths.

Changes:

  • Introduce ExecutionLoop protocol + frozen result models (ExecutionResult, TurnRecord, TerminationReason) with computed fields and cross-field validation.
  • Implement ReactLoop with decomposed helpers, explicit handling for provider/tool/budget failures, and new execution-loop observability events.
  • Add/extend unit tests and documentation to cover new loop behaviors and clarify engine vs loop error semantics.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/ai_company/engine/loop_protocol.py Defines protocol + result models with computed fields and validation.
src/ai_company/engine/react_loop.py Implements the ReAct execution loop with structured logging and robust error handling.
src/ai_company/observability/events/execution.py Adds event constants for loop lifecycle/turn/tool logging.
src/ai_company/tools/invoker.py Exposes ToolInvoker.registry for deriving tool definitions.
src/ai_company/engine/errors.py Adds engine-layer exceptions to mirror loop termination reasons.
src/ai_company/engine/__init__.py Re-exports new loop APIs and related errors.
tests/unit/engine/conftest.py Adds a MockCompletionProvider test double + fixture.
tests/unit/engine/test_react_loop.py Adds comprehensive ReactLoop behavior tests across success and failure modes.
tests/unit/engine/test_loop_protocol.py Tests protocol models, computed fields, and validation rules.
tests/unit/engine/test_errors.py Extends error hierarchy tests for new exceptions.
DESIGN_SPEC.md Documents the ExecutionLoop protocol and updates diagrams/project structure.
CLAUDE.md Updates contributor workflow guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +179 to +194
try:
exhausted = budget_checker(ctx)
except Exception as exc:
error_msg = f"Budget checker failed: {type(exc).__name__}: {exc}"
logger.exception(
EXECUTION_LOOP_ERROR,
execution_id=ctx.execution_id,
turn=ctx.turn_count,
error=error_msg,
)
return _build_result(
ctx,
TerminationReason.ERROR,
turns,
error_message=error_msg,
)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _check_budget, MemoryError/RecursionError are currently caught by the broad except Exception and converted into an ExecutionResult. Elsewhere in this loop (provider call + tool execution) these exceptions are treated as non-recoverable and re-raised. Consider adding an explicit except MemoryError, RecursionError: raise here as well to keep fatal error handling consistent and avoid masking OOM/recursion failures.

Copilot uses AI. Check for mistakes.

@pytest.mark.unit
class TestReactLoopToolExecutionException:
"""Tool invoker raising fatal error during invoke_all()."""
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class docstring says the tool invoker raises a fatal error during invoke_all(), but this test actually exercises a tool raising RuntimeError, which ToolInvoker.invoke() converts into a ToolResult(is_error=True) (i.e., not fatal). Updating the docstring to match the behavior would make the test intent clearer.

Suggested change
"""Tool invoker raising fatal error during invoke_all()."""
"""Tool execution errors are captured by ToolInvoker and do not crash the loop."""

Copilot uses AI. Check for mistakes.
Comment on lines +657 to +660
assert result.termination_reason in (
TerminationReason.ERROR,
TerminationReason.COMPLETED,
)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion allows both ERROR and COMPLETED, which makes the test too permissive and reduces its ability to catch regressions. Given the current setup (provider has only one response), the loop should deterministically return TerminationReason.ERROR on the next provider call due to IndexError (no more responses); it would be better to assert the exact termination reason and, ideally, key parts of the error message.

Suggested change
assert result.termination_reason in (
TerminationReason.ERROR,
TerminationReason.COMPLETED,
)
assert result.termination_reason == TerminationReason.ERROR

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 811-814: The spec removes a distinct blocked termination state and
folds it into ERROR which loses important semantics; update the
TerminationReason enum to include BLOCKED as a first-class value (e.g., add
`BLOCKED` alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `ERROR`) and
adjust ExecutionResult rules so that an error message is required only when
TerminationReason == `ERROR`, while `BLOCKED` is documented as a
resumable/parked state (not an outright failure) and any references to
termination handling or consumers of ExecutionResult/TurnRecord reflect this
distinction; if BLOCKED is intentionally deferred, add a clear note in the spec
stating it will be introduced in M3 rather than collapsing it into ERROR.

In `@src/ai_company/tools/invoker.py`:
- Around line 87-90: The current registry property on ToolInvoker exposes the
live ToolRegistry (_registry), widening the public surface and allowing callers
to bypass validation via registry.get(...).execute(...); change the registry
property to instead return only provider-facing tool definitions (e.g., a
deep-copied mapping/list of ToolDefinition or ToolSchema objects) that contain
schemas/metadata needed by ReactLoop and do not expose ToolRegistry methods;
ensure the property returns immutable or copied schema objects (not the original
_registry or Tool instances) so callers cannot call execute or mutate the
underlying registry.

In `@tests/unit/engine/test_react_loop.py`:
- Around line 554-570: The test only verifies completion succeeded but doesn't
confirm the custom CompletionConfig was passed through to the provider; update
test_custom_completion_config to assert the MockCompletionProvider recorded the
request with the override. After calling ReactLoop.execute, inspect the
provider's request log (e.g., provider.request_log or provider.requests) and
assert the first request's completion config or fields (temperature and
max_tokens) match the passed custom_config, ensuring ReactLoop.execute actually
forwarded the override to the provider.
- Around line 667-687: The test incorrectly treats FinishReason.MAX_TOKENS as a
successful completion; update the test (test_max_tokens_returns_completed) to
assert that a MAX_TOKENS finish is NOT mapped to TerminationReason.COMPLETED
(e.g., assert result.termination_reason != TerminationReason.COMPLETED and/or
assert it equals the non-success code your system uses such as
TerminationReason.INTERRUPTED/INCOMPLETE), and verify ReactLoop.execute’s
mapping of CompletionResponse.finish_reason (FinishReason.MAX_TOKENS) treats it
as a non-success path instead of producing TerminationReason.COMPLETED.
- Around line 577-597: The test currently expects the loop to return a generic
TerminationReason.ERROR and mask the provider's exception; instead change it to
assert the provider exception propagates: update
test_provider_exception_returns_error_result to call loop.execute with the
_FailingProvider and use pytest.raises(ConnectionError) (or assert the raised
exception is ConnectionError) around the await, remove the post-call assertions
about termination_reason/error_message, and ensure the failing provider class
and its complete method remain the same so the ConnectionError is what bubbles
up to the engine layer.
- Around line 620-660: The test test_tool_exception_returns_error_result is
non-deterministic because the mock provider only returns one response; add a
second response when constructing provider (use mock_provider_factory([... ,
_tool_use_response(...)])) so the ReactLoop actually continues past the tool
error, and then change the final assertion to require a single deterministic
outcome (e.g., assert result.termination_reason == TerminationReason.COMPLETED).
Update only the provider creation and the assertion; leave _ExplodingTool,
ToolInvoker, and ReactLoop usage unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1463bb3e-80fb-4849-a07a-1c27ef9bb915

📥 Commits

Reviewing files that changed from the base of the PR and between 09619cb and 62732ba.

📒 Files selected for processing (12)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/errors.py
  • src/ai_company/engine/loop_protocol.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/observability/events/execution.py
  • src/ai_company/tools/invoker.py
  • tests/unit/engine/conftest.py
  • tests/unit/engine/test_errors.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/engine/test_react_loop.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) for exception handling per PEP 758 — ruff enforces this on Python 3.14
Add type hints to all public functions; mypy strict mode must pass
Use Google-style docstrings on all public classes and functions; enforced by ruff D rules
Create new objects for immutability; never mutate existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves (e.g. agent execution state, task progress). Never mix static config fields with mutable runtime fields
Use Pydantic v2 with adopted conventions: use @computed_field for derived values instead of storing redundant fields (e.g. TokenUsage.total_tokens); use NotBlankStr for all identifier/name fields including optional and tuple variants instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically; never implement retry logic in driver subclasses or calling code
Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig
Retryable erro...

Files:

  • tests/unit/engine/test_errors.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/observability/events/execution.py
  • src/ai_company/engine/errors.py
  • src/ai_company/engine/react_loop.py
  • tests/unit/engine/test_react_loop.py
  • tests/unit/engine/test_loop_protocol.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/engine/loop_protocol.py
  • tests/unit/engine/conftest.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow to categorize tests
Set test timeout to 30 seconds per test
Use tests/**/*.py for vendor-agnostic naming with test-provider, test-small-001 instead of real vendor model names

Files:

  • tests/unit/engine/test_errors.py
  • tests/unit/engine/test_react_loop.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/engine/conftest.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Every module with business logic MUST have from ai_company.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code
Always use logger as the variable name for loggers, not _logger or log
Always use event name constants from domain-specific modules under ai_company.observability.events (e.g. PROVIDER_CALL_START from events.provider); import directly
Always use structured logging with logger.info(EVENT, key=value) format; never use logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/engine/__init__.py
  • src/ai_company/observability/events/execution.py
  • src/ai_company/engine/errors.py
  • src/ai_company/engine/react_loop.py
  • src/ai_company/tools/invoker.py
  • src/ai_company/engine/loop_protocol.py
DESIGN_SPEC.md

📄 CodeRabbit inference engine (CLAUDE.md)

Update DESIGN_SPEC.md to reflect new reality when approved deviations occur

Files:

  • DESIGN_SPEC.md
🧠 Learnings (22)
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to README.md : Update README.md for significant feature changes

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: At every phase of planning and implementation, be critical and actively look for ways to improve the design in the spirit of what we're building (robustness, correctness, simplicity, future-proofing where it's free); surface improvements as suggestions, not silent changes

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue; the design spec is the starting point for architecture, data models, and behavior

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Never defer work—do not suggest "this can be done later" or "consider for a future PR". Complete all requested changes fully.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Never defer work. Do not suggest 'this can be done later' or 'consider for a future PR'. Complete all requested changes fully.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Alert the user and explain why if implementation deviates from the spec; user decides whether to proceed or update the spec. Do NOT silently diverge — every deviation needs explicit user approval

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Enable pre-commit hooks: trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, no-commit-to-branch (main), ruff check+format, gitleaks

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: After finishing an issue implementation, create a feature branch (`<type>/<slug>`), commit, and push — do NOT create a PR automatically

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Use commit message format `<type>: <description>` with types: feat, fix, refactor, docs, test, chore, perf, ci; enforced by commitizen

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Always create a PR for issue work. When implementing changes for a GitHub issue, create a branch and open a pull request. Do not wait to be asked.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: No bypassing CI - never use `git push --no-verify` or modify test coverage thresholds to make tests pass. If tests fail, fix the actual issue. Pre-push hooks exist to catch problems before they reach CI.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: NEVER create a PR directly — `gh pr create` is blocked by hookify; ALWAYS use `/pre-pr-review` to create PRs

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • src/ai_company/engine/__init__.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state

Applied to files:

  • src/ai_company/engine/__init__.py
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Applies to src/**/*.py : Always use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`); import directly

Applied to files:

  • src/ai_company/observability/events/execution.py
📚 Learning: 2026-03-06T17:20:51.959Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T17:20:51.959Z
Learning: Applies to **/*.py : `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains

Applied to files:

  • src/ai_company/engine/errors.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/unit/engine/test_react_loop.py
  • tests/unit/engine/test_loop_protocol.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

  • tests/unit/engine/conftest.py
🔇 Additional comments (13)
src/ai_company/engine/react_loop.py (5)

1-40: LGTM!

Module docstring clearly describes the ReAct pattern, imports are well-organized with TYPE_CHECKING guards, and observability setup follows project conventions with get_logger(__name__).


56-168: LGTM!

The execute loop structure is clean and follows the documented think→act→observe pattern. The decomposition into focused helper methods improves readability, and each early-return path correctly propagates ExecutionResult. The immutable context evolution via ctx.with_turn_completed() and ctx.with_message() aligns with the project's frozen model conventions.


322-389: LGTM!

Comprehensive error handling for tool execution: gracefully handles missing tool invoker when the LLM unexpectedly requests tools, and properly catches tool execution failures with full context logging. The context evolution loop using ctx.with_message() correctly maintains immutability.


392-439: LGTM!

Helper functions are well-focused and correctly handle the data transformations. _build_result properly converts the mutable turns list to an immutable tuple for the frozen ExecutionResult model.


231-248: No issues. The code correctly uses PEP 758 exception syntax for Python 3.14.

src/ai_company/engine/__init__.py (1)

1-61: LGTM!

Public API surface cleanly expanded with all new loop protocol types, the ReactLoop implementation, and related error types. The __all__ list maintains alphabetical ordering and the docstring is updated to reflect the new scope.

tests/unit/engine/conftest.py (2)

176-230: LGTM!

The MockCompletionProvider test double correctly implements the CompletionProvider protocol with matching method signatures. The implementation is appropriately minimal: complete() pops pre-configured responses, stream() raises NotImplementedError (acceptable for tests not exercising streaming), and get_model_capabilities() returns valid minimal capabilities using the vendor-agnostic "test-provider" name.


233-236: LGTM!

The fixture returning the class type (rather than an instance) provides good test flexibility—each test can construct a mock with its specific response sequence.

src/ai_company/engine/loop_protocol.py (5)

1-21: LGTM!

Module structure is clean with appropriate imports. The TYPE_CHECKING guard for forward references (CompletionConfig, CompletionProvider, ToolInvoker) is correctly applied while AgentContext is imported directly since it's used in runtime field definitions.


23-29: LGTM!

Clear, minimal enum covering the four termination scenarios documented in the PR objectives.


32-63: LGTM!

TurnRecord is well-designed with appropriate field constraints (gt=0, ge=0), immutable tuple for tool_calls_made, and @computed_field for derived total_tokens following project conventions.


66-117: LGTM!

ExecutionResult is well-designed with proper validation. The @model_validator correctly enforces the error_message↔ERROR invariant. The docstring appropriately warns about metadata dict mutation and references the project's deep-copy convention for system boundaries.


120-160: LGTM!

The BudgetChecker type alias is simple and well-documented. The ExecutionLoop protocol with @runtime_checkable provides a clean contract that ReactLoop implements, with comprehensive docstrings explaining the callback semantics and configuration options.

Comment on lines +811 to +814
- **`TerminationReason`** — enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `ERROR`
- **`TurnRecord`** — frozen per-turn stats (tokens, cost, tool calls, finish reason)
- **`ExecutionResult`** — frozen outcome with final context, termination reason, turn records, and optional error message (required when reason is `ERROR`)
- **`BudgetChecker`** — callback type `Callable[[AgentContext], bool]` invoked before each LLM call
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep a first-class blocked termination state.

Issue #124 still calls out blocked termination, and later sections in this spec already distinguish parked/resumable work from outright failure. Folding those cases into ERROR loses whether the loop failed versus needs external input and can be resumed; if BLOCKED is intentionally deferred for M3, call that out explicitly instead of redefining the protocol here.

📝 Suggested spec correction
-- **`TerminationReason`** — enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `ERROR`
+- **`TerminationReason`** — enum: `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`, `BLOCKED`, `ERROR`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` around lines 811 - 814, The spec removes a distinct blocked
termination state and folds it into ERROR which loses important semantics;
update the TerminationReason enum to include BLOCKED as a first-class value
(e.g., add `BLOCKED` alongside `COMPLETED`, `MAX_TURNS`, `BUDGET_EXHAUSTED`,
`ERROR`) and adjust ExecutionResult rules so that an error message is required
only when TerminationReason == `ERROR`, while `BLOCKED` is documented as a
resumable/parked state (not an outright failure) and any references to
termination handling or consumers of ExecutionResult/TurnRecord reflect this
distinction; if BLOCKED is intentionally deferred, add a clear note in the spec
stating it will be introduced in M3 rather than collapsing it into ERROR.

Comment on lines +87 to +90
@property
def registry(self) -> ToolRegistry:
"""Read-only access to the underlying tool registry."""
return self._registry
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Expose only provider-facing tool definitions here.

ReactLoop only needs schemas. Returning the live ToolRegistry widens ToolInvoker's public surface and makes it easy for callers to step around the validation/deep-copy boundary via registry.get(...).execute(...).

♻️ Narrow the accessor
-from ai_company.providers.models import ToolCall, ToolResult
+from ai_company.providers.models import ToolCall, ToolDefinition, ToolResult
...
-    `@property`
-    def registry(self) -> ToolRegistry:
-        """Read-only access to the underlying tool registry."""
-        return self._registry
+    def to_definitions(self) -> tuple[ToolDefinition, ...]:
+        """Return provider-facing tool definitions."""
+        return self._registry.to_definitions()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/tools/invoker.py` around lines 87 - 90, The current registry
property on ToolInvoker exposes the live ToolRegistry (_registry), widening the
public surface and allowing callers to bypass validation via
registry.get(...).execute(...); change the registry property to instead return
only provider-facing tool definitions (e.g., a deep-copied mapping/list of
ToolDefinition or ToolSchema objects) that contain schemas/metadata needed by
ReactLoop and do not expose ToolRegistry methods; ensure the property returns
immutable or copied schema objects (not the original _registry or Tool
instances) so callers cannot call execute or mutate the underlying registry.

Comment on lines +577 to +597
async def test_provider_exception_returns_error_result(
self,
sample_agent_context: AgentContext,
) -> None:
ctx = _ctx_with_user_msg(sample_agent_context)

class _FailingProvider:
async def complete(self, *_args: Any, **_kwargs: Any) -> None:
msg = "connection refused"
raise ConnectionError(msg)

loop = ReactLoop()
result = await loop.execute(
context=ctx,
provider=_FailingProvider(), # type: ignore[arg-type]
)

assert result.termination_reason == TerminationReason.ERROR
assert result.error_message is not None
assert "ConnectionError" in result.error_message

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't lock provider failures into TerminationReason.ERROR.

An error result here hides the original provider exception from the engine layer. That breaks the fallback path for retry exhaustion and similar provider failures, which the engine is supposed to catch directly rather than receiving a generic loop error.

As per coding guidelines, "RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains`."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_react_loop.py` around lines 577 - 597, The test
currently expects the loop to return a generic TerminationReason.ERROR and mask
the provider's exception; instead change it to assert the provider exception
propagates: update test_provider_exception_returns_error_result to call
loop.execute with the _FailingProvider and use pytest.raises(ConnectionError)
(or assert the raised exception is ConnectionError) around the await, remove the
post-call assertions about termination_reason/error_message, and ensure the
failing provider class and its complete method remain the same so the
ConnectionError is what bubbles up to the engine layer.

Comment on lines +620 to +660
async def test_tool_exception_returns_error_result(
self,
sample_agent_context: AgentContext,
mock_provider_factory: type[MockCompletionProvider],
) -> None:
ctx = _ctx_with_user_msg(sample_agent_context)
provider = mock_provider_factory([_tool_use_response("explode", "tc-1")])

class _ExplodingTool(BaseTool):
def __init__(self) -> None:
super().__init__(
name="explode",
description="boom",
)

async def execute(
self,
*,
arguments: dict[str, Any],
) -> ToolExecutionResult:
msg = "kaboom"
raise RuntimeError(msg)

registry = ToolRegistry([_ExplodingTool()])
invoker = ToolInvoker(registry)
loop = ReactLoop()

result = await loop.execute(
context=ctx,
provider=provider,
tool_invoker=invoker,
)

# The tool error is caught by ToolInvoker.invoke and returned
# as ToolResult(is_error=True), so the loop continues normally.
# It terminates because the mock has no more responses.
# This validates the loop doesn't crash on tool errors.
assert result.termination_reason in (
TerminationReason.ERROR,
TerminationReason.COMPLETED,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the post-tool-error contract deterministic.

Allowing either ERROR or COMPLETED means this test will keep passing even if the loop starts terminating for the wrong reason; at the moment it's mostly asserting the mock provider's exhaustion behavior. Feed a second provider response and pin the expected outcome after a recoverable tool failure.

💚 Make the expectation explicit
-        provider = mock_provider_factory([_tool_use_response("explode", "tc-1")])
+        provider = mock_provider_factory(
+            [
+                _tool_use_response("explode", "tc-1"),
+                _stop_response("done after tool error"),
+            ]
+        )
...
-        # It terminates because the mock has no more responses.
-        # This validates the loop doesn't crash on tool errors.
-        assert result.termination_reason in (
-            TerminationReason.ERROR,
-            TerminationReason.COMPLETED,
-        )
+        assert result.termination_reason == TerminationReason.COMPLETED
+        assert result.context.conversation[2].tool_result is not None
+        assert result.context.conversation[2].tool_result.is_error is True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_react_loop.py` around lines 620 - 660, The test
test_tool_exception_returns_error_result is non-deterministic because the mock
provider only returns one response; add a second response when constructing
provider (use mock_provider_factory([... , _tool_use_response(...)])) so the
ReactLoop actually continues past the tool error, and then change the final
assertion to require a single deterministic outcome (e.g., assert
result.termination_reason == TerminationReason.COMPLETED). Update only the
provider creation and the assertion; leave _ExplodingTool, ToolInvoker, and
ReactLoop usage unchanged.

Comment on lines +667 to +687
async def test_max_tokens_returns_completed(
self,
sample_agent_context: AgentContext,
mock_provider_factory: type[MockCompletionProvider],
) -> None:
ctx = _ctx_with_user_msg(sample_agent_context)
response = CompletionResponse(
content="partial output",
finish_reason=FinishReason.MAX_TOKENS,
usage=_usage(),
model="test-model-001",
)
provider = mock_provider_factory([response])
loop = ReactLoop()

result = await loop.execute(context=ctx, provider=provider)

assert result.termination_reason == TerminationReason.COMPLETED
assert len(result.turns) == 1
assert result.turns[0].finish_reason == FinishReason.MAX_TOKENS

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

MAX_TOKENS is not a successful completion signal.

A FinishReason.MAX_TOKENS response only says the model hit the output ceiling; it says nothing about the task being done. Baking TerminationReason.COMPLETED into the test will bless truncated answers as successful executions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_react_loop.py` around lines 667 - 687, The test
incorrectly treats FinishReason.MAX_TOKENS as a successful completion; update
the test (test_max_tokens_returns_completed) to assert that a MAX_TOKENS finish
is NOT mapped to TerminationReason.COMPLETED (e.g., assert
result.termination_reason != TerminationReason.COMPLETED and/or assert it equals
the non-success code your system uses such as
TerminationReason.INTERRUPTED/INCOMPLETE), and verify ReactLoop.execute’s
mapping of CompletionResponse.finish_reason (FinishReason.MAX_TOKENS) treats it
as a non-success path instead of producing TerminationReason.COMPLETED.

Source fixes:
- Use NotBlankStr for TurnRecord.tool_calls_made (convention consistency)
- Add MemoryError/RecursionError guard to _check_budget (consistency with
  _call_provider and _execute_tool_calls)
- Fix cost accounting: error responses now include failing turn's cost
  in context.accumulated_cost via model_copy
- Update module, method, and class docstrings for accuracy

Test improvements:
- Tighten overly permissive tool error assertion (ERROR, not ERROR|COMPLETED)
- Add RecursionError propagation tests for provider and tool execution
- Add invoke_all exception path test with mock ToolInvoker
- Add empty ToolRegistry test
- Add cost accounting test for error responses
- Add TurnRecord field validation boundary tests
- Record and verify CompletionConfig forwarding in MockCompletionProvider

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aureliolo Aureliolo merged commit 8dfb3c0 into main Mar 6, 2026
8 of 9 checks passed
@Aureliolo Aureliolo deleted the feat/execution-loop-protocol branch March 6, 2026 19:20
Comment on lines +279 to +292
updated_ctx = ctx.model_copy(
update={
"turn_count": ctx.turn_count + 1,
"accumulated_cost": add_token_usage(
ctx.accumulated_cost, response.usage
),
},
)
return _build_result(
updated_ctx,
TerminationReason.ERROR,
turns,
error_message=error_msg,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task_execution cost skipped on error responses

_check_response_errors manually assembles the context update via model_copy but omits the task_execution.with_cost(usage) step that AgentContext.with_turn_completed() always performs. For task-bound executions, this means the CONTENT_FILTER and ERROR finish-reason paths never credit the failing turn's token cost to the TaskExecution, creating a gap between result.context.accumulated_cost (updated) and result.context.task_execution.accumulated_cost (stale).

with_turn_completed (context.py line 223–224) shows the pattern:

if self.task_execution is not None:
    updates["task_execution"] = self.task_execution.with_cost(usage)

This same guard is missing here:

Suggested change
updated_ctx = ctx.model_copy(
update={
"turn_count": ctx.turn_count + 1,
"accumulated_cost": add_token_usage(
ctx.accumulated_cost, response.usage
),
},
)
return _build_result(
updated_ctx,
TerminationReason.ERROR,
turns,
error_message=error_msg,
)
updated_ctx = ctx.model_copy(
update={
"turn_count": ctx.turn_count + 1,
"accumulated_cost": add_token_usage(
ctx.accumulated_cost, response.usage
),
**(
{"task_execution": ctx.task_execution.with_cost(response.usage)}
if ctx.task_execution is not None
else {}
),
},
)

None of the existing tests catch this because every test that hits this error path uses the sample_agent_context fixture, which has no task_execution (AgentContext.from_identity without a task= argument). A test exercising CONTENT_FILTER on a task-bound context would expose the discrepancy.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/react_loop.py
Line: 279-292

Comment:
**`task_execution` cost skipped on error responses**

`_check_response_errors` manually assembles the context update via `model_copy` but omits the `task_execution.with_cost(usage)` step that `AgentContext.with_turn_completed()` always performs. For task-bound executions, this means the `CONTENT_FILTER` and `ERROR` finish-reason paths never credit the failing turn's token cost to the `TaskExecution`, creating a gap between `result.context.accumulated_cost` (updated) and `result.context.task_execution.accumulated_cost` (stale).

`with_turn_completed` (context.py line 223–224) shows the pattern:
```python
if self.task_execution is not None:
    updates["task_execution"] = self.task_execution.with_cost(usage)
```

This same guard is missing here:

```suggestion
        updated_ctx = ctx.model_copy(
            update={
                "turn_count": ctx.turn_count + 1,
                "accumulated_cost": add_token_usage(
                    ctx.accumulated_cost, response.usage
                ),
                **(
                    {"task_execution": ctx.task_execution.with_cost(response.usage)}
                    if ctx.task_execution is not None
                    else {}
                ),
            },
        )
```

None of the existing tests catch this because every test that hits this error path uses the `sample_agent_context` fixture, which has no `task_execution` (`AgentContext.from_identity` without a `task=` argument). A test exercising CONTENT_FILTER on a task-bound context would expose the discrepancy.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Define ExecutionLoop protocol and implement ReAct loop (DESIGN_SPEC §6.5)

2 participants