Skip to content

feat: implement AgentEngine core orchestrator (#11)#142

Closed
Aureliolo wants to merge 2 commits intomainfrom
feat/agent-engine-core
Closed

feat: implement AgentEngine core orchestrator (#11)#142
Aureliolo wants to merge 2 commits intomainfrom
feat/agent-engine-core

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • AgentEngine — top-level orchestrator tying together prompt construction, execution context, execution loop, tool invocation, and budget tracking into a single run() entry point
  • AgentRunResult — frozen Pydantic model wrapping ExecutionResult with engine-level metadata (system prompt, duration, agent/task IDs, computed convenience fields)
  • 4 new execution event constants for debug observability (EXECUTION_ENGINE_CREATED, EXECUTION_ENGINE_PROMPT_BUILT, EXECUTION_ENGINE_COST_SKIPPED, EXECUTION_ENGINE_COST_FAILED)
  • 32 unit tests covering happy path, validation errors, error handling, immutability, budget checking, cost recording, tool integration, custom loops, computed fields, completion config forwarding, max_turns propagation, and deadline formatting

Key design decisions

  • Cost recording failures are non-fatal — a successful run is never downgraded to an error because of a tracker failure
  • MemoryError and RecursionError propagate unconditionally; all other exceptions are caught and wrapped in an error result
  • DEFAULT_MAX_TURNS imported from context.py (single source of truth, no duplication)
  • _prepare_context() extracted to keep all methods under the 50-line convention
  • Wall-clock duration_seconds captured in run() and preserved even in error results

Documentation updates

  • DESIGN_SPEC.md §6.5: Added AgentEngine orchestrator description and AgentRunResult model documentation
  • DESIGN_SPEC.md §15.3: Added run_result.py to project structure
  • CLAUDE.md: Updated engine/ package description

Closes #11

Test plan

  • uv run ruff check src/ tests/ — all checks pass
  • uv run mypy src/ tests/ — no issues in 218 files
  • uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80 — 1937 passed, 95.38% coverage
  • agent_engine.py at 99% coverage, run_result.py at 100%
  • Pre-commit hooks pass (ruff, gitleaks, commitizen)

Review coverage

Pre-reviewed by 9 agents: code-reviewer, python-reviewer, test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency. 26 findings total, 23 implemented (4 Critical, 6 Major, 10 Medium, 3 Minor) + 3 suggestions. 3 items deferred as future M4 concerns (exception object preservation, broad catch narrowing, fresh context in error path).

Aureliolo and others added 2 commits March 6, 2026 20:51
Add AgentEngine as the top-level orchestrator that ties together prompt
construction, execution context, execution loop, tool invocation, and
budget tracking into a single run() entry point.

New files:
- engine/agent_engine.py: AgentEngine class with run() method
- engine/run_result.py: AgentRunResult frozen model with computed fields
- tests/unit/engine/test_agent_engine.py: 27 unit tests (14 classes)

Modified:
- engine/__init__.py: export AgentEngine, AgentRunResult
- observability/events/execution.py: 6 new engine event constants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 9 agents, 23 findings addressed:
- Critical: cost recording failure no longer destroys successful results
- Major: extracted _prepare_context, fixed duration in error path,
  removed duplicate DEFAULT_MAX_TURNS, added Raises docstring
- Added 5 new tests (zero-cost skip, cost-tracker failure, completion
  config forwarding, max_turns forwarding, deadline formatting)
- Updated DESIGN_SPEC.md §6.5 with AgentEngine + AgentRunResult docs
- Added 4 new execution event constants for debug observability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 20:12
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

📝 Walkthrough

Summary by CodeRabbit

New Features

  • Introduced a new agent orchestration engine as the primary entry point for executing agents on tasks
  • Added structured execution results with metadata including duration, cost, turn count, and success status
  • Expanded execution architecture with dedicated engines for task routing, workflow orchestration, meetings, and HR operations

Walkthrough

Introduces AgentEngine, a top-level orchestrator for agent execution that validates inputs, builds system prompts, creates execution context, delegates to an ExecutionLoop, handles task state transitions, applies budget constraints, records costs, and returns structured results with metadata.

Changes

Cohort / File(s) Summary
Agent Engine Core Implementation
src/ai_company/engine/agent_engine.py, src/ai_company/engine/run_result.py
New AgentEngine orchestrator class implementing the run() pipeline with validation, prompt/context preparation, task state transitions, budget enforcement, cost tracking, and error handling. New AgentRunResult frozen model wrapping ExecutionResult with engine metadata and computed properties (termination_reason, total_turns, total_cost_usd, is_success).
Engine Package Exports
src/ai_company/engine/__init__.py
Re-exports AgentEngine and AgentRunResult as public API symbols from the engine package.
Documentation & Package Description
CLAUDE.md, DESIGN_SPEC.md
Updated package structure description to "Agent orchestration, execution loops, and task lifecycle." Added AgentEngine Orchestrator specification detailing run() pipeline, error handling, crash recovery (Fail-and-Reassign MVP), and new public engine module declarations (agent_engine, run_result, task_engine, workflow_engine, meeting_engine, hr_engine).
Observability Events
src/ai_company/observability/events/execution.py
Added ten execution engine lifecycle and cost-tracking event constants: EXECUTION_ENGINE_CREATED, START, PROMPT_BUILT, COMPLETE, ERROR, INVALID_INPUT, TASK_TRANSITION, COST_RECORDED, COST_SKIPPED, COST_FAILED.
Unit Tests
tests/unit/engine/test_agent_engine.py
Comprehensive test suite covering run results, success status, system prompt construction, task state transitions, tool integration, budget checking, cost tracking, error propagation (MemoryError/RecursionError), execution loop selection, result properties, formatting helpers, and immutability validation.

Sequence Diagram

sequenceDiagram
    participant Client
    participant AgentEngine
    participant Validator
    participant PromptBuilder
    participant ExecutionLoop
    participant CostTracker
    participant TaskMgmt

    Client->>AgentEngine: run(identity, task, ...)
    AgentEngine->>Validator: _validate_agent(identity, agent_id)
    Validator-->>AgentEngine: ✓ or ExecutionStateError
    AgentEngine->>Validator: _validate_task(task, agent_id, task_id)
    Validator-->>AgentEngine: ✓ or ExecutionStateError
    
    AgentEngine->>PromptBuilder: build_system_prompt(agent config)
    PromptBuilder-->>AgentEngine: SystemPrompt
    
    AgentEngine->>TaskMgmt: transition task ASSIGNED→IN_PROGRESS
    TaskMgmt-->>AgentEngine: ✓
    
    AgentEngine->>AgentEngine: _prepare_context(identity, task, system_prompt, ...)
    AgentEngine->>AgentEngine: seed conversation(system msg + task instruction)
    
    AgentEngine->>ExecutionLoop: run(context, budget_checker, tool_invoker, max_turns)
    ExecutionLoop-->>AgentEngine: ExecutionResult(termination_reason, turns, cost, ...)
    
    alt MemoryError or RecursionError
        AgentEngine->>AgentEngine: re-raise exception
    else Other Exception
        AgentEngine->>AgentEngine: _handle_fatal_error() → AgentRunResult(termination=ERROR)
    end
    
    AgentEngine->>CostTracker: _record_costs(result, identity, agent_id, task_id)
    CostTracker-->>AgentEngine: ✓ (or fail gracefully)
    
    AgentEngine->>Client: AgentRunResult(execution_result, system_prompt, duration, agent_id, task_id, ...)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • PR #141: Introduces ExecutionLoop protocol and ReAct loop implementation that AgentEngine directly delegates to, forming a core architectural dependency.
  • PR #71: Adds Task domain model with TaskStatus, budget_limit fields, and cost types that AgentEngine references for validation, transitions, and budget enforcement.
  • PR #105: Introduces build_system_prompt utility and SystemPrompt model that AgentEngine uses in context preparation and result wrapping.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.92% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: implementation of the AgentEngine core orchestrator, which is the primary focus of this PR.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, covering the AgentEngine implementation, AgentRunResult model, design decisions, test results, and documentation updates.
Linked Issues check ✅ Passed The PR successfully implements all core requirements from issue #11: AgentEngine orchestrator [#11], system prompt construction from agent config [#11], ExecutionLoop protocol integration with default ReactLoop [#11], response metadata collection [#11], async execution with max iteration limit [#11], tool integration [#11], and comprehensive unit/integration tests [#11].
Out of Scope Changes check ✅ Passed All code changes are directly aligned with issue #11 requirements: AgentEngine and AgentRunResult implementations, execution event constants for observability, comprehensive test suite, and DESIGN_SPEC/CLAUDE.md documentation updates are all in scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/agent-engine-core

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the foundational AgentEngine and AgentRunResult components, establishing a robust and centralized mechanism for executing agent tasks within the system. This change streamlines the agent execution flow, from initial prompt construction to final result encapsulation, while also significantly improving observability and error handling. The new architecture provides a clear, single entry point for agent operations, making the system more modular and easier to manage.

Highlights

  • Core Orchestrator: Implemented AgentEngine as the top-level orchestrator, integrating prompt construction, execution context, execution loop, tool invocation, and budget tracking into a single run() entry point.
  • Result Model: Introduced AgentRunResult, a frozen Pydantic model that wraps ExecutionResult with engine-level metadata such as system prompt, duration, agent/task IDs, and computed convenience fields.
  • Observability: Added 4 new execution event constants (EXECUTION_ENGINE_CREATED, EXECUTION_ENGINE_PROMPT_BUILT, EXECUTION_ENGINE_COST_SKIPPED, EXECUTION_ENGINE_COST_FAILED) for enhanced debug observability.
  • Unit Testing: Included 32 unit tests covering various scenarios including happy paths, validation errors, error handling, immutability, budget checking, cost recording, tool integration, custom loops, computed fields, completion config forwarding, max_turns propagation, and deadline formatting.
  • Design Decisions: Key design decisions include making cost recording failures non-fatal, unconditionally propagating MemoryError and RecursionError, centralizing DEFAULT_MAX_TURNS, extracting _prepare_context() for conciseness, and capturing wall-clock duration_seconds even in error results.
  • Documentation: Updated DESIGN_SPEC.md with descriptions of the AgentEngine orchestrator and AgentRunResult model, and added run_result.py to the project structure. Also updated CLAUDE.md with an enhanced description for the engine/ package.
Changelog
  • CLAUDE.md
    • Updated the description of the engine/ package to reflect its expanded role in agent orchestration and execution loops.
  • DESIGN_SPEC.md
    • Added a new section detailing the AgentEngine orchestrator, outlining its pipeline steps and error handling strategy.
    • Documented the AgentRunResult Pydantic model, including its attributes and computed fields.
    • Updated the project structure to include run_result.py within the engine/ package.
  • src/ai_company/engine/init.py
    • Imported AgentEngine and AgentRunResult to make them accessible at the package level.
    • Added AgentEngine and AgentRunResult to the __all__ export list.
  • src/ai_company/engine/agent_engine.py
    • Added the AgentEngine class, which serves as the top-level orchestrator for agent execution.
    • Implemented the run method to manage the agent's lifecycle, including validation, prompt building, context creation, loop delegation, and cost recording.
    • Included helper methods for input validation, tool definition extraction, task status transitions, and error handling.
    • Defined _format_task_instruction and _make_budget_checker utility functions.
  • src/ai_company/engine/run_result.py
    • Added the AgentRunResult Pydantic model, designed to encapsulate the outcome of an agent engine run.
    • Defined fields for execution_result, system_prompt, duration_seconds, agent_id, and task_id.
    • Implemented computed properties for termination_reason, total_turns, total_cost_usd, and is_success.
  • src/ai_company/observability/events/execution.py
    • Added new EXECUTION_ENGINE_ prefixed constants for various events within the AgentEngine lifecycle, such as creation, start, prompt building, completion, errors, invalid input, task transitions, and cost recording/skipping/failing.
  • tests/unit/engine/test_agent_engine.py
    • Added extensive unit tests for the AgentEngine class, covering basic runs, system prompt integration, task transitions, invalid input handling, tool integration, budget checking, cost recording, completion config forwarding, max turns propagation, error handling, non-recoverable errors, duration tracking, and computed convenience fields.
    • Included tests for the immutability of agent identity and task objects after execution.
    • Added tests for the _format_task_instruction helper function.
Activity
  • Pre-reviewed by 9 agents: code-reviewer, python-reviewer, test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency.
  • 26 findings total were identified, with 23 implemented (4 Critical, 6 Major, 10 Medium, 3 Minor) and 3 suggestions.
  • 3 items were deferred as future M4 concerns: exception object preservation, broad catch narrowing, and fresh context in error path.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the AgentEngine as the core orchestrator, a significant and well-implemented feature with well-structured code and comprehensive unit tests, along with thorough documentation updates in DESIGN_SPEC.md and CLAUDE.md. However, a significant security vulnerability related to prompt injection was identified: the AgentEngine directly concatenates potentially untrusted task and identity data into LLM prompts without sanitization, which could allow an attacker to manipulate the agent's behavior or execute unauthorized tool calls. Remediation is strongly recommended. Additionally, there are minor areas for improvement related to code consistency and formatting, specifically a formatting inconsistency in task instruction string generation and an inconsistent style for handling type-only imports.

Comment on lines +422 to +438
def _format_task_instruction(task: Task) -> str:
"""Format a task into a user message for the initial conversation."""
parts = [f"# Task: {task.title}", "", task.description]

if task.acceptance_criteria:
parts.append("")
parts.append("## Acceptance Criteria")
parts.extend(f"- {c.description}" for c in task.acceptance_criteria)

if task.budget_limit > 0:
parts.append("")
parts.append(f"**Budget limit:** ${task.budget_limit:.2f} USD")

if task.deadline:
parts.append(f"**Deadline:** {task.deadline}")

return "\n".join(parts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

A prompt injection vulnerability exists because the AgentEngine constructs prompts by directly concatenating fields from the Task and AgentIdentity objects without any sanitization or escaping. If an attacker controls Task or AgentIdentity fields (e.g., title, description, acceptance_criteria, name, personality.description), they can inject malicious instructions, potentially leading to LLM manipulation, unauthorized tool execution, or data exfiltration. Remediation requires implementing sanitization or escaping for all untrusted inputs, considering structured prompt formats or LLM-based guardrails, and instructing the LLM to treat user-provided task data as untrusted. (Also, note a minor formatting inconsistency: the deadline section lacks a blank line before it, unlike the budget limit section, which can reduce readability.)

Comment on lines +227 to +247
system_prompt = build_system_prompt(
agent=identity,
task=task,
available_tools=tool_defs,
)

ctx = AgentContext.from_identity(
identity,
task=task,
max_turns=max_turns,
)
# Seed conversation with system prompt and task instruction
ctx = ctx.with_message(
ChatMessage(role=MessageRole.SYSTEM, content=system_prompt.content),
)
ctx = ctx.with_message(
ChatMessage(
role=MessageRole.USER,
content=_format_task_instruction(task),
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The AgentEngine constructs prompts by directly concatenating fields from the Task and AgentIdentity objects without any sanitization or escaping. Specifically, in _prepare_context, the task and identity objects are passed to build_system_prompt to create the system prompt, and the task object is passed to _format_task_instruction to create the initial user message. If an attacker can control the fields of the Task (e.g., title, description, acceptance_criteria) or AgentIdentity (e.g., name, personality.description), they can inject malicious instructions into the prompt. This can lead to manipulation of the LLM's behavior, unauthorized tool execution, or exfiltration of sensitive information.

To remediate this, implement sanitization or escaping for all untrusted inputs included in prompts. Consider using structured prompt formats or LLM-based guardrails to detect and prevent prompt injection. Ensure that the LLM is instructed to treat user-provided task data as untrusted content.

Comment on lines +9 to +14
from ai_company.core.types import NotBlankStr # noqa: TC001
from ai_company.engine.loop_protocol import (
ExecutionResult,
TerminationReason,
)
from ai_company.engine.prompt import SystemPrompt # noqa: TC001
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other files in this PR (e.g., agent_engine.py), type-only imports should be placed within a TYPE_CHECKING block. This avoids using noqa suppressions and is a standard practice for handling circular dependencies and improving startup time. You would also need to update the type hints in the class to be string forward references (e.g., agent_id: "NotBlankStr").

import pytest

from ai_company.budget.tracker import CostTracker
from ai_company.core.agent import AgentIdentity # noqa: TC001
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For consistency with other files in this PR (e.g., agent_engine.py), type-only imports should be placed within a TYPE_CHECKING block. This avoids using noqa suppressions. You would also need to update the type hints in function signatures to be string forward references (e.g., sample_agent_with_personality: "AgentIdentity").

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements AgentEngine, the top-level orchestrator that ties together prompt construction, execution context, execution loop, tool invocation, and budget tracking. It also introduces AgentRunResult, a frozen Pydantic model wrapping ExecutionResult with engine-level metadata.

Changes:

  • AgentEngine + AgentRunResult implementation with 32 unit tests covering happy paths, error handling, budget checking, tool integration, and computed fields
  • 10 new observability event constants under EXECUTION_ENGINE_* namespace
  • Documentation updates in DESIGN_SPEC.md (§6.5 AgentEngine orchestrator) and CLAUDE.md

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/ai_company/engine/agent_engine.py Core orchestrator: validation, prompt building, context seeding, loop delegation, cost recording, error wrapping
src/ai_company/engine/run_result.py Frozen AgentRunResult model with computed fields delegating to ExecutionResult
src/ai_company/observability/events/execution.py 10 new EXECUTION_ENGINE_* event constants
src/ai_company/engine/__init__.py Adds AgentEngine and AgentRunResult to public API exports
tests/unit/engine/test_agent_engine.py 32 unit tests covering orchestration, error handling, budget, cost recording, immutability
DESIGN_SPEC.md AgentEngine pipeline description and AgentRunResult field documentation in §6.5
CLAUDE.md Updates engine/ package description

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

max_turns=max_turns,
start=start,
)
except MemoryError, RecursionError:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both except clauses use except MemoryError, RecursionError: which is invalid Python 3 syntax. In Python 3, this is a SyntaxError. The correct syntax for catching multiple exception types is except (MemoryError, RecursionError):. Without the parentheses, this code will fail to parse entirely and cannot be imported or executed.

The correct pattern, consistent with how it is used elsewhere in the codebase (e.g., src/ai_company/tools/invoker.py), is except (MemoryError, RecursionError):.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
timestamp=datetime.now(UTC),
)
await self._cost_tracker.record(record)
except MemoryError, RecursionError:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same invalid Python 3 syntax as at line 138: except MemoryError, RecursionError: is a SyntaxError in Python 3. It must be written as except (MemoryError, RecursionError): to correctly catch both MemoryError and RecursionError and re-raise them.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +34
EXECUTION_ENGINE_CREATED: Final[str] = "execution.engine.created"
EXECUTION_ENGINE_START: Final[str] = "execution.engine.start"
EXECUTION_ENGINE_PROMPT_BUILT: Final[str] = "execution.engine.prompt_built"
EXECUTION_ENGINE_COMPLETE: Final[str] = "execution.engine.complete"
EXECUTION_ENGINE_ERROR: Final[str] = "execution.engine.error"
EXECUTION_ENGINE_INVALID_INPUT: Final[str] = "execution.engine.invalid_input"
EXECUTION_ENGINE_TASK_TRANSITION: Final[str] = "execution.engine.task_transition"
EXECUTION_ENGINE_COST_RECORDED: Final[str] = "execution.engine.cost_recorded"
EXECUTION_ENGINE_COST_SKIPPED: Final[str] = "execution.engine.cost_skipped"
EXECUTION_ENGINE_COST_FAILED: Final[str] = "execution.engine.cost_failed"
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description states "4 new execution event constants for debug observability (EXECUTION_ENGINE_CREATED, EXECUTION_ENGINE_PROMPT_BUILT, EXECUTION_ENGINE_COST_SKIPPED, EXECUTION_ENGINE_COST_FAILED)", but the diff actually adds 10 new constants: EXECUTION_ENGINE_CREATED, EXECUTION_ENGINE_START, EXECUTION_ENGINE_PROMPT_BUILT, EXECUTION_ENGINE_COMPLETE, EXECUTION_ENGINE_ERROR, EXECUTION_ENGINE_INVALID_INPUT, EXECUTION_ENGINE_TASK_TRANSITION, EXECUTION_ENGINE_COST_RECORDED, EXECUTION_ENGINE_COST_SKIPPED, and EXECUTION_ENGINE_COST_FAILED. The PR description should be updated to reflect the actual number of new constants added.

Copilot uses AI. Check for mistakes.
parts.append("")
parts.append(f"**Budget limit:** ${task.budget_limit:.2f} USD")

if task.deadline:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deadline is appended without a blank-line separator before it, while budget_limit inserts an empty string (parts.append("")) before it for spacing. When a deadline exists but budget_limit is 0 (or negative), the deadline immediately follows the description or last acceptance criterion with no blank line, resulting in inconsistent markdown formatting compared to the budget block.

Suggested change
if task.deadline:
if task.deadline:
if task.budget_limit <= 0:
parts.append("")

Copilot uses AI. Check for mistakes.
task_id: Task identifier, if task-bound.
"""

model_config = ConfigDict(frozen=True)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codebase consistently includes a test_frozen test for every frozen Pydantic model (e.g., TurnRecord, ExecutionResult, SystemPrompt, AgentContext, TaskExecution, etc. all have dedicated test_frozen cases). The AgentRunResult model uses model_config = ConfigDict(frozen=True) but test_agent_engine.py does not include a corresponding test_frozen test. A test that attempts to reassign a field and expects a ValidationError should be added, consistent with the convention established throughout the engine test suite.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 2155-2156: Section 15.3's "planned-only" note conflicts with the
file tree because agent_engine.py is listed with a milestone tag (M3) but the PR
actually adds that file; update the doc so readers can tell which modules exist
now by either removing milestone tags from files that are already shipped
(remove "(M3)" from agent_engine.py and any other added files like run_result.py
if shipped) or rewrite the note to state that milestone tags indicate original
plan (not current presence) and explicitly mark planned-only files (e.g., add
"(planned)" or a separate subsection). Also update any cross-references
mentioned (CLAUDE.md, README.md) to reflect the corrected status of
agent_engine.py and related engine modules.

In `@src/ai_company/engine/agent_engine.py`:
- Around line 162-212: After successfully calling _prepare_context (which
returns ctx and system_prompt) modify _execute so any exceptions raised after
that point are caught and forwarded to _handle_fatal_error with the original ctx
and system_prompt instead of letting _handle_fatal_error rebuild a blank
AgentContext; specifically wrap the post-_prepare_context work (including
creation of budget_checker/tool_invoker, the await self._loop.execute call,
_record_costs, and construction of AgentRunResult) in a try/except that calls
_handle_fatal_error(exception, ctx=ctx, system_prompt=system_prompt, ...) to
preserve the seeded conversation, max_turns and ASSIGNED->IN_PROGRESS state when
returning the AgentRunResult.
- Around line 94-148: Validate the incoming max_turns at the start of the method
(before the try/except that wraps _execute) so invalid caller input (e.g.,
max_turns <= 0) is rejected at the boundary rather than being converted into a
fatal engine error; add a short check right after agent_id/task_id assignment
that raises a clear user-level exception (e.g., ValueError with a concise
message) when max_turns is not positive, keeping references to the existing
symbols _execute, _handle_fatal_error, and AgentContext.from_identity to locate
the related logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 961cac72-615d-4e70-84bd-02851daef231

📥 Commits

Reviewing files that changed from the base of the PR and between 8dfb3c0 and b033a75.

📒 Files selected for processing (7)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/run_result.py
  • src/ai_company/observability/events/execution.py
  • tests/unit/engine/test_agent_engine.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations
Use except A, B: (no parentheses) for exception handling—PEP 758 except syntax. Ruff enforces this on Python 3.14
All public functions require type hints. Enforce mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones. Use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement on non-Pydantic internal collections (registries, BaseTool)
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr from core.types for all identifier/name fields (including optional NotBlankStr | None and tuple tuple[NotBlankStr, ...] variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task
Maximum line length is 88 characters (enforced by ruff)
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

  • src/ai_company/engine/run_result.py
  • src/ai_company/observability/events/execution.py
  • src/ai_company/engine/__init__.py
  • tests/unit/engine/test_agent_engine.py
  • src/ai_company/engine/agent_engine.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have from ai_company.observability import get_logger then logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code
Logger variable name must always be logger (not _logger, not log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g. PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging with kwargs: logger.info(EVENT, key=value). Never use format strings like logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging
Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig

Files:

  • src/ai_company/engine/run_result.py
  • src/ai_company/observability/events/execution.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
src/ai_company/{engine,providers}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains

Files:

  • src/ai_company/engine/run_result.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
{src/ai_company,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/engine/run_result.py
  • src/ai_company/observability/events/execution.py
  • src/ai_company/engine/__init__.py
  • tests/unit/engine/test_agent_engine.py
  • src/ai_company/engine/agent_engine.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow for test categorization
Maintain 80% minimum code coverage (enforced in CI)

Files:

  • tests/unit/engine/test_agent_engine.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Use base class functionality and leverage reusable components when possible in agent implementations
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/ai_company/observability/events/execution.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/{engine,providers}/**/*.py : `RetryExhaustedError` signals that all retries failed—the engine layer catches this to trigger fallback chains

Applied to files:

  • src/ai_company/engine/__init__.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/unit/engine/test_agent_engine.py

Comment on lines +2155 to 2156
│ │ ├── run_result.py # AgentRunResult outcome model
│ │ ├── agent_engine.py # Agent execution engine (M3)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Section 15.3 now contradicts itself.

The note above this tree says milestone-tagged files are planned-only, but agent_engine.py is still tagged (M3) even though this PR adds it. Please either drop milestone tags from shipped files or rewrite the note so readers can tell which engine modules actually exist.

Based on learnings, "When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` around lines 2155 - 2156, Section 15.3's "planned-only" note
conflicts with the file tree because agent_engine.py is listed with a milestone
tag (M3) but the PR actually adds that file; update the doc so readers can tell
which modules exist now by either removing milestone tags from files that are
already shipped (remove "(M3)" from agent_engine.py and any other added files
like run_result.py if shipped) or rewrite the note to state that milestone tags
indicate original plan (not current presence) and explicitly mark planned-only
files (e.g., add "(planned)" or a separate subsection). Also update any
cross-references mentioned (CLAUDE.md, README.md) to reflect the corrected
status of agent_engine.py and related engine modules.

Comment on lines +94 to +148
max_turns: int = DEFAULT_MAX_TURNS,
) -> AgentRunResult:
"""Execute an agent on a task.

Args:
identity: Frozen agent identity card.
task: Task to execute (must be ASSIGNED or IN_PROGRESS).
completion_config: Optional per-run LLM config override.
max_turns: Maximum LLM turns allowed.

Returns:
``AgentRunResult`` with execution outcome and metadata.

Raises:
ExecutionStateError: If the agent is not ACTIVE or the task
is not ASSIGNED/IN_PROGRESS.
MemoryError: Re-raised unconditionally (non-recoverable).
RecursionError: Re-raised unconditionally (non-recoverable).
"""
agent_id = str(identity.id)
task_id = task.id

self._validate_agent(identity, agent_id)
self._validate_task(task, agent_id, task_id)

logger.info(
EXECUTION_ENGINE_START,
agent_id=agent_id,
task_id=task_id,
loop_type=self._loop.get_loop_type(),
max_turns=max_turns,
)

start = time.monotonic()
try:
return await self._execute(
identity=identity,
task=task,
agent_id=agent_id,
task_id=task_id,
completion_config=completion_config,
max_turns=max_turns,
start=start,
)
except MemoryError, RecursionError:
raise
except Exception as exc:
return self._handle_fatal_error(
exc=exc,
identity=identity,
task=task,
agent_id=agent_id,
task_id=task_id,
duration_seconds=time.monotonic() - start,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate max_turns before entering the generic error wrapper.

Line 94 exposes max_turns, but the first real validation happens later inside AgentContext.from_identity(). Today max_turns <= 0 gets caught by the broad except Exception path and comes back as TerminationReason.ERROR, which turns bad caller input into a runtime engine failure instead of rejecting it at the boundary.

Suggested fix
         agent_id = str(identity.id)
         task_id = task.id
 
+        if max_turns <= 0:
+            msg = f"max_turns must be greater than 0, got {max_turns}"
+            logger.warning(
+                EXECUTION_ENGINE_INVALID_INPUT,
+                agent_id=agent_id,
+                task_id=task_id,
+                reason=msg,
+            )
+            raise ValueError(msg)
+
         self._validate_agent(identity, agent_id)
         self._validate_task(task, agent_id, task_id)

As per coding guidelines, "Validate at system boundaries (user input, external APIs, config files)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 94 - 148, Validate the
incoming max_turns at the start of the method (before the try/except that wraps
_execute) so invalid caller input (e.g., max_turns <= 0) is rejected at the
boundary rather than being converted into a fatal engine error; add a short
check right after agent_id/task_id assignment that raises a clear user-level
exception (e.g., ValueError with a concise message) when max_turns is not
positive, keeping references to the existing symbols _execute,
_handle_fatal_error, and AgentContext.from_identity to locate the related logic.

Comment on lines +162 to +212
ctx, system_prompt = self._prepare_context(
identity=identity,
task=task,
agent_id=agent_id,
task_id=task_id,
max_turns=max_turns,
)
budget_checker = _make_budget_checker(task)
tool_invoker = self._make_tool_invoker()

logger.debug(
EXECUTION_ENGINE_PROMPT_BUILT,
agent_id=agent_id,
task_id=task_id,
estimated_tokens=system_prompt.estimated_tokens,
)

execution_result = await self._loop.execute(
context=ctx,
provider=self._provider,
tool_invoker=tool_invoker,
budget_checker=budget_checker,
completion_config=completion_config,
)
duration = time.monotonic() - start

await self._record_costs(
execution_result,
identity,
agent_id,
task_id,
)

result = AgentRunResult(
execution_result=execution_result,
system_prompt=system_prompt,
duration_seconds=duration,
agent_id=agent_id,
task_id=task_id,
)
logger.info(
EXECUTION_ENGINE_COMPLETE,
agent_id=agent_id,
task_id=task_id,
termination_reason=result.termination_reason.value,
total_turns=result.total_turns,
total_tokens=execution_result.context.accumulated_cost.total_tokens,
duration_seconds=duration,
cost_usd=result.total_cost_usd,
)
return result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve the prepared context/prompt on fatal errors.

Once _prepare_context() succeeds, any later exception still falls through to _handle_fatal_error(), which rebuilds a blank AgentContext and synthetic prompt at Lines 400-412. That drops the seeded conversation, the caller-supplied max_turns, and the ASSIGNED -> IN_PROGRESS transition, so the returned AgentRunResult can report misleading state after execution has already started.

Catch the post-setup failure path inside _execute() and thread the prepared ctx / system_prompt into _handle_fatal_error() when they already exist.

Also applies to: 381-419

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 162 - 212, After
successfully calling _prepare_context (which returns ctx and system_prompt)
modify _execute so any exceptions raised after that point are caught and
forwarded to _handle_fatal_error with the original ctx and system_prompt instead
of letting _handle_fatal_error rebuild a blank AgentContext; specifically wrap
the post-_prepare_context work (including creation of
budget_checker/tool_invoker, the await self._loop.execute call, _record_costs,
and construction of AgentRunResult) in a try/except that calls
_handle_fatal_error(exception, ctx=ctx, system_prompt=system_prompt, ...) to
preserve the seeded conversation, max_turns and ASSIGNED->IN_PROGRESS state when
returning the AgentRunResult.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR introduces AgentEngine, the top-level orchestrator that ties together prompt construction, execution context, the ReactLoop, tool invocation, and cost tracking into a single run() entry point, along with the AgentRunResult frozen Pydantic model and four new observability event constants. The implementation is well-designed, thoroughly tested (99%/100% coverage), with clear separation of concerns and explicit handling of non-recoverable errors.

Key finding:

  • Deadline formatting gap — In _format_task_instruction, the deadline block is missing the blank-line separator that the budget block includes. When both are present, they render on consecutive lines with no visual separation. A single-line fix adds the blank line before the deadline append, ensuring consistent Markdown formatting.

Confidence Score: 4/5

  • Safe to merge with a one-line fix to deadline formatting in _format_task_instruction.
  • The implementation is clean, well-structured, and comprehensively tested at 99%+ coverage. The only functional concern is the missing blank-line separator before the deadline in the task instruction formatter, which affects the quality of prompts sent to the LLM. This is a straightforward one-line fix. No security, data-loss, or runtime-crash risks identified.
  • src/ai_company/engine/agent_engine.py — specifically the deadline formatting in _format_task_instruction (lines 435–436).

Last reviewed commit: b033a75

Comment on lines +435 to +436
if task.deadline:
parts.append(f"**Deadline:** {task.deadline}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deadline section is missing the blank-line separator that the budget section includes. When both budget and deadline are present, they will render on consecutive lines with no visual separation. When only deadline is present, it butts directly against the preceding section.

Suggested change
if task.deadline:
parts.append(f"**Deadline:** {task.deadline}")
if task.deadline:
parts.append("")
parts.append(f"**Deadline:** {task.deadline}")

This ensures consistent Markdown formatting across all task instruction blocks.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 435-436

Comment:
The deadline section is missing the blank-line separator that the budget section includes. When both budget and deadline are present, they will render on consecutive lines with no visual separation. When only deadline is present, it butts directly against the preceding section.

```suggestion
    if task.deadline:
        parts.append("")
        parts.append(f"**Deadline:** {task.deadline}")
```

This ensures consistent Markdown formatting across all task instruction blocks.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement agent engine core with ExecutionLoop protocol integration (DESIGN_SPEC §3.1, §6.1, §6.5)

2 participants