Skip to content

feat: implement AgentEngine core orchestrator (#11)#143

Merged
Aureliolo merged 6 commits intomainfrom
feat/agent-engine-core
Mar 6, 2026
Merged

feat: implement AgentEngine core orchestrator (#11)#143
Aureliolo merged 6 commits intomainfrom
feat/agent-engine-core

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • AgentEngine (engine/agent_engine.py, 513 lines) — top-level orchestrator that composes prompt construction, execution context, execution loop, tool invocation, and cost tracking into a single run() entry point
  • AgentRunResult (engine/run_result.py, 84 lines) — frozen Pydantic model wrapping ExecutionResult with engine metadata and computed fields (termination_reason, total_turns, total_cost_usd, is_success)
  • 10 new event constants in observability/events/execution.py for structured logging coverage
  • DESIGN_SPEC.md updated with AgentEngine orchestrator section (§6.5) and project structure
  • CLAUDE.md engine description updated

Key design decisions

  • run() validates inputs (agent ACTIVE, task ASSIGNED/IN_PROGRESS, max_turns >= 1), builds system prompt, seeds conversation, delegates to ExecutionLoop, records costs, and returns structured result
  • MemoryError/RecursionError always propagate — all other exceptions are caught and returned as error results
  • Cost recording failures are logged but never downgrade a successful run
  • _handle_fatal_error has a defensive secondary-failure guard
  • memory_messages parameter provides the injection hook for M5 working memory
  • Budget checker uses a closure capturing the task's budget limit

Test coverage

  • 1970 tests pass, 95.36% coverage
  • 3 test files: test_agent_engine.py (orchestration), test_agent_engine_errors.py (error handling), test_run_result.py (model + helpers)
  • 1 integration test: full pipeline AgentEngine → ReactLoop → tool call → result
  • Covers: happy paths, error containment, immutability, validation boundaries, cost recording edge cases, memory message ordering, budget checker closure logic

Closes #11

Test plan

  • uv run ruff check src/ tests/ — all checks passed
  • uv run mypy src/ tests/ — no issues in 222 files
  • uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80 — 1970 passed, 95.36% coverage
  • Pre-reviewed by 8 agents (code-reviewer, python-reviewer, test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, docs-consistency), 14 findings addressed

🤖 Generated with Claude Code

Aureliolo and others added 4 commits March 6, 2026 20:51
Add AgentEngine as the top-level orchestrator that ties together prompt
construction, execution context, execution loop, tool invocation, and
budget tracking into a single run() entry point.

New files:
- engine/agent_engine.py: AgentEngine class with run() method
- engine/run_result.py: AgentRunResult frozen model with computed fields
- tests/unit/engine/test_agent_engine.py: 27 unit tests (14 classes)

Modified:
- engine/__init__.py: export AgentEngine, AgentRunResult
- observability/events/execution.py: 6 new engine event constants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 9 agents, 23 findings addressed:
- Critical: cost recording failure no longer destroys successful results
- Major: extracted _prepare_context, fixed duration in error path,
  removed duplicate DEFAULT_MAX_TURNS, added Raises docstring
- Added 5 new tests (zero-cost skip, cost-tracker failure, completion
  config forwarding, max_turns forwarding, deadline formatting)
- Updated DESIGN_SPEC.md §6.5 with AgentEngine + AgentRunResult docs
- Added 4 new execution event constants for debug observability

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add memory_messages parameter to engine.run() for working memory injection
- Add max_turns validation (>= 1) at engine boundary
- Extract _log_completion helper to keep _execute under 50 lines
- Defend _handle_fatal_error against secondary failures
- Add cost context to _record_costs exception log
- Add DEBUG log when cost_tracker is None
- Fix deadline blank-line separator in _format_task_instruction
- Fix task_id field description inconsistency in AgentRunResult
- Update engine __init__.py docstring
- Update DESIGN_SPEC.md engine section to match implementation
- Split test_agent_engine.py (829 lines) into 3 focused files:
  - test_agent_engine.py (core orchestration tests)
  - test_agent_engine_errors.py (error handling + edge cases)
  - test_run_result.py (AgentRunResult + helpers)
- Add integration test for full tool-call pipeline
- Fix mypy errors: hiring_date, FinishReason.TOOL_USE, parameters_schema,
  AgentContext.from_identity(), deadline as ISO string

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 8 agents, 14 findings addressed:

- Add MemoryError/RecursionError guard in build_system_prompt (prompt.py)
- Add MemoryError/RecursionError guard in CostTracker._resolve_department
- Bind exception and add error kwarg in _record_costs exception log
- Add clarifying comment for zero-cost AND guard condition
- Add comment explaining type: ignore[prop-decorator] in run_result.py
- Update DESIGN_SPEC.md: add memory_messages to run() signature and
  pipeline step 4
- Extract _make_completion_response to conftest (remove duplication)
- Replace private _loop attribute access with behavioral assertion
- Add RecursionError test for _record_costs propagation
- Add MemoryError test for _handle_fatal_error build path
- Add BLOCKED task status rejection test
- Add max_turns=1 boundary test
- Consolidate cost-recording error test classes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 20:59
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1546c4c9-fecb-4849-b03a-e43128b8fa44

📥 Commits

Reviewing files that changed from the base of the PR and between 0fb8035 and 528ca2c.

📒 Files selected for processing (3)
  • .claude/skills/aurelio-review-pr/SKILL.md
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/prompt.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Introduces AgentEngine and AgentRunResult for unified agent execution with duration, cost, termination reason, and success metadata.
    • Engine surface expanded for task routing, workflows, meetings, and HR coordination.
    • Structured execution-engine observability events added.
  • Bug Fixes

    • MemoryError/RecursionError now propagate rather than being swallowed.
    • Improved prompt-build and cost-recording error handling.
  • Tests

    • Extensive unit and integration tests covering execution flows, tools, budgeting, costs, and error cases.
  • Documentation

    • Engine description updated to reflect orchestration responsibilities.

Walkthrough

Adds a new AgentEngine orchestrator and AgentRunResult model, expands engine package exports and observability events, updates prompt/budget error handling to re-raise MemoryError/RecursionError, and adds comprehensive unit and integration tests for orchestration, tooling, costs, and error paths.

Changes

Cohort / File(s) Summary
Core Engine & Run Result
src/ai_company/engine/agent_engine.py, src/ai_company/engine/run_result.py
Adds AgentEngine orchestrator with async run() pipeline (validation, context/prompt build, execution loop delegation, cost recording) and immutable AgentRunResult model with computed fields.
Engine API Exports
src/ai_company/engine/__init__.py
Exports AgentEngine and AgentRunResult and updates module docstring to reflect new orchestration surface.
Prompt & Budget error handling
src/ai_company/engine/prompt.py, src/ai_company/budget/tracker.py
Modify error handling to re-raise MemoryError and RecursionError during system-prompt building and budget department resolution so they propagate instead of being swallowed.
Observability / Events
src/ai_company/observability/events/execution.py
Adds execution engine event constants (created, start, prompt_built, complete, error, invalid_input, task_transition, cost_recorded, cost_skipped, cost_failed).
Design & Docs
DESIGN_SPEC.md, CLAUDE.md
DESIGN_SPEC adds AgentEngine orchestration details and new engine modules; CLAUDE.md package description updated to "Agent orchestration, execution loops, and task lifecycle".
Tests — unit & integration
tests/unit/engine/test_agent_engine.py, tests/unit/engine/test_agent_engine_errors.py, tests/unit/engine/test_run_result.py, tests/unit/engine/conftest.py, tests/integration/engine/test_agent_engine_integration.py
Adds extensive unit tests covering happy paths, error propagation (including MemoryError/RecursionError), budget and cost-tracking behaviors, tool integration, and an integration test exercising a tool call loop with a mock provider and real ToolRegistry.
Engine submodules (placeholders)
src/ai_company/engine/task_engine.py, src/ai_company/engine/workflow_engine.py, src/ai_company/engine/meeting_engine.py, src/ai_company/engine/hr_engine.py
Adds new public engine modules/stubs for task routing, workflow, meeting coordination, and HR-related functionality referenced by design spec.
Misc — skill doc
.claude/skills/aurelio-review-pr/SKILL.md
Updates review-skill behavior: unfiltered reviewer fetching, expanded extraction rules, and new bot-handling protocol.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AgentEngine
    participant ExecutionLoop
    participant Provider
    participant ToolRegistry
    participant CostTracker

    Client->>AgentEngine: run(identity, task, max_turns, ...)
    AgentEngine->>AgentEngine: validate inputs & prepare context
    AgentEngine->>Provider: build/system prompt (via prompt builder)
    AgentEngine->>ToolRegistry: fetch tool definitions
    AgentEngine->>ExecutionLoop: start loop(context, tools, budget_checker)
    loop up to max_turns
        ExecutionLoop->>Provider: complete(messages, tools)
        Provider-->>ExecutionLoop: CompletionResponse
        alt tool call returned
            ExecutionLoop->>ToolRegistry: invoke(tool_call)
            ToolRegistry-->>ExecutionLoop: ToolExecutionResult
            ExecutionLoop->>ExecutionLoop: integrate tool result
        end
    end
    AgentEngine->>CostTracker: record_costs(execution_result)
    AgentEngine-->>Client: return AgentRunResult(execution_result, metadata)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.61% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: implementing AgentEngine, the core orchestrator. It is concise, specific, and directly related to the primary deliverable.
Description check ✅ Passed The description clearly covers the main changes (AgentEngine, AgentRunResult, events, documentation updates), design decisions, test coverage, and validates the PR against objectives. It is directly related to the changeset.
Linked Issues check ✅ Passed All coding requirements from issue #11 are met: core orchestration with agent config loading and prompt construction, async execution with max iteration limits, ExecutionLoop protocol delegation, tool calling support, error handling for MemoryError/RecursionError propagation, response metadata (tokens, cost, duration, iterations), and comprehensive test coverage (>95% coverage with unit and integration tests).
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #11 requirements. Exception handling additions in tracker.py and prompt.py (MemoryError/RecursionError propagation) support the engine's error strategy. CLAUDE.md and observability updates document the new components. No unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/agent-engine-core

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces the core AgentEngine orchestrator, a central component designed to manage the entire lifecycle of an agent's task execution. It streamlines the process from initial prompt generation and context setup to executing the agent's logic, invoking tools, tracking costs, and providing a comprehensive AgentRunResult. This foundational change significantly enhances the framework's ability to reliably run agents, offering improved error handling, observability, and a clear, structured output for each agent run.

Highlights

  • AgentEngine Implementation: Implemented AgentEngine as the top-level orchestrator for agent execution, handling prompt construction, execution context, loops, tool invocation, and cost tracking.
  • AgentRunResult Model: Introduced AgentRunResult, a Pydantic model to encapsulate execution outcomes with engine-level metadata such as termination reason, total turns, total cost, and success status.
  • Enhanced Observability: Added 10 new structured logging event constants for execution.engine.* to provide comprehensive coverage of engine activities.
  • Documentation Updates: Updated documentation in DESIGN_SPEC.md and CLAUDE.md to reflect the new AgentEngine orchestrator and updated project structure.
  • Robust Error Handling: Ensured robust error handling by unconditionally propagating MemoryError and RecursionError, while catching other exceptions and returning them as structured error results.
Changelog
  • CLAUDE.md
    • Updated the description for the engine/ directory.
  • DESIGN_SPEC.md
    • Added a new section detailing the AgentEngine orchestrator.
    • Updated the project structure to include new engine components.
  • src/ai_company/budget/tracker.py
    • Modified exception handling in _resolve_department to re-raise MemoryError and RecursionError.
  • src/ai_company/engine/init.py
    • Updated the module docstring to reflect new exports.
    • Exposed AgentEngine and AgentRunResult in the public API.
  • src/ai_company/engine/agent_engine.py
    • Added the AgentEngine class, which orchestrates agent execution.
  • src/ai_company/engine/prompt.py
    • Added exception handling for MemoryError and RecursionError in build_system_prompt.
  • src/ai_company/engine/run_result.py
    • Added the AgentRunResult Pydantic model for structured execution outcomes.
  • src/ai_company/observability/events/execution.py
    • Added new constants for EXECUTION_ENGINE related logging events.
  • tests/integration/engine/test_agent_engine_integration.py
    • Added an integration test to validate the full AgentEngine pipeline with tool calls.
  • tests/unit/engine/conftest.py
    • Added a helper function make_completion_response.
    • Imported FinishReason for use in tests.
  • tests/unit/engine/test_agent_engine.py
    • Added comprehensive unit tests for the AgentEngine class.
  • tests/unit/engine/test_agent_engine_errors.py
    • Added unit tests specifically for error handling and edge cases within AgentEngine.
  • tests/unit/engine/test_run_result.py
    • Added unit tests for the AgentRunResult model and associated helper functions.
Activity
  • The pull request was pre-reviewed by 8 agents (code-reviewer, python-reviewer, test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, docs-consistency), and 14 findings were addressed.
  • All ruff check and mypy checks passed.
  • pytest reported 1970 tests passed with 95.36% coverage, meeting the 80% coverage requirement.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the AgentEngine orchestrator, a major and well-structured core component with comprehensive test coverage. However, several security vulnerabilities were identified, including potential prompt injection where user-controlled data is directly concatenated into LLM prompts, information exposure through overly verbose error messages in the fatal error handler, and a systematic syntax error in exception handling that will lead to a denial of service in Python 3 environments. Additionally, a potential bug in the cost recording logic and a minor efficiency improvement were noted. Addressing these issues is critical for the robustness, correctness, and security of the new engine.

Comment on lines +400 to +401
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except A, B: syntax is from Python 2 and is invalid in Python 3, where it will raise a SyntaxError. The correct syntax for catching multiple exceptions is to use a tuple: except (MemoryError, RecursionError):.

While CLAUDE.md mentions using except A, B: and attributes it to PEP 758, this seems to be a misunderstanding. PEP 758 introduces the except* syntax for handling ExceptionGroups and does not change the standard except syntax. This code will fail to parse in a Python 3.14 environment.

        except (MemoryError, RecursionError):
            raise

Comment on lines +465 to +466
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except A, B: syntax is from Python 2 and is invalid in Python 3, where it will raise a SyntaxError. The correct syntax for catching multiple exceptions is to use a tuple: except (MemoryError, RecursionError):.

While CLAUDE.md mentions using except A, B: and attributes it to PEP 758, this seems to be a misunderstanding. PEP 758 introduces the except* syntax for handling ExceptionGroups and does not change the standard except syntax. This code will fail to parse in a Python 3.14 environment.

Suggested change
except MemoryError, RecursionError:
raise
except (MemoryError, RecursionError):
raise

Comment on lines +248 to +249
for msg in memory_messages:
ctx = ctx.with_message(msg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The _prepare_context method appends memory_messages to the conversation context without validating their roles. This creates a security vulnerability where an attacker could inject messages with the system role from untrusted input, overriding the agent's core instructions. It is critical to validate that memory_messages only contain allowed roles (e.g., user, assistant) before adding them to the context. Additionally, this loop creates a new AgentContext instance for each message, which can be inefficient for a large number of messages due to repeated tuple creation and object copying. Consider adding a with_messages(self, msgs: tuple[ChatMessage, ...]) method to AgentContext for improved efficiency.

Comment on lines +478 to +495
def _format_task_instruction(task: Task) -> str:
"""Format a task into a user message for the initial conversation."""
parts = [f"# Task: {task.title}", "", task.description]

if task.acceptance_criteria:
parts.append("")
parts.append("## Acceptance Criteria")
parts.extend(f"- {c.description}" for c in task.acceptance_criteria)

if task.budget_limit > 0:
parts.append("")
parts.append(f"**Budget limit:** ${task.budget_limit:.2f} USD")

if task.deadline:
parts.append("")
parts.append(f"**Deadline:** {task.deadline}")

return "\n".join(parts)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The _format_task_instruction function constructs a user message by directly concatenating task fields (title, description, acceptance_criteria) without any sanitization or escaping. If these fields contain user-supplied data, an attacker can perform prompt injection to manipulate the LLM's behavior. Consider sanitizing these inputs or using a structured format (like XML or JSON) with clear delimiters to help the LLM distinguish between instructions and data.

usage = result.context.accumulated_cost
# Skip only when provably nothing happened (both zero); a run with
# tokens but zero cost (e.g., a test provider) is still recorded.
if usage.cost_usd <= 0.0 and usage.input_tokens == 0:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The condition to skip cost recording only checks for usage.input_tokens == 0. This could lead to incorrectly skipping records where input_tokens is 0 but output_tokens is greater than 0.

The accompanying comment says, 'a run with tokens but zero cost... is still recorded,' which this logic contradicts. To fix this, the condition should check that both input and output tokens are zero.

        if usage.cost_usd <= 0.0 and usage.input_tokens == 0 and usage.output_tokens == 0:

return None
try:
return self._department_resolver(agent_id)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The exception handling syntax except MemoryError, RecursionError: is invalid in Python 3 and will result in a SyntaxError, causing the application to crash and leading to a denial of service. In Python 3, multiple exceptions must be caught using a parenthesized tuple. The mention of except A, B: in CLAUDE.md and its attribution to PEP 758 is a misunderstanding, as PEP 758 introduces except* for ExceptionGroups, not a change to standard except syntax.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

memory_messages=memory_messages,
start=start,
)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The exception handling syntax except MemoryError, RecursionError: is invalid in Python 3 and will result in a SyntaxError, causing the application to crash and leading to a denial of service. In Python 3, multiple exceptions must be caught using a parenthesized tuple. The mention of except A, B: in CLAUDE.md and its attribution to PEP 758 is a misunderstanding, as PEP 758 introduces except* for ExceptionGroups, not a change to standard except syntax.

        except (MemoryError, RecursionError):

)
except PromptBuildError:
raise # Already logged by inner functions.
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The exception handling syntax except MemoryError, RecursionError: is invalid in Python 3 and will result in a SyntaxError, causing the application to crash and leading to a denial of service. In Python 3, multiple exceptions must be caught using a parenthesized tuple. The mention of except A, B: in CLAUDE.md and its attribution to PEP 758 is a misunderstanding, as PEP 758 introduces except* for ExceptionGroups, not a change to standard except syntax.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Comment on lines +436 to +449
error_msg = f"{type(exc).__name__}: {exc}"
logger.exception(
EXECUTION_ENGINE_ERROR,
agent_id=agent_id,
task_id=task_id,
error=error_msg,
)

try:
ctx = AgentContext.from_identity(identity, task=task)
error_execution = ExecutionResult(
context=ctx,
termination_reason=TerminationReason.ERROR,
error_message=error_msg,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The _handle_fatal_error method includes the raw exception message in the AgentRunResult, which may be exposed to end-users. This can leak sensitive internal information such as file paths, database details, or system configurations. Use a generic error message for the public-facing result and log the detailed exception internally for debugging.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR introduces AgentEngine — the top-level orchestrator that wires together prompt construction, execution context, ExecutionLoop delegation, cost recording, and structured error handling into a single run() entry point — along with the accompanying AgentRunResult frozen Pydantic model, 10 new observability event constants, and comprehensive test coverage (1970 tests, 95.36% coverage).

Key observations:

  • Architecture is clean and well-layered: _prepare_context_execute_record_costs separation is clear, immutability via copy-on-modify is consistently applied, and the _handle_fatal_error secondary-failure guard is a thoughtful defensive pattern.
  • EXECUTION_ENGINE_PROMPT_BUILT fires in the wrong place: The event fires inside _execute() after budget-checker and tool-invoker setup, not in _prepare_context() where the prompt is actually built — misleading for trace-based latency analysis.
  • Cost record timestamps lose per-turn timing: All CostRecord objects for a run are stamped with datetime.now(UTC) at recording time (post-execution), so per-turn execution timestamps are unrecoverable from cost data alone.
  • Total budget limit shown in LLM task instruction: _format_task_instruction appends task.budget_limit (the configured total) to the user message. On resumed runs (which the IN_PROGRESS acceptance path allows), the LLM sees the full budget figure without deduction for prior spend, and the BudgetChecker closure similarly only tracks cost within the current run() call. This should be documented as a known M3 limitation if cross-run budget accounting is deferred.

Confidence Score: 4/5

  • Safe to merge with minor observability/design refinements; no correctness-breaking bugs in the core orchestration logic.
  • The orchestrator logic is sound, error containment is well-tested, and immutability is respected throughout. The three flagged items are observability/design concerns (event placement, cost timestamp granularity, budget-in-prompt on resumed runs) rather than correctness defects. Test coverage is comprehensive (1970 tests, 95.36%), and the separation of concerns is clean. All exceptions are properly handled, with MemoryError/RecursionError propagating as documented and all others caught and returned as structured error results.
  • src/ai_company/engine/agent_engine.py — the three observability/design comments target this file (event placement, cost timestamps, budget limit). All are improvements rather than blockers.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant AgentEngine
    participant _prepare_context
    participant ExecutionLoop
    participant _record_costs
    participant CostTracker

    Caller->>AgentEngine: run(identity, task, ...)
    AgentEngine->>AgentEngine: validate inputs (agent ACTIVE, task ASSIGNED/IN_PROGRESS, max_turns ≥ 1)
    AgentEngine->>_prepare_context: build_system_prompt + seed conversation + transition task
    _prepare_context-->>AgentEngine: (AgentContext, SystemPrompt)
    AgentEngine->>ExecutionLoop: execute(context, provider, tool_invoker, budget_checker, config)
    ExecutionLoop-->>AgentEngine: ExecutionResult
    AgentEngine->>_record_costs: per-turn CostRecords
    alt CostTracker configured
        _record_costs->>CostTracker: record(CostRecord) × N turns
        CostTracker-->>_record_costs: ok / Exception (logged, swallowed)
    else No CostTracker
        _record_costs-->>AgentEngine: skip (logged)
    end
    AgentEngine-->>Caller: AgentRunResult(execution_result, system_prompt, duration, ...)

    note over AgentEngine: MemoryError / RecursionError → re-raised unconditionally
    note over AgentEngine: All other exceptions → _handle_fatal_error → error AgentRunResult
Loading

Last reviewed commit: 528ca2c

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the AgentEngine core orchestrator (issue #11), which serves as the top-level entry point for running an agent on a task. It composes prompt construction, execution context management, execution loop delegation, tool invocation, and cost tracking into a single run() method, returning a structured AgentRunResult.

Changes:

  • AgentEngine orchestrator (agent_engine.py, 513 lines) with full pipeline: input validation → prompt building → context seeding → task transition → loop delegation → cost recording → result wrapping, plus _format_task_instruction and _make_budget_checker helpers
  • AgentRunResult model (run_result.py, 84 lines) — frozen Pydantic model with computed fields (termination_reason, total_turns, total_cost_usd, is_success) delegating to the inner ExecutionResult
  • 10 new execution.engine.* event constants, extensive test coverage across 3 unit test files + 1 integration test, and documentation updates to DESIGN_SPEC.md and CLAUDE.md

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/ai_company/engine/agent_engine.py New top-level orchestrator with run(), validation, context setup, cost recording, and error handling
src/ai_company/engine/run_result.py New frozen Pydantic model wrapping ExecutionResult with engine metadata and computed fields
src/ai_company/observability/events/execution.py 10 new structured logging event constants for the engine namespace
src/ai_company/engine/__init__.py Re-exports AgentEngine and AgentRunResult in public API
src/ai_company/engine/prompt.py Added except MemoryError, RecursionError clause (has a bug)
src/ai_company/budget/tracker.py Added except MemoryError, RecursionError clause (has a bug)
tests/unit/engine/conftest.py New make_completion_response factory helper for engine tests
tests/unit/engine/test_agent_engine.py 692 lines: happy paths, task transition, validation, budget, cost, tools, immutability
tests/unit/engine/test_agent_engine_errors.py 352 lines: error handling, non-recoverable propagation, fatal error paths
tests/unit/engine/test_run_result.py 426 lines: frozen model, computed fields, validation, helpers
tests/integration/engine/test_agent_engine_integration.py Full pipeline integration test with tool calls
DESIGN_SPEC.md New §6.5 AgentEngine Orchestrator section and project structure update
CLAUDE.md Updated engine directory description

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)
except PromptBuildError:
raise # Already logged by inner functions.
except MemoryError, RecursionError:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError — it does NOT catch both exception types. The correct syntax is except (MemoryError, RecursionError): with parentheses. See the correct pattern in src/ai_company/tools/invoker.py:168.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
return None
try:
return self._department_resolver(agent_id)
except MemoryError, RecursionError:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError — it does NOT catch both exception types. The correct syntax is except (MemoryError, RecursionError): with parentheses. See the correct pattern in src/ai_company/tools/invoker.py:168.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
Comment on lines +130 to +131
@pytest.mark.integration
class TestAgentEngineToolCallIntegration:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This integration test file is missing the module-level pytestmark convention used by all other integration test files in the repository. All other integration test files (e.g., tests/integration/providers/test_provider_pipeline.py:28, tests/integration/providers/test_error_scenarios.py:44, tests/integration/observability/test_sink_routing_integration.py:15) use pytestmark = [pytest.mark.integration, pytest.mark.timeout(30)] at module level. Without pytest.mark.timeout(30), this test has no timeout guard and could hang indefinitely. Consider adding pytestmark = [pytest.mark.integration, pytest.mark.timeout(30)] and removing the class-level @pytest.mark.integration decorator.

Copilot uses AI. Check for mistakes.
memory_messages=memory_messages,
start=start,
)
except MemoryError, RecursionError:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError, shadowing the builtin. It does NOT catch both exception types. The correct Python 3 syntax to catch multiple exceptions is except (MemoryError, RecursionError): with parentheses (tuple form). The codebase already uses the correct form in src/ai_company/tools/invoker.py (lines 168, 256, 282, 352).

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
timestamp=datetime.now(UTC),
)
await self._cost_tracker.record(record)
except MemoryError, RecursionError:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError, shadowing the builtin — it does NOT catch both exception types. The correct syntax is except (MemoryError, RecursionError): with parentheses. See the correct pattern in src/ai_company/tools/invoker.py:168.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
agent_id=agent_id,
task_id=task_id,
)
except MemoryError, RecursionError:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError — it does NOT catch both exception types. The correct syntax is except (MemoryError, RecursionError): with parentheses. See the correct pattern in src/ai_company/tools/invoker.py:168.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Line 918: Update the public spec to mark run as asynchronous: change the
signature `run(identity, task, completion_config?, max_turns?, memory_messages?)
-> AgentRunResult` to an async form (e.g., `async run(...) ->
Promise<AgentRunResult>`), and update any mentions of `AgentEngine.run` to
reflect that callers must await the result; ensure the return type and
description in the spec indicate a Promise of AgentRunResult and note async
behavior for callers.

In `@src/ai_company/engine/agent_engine.py`:
- Around line 156-157: The except blocks in agent_engine.py that catch
MemoryError/RecursionError should log a WARNING/ERROR with context and the
exception before re-raising (e.g., replace the bare re-raise in the except
(MemoryError, RecursionError) handlers with a call to the engine logger such as
logger.error or logger.exception including contextual identifiers like agent id,
task id or function name, and exc_info=True), then re-raise the original
exception; update the same pattern for the other occurrences noted (around the
handlers at the lines referenced) to ensure all fatal error paths emit
structured logs before propagating.
- Around line 376-385: The cost-skip guard in the agent engine incorrectly
treats a run with output tokens but zero input tokens and zero cost as "nothing
happened"; update the condition in the block handling
result.context.accumulated_cost (variable usage) so it only skips when
usage.cost_usd <= 0.0 AND usage.input_tokens == 0 AND usage.output_tokens == 0;
leave the existing logger.debug call (EXECUTION_ENGINE_COST_SKIPPED with
agent_id and task_id) intact so runs that produced output_tokens are still
recorded.
- Around line 181-208: The exception path after calling _prepare_context
discards the prepared AgentContext and system_prompt; modify agent_engine.py so
any error that occurs after _prepare_context (including during
_make_tool_invoker, _loop.execute, and _record_costs) preserves and returns or
logs the prepared context and prompt metadata instead of rebuilding from Task:
capture ctx and system_prompt immediately, wrap the subsequent work
(tool_invoker creation, _loop.execute, and _record_costs) in a
try/except/finally that on failure records the ASSIGNED->IN_PROGRESS transition
and includes ctx/system_prompt in the error result or error log, and apply the
same fix to the similar block referenced at lines 444-464 to ensure
telemetry/recovery sees the run as started.

In `@tests/unit/engine/test_agent_engine.py`:
- Around line 440-468: Add a regression test alongside
test_zero_cost_not_recorded that covers the case cost_usd == 0.0 but tokens > 0:
use CostTracker, create a Task and a response via
_make_completion_response(cost_usd=0.0, input_tokens=5, output_tokens=2), pass a
mock provider to AgentEngine(provider=..., cost_tracker=tracker), run
engine.run(...), then assert tracker.get_record_count() == 1 and optionally
verify the recorded entry's token fields and cost are persisted; reference test
function test_zero_cost_not_recorded, helper _make_completion_response, class
CostTracker, and AgentEngine/_record_costs to locate where to add this new test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 59e887fc-e7fd-43e8-bc42-00335982086b

📥 Commits

Reviewing files that changed from the base of the PR and between 8dfb3c0 and 79cb49b.

📒 Files selected for processing (14)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • src/ai_company/budget/tracker.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/run_result.py
  • src/ai_company/observability/events/execution.py
  • tests/integration/engine/__init__.py
  • tests/integration/engine/test_agent_engine_integration.py
  • tests/unit/engine/conftest.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_agent_engine_errors.py
  • tests/unit/engine/test_run_result.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations
Use except A, B: (no parentheses) for exception handling—PEP 758 except syntax. Ruff enforces this on Python 3.14
All public functions require type hints. Enforce mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones. Use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement on non-Pydantic internal collections (registries, BaseTool)
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr from core.types for all identifier/name fields (including optional NotBlankStr | None and tuple tuple[NotBlankStr, ...] variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task
Maximum line length is 88 characters (enforced by ruff)
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

  • tests/unit/engine/test_run_result.py
  • tests/unit/engine/test_agent_engine_errors.py
  • src/ai_company/engine/run_result.py
  • src/ai_company/observability/events/execution.py
  • tests/unit/engine/conftest.py
  • src/ai_company/budget/tracker.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
  • tests/integration/engine/test_agent_engine_integration.py
  • src/ai_company/engine/prompt.py
  • tests/unit/engine/test_agent_engine.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow for test categorization
Maintain 80% minimum code coverage (enforced in CI)

Files:

  • tests/unit/engine/test_run_result.py
  • tests/unit/engine/test_agent_engine_errors.py
  • tests/unit/engine/conftest.py
  • tests/integration/engine/test_agent_engine_integration.py
  • tests/unit/engine/test_agent_engine.py
{src/ai_company,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.

Files:

  • tests/unit/engine/test_run_result.py
  • tests/unit/engine/test_agent_engine_errors.py
  • src/ai_company/engine/run_result.py
  • src/ai_company/observability/events/execution.py
  • tests/unit/engine/conftest.py
  • src/ai_company/budget/tracker.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
  • tests/integration/engine/test_agent_engine_integration.py
  • src/ai_company/engine/prompt.py
  • tests/unit/engine/test_agent_engine.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have from ai_company.observability import get_logger then logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code
Logger variable name must always be logger (not _logger, not log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g. PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging with kwargs: logger.info(EVENT, key=value). Never use format strings like logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging
Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig

Files:

  • src/ai_company/engine/run_result.py
  • src/ai_company/observability/events/execution.py
  • src/ai_company/budget/tracker.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/prompt.py
src/ai_company/{engine,providers}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains

Files:

  • src/ai_company/engine/run_result.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/prompt.py
🧠 Learnings (11)
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/unit/engine/test_run_result.py
  • tests/unit/engine/test_agent_engine_errors.py
  • tests/integration/engine/test_agent_engine_integration.py
  • tests/unit/engine/test_agent_engine.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • tests/unit/engine/test_agent_engine_errors.py
  • DESIGN_SPEC.md
  • src/ai_company/engine/prompt.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/ai_company/observability/events/execution.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/integration/test_*.py : Place integration tests for component interactions in `tests/integration/` directory

Applied to files:

  • tests/integration/engine/test_agent_engine_integration.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to **/*.py : Use `except A, B:` (no parentheses) for exception handling—PEP 758 except syntax. Ruff enforces this on Python 3.14

Applied to files:

  • src/ai_company/engine/prompt.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions

Applied to files:

  • src/ai_company/engine/prompt.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : All new code must have corresponding unit tests. When modifying existing code, update related tests. Tests should cover both happy paths and edge cases.

Applied to files:

  • tests/unit/engine/test_agent_engine.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : Core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`) must maintain 100% test coverage

Applied to files:

  • tests/unit/engine/test_agent_engine.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : 100% test coverage is MANDATORY for every commit on core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`), CI enforces this coverage requirement

Applied to files:

  • tests/unit/engine/test_agent_engine.py
🔇 Additional comments (11)
CLAUDE.md (1)

51-51: Good doc sync for the expanded engine scope.

This update matches the PR’s shift from a narrow execution engine to a broader orchestration layer with execution-loop responsibility.

Based on learnings, architecture/service changes should be reflected in existing documentation such as CLAUDE.md.

src/ai_company/budget/tracker.py (1)

304-305: LGTM!

The exception handling correctly re-raises MemoryError and RecursionError before the generic Exception handler, ensuring these non-recoverable errors propagate to callers. This aligns with the error propagation strategy across the engine layer.

src/ai_company/engine/run_result.py (1)

1-84: LGTM!

The AgentRunResult model is well-designed:

  • Frozen Pydantic model with proper field validation
  • Computed fields correctly delegate to the nested ExecutionResult
  • NotBlankStr used for identifier fields as per guidelines
  • Clear documentation for the type: ignore comments explaining mypy limitation
src/ai_company/observability/events/execution.py (1)

25-34: LGTM!

The new engine-level event constants follow the established naming conventions and provide comprehensive observability coverage for the AgentEngine lifecycle, including cost recording outcomes.

src/ai_company/engine/prompt.py (1)

220-221: LGTM!

The exception handling correctly ensures MemoryError and RecursionError propagate unconditionally, preventing these non-recoverable errors from being wrapped in PromptBuildError. The handler ordering (specific → fatal → generic) is appropriate.

src/ai_company/engine/__init__.py (1)

3-5: LGTM!

The public API exports are correctly expanded with AgentEngine and AgentRunResult. The __all__ list maintains alphabetical ordering, and the docstring accurately reflects the broader API surface.

Also applies to: 8-8, 36-36, 45-46

tests/unit/engine/conftest.py (1)

241-259: LGTM!

The make_completion_response helper is well-designed with sensible defaults and correctly constructs a valid CompletionResponse. It uses the generic "test-model-001" identifier as required by coding guidelines.

tests/unit/engine/test_agent_engine_errors.py (1)

1-352: LGTM!

Comprehensive error handling test coverage that validates:

  • Provider errors return error results (not crashes)
  • MemoryError/RecursionError propagate unconditionally
  • max_turns boundary validation
  • Cost recording fatal error propagation
  • Error result structure and fatal error recovery paths
  • Memory message ordering in conversation context

The tests are well-organized into focused test classes with proper markers.

tests/unit/engine/test_run_result.py (1)

1-426: LGTM!

Thorough test coverage for the AgentRunResult model including:

  • Frozen/immutable behavior verification
  • Computed field delegation for all termination reasons
  • Field validation constraints (negative duration, blank agent_id, optional task_id)
  • _format_task_instruction formatting variations
  • _make_budget_checker closure logic with boundary conditions

The test helpers (_test_identity, _make_run_result) are well-designed for focused, readable tests.

tests/integration/engine/test_agent_engine_integration.py (1)

130-207: Nice end-to-end regression coverage.

This exercises the real AgentEngine -> ReactLoop -> ToolRegistry path instead of a mocked loop, which makes it a strong guard against wiring regressions.

src/ai_company/engine/agent_engine.py (1)

467-475: The current implementation correctly preserves the original traceback.

raise exc from build_exc properly re-raises the original exception while documenting that a secondary failure (build_exc) occurred during error handling. The original exception object retains its __traceback__ from the point of initial failure. The proposed fix adds explicit traceback handling but produces no improvement—testing confirms both approaches yield identical traceback preservation. No change needed.

…Copilot, CodeRabbit

- DESIGN_SPEC.md: update stale M2 note to M3, mark run() as async
- agent_engine: log MemoryError/RecursionError before re-raising (3 sites)
- agent_engine: split _record_costs into _record_costs + _submit_cost (<50 lines)
- agent_engine: cost skip now checks output_tokens == 0, fix comment
- agent_engine: separate CostRecord construction from storage try/except
- agent_engine: fix inverted exception chain (raise exc from None)
- agent_engine: error-path SystemPrompt.metadata now has all 5 keys
- agent_engine: improve _EXECUTABLE_STATUSES docstring
- prompt.py: fix module docstring example to show realistic usage
- integration test: add pytestmark with timeout(30)
- unit test: add free-provider regression test (cost=0, tokens>0)
- unit test: fix Any -> Task type annotations in test_run_result.py
- skill: fix Phase 4 to fetch ALL reviewers unfiltered

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
src/ai_company/engine/prompt.py (1)

220-221: ⚠️ Potential issue | 🟠 Major

Log fatal prompt-build failures before re-raising.

This branch bypasses any PROMPT_* error event, so MemoryError/RecursionError from prompt construction lose prompt-layer context unless an outer caller happens to log them.

Suggested fix
-    except MemoryError, RecursionError:
-        raise
+    except MemoryError, RecursionError:
+        logger.error(
+            PROMPT_BUILD_ERROR,
+            agent_id=str(agent.id),
+            agent_name=agent.name,
+            error="non-recoverable error building prompt",
+            exc_info=True,
+        )
+        raise

As per coding guidelines, "All error paths must log at WARNING or ERROR with context before raising."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/prompt.py` around lines 220 - 221, The except branch
that currently reads "except MemoryError, RecursionError: raise" should log the
failure (including exception info and prompt context) at WARNING or ERROR before
re-raising so PROMPT_* error events get context; update that except block to
catch the exceptions, call the module/class logger (e.g., logger.error or
self._logger.error) with a descriptive message plus exception info and any
relevant prompt identifiers, then re-raise the original exception.
src/ai_company/engine/agent_engine.py (1)

192-229: ⚠️ Potential issue | 🟠 Major

Preserve the prepared execution state on post-setup failures.

If anything fails after _prepare_context(), run() falls back to _handle_fatal_error(), and that path rebuilds a fresh AgentContext on Line 478. The returned error result then loses the seeded conversation, the ASSIGNED -> IN_PROGRESS transition, and any state already accumulated by the failing run.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 192 - 229, The failure
path currently rebuilds a fresh AgentContext in _handle_fatal_error, losing the
prepared state from _prepare_context; change run() so that any exception after
calling _prepare_context (i.e., after obtaining ctx and system_prompt) passes
that prepared ctx and system_prompt into _handle_fatal_error instead of letting
it recreate a new context. Concretely: update _handle_fatal_error's signature to
accept an optional AgentContext and SystemPrompt (or overload it) and in run()
catch post-setup exceptions (around _loop.execute/_record_costs/_log_completion)
and call _handle_fatal_error(error, ctx=ctx, system_prompt=system_prompt, ...)
so the seeded conversation, ASSIGNED->IN_PROGRESS transition and accumulated
state are preserved in the returned error result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/aurelio-review-pr/SKILL.md:
- Around line 264-266: The GH API fetch examples are truncating review bodies
with the jq slice (.[0:15000] or similar), which violates the "fetch full body"
requirement; update the commands in the blocks showing gh api
repos/.../pulls/NUMBER/reviews (and the other occurrences called out around
lines 274-276 and 282-284) to remove the slicing expression and instead output
the full .body (e.g., replace (.body // "" | if length > 15000 then .[0:15000]
else . end) with (.body // "")) so reviewer bodies are not truncated.

In `@src/ai_company/engine/agent_engine.py`:
- Around line 387-414: The current code collapses multi-turn usage by creating a
single CostRecord from result.context.accumulated_cost, which loses per-call
granularity and undercounts record_count; instead iterate result.turns (or
result.context.turns if present) and create/persist one CostRecord per turn
using the turn-level tokens/cost and same metadata (agent_id, task_id,
identity.model.provider/model_id, timestamp), calling _submit_cost for each
record (or batching but ensuring record_count reflects per-call records) so
per-turn analytics are preserved; update any use of the local variable usage
(and the zero-cost skip) to apply per-turn and still keep the existing aggregate
skip logic only when every turn is zero.

---

Duplicate comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 192-229: The failure path currently rebuilds a fresh AgentContext
in _handle_fatal_error, losing the prepared state from _prepare_context; change
run() so that any exception after calling _prepare_context (i.e., after
obtaining ctx and system_prompt) passes that prepared ctx and system_prompt into
_handle_fatal_error instead of letting it recreate a new context. Concretely:
update _handle_fatal_error's signature to accept an optional AgentContext and
SystemPrompt (or overload it) and in run() catch post-setup exceptions (around
_loop.execute/_record_costs/_log_completion) and call _handle_fatal_error(error,
ctx=ctx, system_prompt=system_prompt, ...) so the seeded conversation,
ASSIGNED->IN_PROGRESS transition and accumulated state are preserved in the
returned error result.

In `@src/ai_company/engine/prompt.py`:
- Around line 220-221: The except branch that currently reads "except
MemoryError, RecursionError: raise" should log the failure (including exception
info and prompt context) at WARNING or ERROR before re-raising so PROMPT_* error
events get context; update that except block to catch the exceptions, call the
module/class logger (e.g., logger.error or self._logger.error) with a
descriptive message plus exception info and any relevant prompt identifiers,
then re-raise the original exception.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7cdb92d0-308c-4eb9-aefe-32e03cc5176f

📥 Commits

Reviewing files that changed from the base of the PR and between 79cb49b and 0fb8035.

📒 Files selected for processing (7)
  • .claude/skills/aurelio-review-pr/SKILL.md
  • DESIGN_SPEC.md
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/prompt.py
  • tests/integration/engine/test_agent_engine_integration.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_run_result.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations
Use except A, B: (no parentheses) for exception handling—PEP 758 except syntax. Ruff enforces this on Python 3.14
All public functions require type hints. Enforce mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones. Use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement on non-Pydantic internal collections (registries, BaseTool)
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr from core.types for all identifier/name fields (including optional NotBlankStr | None and tuple tuple[NotBlankStr, ...] variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task
Maximum line length is 88 characters (enforced by ruff)
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

  • src/ai_company/engine/prompt.py
  • tests/unit/engine/test_run_result.py
  • tests/integration/engine/test_agent_engine_integration.py
  • src/ai_company/engine/agent_engine.py
  • tests/unit/engine/test_agent_engine.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have from ai_company.observability import get_logger then logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code
Logger variable name must always be logger (not _logger, not log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g. PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging with kwargs: logger.info(EVENT, key=value). Never use format strings like logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging
Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig

Files:

  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/agent_engine.py
src/ai_company/{engine,providers}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains

Files:

  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/agent_engine.py
{src/ai_company,tests}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2) .claude/ skill/agent files, (3) third-party import paths/module names. Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/engine/prompt.py
  • tests/unit/engine/test_run_result.py
  • tests/integration/engine/test_agent_engine_integration.py
  • src/ai_company/engine/agent_engine.py
  • tests/unit/engine/test_agent_engine.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow for test categorization
Maintain 80% minimum code coverage (enforced in CI)

Files:

  • tests/unit/engine/test_run_result.py
  • tests/integration/engine/test_agent_engine_integration.py
  • tests/unit/engine/test_agent_engine.py
🧠 Learnings (26)
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: After the PR exists, use `/aurelio-review-pr` to handle external reviewer feedback

Applied to files:

  • .claude/skills/aurelio-review-pr/SKILL.md
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/memory/**/*.py,src/services/**/*.py} : Story state is maintained through `src/memory/story_state.py` module using Pydantic models for validation (StoryState, Character, Chapter, etc.)

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use `StoryState` from `memory/story_state.py` for context management and balance context size vs. token limits when passing story context

Applied to files:

  • DESIGN_SPEC.md
  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/memory/story_state.py : Story state must be maintained through `src/memory/story_state.py` module using Pydantic models for validation (StoryState, Character, Chapter, etc.)

Applied to files:

  • DESIGN_SPEC.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • DESIGN_SPEC.md
  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to **/*.py : Use `except A, B:` (no parentheses) for exception handling—PEP 758 except syntax. Ruff enforces this on Python 3.14

Applied to files:

  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions

Applied to files:

  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state

Applied to files:

  • src/ai_company/engine/prompt.py
  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/unit/engine/test_run_result.py
  • tests/integration/engine/test_agent_engine_integration.py
  • src/ai_company/engine/agent_engine.py
  • tests/unit/engine/test_agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to tests/**/*.py : Use pytest markers `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow` for test categorization

Applied to files:

  • tests/integration/engine/test_agent_engine_integration.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to pyproject.toml : Set test timeout to 30 seconds per test

Applied to files:

  • tests/integration/engine/test_agent_engine_integration.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/integration/test_*.py : Place integration tests for component interactions in `tests/integration/` directory

Applied to files:

  • tests/integration/engine/test_agent_engine_integration.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/{engine,providers}/**/*.py : `RetryExhaustedError` signals that all retries failed—the engine layer catches this to trigger fallback chains

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Use error handling decorators `handle_ollama_errors` and `retry_with_fallback` from utils/exceptions.py for LLM operations.

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py} : Use decorators `handle_ollama_errors` and `retry_with_fallback` for error handling

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/agents/*.py : All AI agents must extend `BaseAgent` from `src/agents/base.py` with retry logic and rate limiting

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/agents/**/*.py : Use error handling decorators `handle_ollama_errors` and `retry_with_fallback` from `src/utils/error_handling.py` for Ollama operations

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/providers/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/**/*.py : DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : RAG retrieval failures are non-fatal—agents proceed with empty context if retrieval fails rather than raising errors.

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Be aware of concurrent agent execution - don't modify shared state without synchronization and use thread-safe data structures when needed

Applied to files:

  • src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : All new code must have corresponding unit tests. When modifying existing code, update related tests. Tests should cover both happy paths and edge cases.

Applied to files:

  • tests/unit/engine/test_agent_engine.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: All new code must have corresponding unit tests. When modifying existing code, update related tests. Tests should cover both happy paths and edge cases.

Applied to files:

  • tests/unit/engine/test_agent_engine.py
🪛 LanguageTool
.claude/skills/aurelio-review-pr/SKILL.md

[style] ~260-~260: Consider using the typographical ellipsis character here instead.
Context: ...ng gh apialways unfiltered (no select(.user.login == ...) filtering): 1. *Review submissions...

(ELLIPSIS)


[style] ~270-~270: The phrase ‘Look for patterns’ is used very frequently. Consider using a less frequent alternative to set your writing apart from others.
Context: ... lines are outside the PR's diff range. Look for patterns like "Outside diff range comments (N)" ...

(LOOK_FOR_STYLE)


[style] ~290-~290: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ... no reviewer is accidentally missed. Important: Use gh api with --jq for filteri...

(EN_REPEATEDWORDS_IMPORTANT)

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +53 to +54
CREATED tasks lack an assignee; terminal statuses (COMPLETED, CANCELLED,
FAILED) and BLOCKED/IN_REVIEW are not executable.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring mentions FAILED as a non-executable terminal status, but TaskStatus.FAILED does not yet exist in core/enums.py (it's planned for future crash recovery per DESIGN_SPEC §6.6). Consider removing FAILED from this docstring to keep it accurate with respect to the current codebase, or add a note that it's a planned status.

Suggested change
CREATED tasks lack an assignee; terminal statuses (COMPLETED, CANCELLED,
FAILED) and BLOCKED/IN_REVIEW are not executable.
CREATED tasks lack an assignee; terminal statuses (COMPLETED, CANCELLED)
and BLOCKED/IN_REVIEW are not executable.

Copilot uses AI. Check for mistakes.
…ilot

- Remove non-existent FAILED from executable statuses docstring
- Per-turn CostRecord recording instead of single aggregate
- Pass tracker as explicit parameter to _submit_cost (eliminates type: ignore)
- Preserve prepared context in _handle_fatal_error on post-setup failures
- Clarify MemoryError/RecursionError propagation in _record_costs docstring
- Log before re-raising non-recoverable errors in prompt builder
- Remove body truncation from PR review skill fetch commands

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aureliolo Aureliolo merged commit f2eb73a into main Mar 6, 2026
7 checks passed
@Aureliolo Aureliolo deleted the feat/agent-engine-core branch March 6, 2026 21:50
Comment on lines +208 to +213
logger.debug(
EXECUTION_ENGINE_PROMPT_BUILT,
agent_id=agent_id,
task_id=task_id,
estimated_tokens=system_prompt.estimated_tokens,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misplaced EXECUTION_ENGINE_PROMPT_BUILT event

This log event fires inside _execute() — after _make_budget_checker() and _make_tool_invoker() have already run — but the system prompt was actually constructed earlier in _prepare_context() (line 250). The event name implies it fires immediately after the prompt build, but it fires much later (right before the execution loop starts at line 215). This can produce misleading traces: an observer correlating this event with prompt-construction latency would see inflated timing that includes tool-invoker and budget-checker setup.

Consider either:

  • Moving the log to the end of _prepare_context() where the prompt is actually built, or
  • Renaming the event to something like EXECUTION_ENGINE_READY / EXECUTION_ENGINE_LOOP_START to better reflect when it actually fires.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 208-213

Comment:
**Misplaced `EXECUTION_ENGINE_PROMPT_BUILT` event**

This log event fires inside `_execute()` — after `_make_budget_checker()` and `_make_tool_invoker()` have already run — but the system prompt was actually constructed earlier in `_prepare_context()` (line 250). The event name implies it fires immediately after the prompt build, but it fires much later (right before the execution loop starts at line 215). This can produce misleading traces: an observer correlating this event with prompt-construction latency would see inflated timing that includes tool-invoker and budget-checker setup.

Consider either:
- Moving the log to the end of `_prepare_context()` where the prompt is actually built, or
- Renaming the event to something like `EXECUTION_ENGINE_READY` / `EXECUTION_ENGINE_LOOP_START` to better reflect when it actually fires.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +416 to +425
record = CostRecord(
agent_id=agent_id,
task_id=task_id,
provider=identity.model.provider,
model=identity.model.model_id,
input_tokens=turn.input_tokens,
output_tokens=turn.output_tokens,
cost_usd=turn.cost_usd,
timestamp=datetime.now(UTC),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cost record timestamps reflect recording time, not turn execution time

timestamp=datetime.now(UTC) is evaluated in the post-execution _record_costs loop (lines 398–432). Every CostRecord for a multi-turn run will therefore have an essentially identical timestamp (all calls happen within microseconds of each other, after the loop completes) that reflects when costs were recorded, not when each LLM turn actually ran.

This makes it impossible to reconstruct per-turn execution timing from cost records alone. For example, cost-analytics queries like "which turns were most expensive and when did they run?" cannot be answered from the stored records.

TurnRecord does not currently carry a timestamp, so a proper fix would require adding one. As a minimal improvement, you could at least document this limitation in the _record_costs docstring (lines 377–386) so future maintainers understand that CostRecord.timestamp represents the batch-recording instant rather than per-turn execution time.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 416-425

Comment:
**Cost record timestamps reflect recording time, not turn execution time**

`timestamp=datetime.now(UTC)` is evaluated in the post-execution `_record_costs` loop (lines 398–432). Every `CostRecord` for a multi-turn run will therefore have an essentially identical timestamp (all calls happen within microseconds of each other, after the loop completes) that reflects when costs were *recorded*, not when each LLM turn actually *ran*.

This makes it impossible to reconstruct per-turn execution timing from cost records alone. For example, cost-analytics queries like "which turns were most expensive and when did they run?" cannot be answered from the stored records.

`TurnRecord` does not currently carry a timestamp, so a proper fix would require adding one. As a minimal improvement, you could at least document this limitation in the `_record_costs` docstring (lines 377–386) so future maintainers understand that `CostRecord.timestamp` represents the batch-recording instant rather than per-turn execution time.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +560 to +562
if task.budget_limit > 0:
parts.append("")
parts.append(f"**Budget limit:** ${task.budget_limit:.2f} USD")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total budget limit shown to LLM; remaining budget not considered

The task instruction appends task.budget_limit — the total configured limit — to the user message sent to the LLM. If the same task is resumed across multiple run() calls (which the IN_PROGRESS acceptance path at line 51 allows), the LLM is told the full original budget even though some of it may have been consumed in a prior run. The in-run BudgetChecker (lines 583–586) similarly only checks against task.budget_limit, not remaining budget.

This creates two related issues:

  1. The LLM may believe it has more spending headroom than it actually does.
  2. Budget-aware reasoning by the LLM ("I have $X left") will be inaccurate on resumed runs.

If resumption is out of scope for M3, this should be documented as a known limitation. If it is in scope, consider passing remaining_budget = task.budget_limit - prior_spend and only showing/checking that value.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 560-562

Comment:
**Total budget limit shown to LLM; remaining budget not considered**

The task instruction appends `task.budget_limit` — the *total* configured limit — to the user message sent to the LLM. If the same task is resumed across multiple `run()` calls (which the `IN_PROGRESS` acceptance path at line 51 allows), the LLM is told the full original budget even though some of it may have been consumed in a prior run. The in-run `BudgetChecker` (lines 583–586) similarly only checks against `task.budget_limit`, not remaining budget.

This creates two related issues:
1. The LLM may believe it has more spending headroom than it actually does.
2. Budget-aware reasoning by the LLM ("I have $X left") will be inaccurate on resumed runs.

If resumption is out of scope for M3, this should be documented as a known limitation. If it is in scope, consider passing `remaining_budget = task.budget_limit - prior_spend` and only showing/checking that value.

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement agent engine core with ExecutionLoop protocol integration (DESIGN_SPEC §3.1, §6.1, §6.5)

2 participants