feat: implement AgentEngine core orchestrator (#11)#143
Conversation
Add AgentEngine as the top-level orchestrator that ties together prompt construction, execution context, execution loop, tool invocation, and budget tracking into a single run() entry point. New files: - engine/agent_engine.py: AgentEngine class with run() method - engine/run_result.py: AgentRunResult frozen model with computed fields - tests/unit/engine/test_agent_engine.py: 27 unit tests (14 classes) Modified: - engine/__init__.py: export AgentEngine, AgentRunResult - observability/events/execution.py: 6 new engine event constants Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 9 agents, 23 findings addressed: - Critical: cost recording failure no longer destroys successful results - Major: extracted _prepare_context, fixed duration in error path, removed duplicate DEFAULT_MAX_TURNS, added Raises docstring - Added 5 new tests (zero-cost skip, cost-tracker failure, completion config forwarding, max_turns forwarding, deadline formatting) - Updated DESIGN_SPEC.md §6.5 with AgentEngine + AgentRunResult docs - Added 4 new execution event constants for debug observability Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add memory_messages parameter to engine.run() for working memory injection - Add max_turns validation (>= 1) at engine boundary - Extract _log_completion helper to keep _execute under 50 lines - Defend _handle_fatal_error against secondary failures - Add cost context to _record_costs exception log - Add DEBUG log when cost_tracker is None - Fix deadline blank-line separator in _format_task_instruction - Fix task_id field description inconsistency in AgentRunResult - Update engine __init__.py docstring - Update DESIGN_SPEC.md engine section to match implementation - Split test_agent_engine.py (829 lines) into 3 focused files: - test_agent_engine.py (core orchestration tests) - test_agent_engine_errors.py (error handling + edge cases) - test_run_result.py (AgentRunResult + helpers) - Add integration test for full tool-call pipeline - Fix mypy errors: hiring_date, FinishReason.TOOL_USE, parameters_schema, AgentContext.from_identity(), deadline as ISO string Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 8 agents, 14 findings addressed: - Add MemoryError/RecursionError guard in build_system_prompt (prompt.py) - Add MemoryError/RecursionError guard in CostTracker._resolve_department - Bind exception and add error kwarg in _record_costs exception log - Add clarifying comment for zero-cost AND guard condition - Add comment explaining type: ignore[prop-decorator] in run_result.py - Update DESIGN_SPEC.md: add memory_messages to run() signature and pipeline step 4 - Extract _make_completion_response to conftest (remove duplication) - Replace private _loop attribute access with behavioral assertion - Add RecursionError test for _record_costs propagation - Add MemoryError test for _handle_fatal_error build path - Add BLOCKED task status rejection test - Add max_turns=1 boundary test - Consolidate cost-recording error test classes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a new AgentEngine orchestrator and AgentRunResult model, expands engine package exports and observability events, updates prompt/budget error handling to re-raise MemoryError/RecursionError, and adds comprehensive unit and integration tests for orchestration, tooling, costs, and error paths. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant AgentEngine
participant ExecutionLoop
participant Provider
participant ToolRegistry
participant CostTracker
Client->>AgentEngine: run(identity, task, max_turns, ...)
AgentEngine->>AgentEngine: validate inputs & prepare context
AgentEngine->>Provider: build/system prompt (via prompt builder)
AgentEngine->>ToolRegistry: fetch tool definitions
AgentEngine->>ExecutionLoop: start loop(context, tools, budget_checker)
loop up to max_turns
ExecutionLoop->>Provider: complete(messages, tools)
Provider-->>ExecutionLoop: CompletionResponse
alt tool call returned
ExecutionLoop->>ToolRegistry: invoke(tool_call)
ToolRegistry-->>ExecutionLoop: ToolExecutionResult
ExecutionLoop->>ExecutionLoop: integrate tool result
end
end
AgentEngine->>CostTracker: record_costs(execution_result)
AgentEngine-->>Client: return AgentRunResult(execution_result, metadata)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces the core Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces the AgentEngine orchestrator, a major and well-structured core component with comprehensive test coverage. However, several security vulnerabilities were identified, including potential prompt injection where user-controlled data is directly concatenated into LLM prompts, information exposure through overly verbose error messages in the fatal error handler, and a systematic syntax error in exception handling that will lead to a denial of service in Python 3 environments. Additionally, a potential bug in the cost recording logic and a minor efficiency improvement were noted. Addressing these issues is critical for the robustness, correctness, and security of the new engine.
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
This except A, B: syntax is from Python 2 and is invalid in Python 3, where it will raise a SyntaxError. The correct syntax for catching multiple exceptions is to use a tuple: except (MemoryError, RecursionError):.
While CLAUDE.md mentions using except A, B: and attributes it to PEP 758, this seems to be a misunderstanding. PEP 758 introduces the except* syntax for handling ExceptionGroups and does not change the standard except syntax. This code will fail to parse in a Python 3.14 environment.
except (MemoryError, RecursionError):
raise| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
This except A, B: syntax is from Python 2 and is invalid in Python 3, where it will raise a SyntaxError. The correct syntax for catching multiple exceptions is to use a tuple: except (MemoryError, RecursionError):.
While CLAUDE.md mentions using except A, B: and attributes it to PEP 758, this seems to be a misunderstanding. PEP 758 introduces the except* syntax for handling ExceptionGroups and does not change the standard except syntax. This code will fail to parse in a Python 3.14 environment.
| except MemoryError, RecursionError: | |
| raise | |
| except (MemoryError, RecursionError): | |
| raise |
| for msg in memory_messages: | ||
| ctx = ctx.with_message(msg) |
There was a problem hiding this comment.
The _prepare_context method appends memory_messages to the conversation context without validating their roles. This creates a security vulnerability where an attacker could inject messages with the system role from untrusted input, overriding the agent's core instructions. It is critical to validate that memory_messages only contain allowed roles (e.g., user, assistant) before adding them to the context. Additionally, this loop creates a new AgentContext instance for each message, which can be inefficient for a large number of messages due to repeated tuple creation and object copying. Consider adding a with_messages(self, msgs: tuple[ChatMessage, ...]) method to AgentContext for improved efficiency.
| def _format_task_instruction(task: Task) -> str: | ||
| """Format a task into a user message for the initial conversation.""" | ||
| parts = [f"# Task: {task.title}", "", task.description] | ||
|
|
||
| if task.acceptance_criteria: | ||
| parts.append("") | ||
| parts.append("## Acceptance Criteria") | ||
| parts.extend(f"- {c.description}" for c in task.acceptance_criteria) | ||
|
|
||
| if task.budget_limit > 0: | ||
| parts.append("") | ||
| parts.append(f"**Budget limit:** ${task.budget_limit:.2f} USD") | ||
|
|
||
| if task.deadline: | ||
| parts.append("") | ||
| parts.append(f"**Deadline:** {task.deadline}") | ||
|
|
||
| return "\n".join(parts) |
There was a problem hiding this comment.
The _format_task_instruction function constructs a user message by directly concatenating task fields (title, description, acceptance_criteria) without any sanitization or escaping. If these fields contain user-supplied data, an attacker can perform prompt injection to manipulate the LLM's behavior. Consider sanitizing these inputs or using a structured format (like XML or JSON) with clear delimiters to help the LLM distinguish between instructions and data.
| usage = result.context.accumulated_cost | ||
| # Skip only when provably nothing happened (both zero); a run with | ||
| # tokens but zero cost (e.g., a test provider) is still recorded. | ||
| if usage.cost_usd <= 0.0 and usage.input_tokens == 0: |
There was a problem hiding this comment.
The condition to skip cost recording only checks for usage.input_tokens == 0. This could lead to incorrectly skipping records where input_tokens is 0 but output_tokens is greater than 0.
The accompanying comment says, 'a run with tokens but zero cost... is still recorded,' which this logic contradicts. To fix this, the condition should check that both input and output tokens are zero.
if usage.cost_usd <= 0.0 and usage.input_tokens == 0 and usage.output_tokens == 0:| return None | ||
| try: | ||
| return self._department_resolver(agent_id) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
The exception handling syntax except MemoryError, RecursionError: is invalid in Python 3 and will result in a SyntaxError, causing the application to crash and leading to a denial of service. In Python 3, multiple exceptions must be caught using a parenthesized tuple. The mention of except A, B: in CLAUDE.md and its attribution to PEP 758 is a misunderstanding, as PEP 758 introduces except* for ExceptionGroups, not a change to standard except syntax.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| memory_messages=memory_messages, | ||
| start=start, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
The exception handling syntax except MemoryError, RecursionError: is invalid in Python 3 and will result in a SyntaxError, causing the application to crash and leading to a denial of service. In Python 3, multiple exceptions must be caught using a parenthesized tuple. The mention of except A, B: in CLAUDE.md and its attribution to PEP 758 is a misunderstanding, as PEP 758 introduces except* for ExceptionGroups, not a change to standard except syntax.
except (MemoryError, RecursionError):| ) | ||
| except PromptBuildError: | ||
| raise # Already logged by inner functions. | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
The exception handling syntax except MemoryError, RecursionError: is invalid in Python 3 and will result in a SyntaxError, causing the application to crash and leading to a denial of service. In Python 3, multiple exceptions must be caught using a parenthesized tuple. The mention of except A, B: in CLAUDE.md and its attribution to PEP 758 is a misunderstanding, as PEP 758 introduces except* for ExceptionGroups, not a change to standard except syntax.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| error_msg = f"{type(exc).__name__}: {exc}" | ||
| logger.exception( | ||
| EXECUTION_ENGINE_ERROR, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| error=error_msg, | ||
| ) | ||
|
|
||
| try: | ||
| ctx = AgentContext.from_identity(identity, task=task) | ||
| error_execution = ExecutionResult( | ||
| context=ctx, | ||
| termination_reason=TerminationReason.ERROR, | ||
| error_message=error_msg, |
There was a problem hiding this comment.
The _handle_fatal_error method includes the raw exception message in the AgentRunResult, which may be exposed to end-users. This can leak sensitive internal information such as file paths, database details, or system configurations. Use a generic error message for the public-facing result and log the detailed exception internally for debugging.
Greptile SummaryThis PR introduces Key observations:
Confidence Score: 4/5
Sequence DiagramsequenceDiagram
participant Caller
participant AgentEngine
participant _prepare_context
participant ExecutionLoop
participant _record_costs
participant CostTracker
Caller->>AgentEngine: run(identity, task, ...)
AgentEngine->>AgentEngine: validate inputs (agent ACTIVE, task ASSIGNED/IN_PROGRESS, max_turns ≥ 1)
AgentEngine->>_prepare_context: build_system_prompt + seed conversation + transition task
_prepare_context-->>AgentEngine: (AgentContext, SystemPrompt)
AgentEngine->>ExecutionLoop: execute(context, provider, tool_invoker, budget_checker, config)
ExecutionLoop-->>AgentEngine: ExecutionResult
AgentEngine->>_record_costs: per-turn CostRecords
alt CostTracker configured
_record_costs->>CostTracker: record(CostRecord) × N turns
CostTracker-->>_record_costs: ok / Exception (logged, swallowed)
else No CostTracker
_record_costs-->>AgentEngine: skip (logged)
end
AgentEngine-->>Caller: AgentRunResult(execution_result, system_prompt, duration, ...)
note over AgentEngine: MemoryError / RecursionError → re-raised unconditionally
note over AgentEngine: All other exceptions → _handle_fatal_error → error AgentRunResult
Last reviewed commit: 528ca2c |
There was a problem hiding this comment.
Pull request overview
This PR implements the AgentEngine core orchestrator (issue #11), which serves as the top-level entry point for running an agent on a task. It composes prompt construction, execution context management, execution loop delegation, tool invocation, and cost tracking into a single run() method, returning a structured AgentRunResult.
Changes:
- AgentEngine orchestrator (
agent_engine.py, 513 lines) with full pipeline: input validation → prompt building → context seeding → task transition → loop delegation → cost recording → result wrapping, plus_format_task_instructionand_make_budget_checkerhelpers - AgentRunResult model (
run_result.py, 84 lines) — frozen Pydantic model with computed fields (termination_reason,total_turns,total_cost_usd,is_success) delegating to the innerExecutionResult - 10 new
execution.engine.*event constants, extensive test coverage across 3 unit test files + 1 integration test, and documentation updates to DESIGN_SPEC.md and CLAUDE.md
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai_company/engine/agent_engine.py |
New top-level orchestrator with run(), validation, context setup, cost recording, and error handling |
src/ai_company/engine/run_result.py |
New frozen Pydantic model wrapping ExecutionResult with engine metadata and computed fields |
src/ai_company/observability/events/execution.py |
10 new structured logging event constants for the engine namespace |
src/ai_company/engine/__init__.py |
Re-exports AgentEngine and AgentRunResult in public API |
src/ai_company/engine/prompt.py |
Added except MemoryError, RecursionError clause (has a bug) |
src/ai_company/budget/tracker.py |
Added except MemoryError, RecursionError clause (has a bug) |
tests/unit/engine/conftest.py |
New make_completion_response factory helper for engine tests |
tests/unit/engine/test_agent_engine.py |
692 lines: happy paths, task transition, validation, budget, cost, tools, immutability |
tests/unit/engine/test_agent_engine_errors.py |
352 lines: error handling, non-recoverable propagation, fatal error paths |
tests/unit/engine/test_run_result.py |
426 lines: frozen model, computed fields, validation, helpers |
tests/integration/engine/test_agent_engine_integration.py |
Full pipeline integration test with tool calls |
DESIGN_SPEC.md |
New §6.5 AgentEngine Orchestrator section and project structure update |
CLAUDE.md |
Updated engine directory description |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
| except PromptBuildError: | ||
| raise # Already logged by inner functions. | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError — it does NOT catch both exception types. The correct syntax is except (MemoryError, RecursionError): with parentheses. See the correct pattern in src/ai_company/tools/invoker.py:168.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| return None | ||
| try: | ||
| return self._department_resolver(agent_id) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError — it does NOT catch both exception types. The correct syntax is except (MemoryError, RecursionError): with parentheses. See the correct pattern in src/ai_company/tools/invoker.py:168.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| @pytest.mark.integration | ||
| class TestAgentEngineToolCallIntegration: |
There was a problem hiding this comment.
This integration test file is missing the module-level pytestmark convention used by all other integration test files in the repository. All other integration test files (e.g., tests/integration/providers/test_provider_pipeline.py:28, tests/integration/providers/test_error_scenarios.py:44, tests/integration/observability/test_sink_routing_integration.py:15) use pytestmark = [pytest.mark.integration, pytest.mark.timeout(30)] at module level. Without pytest.mark.timeout(30), this test has no timeout guard and could hang indefinitely. Consider adding pytestmark = [pytest.mark.integration, pytest.mark.timeout(30)] and removing the class-level @pytest.mark.integration decorator.
| memory_messages=memory_messages, | ||
| start=start, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError, shadowing the builtin. It does NOT catch both exception types. The correct Python 3 syntax to catch multiple exceptions is except (MemoryError, RecursionError): with parentheses (tuple form). The codebase already uses the correct form in src/ai_company/tools/invoker.py (lines 168, 256, 282, 352).
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| timestamp=datetime.now(UTC), | ||
| ) | ||
| await self._cost_tracker.record(record) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError, shadowing the builtin — it does NOT catch both exception types. The correct syntax is except (MemoryError, RecursionError): with parentheses. See the correct pattern in src/ai_company/tools/invoker.py:168.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
except MemoryError, RecursionError: is Python 2 syntax. In Python 3, this is either a SyntaxError, or it catches only MemoryError and binds it to the name RecursionError — it does NOT catch both exception types. The correct syntax is except (MemoryError, RecursionError): with parentheses. See the correct pattern in src/ai_company/tools/invoker.py:168.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Line 918: Update the public spec to mark run as asynchronous: change the
signature `run(identity, task, completion_config?, max_turns?, memory_messages?)
-> AgentRunResult` to an async form (e.g., `async run(...) ->
Promise<AgentRunResult>`), and update any mentions of `AgentEngine.run` to
reflect that callers must await the result; ensure the return type and
description in the spec indicate a Promise of AgentRunResult and note async
behavior for callers.
In `@src/ai_company/engine/agent_engine.py`:
- Around line 156-157: The except blocks in agent_engine.py that catch
MemoryError/RecursionError should log a WARNING/ERROR with context and the
exception before re-raising (e.g., replace the bare re-raise in the except
(MemoryError, RecursionError) handlers with a call to the engine logger such as
logger.error or logger.exception including contextual identifiers like agent id,
task id or function name, and exc_info=True), then re-raise the original
exception; update the same pattern for the other occurrences noted (around the
handlers at the lines referenced) to ensure all fatal error paths emit
structured logs before propagating.
- Around line 376-385: The cost-skip guard in the agent engine incorrectly
treats a run with output tokens but zero input tokens and zero cost as "nothing
happened"; update the condition in the block handling
result.context.accumulated_cost (variable usage) so it only skips when
usage.cost_usd <= 0.0 AND usage.input_tokens == 0 AND usage.output_tokens == 0;
leave the existing logger.debug call (EXECUTION_ENGINE_COST_SKIPPED with
agent_id and task_id) intact so runs that produced output_tokens are still
recorded.
- Around line 181-208: The exception path after calling _prepare_context
discards the prepared AgentContext and system_prompt; modify agent_engine.py so
any error that occurs after _prepare_context (including during
_make_tool_invoker, _loop.execute, and _record_costs) preserves and returns or
logs the prepared context and prompt metadata instead of rebuilding from Task:
capture ctx and system_prompt immediately, wrap the subsequent work
(tool_invoker creation, _loop.execute, and _record_costs) in a
try/except/finally that on failure records the ASSIGNED->IN_PROGRESS transition
and includes ctx/system_prompt in the error result or error log, and apply the
same fix to the similar block referenced at lines 444-464 to ensure
telemetry/recovery sees the run as started.
In `@tests/unit/engine/test_agent_engine.py`:
- Around line 440-468: Add a regression test alongside
test_zero_cost_not_recorded that covers the case cost_usd == 0.0 but tokens > 0:
use CostTracker, create a Task and a response via
_make_completion_response(cost_usd=0.0, input_tokens=5, output_tokens=2), pass a
mock provider to AgentEngine(provider=..., cost_tracker=tracker), run
engine.run(...), then assert tracker.get_record_count() == 1 and optionally
verify the recorded entry's token fields and cost are persisted; reference test
function test_zero_cost_not_recorded, helper _make_completion_response, class
CostTracker, and AgentEngine/_record_costs to locate where to add this new test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 59e887fc-e7fd-43e8-bc42-00335982086b
📒 Files selected for processing (14)
CLAUDE.mdDESIGN_SPEC.mdsrc/ai_company/budget/tracker.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/prompt.pysrc/ai_company/engine/run_result.pysrc/ai_company/observability/events/execution.pytests/integration/engine/__init__.pytests/integration/engine/test_agent_engine_integration.pytests/unit/engine/conftest.pytests/unit/engine/test_agent_engine.pytests/unit/engine/test_agent_engine_errors.pytests/unit/engine/test_run_result.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:(no parentheses) for exception handling—PEP 758 except syntax. Ruff enforces this on Python 3.14
All public functions require type hints. Enforce mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones. Usecopy.deepcopy()at construction andMappingProxyTypewrapping for read-only enforcement on non-Pydantic internal collections (registries,BaseTool)
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (withmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStrfromcore.typesfor all identifier/name fields (including optionalNotBlankStr | Noneand tupletuple[NotBlankStr, ...]variants) instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task
Maximum line length is 88 characters (enforced by ruff)
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
tests/unit/engine/test_run_result.pytests/unit/engine/test_agent_engine_errors.pysrc/ai_company/engine/run_result.pysrc/ai_company/observability/events/execution.pytests/unit/engine/conftest.pysrc/ai_company/budget/tracker.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.pytests/integration/engine/test_agent_engine_integration.pysrc/ai_company/engine/prompt.pytests/unit/engine/test_agent_engine.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowfor test categorization
Maintain 80% minimum code coverage (enforced in CI)
Files:
tests/unit/engine/test_run_result.pytests/unit/engine/test_agent_engine_errors.pytests/unit/engine/conftest.pytests/integration/engine/test_agent_engine_integration.pytests/unit/engine/test_agent_engine.py
{src/ai_company,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2).claude/skill/agent files, (3) third-party import paths/module names. Tests must usetest-provider,test-small-001, etc.
Files:
tests/unit/engine/test_run_result.pytests/unit/engine/test_agent_engine_errors.pysrc/ai_company/engine/run_result.pysrc/ai_company/observability/events/execution.pytests/unit/engine/conftest.pysrc/ai_company/budget/tracker.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.pytests/integration/engine/test_agent_engine_integration.pysrc/ai_company/engine/prompt.pytests/unit/engine/test_agent_engine.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST havefrom ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code
Logger variable name must always belogger(not_logger, notlog)
Use event name constants from domain-specific modules underai_company.observability.events(e.g.PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging with kwargs:logger.info(EVENT, key=value). Never use format strings likelogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging
SetRetryConfigandRateLimiterConfigper-provider inProviderConfig
Files:
src/ai_company/engine/run_result.pysrc/ai_company/observability/events/execution.pysrc/ai_company/budget/tracker.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/prompt.py
src/ai_company/{engine,providers}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedErrorsignals that all retries failed—the engine layer catches this to trigger fallback chains
Files:
src/ai_company/engine/run_result.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/prompt.py
🧠 Learnings (11)
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
tests/unit/engine/test_run_result.pytests/unit/engine/test_agent_engine_errors.pytests/integration/engine/test_agent_engine_integration.pytests/unit/engine/test_agent_engine.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Applied to files:
tests/unit/engine/test_agent_engine_errors.pyDESIGN_SPEC.mdsrc/ai_company/engine/prompt.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/ai_company/observability/events/execution.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/integration/test_*.py : Place integration tests for component interactions in `tests/integration/` directory
Applied to files:
tests/integration/engine/test_agent_engine_integration.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to **/*.py : Use `except A, B:` (no parentheses) for exception handling—PEP 758 except syntax. Ruff enforces this on Python 3.14
Applied to files:
src/ai_company/engine/prompt.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions
Applied to files:
src/ai_company/engine/prompt.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : All new code must have corresponding unit tests. When modifying existing code, update related tests. Tests should cover both happy paths and edge cases.
Applied to files:
tests/unit/engine/test_agent_engine.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : Core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`) must maintain 100% test coverage
Applied to files:
tests/unit/engine/test_agent_engine.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : 100% test coverage is MANDATORY for every commit on core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`), CI enforces this coverage requirement
Applied to files:
tests/unit/engine/test_agent_engine.py
🔇 Additional comments (11)
CLAUDE.md (1)
51-51: Good doc sync for the expanded engine scope.This update matches the PR’s shift from a narrow execution engine to a broader orchestration layer with execution-loop responsibility.
Based on learnings, architecture/service changes should be reflected in existing documentation such as
CLAUDE.md.src/ai_company/budget/tracker.py (1)
304-305: LGTM!The exception handling correctly re-raises
MemoryErrorandRecursionErrorbefore the genericExceptionhandler, ensuring these non-recoverable errors propagate to callers. This aligns with the error propagation strategy across the engine layer.src/ai_company/engine/run_result.py (1)
1-84: LGTM!The
AgentRunResultmodel is well-designed:
- Frozen Pydantic model with proper field validation
- Computed fields correctly delegate to the nested
ExecutionResultNotBlankStrused for identifier fields as per guidelines- Clear documentation for the
type: ignorecomments explaining mypy limitationsrc/ai_company/observability/events/execution.py (1)
25-34: LGTM!The new engine-level event constants follow the established naming conventions and provide comprehensive observability coverage for the
AgentEnginelifecycle, including cost recording outcomes.src/ai_company/engine/prompt.py (1)
220-221: LGTM!The exception handling correctly ensures
MemoryErrorandRecursionErrorpropagate unconditionally, preventing these non-recoverable errors from being wrapped inPromptBuildError. The handler ordering (specific → fatal → generic) is appropriate.src/ai_company/engine/__init__.py (1)
3-5: LGTM!The public API exports are correctly expanded with
AgentEngineandAgentRunResult. The__all__list maintains alphabetical ordering, and the docstring accurately reflects the broader API surface.Also applies to: 8-8, 36-36, 45-46
tests/unit/engine/conftest.py (1)
241-259: LGTM!The
make_completion_responsehelper is well-designed with sensible defaults and correctly constructs a validCompletionResponse. It uses the generic"test-model-001"identifier as required by coding guidelines.tests/unit/engine/test_agent_engine_errors.py (1)
1-352: LGTM!Comprehensive error handling test coverage that validates:
- Provider errors return error results (not crashes)
MemoryError/RecursionErrorpropagate unconditionallymax_turnsboundary validation- Cost recording fatal error propagation
- Error result structure and fatal error recovery paths
- Memory message ordering in conversation context
The tests are well-organized into focused test classes with proper markers.
tests/unit/engine/test_run_result.py (1)
1-426: LGTM!Thorough test coverage for the
AgentRunResultmodel including:
- Frozen/immutable behavior verification
- Computed field delegation for all termination reasons
- Field validation constraints (negative duration, blank agent_id, optional task_id)
_format_task_instructionformatting variations_make_budget_checkerclosure logic with boundary conditionsThe test helpers (
_test_identity,_make_run_result) are well-designed for focused, readable tests.tests/integration/engine/test_agent_engine_integration.py (1)
130-207: Nice end-to-end regression coverage.This exercises the real
AgentEngine -> ReactLoop -> ToolRegistrypath instead of a mocked loop, which makes it a strong guard against wiring regressions.src/ai_company/engine/agent_engine.py (1)
467-475: The current implementation correctly preserves the original traceback.
raise exc from build_excproperly re-raises the original exception while documenting that a secondary failure (build_exc) occurred during error handling. The original exception object retains its__traceback__from the point of initial failure. The proposed fix adds explicit traceback handling but produces no improvement—testing confirms both approaches yield identical traceback preservation. No change needed.
…Copilot, CodeRabbit - DESIGN_SPEC.md: update stale M2 note to M3, mark run() as async - agent_engine: log MemoryError/RecursionError before re-raising (3 sites) - agent_engine: split _record_costs into _record_costs + _submit_cost (<50 lines) - agent_engine: cost skip now checks output_tokens == 0, fix comment - agent_engine: separate CostRecord construction from storage try/except - agent_engine: fix inverted exception chain (raise exc from None) - agent_engine: error-path SystemPrompt.metadata now has all 5 keys - agent_engine: improve _EXECUTABLE_STATUSES docstring - prompt.py: fix module docstring example to show realistic usage - integration test: add pytestmark with timeout(30) - unit test: add free-provider regression test (cost=0, tokens>0) - unit test: fix Any -> Task type annotations in test_run_result.py - skill: fix Phase 4 to fetch ALL reviewers unfiltered Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
src/ai_company/engine/prompt.py (1)
220-221:⚠️ Potential issue | 🟠 MajorLog fatal prompt-build failures before re-raising.
This branch bypasses any
PROMPT_*error event, soMemoryError/RecursionErrorfrom prompt construction lose prompt-layer context unless an outer caller happens to log them.Suggested fix
- except MemoryError, RecursionError: - raise + except MemoryError, RecursionError: + logger.error( + PROMPT_BUILD_ERROR, + agent_id=str(agent.id), + agent_name=agent.name, + error="non-recoverable error building prompt", + exc_info=True, + ) + raiseAs per coding guidelines, "All error paths must log at WARNING or ERROR with context before raising."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/prompt.py` around lines 220 - 221, The except branch that currently reads "except MemoryError, RecursionError: raise" should log the failure (including exception info and prompt context) at WARNING or ERROR before re-raising so PROMPT_* error events get context; update that except block to catch the exceptions, call the module/class logger (e.g., logger.error or self._logger.error) with a descriptive message plus exception info and any relevant prompt identifiers, then re-raise the original exception.src/ai_company/engine/agent_engine.py (1)
192-229:⚠️ Potential issue | 🟠 MajorPreserve the prepared execution state on post-setup failures.
If anything fails after
_prepare_context(),run()falls back to_handle_fatal_error(), and that path rebuilds a freshAgentContexton Line 478. The returned error result then loses the seeded conversation, theASSIGNED -> IN_PROGRESStransition, and any state already accumulated by the failing run.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/agent_engine.py` around lines 192 - 229, The failure path currently rebuilds a fresh AgentContext in _handle_fatal_error, losing the prepared state from _prepare_context; change run() so that any exception after calling _prepare_context (i.e., after obtaining ctx and system_prompt) passes that prepared ctx and system_prompt into _handle_fatal_error instead of letting it recreate a new context. Concretely: update _handle_fatal_error's signature to accept an optional AgentContext and SystemPrompt (or overload it) and in run() catch post-setup exceptions (around _loop.execute/_record_costs/_log_completion) and call _handle_fatal_error(error, ctx=ctx, system_prompt=system_prompt, ...) so the seeded conversation, ASSIGNED->IN_PROGRESS transition and accumulated state are preserved in the returned error result.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/skills/aurelio-review-pr/SKILL.md:
- Around line 264-266: The GH API fetch examples are truncating review bodies
with the jq slice (.[0:15000] or similar), which violates the "fetch full body"
requirement; update the commands in the blocks showing gh api
repos/.../pulls/NUMBER/reviews (and the other occurrences called out around
lines 274-276 and 282-284) to remove the slicing expression and instead output
the full .body (e.g., replace (.body // "" | if length > 15000 then .[0:15000]
else . end) with (.body // "")) so reviewer bodies are not truncated.
In `@src/ai_company/engine/agent_engine.py`:
- Around line 387-414: The current code collapses multi-turn usage by creating a
single CostRecord from result.context.accumulated_cost, which loses per-call
granularity and undercounts record_count; instead iterate result.turns (or
result.context.turns if present) and create/persist one CostRecord per turn
using the turn-level tokens/cost and same metadata (agent_id, task_id,
identity.model.provider/model_id, timestamp), calling _submit_cost for each
record (or batching but ensuring record_count reflects per-call records) so
per-turn analytics are preserved; update any use of the local variable usage
(and the zero-cost skip) to apply per-turn and still keep the existing aggregate
skip logic only when every turn is zero.
---
Duplicate comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 192-229: The failure path currently rebuilds a fresh AgentContext
in _handle_fatal_error, losing the prepared state from _prepare_context; change
run() so that any exception after calling _prepare_context (i.e., after
obtaining ctx and system_prompt) passes that prepared ctx and system_prompt into
_handle_fatal_error instead of letting it recreate a new context. Concretely:
update _handle_fatal_error's signature to accept an optional AgentContext and
SystemPrompt (or overload it) and in run() catch post-setup exceptions (around
_loop.execute/_record_costs/_log_completion) and call _handle_fatal_error(error,
ctx=ctx, system_prompt=system_prompt, ...) so the seeded conversation,
ASSIGNED->IN_PROGRESS transition and accumulated state are preserved in the
returned error result.
In `@src/ai_company/engine/prompt.py`:
- Around line 220-221: The except branch that currently reads "except
MemoryError, RecursionError: raise" should log the failure (including exception
info and prompt context) at WARNING or ERROR before re-raising so PROMPT_* error
events get context; update that except block to catch the exceptions, call the
module/class logger (e.g., logger.error or self._logger.error) with a
descriptive message plus exception info and any relevant prompt identifiers,
then re-raise the original exception.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7cdb92d0-308c-4eb9-aefe-32e03cc5176f
📒 Files selected for processing (7)
.claude/skills/aurelio-review-pr/SKILL.mdDESIGN_SPEC.mdsrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/prompt.pytests/integration/engine/test_agent_engine_integration.pytests/unit/engine/test_agent_engine.pytests/unit/engine/test_run_result.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:(no parentheses) for exception handling—PEP 758 except syntax. Ruff enforces this on Python 3.14
All public functions require type hints. Enforce mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones. Usecopy.deepcopy()at construction andMappingProxyTypewrapping for read-only enforcement on non-Pydantic internal collections (registries,BaseTool)
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (withmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStrfromcore.typesfor all identifier/name fields (including optionalNotBlankStr | Noneand tupletuple[NotBlankStr, ...]variants) instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task
Maximum line length is 88 characters (enforced by ruff)
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/ai_company/engine/prompt.pytests/unit/engine/test_run_result.pytests/integration/engine/test_agent_engine_integration.pysrc/ai_company/engine/agent_engine.pytests/unit/engine/test_agent_engine.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST havefrom ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code
Logger variable name must always belogger(not_logger, notlog)
Use event name constants from domain-specific modules underai_company.observability.events(e.g.PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging with kwargs:logger.info(EVENT, key=value). Never use format strings likelogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging
SetRetryConfigandRateLimiterConfigper-provider inProviderConfig
Files:
src/ai_company/engine/prompt.pysrc/ai_company/engine/agent_engine.py
src/ai_company/{engine,providers}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedErrorsignals that all retries failed—the engine layer catches this to trigger fallback chains
Files:
src/ai_company/engine/prompt.pysrc/ai_company/engine/agent_engine.py
{src/ai_company,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2).claude/skill/agent files, (3) third-party import paths/module names. Tests must usetest-provider,test-small-001, etc.
Files:
src/ai_company/engine/prompt.pytests/unit/engine/test_run_result.pytests/integration/engine/test_agent_engine_integration.pysrc/ai_company/engine/agent_engine.pytests/unit/engine/test_agent_engine.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowfor test categorization
Maintain 80% minimum code coverage (enforced in CI)
Files:
tests/unit/engine/test_run_result.pytests/integration/engine/test_agent_engine_integration.pytests/unit/engine/test_agent_engine.py
🧠 Learnings (26)
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: After the PR exists, use `/aurelio-review-pr` to handle external reviewer feedback
Applied to files:
.claude/skills/aurelio-review-pr/SKILL.md
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (with `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/memory/**/*.py,src/services/**/*.py} : Story state is maintained through `src/memory/story_state.py` module using Pydantic models for validation (StoryState, Character, Chapter, etc.)
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use `StoryState` from `memory/story_state.py` for context management and balance context size vs. token limits when passing story context
Applied to files:
DESIGN_SPEC.mdsrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/memory/story_state.py : Story state must be maintained through `src/memory/story_state.py` module using Pydantic models for validation (StoryState, Character, Chapter, etc.)
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Applied to files:
DESIGN_SPEC.mdsrc/ai_company/engine/prompt.pysrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to **/*.py : Use `except A, B:` (no parentheses) for exception handling—PEP 758 except syntax. Ruff enforces this on Python 3.14
Applied to files:
src/ai_company/engine/prompt.pysrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions
Applied to files:
src/ai_company/engine/prompt.pysrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state
Applied to files:
src/ai_company/engine/prompt.pysrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
tests/unit/engine/test_run_result.pytests/integration/engine/test_agent_engine_integration.pysrc/ai_company/engine/agent_engine.pytests/unit/engine/test_agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to tests/**/*.py : Use pytest markers `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow` for test categorization
Applied to files:
tests/integration/engine/test_agent_engine_integration.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to pyproject.toml : Set test timeout to 30 seconds per test
Applied to files:
tests/integration/engine/test_agent_engine_integration.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/integration/test_*.py : Place integration tests for component interactions in `tests/integration/` directory
Applied to files:
tests/integration/engine/test_agent_engine_integration.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/{engine,providers}/**/*.py : `RetryExhaustedError` signals that all retries failed—the engine layer catches this to trigger fallback chains
Applied to files:
src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/ai_company/engine/agent_engine.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Use error handling decorators `handle_ollama_errors` and `retry_with_fallback` from utils/exceptions.py for LLM operations.
Applied to files:
src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py} : Use decorators `handle_ollama_errors` and `retry_with_fallback` for error handling
Applied to files:
src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/agents/*.py : All AI agents must extend `BaseAgent` from `src/agents/base.py` with retry logic and rate limiting
Applied to files:
src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/agents/**/*.py : Use error handling decorators `handle_ollama_errors` and `retry_with_fallback` from `src/utils/error_handling.py` for Ollama operations
Applied to files:
src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/providers/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry
Applied to files:
src/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-06T19:21:45.815Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T19:21:45.815Z
Learning: Applies to src/ai_company/**/*.py : DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging
Applied to files:
src/ai_company/engine/agent_engine.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : RAG retrieval failures are non-fatal—agents proceed with empty context if retrieval fails rather than raising errors.
Applied to files:
src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Be aware of concurrent agent execution - don't modify shared state without synchronization and use thread-safe data structures when needed
Applied to files:
src/ai_company/engine/agent_engine.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : All new code must have corresponding unit tests. When modifying existing code, update related tests. Tests should cover both happy paths and edge cases.
Applied to files:
tests/unit/engine/test_agent_engine.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: All new code must have corresponding unit tests. When modifying existing code, update related tests. Tests should cover both happy paths and edge cases.
Applied to files:
tests/unit/engine/test_agent_engine.py
🪛 LanguageTool
.claude/skills/aurelio-review-pr/SKILL.md
[style] ~260-~260: Consider using the typographical ellipsis character here instead.
Context: ...ng gh api — always unfiltered (no select(.user.login == ...) filtering): 1. *Review submissions...
(ELLIPSIS)
[style] ~270-~270: The phrase ‘Look for patterns’ is used very frequently. Consider using a less frequent alternative to set your writing apart from others.
Context: ... lines are outside the PR's diff range. Look for patterns like "Outside diff range comments (N)" ...
(LOOK_FOR_STYLE)
[style] ~290-~290: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ... no reviewer is accidentally missed. Important: Use gh api with --jq for filteri...
(EN_REPEATEDWORDS_IMPORTANT)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| CREATED tasks lack an assignee; terminal statuses (COMPLETED, CANCELLED, | ||
| FAILED) and BLOCKED/IN_REVIEW are not executable. |
There was a problem hiding this comment.
The docstring mentions FAILED as a non-executable terminal status, but TaskStatus.FAILED does not yet exist in core/enums.py (it's planned for future crash recovery per DESIGN_SPEC §6.6). Consider removing FAILED from this docstring to keep it accurate with respect to the current codebase, or add a note that it's a planned status.
| CREATED tasks lack an assignee; terminal statuses (COMPLETED, CANCELLED, | |
| FAILED) and BLOCKED/IN_REVIEW are not executable. | |
| CREATED tasks lack an assignee; terminal statuses (COMPLETED, CANCELLED) | |
| and BLOCKED/IN_REVIEW are not executable. |
…ilot - Remove non-existent FAILED from executable statuses docstring - Per-turn CostRecord recording instead of single aggregate - Pass tracker as explicit parameter to _submit_cost (eliminates type: ignore) - Preserve prepared context in _handle_fatal_error on post-setup failures - Clarify MemoryError/RecursionError propagation in _record_costs docstring - Log before re-raising non-recoverable errors in prompt builder - Remove body truncation from PR review skill fetch commands Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| logger.debug( | ||
| EXECUTION_ENGINE_PROMPT_BUILT, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| estimated_tokens=system_prompt.estimated_tokens, | ||
| ) |
There was a problem hiding this comment.
Misplaced EXECUTION_ENGINE_PROMPT_BUILT event
This log event fires inside _execute() — after _make_budget_checker() and _make_tool_invoker() have already run — but the system prompt was actually constructed earlier in _prepare_context() (line 250). The event name implies it fires immediately after the prompt build, but it fires much later (right before the execution loop starts at line 215). This can produce misleading traces: an observer correlating this event with prompt-construction latency would see inflated timing that includes tool-invoker and budget-checker setup.
Consider either:
- Moving the log to the end of
_prepare_context()where the prompt is actually built, or - Renaming the event to something like
EXECUTION_ENGINE_READY/EXECUTION_ENGINE_LOOP_STARTto better reflect when it actually fires.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 208-213
Comment:
**Misplaced `EXECUTION_ENGINE_PROMPT_BUILT` event**
This log event fires inside `_execute()` — after `_make_budget_checker()` and `_make_tool_invoker()` have already run — but the system prompt was actually constructed earlier in `_prepare_context()` (line 250). The event name implies it fires immediately after the prompt build, but it fires much later (right before the execution loop starts at line 215). This can produce misleading traces: an observer correlating this event with prompt-construction latency would see inflated timing that includes tool-invoker and budget-checker setup.
Consider either:
- Moving the log to the end of `_prepare_context()` where the prompt is actually built, or
- Renaming the event to something like `EXECUTION_ENGINE_READY` / `EXECUTION_ENGINE_LOOP_START` to better reflect when it actually fires.
How can I resolve this? If you propose a fix, please make it concise.| record = CostRecord( | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| provider=identity.model.provider, | ||
| model=identity.model.model_id, | ||
| input_tokens=turn.input_tokens, | ||
| output_tokens=turn.output_tokens, | ||
| cost_usd=turn.cost_usd, | ||
| timestamp=datetime.now(UTC), | ||
| ) |
There was a problem hiding this comment.
Cost record timestamps reflect recording time, not turn execution time
timestamp=datetime.now(UTC) is evaluated in the post-execution _record_costs loop (lines 398–432). Every CostRecord for a multi-turn run will therefore have an essentially identical timestamp (all calls happen within microseconds of each other, after the loop completes) that reflects when costs were recorded, not when each LLM turn actually ran.
This makes it impossible to reconstruct per-turn execution timing from cost records alone. For example, cost-analytics queries like "which turns were most expensive and when did they run?" cannot be answered from the stored records.
TurnRecord does not currently carry a timestamp, so a proper fix would require adding one. As a minimal improvement, you could at least document this limitation in the _record_costs docstring (lines 377–386) so future maintainers understand that CostRecord.timestamp represents the batch-recording instant rather than per-turn execution time.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 416-425
Comment:
**Cost record timestamps reflect recording time, not turn execution time**
`timestamp=datetime.now(UTC)` is evaluated in the post-execution `_record_costs` loop (lines 398–432). Every `CostRecord` for a multi-turn run will therefore have an essentially identical timestamp (all calls happen within microseconds of each other, after the loop completes) that reflects when costs were *recorded*, not when each LLM turn actually *ran*.
This makes it impossible to reconstruct per-turn execution timing from cost records alone. For example, cost-analytics queries like "which turns were most expensive and when did they run?" cannot be answered from the stored records.
`TurnRecord` does not currently carry a timestamp, so a proper fix would require adding one. As a minimal improvement, you could at least document this limitation in the `_record_costs` docstring (lines 377–386) so future maintainers understand that `CostRecord.timestamp` represents the batch-recording instant rather than per-turn execution time.
How can I resolve this? If you propose a fix, please make it concise.| if task.budget_limit > 0: | ||
| parts.append("") | ||
| parts.append(f"**Budget limit:** ${task.budget_limit:.2f} USD") |
There was a problem hiding this comment.
Total budget limit shown to LLM; remaining budget not considered
The task instruction appends task.budget_limit — the total configured limit — to the user message sent to the LLM. If the same task is resumed across multiple run() calls (which the IN_PROGRESS acceptance path at line 51 allows), the LLM is told the full original budget even though some of it may have been consumed in a prior run. The in-run BudgetChecker (lines 583–586) similarly only checks against task.budget_limit, not remaining budget.
This creates two related issues:
- The LLM may believe it has more spending headroom than it actually does.
- Budget-aware reasoning by the LLM ("I have $X left") will be inaccurate on resumed runs.
If resumption is out of scope for M3, this should be documented as a known limitation. If it is in scope, consider passing remaining_budget = task.budget_limit - prior_spend and only showing/checking that value.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 560-562
Comment:
**Total budget limit shown to LLM; remaining budget not considered**
The task instruction appends `task.budget_limit` — the *total* configured limit — to the user message sent to the LLM. If the same task is resumed across multiple `run()` calls (which the `IN_PROGRESS` acceptance path at line 51 allows), the LLM is told the full original budget even though some of it may have been consumed in a prior run. The in-run `BudgetChecker` (lines 583–586) similarly only checks against `task.budget_limit`, not remaining budget.
This creates two related issues:
1. The LLM may believe it has more spending headroom than it actually does.
2. Budget-aware reasoning by the LLM ("I have $X left") will be inaccurate on resumed runs.
If resumption is out of scope for M3, this should be documented as a known limitation. If it is in scope, consider passing `remaining_budget = task.budget_limit - prior_spend` and only showing/checking that value.
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
engine/agent_engine.py, 513 lines) — top-level orchestrator that composes prompt construction, execution context, execution loop, tool invocation, and cost tracking into a singlerun()entry pointengine/run_result.py, 84 lines) — frozen Pydantic model wrappingExecutionResultwith engine metadata and computed fields (termination_reason,total_turns,total_cost_usd,is_success)observability/events/execution.pyfor structured logging coverageKey design decisions
run()validates inputs (agent ACTIVE, task ASSIGNED/IN_PROGRESS, max_turns >= 1), builds system prompt, seeds conversation, delegates toExecutionLoop, records costs, and returns structured resultMemoryError/RecursionErroralways propagate — all other exceptions are caught and returned as error results_handle_fatal_errorhas a defensive secondary-failure guardmemory_messagesparameter provides the injection hook for M5 working memoryTest coverage
test_agent_engine.py(orchestration),test_agent_engine_errors.py(error handling),test_run_result.py(model + helpers)Closes #11
Test plan
uv run ruff check src/ tests/— all checks passeduv run mypy src/ tests/— no issues in 222 filesuv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80— 1970 passed, 95.36% coverage🤖 Generated with Claude Code