feat: implement agent runtime state vs immutable config split by Aureliolo · Pull Request #115 · Aureliolo/synthorg

Aureliolo · 2026-03-05T20:33:48Z

Summary

TaskExecution — frozen Pydantic model wrapping Task with evolving execution state (status transitions via model_copy(update=...)), cost accumulation (TokenUsage), turn counting, and StatusTransition audit trail
AgentContext — frozen runtime context wrapping AgentIdentity + optional TaskExecution with conversation history, accumulated cost, turn limits, and snapshot generation
AgentContextSnapshot — compact frozen DTO for reporting/logging with task_id/task_status pair invariant validation
New error types — ExecutionStateError and MaxTurnsExceededError under EngineError hierarchy
Public utilities — add_token_usage(), ZERO_TOKEN_USAGE, DEFAULT_MAX_TURNS exported from engine package
6 new event constants for structured logging of execution lifecycle events
DESIGN_SPEC.md updated — §3.1 runtime state docs, §6.1 TaskExecution wrapper note, §15.3 project structure (added errors.py, prompt_template.py), §15.5 "Config vs runtime split" marked Adopted
README.md — status updated to M3 in progress

Pre-PR Review

Pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency). 16 findings addressed:

Added @pytest.mark.unit markers to all test classes (was invisible to -m unit runs)
Added logger.error() before raising ExecutionStateError (CLAUDE.md logging rule)
Added tests for MaxTurnsExceededError, error hierarchy, __all__ re-exports, zero-value token usage
Used NotBlankStr for execution_id/agent_id identifier fields
Added model_validator for task_id/task_status pair invariant on AgentContextSnapshot
Promoted _ZERO_USAGE/_add_token_usage to public (ZERO_TOKEN_USAGE/add_token_usage)
Extracted DEFAULT_MAX_TURNS constant to avoid magic number duplication
Fixed _add_token_usage docstring accuracy (Returns section)
Improved MaxTurnsExceededError docstring with field references
Updated DESIGN_SPEC.md §15.3 (missing errors.py, prompt_template.py)
Updated README.md status (M2→M3)
Organized test files: moved error/export tests to dedicated test_errors.py/test_exports.py

Test Plan

uv run ruff check src/ tests/ — clean
uv run ruff format src/ tests/ — no changes
uv run mypy src/ tests/ — no issues (199 files)
uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80 — 1770 passed, 95.25% coverage
All pre-commit hooks pass

closes #106

Add TaskExecution and AgentContext frozen Pydantic models that use model_copy(update=...) for O(1) state transitions without re-running validators. TaskExecution wraps Task with status transitions, cost accumulation, and audit trail. AgentContext wraps AgentIdentity with conversation history, turn tracking, and configurable max turns. - Add StatusTransition audit record and TaskExecution runtime wrapper - Add AgentContext and AgentContextSnapshot for execution tracking - Add ExecutionStateError and MaxTurnsExceededError to engine errors - Add 6 execution-domain event constants to observability - Update DESIGN_SPEC.md sections 3.1, 6.1, 15.3, 15.5 - 53 new tests with 100% coverage on both modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pre-reviewed by 9 agents, 16 findings addressed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-05T20:34:02Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-03-05T20:34:09Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 80e57fdc-5b79-4a5f-9a04-f12aad1eaf49

📥 Commits

Reviewing files that changed from the base of the PR and between 8d01284 and 608a234.

📒 Files selected for processing (11)

DESIGN_SPEC.md
src/ai_company/engine/__init__.py
src/ai_company/engine/context.py
src/ai_company/engine/errors.py
src/ai_company/engine/task_execution.py
src/ai_company/observability/events.py
src/ai_company/providers/__init__.py
src/ai_company/providers/models.py
tests/unit/engine/test_context.py
tests/unit/engine/test_errors.py
tests/unit/engine/test_task_execution.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Single-agent (M3) runtime in progress with immutable execution contexts, turn limits, and audited task-state transitions.
- Execution wrappers now track turns, accumulated cost, and expose snapshots.
Error Handling
- New runtime errors for invalid state transitions and max-turns exceeded.
Observability
- New execution lifecycle event names for creation, transitions, cost, turns, snapshots, and failure signals.
Documentation
- Design spec and README updated to describe the adopted runtime model.
Tests
- Comprehensive unit tests added for contexts, task execution, errors, and exports.

Walkthrough

Adds an M3 runtime layer: immutable, audited TaskExecution wraps Task for status transitions and cost tracking; AgentContext wraps AgentIdentity + TaskExecution for conversation, turn management, and snapshots; new engine errors, observability event names, provider token-usage helpers, exports, and tests accompany the changes. (46 words)

Changes

Cohort / File(s)	Summary
Design & Docs `DESIGN_SPEC.md`, `README.md`	Updated design narrative to adopt M3 runtime wrapper approach (TaskExecution, AgentContext) and updated milestone status.
Runtime models `src/ai_company/engine/task_execution.py`, `src/ai_company/engine/context.py`	Added `TaskExecution` (StatusTransition log, immutable transitions, cost/turn tracking) and `AgentContext` (execution_id, conversation, turns, cost, snapshot).
Engine errors `src/ai_company/engine/errors.py`	Added `ExecutionStateError` and `MaxTurnsExceededError`.
Observability & Exports `src/ai_company/observability/events.py`, `src/ai_company/engine/__init__.py`	Added execution.* event constants; expanded engine exports to include DEFAULT_MAX_TURNS, AgentContext/AgentContextSnapshot, StatusTransition, TaskExecution, new errors, and token-usage helpers.
Provider utilities `src/ai_company/providers/models.py`, `src/ai_company/providers/__init__.py`	Introduced `ZERO_TOKEN_USAGE` and `add_token_usage`, and exported them from providers package.
Tests `tests/unit/engine/conftest.py`, `tests/unit/engine/test_context.py`, `tests/unit/engine/test_task_execution.py`, `tests/unit/engine/test_errors.py`, `tests/unit/engine/test_exports.py`	Added fixtures and comprehensive unit tests covering TaskExecution, AgentContext lifecycle, error hierarchy, token-usage helpers, and engine exports.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant AgentContext
    participant TaskExecution
    participant Task
    participant Observability

    User->>AgentContext: from_identity(identity, task)
    AgentContext->>TaskExecution: from_task(task)
    TaskExecution->>Task: read snapshot/status
    TaskExecution->>Observability: emit EXECUTION_TASK_CREATED
    AgentContext->>Observability: emit EXECUTION_CONTEXT_CREATED

    User->>AgentContext: with_message(user_msg)
    AgentContext->>AgentContext: append conversation (immutable copy)

    User->>AgentContext: with_turn_completed(usage, response_msg)
    AgentContext->>AgentContext: increment turn_count, append response
    AgentContext->>TaskExecution: with_cost(usage)
    TaskExecution->>TaskExecution: update accumulated_cost
    TaskExecution->>Observability: emit EXECUTION_COST_RECORDED
    AgentContext->>Observability: emit EXECUTION_CONTEXT_TURN

    User->>AgentContext: with_task_transition(target_status)
    AgentContext->>TaskExecution: with_transition(target_status)
    TaskExecution->>TaskExecution: append StatusTransition, set timestamps
    TaskExecution->>Observability: emit EXECUTION_TASK_TRANSITION

    User->>AgentContext: to_snapshot()
    AgentContext->>Observability: emit EXECUTION_CONTEXT_SNAPSHOT

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

PR #115 — Implements the same config-vs-runtime split (TaskExecution, AgentContext, related errors, token-usage helpers, events, and tests); strongly related.
PR #71 — Introduced TaskStatus and transition validation logic that TaskExecution depends on; closely related.
PR #105 — Extended engine error surface and touched engine exports/events; has overlapping changes to error/event interfaces.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.32% which is insufficient. The required threshold is 100.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: implementing a split between agent runtime state and immutable config, which is the core focus of this PR.
Description check	✅ Passed	The description comprehensively documents the changes, including new types (TaskExecution, AgentContext, AgentContextSnapshot), error types, utilities, observability additions, documentation updates, and test coverage results.
Linked Issues check	✅ Passed	The PR fully satisfies issue `#106` acceptance criteria: introduces AgentContext for runtime state with conversation/cost/task reference, TaskExecution with cheap transitions avoiding re-validation, preserves frozen models for config/snapshots, and updates DESIGN_SPEC.md sections 3 and 6.
Out of Scope Changes check	✅ Passed	All changes align with issue `#106` objectives: new runtime types, error hierarchy, utilities, event constants, documentation updates, and comprehensive test coverage. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/agent-runtime-state-split

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-05T20:34:16Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the agent execution engine by introducing a clear separation between immutable configuration models and mutable runtime state. It establishes new frozen Pydantic models, TaskExecution and AgentContext, to manage the dynamic aspects of agent and task execution, such as conversation history, accumulated costs, and status transitions, using model_copy for efficient state updates. This change improves the clarity, immutability, and auditability of the system's core execution flow.

Highlights

Runtime State Models: Introduced TaskExecution to wrap Task with evolving execution state (status, cost, turns, audit trail) and AgentContext to manage agent runtime context (conversation, cost, turn limits, TaskExecution).
Snapshot and DTO: Added AgentContextSnapshot, a compact frozen Data Transfer Object (DTO) for reporting and logging, including invariant validation for task_id/task_status pairs.
Error Handling: Defined new error types, ExecutionStateError for invalid state transitions and MaxTurnsExceededError for exceeding turn limits, both inheriting from EngineError.
Public Utilities: Exported add_token_usage(), ZERO_TOKEN_USAGE, and DEFAULT_MAX_TURNS from the engine package for broader use.
Observability: Added six new event constants for structured logging of execution lifecycle events, enhancing traceability.
Documentation Updates: Updated DESIGN_SPEC.md to reflect the adopted 'Config vs runtime split' strategy, detail runtime state, and update project structure. README.md status was also updated to M3 in progress.

Changelog

DESIGN_SPEC.md
- Updated runtime state documentation to reflect the adopted engine/ approach using frozen models and model_copy.
- Added a note explaining TaskExecution as a runtime wrapper for Task.
- Updated the project structure to include new errors.py, prompt_template.py, task_execution.py, and context.py files.
- Marked the 'Config vs runtime split' convention as 'Adopted (M3)'.
README.md
- Updated the project status from M2 to M3 in progress.
src/ai_company/engine/init.py
- Updated the module docstring to reflect re-exports for runtime execution state and engine errors.
- Added imports and re-exports for AgentContext, AgentContextSnapshot, DEFAULT_MAX_TURNS, ExecutionStateError, MaxTurnsExceededError, StatusTransition, TaskExecution, ZERO_TOKEN_USAGE, and add_token_usage.
src/ai_company/engine/context.py
- Added AgentContext model to manage agent runtime state, including conversation, accumulated cost, turn count, and task execution.
- Added AgentContextSnapshot model for compact reporting and logging of agent context state, including a model_validator for task_id/task_status invariant.
- Introduced DEFAULT_MAX_TURNS constant for LLM turn limits.
src/ai_company/engine/errors.py
- Added ExecutionStateError for invalid execution state transitions.
- Added MaxTurnsExceededError for when an agent's turn count exceeds its maximum limit.
src/ai_company/engine/task_execution.py
- Added TaskExecution model to wrap Task with evolving execution state, including status, transition log, accumulated cost, and turn count.
- Added StatusTransition model to record audit trails of task status changes.
- Implemented add_token_usage utility function for summing TokenUsage objects.
- Defined ZERO_TOKEN_USAGE constant for initial token usage.
src/ai_company/observability/events.py
- Added new event constants for execution lifecycle logging: EXECUTION_TASK_CREATED, EXECUTION_TASK_TRANSITION, EXECUTION_COST_RECORDED, EXECUTION_CONTEXT_CREATED, EXECUTION_CONTEXT_TURN, EXECUTION_CONTEXT_SNAPSHOT, and EXECUTION_CONTEXT_NO_TASK.
tests/unit/engine/conftest.py
- Added sample_token_usage fixture for testing cost accumulation.
- Added sample_task_execution fixture for TaskExecution instances.
- Added sample_agent_context fixture for AgentContext instances.
tests/unit/engine/test_context.py
- Added unit tests for AgentContext.from_identity factory, covering task and no-task scenarios, defaults, execution ID generation, and custom max turns.
- Added unit tests for AgentContext.with_message to verify conversation message appending.
- Added unit tests for AgentContext.with_turn_completed and has_turns_remaining, checking turn increments, cost accumulation, and boundary conditions.
- Added unit tests for AgentContext.with_task_transition, including delegation and error handling for missing task execution or invalid transitions.
- Added unit tests for AgentContext.to_snapshot, verifying snapshot content, task presence, immutability, and timestamps.
- Added unit tests to confirm AgentContext immutability and that model_copy preserves original objects.
- Added unit tests for logging events related to AgentContext operations.
tests/unit/engine/test_errors.py
- Added unit tests to verify the inheritance hierarchy of ExecutionStateError, MaxTurnsExceededError, and PromptBuildError from EngineError.
tests/unit/engine/test_exports.py
- Added unit tests to ensure all names listed in ai_company.engine.__all__ are correctly importable.
tests/unit/engine/test_task_execution.py
- Added unit tests for StatusTransition construction, default reason, and immutability.
- Added unit tests for TaskExecution.from_task factory, checking status matching, defaults, and initial terminal state.
- Added unit tests for TaskExecution.with_transition, covering valid/invalid transitions, log accumulation, and started_at/completed_at timestamps.
- Added unit tests for TaskExecution.with_cost to verify token usage and turn count accumulation.
- Added unit tests for TaskExecution.to_task_snapshot, ensuring status updates and field preservation.
- Added unit tests to confirm TaskExecution immutability and that model_copy preserves original objects.
- Added unit tests for add_token_usage utility, checking correct summation, invariant maintenance, and zero usage.
- Added unit tests for logging events related to TaskExecution operations.

Activity

The pull request underwent a pre-review process by 9 automated agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency).
A total of 16 findings identified during the pre-review were addressed by the author.
Specific findings addressed include adding @pytest.mark.unit markers to tests, ensuring logger.error() calls before raising ExecutionStateError, adding tests for new error types and __all__ re-exports, using NotBlankStr for identifiers, implementing a model_validator for AgentContextSnapshot, promoting internal utilities to public, extracting DEFAULT_MAX_TURNS as a constant, improving docstring accuracy, and reorganizing test files.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a well-designed separation between immutable agent configuration and mutable runtime state. The new TaskExecution and AgentContext models are implemented as frozen Pydantic models, correctly using model_copy(update=...) for state transitions, which is a robust pattern for immutability. The changes are supported by comprehensive unit tests covering functionality, edge cases, and logging, which significantly increases confidence in the new execution engine components. The documentation in DESIGN_SPEC.md and README.md has also been updated to reflect these architectural changes. I have one minor suggestion to improve code maintainability.

gemini-code-assist · 2026-03-05T20:36:13Z

src/ai_company/engine/__init__.py

 __all__ = [
+    "DEFAULT_MAX_TURNS",
+    "ZERO_TOKEN_USAGE",
+    "AgentContext",
+    "AgentContextSnapshot",
    "DefaultTokenEstimator",
    "EngineError",
+    "ExecutionStateError",
+    "MaxTurnsExceededError",
    "PromptBuildError",
    "PromptTokenEstimator",
+    "StatusTransition",
    "SystemPrompt",
+    "TaskExecution",
+    "add_token_usage",
    "build_system_prompt",
 ]


For better maintainability and readability, it's a good practice to keep the __all__ list sorted alphabetically. This makes it easier to find exports as the list grows. The previous __all__ list in this file was sorted, so this change aligns with the existing project convention.

Suggested change

__all__ = [

"DEFAULT_MAX_TURNS",

"ZERO_TOKEN_USAGE",

"AgentContext",

"AgentContextSnapshot",

"DefaultTokenEstimator",

"EngineError",

"ExecutionStateError",

"MaxTurnsExceededError",

"PromptBuildError",

"PromptTokenEstimator",

"StatusTransition",

"SystemPrompt",

"TaskExecution",

"add_token_usage",

"build_system_prompt",

]

__all__ = [

"AgentContext",

"AgentContextSnapshot",

"DEFAULT_MAX_TURNS",

"DefaultTokenEstimator",

"EngineError",

"ExecutionStateError",

"MaxTurnsExceededError",

"PromptBuildError",

"PromptTokenEstimator",

"StatusTransition",

"SystemPrompt",

"TaskExecution",

"ZERO_TOKEN_USAGE",

"add_token_usage",

"build_system_prompt",

]

Copilot

Pull request overview

Implements a clear split between immutable agent/task configuration and runtime execution state by introducing frozen runtime models (TaskExecution, AgentContext) with copy-on-write transitions, plus supporting errors, events, exports, and tests.

Changes:

Added frozen runtime state models for task execution and agent context, including snapshots, transitions, logging, and token usage accumulation.
Introduced new engine errors/events and re-exported the public runtime API from ai_company.engine.
Added unit tests and updated docs/status to reflect the adopted runtime/config split.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/unit/engine/test_task_execution.py	Adds unit coverage for `TaskExecution`, transitions, cost accumulation, immutability, and logging events.
tests/unit/engine/test_context.py	Adds unit coverage for `AgentContext`/`AgentContextSnapshot`, turn tracking, transitions, invariants, and logging events.
tests/unit/engine/test_errors.py	Validates engine error inheritance for newly introduced error types.
tests/unit/engine/test_exports.py	Ensures `engine.__all__` re-exports remain importable.
tests/unit/engine/conftest.py	Provides shared fixtures for token usage, task execution, and agent context.
src/ai_company/observability/events.py	Introduces execution lifecycle event constants for structured logging.
src/ai_company/engine/task_execution.py	Implements `TaskExecution`, `StatusTransition`, and `add_token_usage` helper with lifecycle logging.
src/ai_company/engine/context.py	Implements `AgentContext`, `AgentContextSnapshot` with invariant validation, cost/turn tracking, and snapshotting.
src/ai_company/engine/errors.py	Adds `ExecutionStateError` and `MaxTurnsExceededError` to the engine error hierarchy.
src/ai_company/engine/init.py	Re-exports the runtime state API, errors, and utilities via `__all__`.
README.md	Updates milestone status to reflect M3 in progress.
DESIGN_SPEC.md	Updates runtime state documentation and marks the config/runtime split as adopted.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T20:37:31Z

src/ai_company/engine/task_execution.py

+    accumulated_cost: TokenUsage = Field(
+        default=ZERO_TOKEN_USAGE,
+        description="Running cost totals",
+    )


Using a TokenUsage model instance as a field default can create a shared mutable default across TaskExecution instances if TokenUsage itself is mutable (not frozen). Prefer default_factory to generate a fresh zero-usage value per instance (e.g., default_factory=lambda: ZERO_TOKEN_USAGE.model_copy() or a small zero_token_usage() factory) while still keeping ZERO_TOKEN_USAGE as an exported constant.

Copilot · 2026-03-05T20:37:31Z

src/ai_company/engine/context.py

+        description="Accumulated conversation messages",
+    )
+    accumulated_cost: TokenUsage = Field(
+        default=ZERO_TOKEN_USAGE,


Same shared-default issue as in TaskExecution: ZERO_TOKEN_USAGE is a TokenUsage instance, so if it’s mutable this default can be shared across all AgentContext instances. Switch this field to a default_factory that returns a new zero-usage object to avoid cross-instance contamination through nested mutation.

Suggested change

default=ZERO_TOKEN_USAGE,

default_factory=lambda: ZERO_TOKEN_USAGE.model_copy(deep=True),

Copilot · 2026-03-05T20:37:32Z

src/ai_company/engine/errors.py

+
+
+class MaxTurnsExceededError(EngineError):
+    """Raised when ``turn_count`` reaches ``max_turns`` in ``AgentContext``."""


This docstring states the error is raised by AgentContext, but there’s no raising logic in AgentContext.with_turn_completed() (it only increments and exposes has_turns_remaining). Either implement the raise where the boundary is crossed (likely in with_turn_completed) or adjust the docstring to reflect where the error is actually intended to be raised (e.g., by the execution loop/engine using has_turns_remaining).

Suggested change

"""Raised when ``turn_count`` reaches ``max_turns`` in ``AgentContext``."""

"""Raised when an agent's ``turn_count`` reaches or exceeds its configured ``max_turns``."""

Copilot · 2026-03-05T20:37:32Z

DESIGN_SPEC.md

 |------------|--------|----------|-----------|
 | **Immutability strategy** | Adopted | `MappingProxyType` at construction for dict fields in registries and collections; `frozen=True` on all config/identity models | MappingProxyType is O(1) and prevents accidental mutation. Pydantic `frozen=True` is confirmed shallow (pydantic#7784). |
-| **Config vs runtime split** | Planned (M3) | Frozen models for config/identity; `model_copy(update=...)` for runtime state transitions | Frozen models cannot represent evolving state without serialize/validate round-trips. Separate models keep config immutable while state is explicit. Currently only config layer exists (`AgentIdentity`). |
+| **Config vs runtime split** | Adopted (M3) | Frozen models for config/identity; `model_copy(update=...)` for runtime state transitions | `TaskExecution` and `AgentContext` (in `engine/`) are frozen Pydantic models that use `model_copy(update=...)` for O(1) state transitions, skipping all validators. Config layer (`AgentIdentity`, `Task`) remains unchanged. |


The claim that transitions are “O(1)” is inaccurate for the implemented code paths (e.g., transition_log uses tuple concatenation which grows with history). Also “skipping all validators” is a subtle behavior that depends on Pydantic model_copy semantics and may mislead readers about what constraints are/aren’t enforced. Consider rewording to something like “copy-on-write transitions without re-validation” and avoid the O(1) assertion (or qualify it precisely).

Suggested change

| **Config vs runtime split** | Adopted (M3) | Frozen models for config/identity; `model_copy(update=...)` for runtime state transitions | `TaskExecution` and `AgentContext` (in `engine/`) are frozen Pydantic models that use `model_copy(update=...)` for O(1) state transitions, skipping all validators. Config layer (`AgentIdentity`, `Task`) remains unchanged. |

| **Config vs runtime split** | Adopted (M3) | Frozen models for config/identity; `model_copy(update=...)` for runtime state transitions | `TaskExecution` and `AgentContext` (in `engine/`) are frozen Pydantic models that use `model_copy(update=...)` for copy-on-write style state transitions without re-running validators (per Pydantic `model_copy` semantics). Config layer (`AgentIdentity`, `Task`) remains unchanged. |

greptile-apps · 2026-03-05T20:39:22Z

Greptile Summary

This PR implements a clean runtime state / immutable config split for the agent execution engine: a frozen TaskExecution wrapper tracks status transitions, cost accumulation, and audit trail for a Task, while a frozen AgentContext wraps AgentIdentity + TaskExecution with conversation history, turn counting, and turn-limit enforcement — all via model_copy(update=...). The design is well-structured, consistently logged, and ships with excellent test coverage (95%+).

Key changes:

TaskExecution + StatusTransition frozen models with transition validation and terminal-state detection
AgentContext + AgentContextSnapshot with task_id/task_status pair invariant and UUID-generated execution_id
New ExecutionStateError / MaxTurnsExceededError under the EngineError hierarchy
6 new EXECUTION_* structured-log event constants
ZERO_TOKEN_USAGE, add_token_usage, and DEFAULT_MAX_TURNS promoted to the public engine package API
DESIGN_SPEC.md and README updated to reflect M3 adoption

Issues found:

AgentContext.with_turn_completed can raise an undocumented ExecutionStateError (in addition to MaxTurnsExceededError) when the embedded TaskExecution is already in a terminal state — the Raises: docstring is incomplete, and no test covers this path.

Confidence Score: 4/5

Safe to merge after resolving the undocumented ExecutionStateError raise path in with_turn_completed.
The implementation is architecturally clean, the immutable-copy pattern is applied consistently, and test coverage is thorough. The one issue — an undocumented exception path in with_turn_completed when the task is terminal — is a contract correctness problem that could surprise callers at runtime, but it does not break any existing functionality since the scenario (accumulating cost after task completion) has no callers yet in this milestone.
src/ai_company/engine/context.py — the with_turn_completed method's Raises: section and the missing test for the terminal-task path in tests/unit/engine/test_context.py.

Important Files Changed

Filename	Overview
src/ai_company/engine/context.py	Introduces `AgentContext` and `AgentContextSnapshot` frozen Pydantic models. Core logic is sound, but `with_turn_completed` has an undocumented `ExecutionStateError` raise path when a terminal `TaskExecution` is present.
src/ai_company/engine/task_execution.py	Introduces `TaskExecution` and `StatusTransition` frozen models with clear immutable-state-via-copy semantics. `_TERMINAL_STATUSES` derivation from `VALID_TRANSITIONS`, `started_at`/`completed_at` lifecycle, and transition validation all look correct.
src/ai_company/engine/errors.py	Adds `ExecutionStateError` and `MaxTurnsExceededError` under the `EngineError` hierarchy. Clean and minimal; all new errors are correctly placed in the inheritance tree.
src/ai_company/engine/init.py	Re-exports all new public symbols (`DEFAULT_MAX_TURNS`, `ZERO_TOKEN_USAGE`, `add_token_usage`, `StatusTransition`, `TaskExecution`, error classes, and context models) correctly in `__all__`.
src/ai_company/observability/events.py	Adds 6 new `EXECUTION_*` event constants following the established `domain.noun.verb` naming convention. All constants are typed as `Final[str]` and grouped correctly.
tests/unit/engine/test_context.py	Comprehensive coverage of `AgentContext` factory, conversation, turn, transition, snapshot, immutability, and logging paths. Missing a test for `with_turn_completed` when `task_execution.is_terminal` is True, leaving the undocumented `ExecutionStateError` path uncovered.
tests/unit/engine/test_task_execution.py	Well-structured tests covering construction, valid/invalid transitions, cost accumulation, terminal-state guard, `to_task_snapshot`, immutability, and structured logging. All key paths covered.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant AgentContext
    participant TaskExecution
    participant StatusTransition

    Caller->>AgentContext: from_identity(identity, task)
    AgentContext->>TaskExecution: from_task(task)
    AgentContext-->>Caller: AgentContext (turn_count=0)

    Caller->>AgentContext: with_task_transition(IN_PROGRESS)
    AgentContext->>TaskExecution: with_transition(IN_PROGRESS)
    TaskExecution->>StatusTransition: create(ASSIGNED→IN_PROGRESS)
    TaskExecution-->>AgentContext: new TaskExecution (started_at set)
    AgentContext-->>Caller: new AgentContext

    Caller->>AgentContext: with_turn_completed(usage, msg)
    AgentContext->>AgentContext: check has_turns_remaining
    AgentContext->>TaskExecution: with_cost(usage)
    TaskExecution-->>AgentContext: new TaskExecution (turn_count+1)
    AgentContext-->>Caller: new AgentContext (turn_count+1)

    Caller->>AgentContext: with_task_transition(COMPLETED)
    AgentContext->>TaskExecution: with_transition(COMPLETED)
    TaskExecution->>StatusTransition: create(IN_REVIEW→COMPLETED)
    TaskExecution-->>AgentContext: new TaskExecution (completed_at set, is_terminal=True)
    AgentContext-->>Caller: new AgentContext

    Caller->>AgentContext: to_snapshot()
    AgentContext-->>Caller: AgentContextSnapshot

_{Last reviewed commit: 608a234}

tests/unit/engine/test_context.py

src/ai_company/engine/context.py

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/engine/context.py`:
- Line 17: with_turn_completed currently doesn't enforce the state's max_turns
limit; update the with_turn_completed context manager (and the analogous block
around lines 181-214) to check state.turns (or state.turn_count) against
state.max_turns before allowing a new turn to complete and raise
MaxTurnsExceededError (or wrap it in ExecutionStateError if the codebase expects
that) when the hard limit would be exceeded; locate the function/method named
with_turn_completed and add a pre-commit check that prevents
incrementing/committing the turn if state.max_turns is not None and state.turns
>= state.max_turns, returning/raising the appropriate MaxTurnsExceededError so
the limit is enforced at the state boundary.

In `@src/ai_company/engine/task_execution.py`:
- Around line 208-213: The log emission that records cost/state changes uses
logger.debug for EXECUTION_COST_RECORDED but must be INFO because with_cost
updates execution state (accumulated_cost, turn_count); change the call from
logger.debug(...) to logger.info(...) in the task execution flow (the site where
logger.debug is invoked with EXECUTION_COST_RECORDED, task_id=self.task.id,
turn=result.turn_count, cost_usd=usage.cost_usd) so the lifecycle transition is
logged at INFO level.

In `@tests/unit/engine/test_context.py`:
- Around line 55-61: Update the test_defaults test to use the exported
DEFAULT_MAX_TURNS constant instead of the hardcoded 20: locate the test function
test_defaults and replace the assertion against the literal 20 with an assertion
that ctx.max_turns == DEFAULT_MAX_TURNS (import DEFAULT_MAX_TURNS at top of the
test module if not already present), so the test relies on the AgentContext
default via AgentContext.from_identity and remains correct if the default
changes.

In `@tests/unit/engine/test_errors.py`:
- Around line 28-29: Add an instance check to the
test_prompt_build_error_is_engine_error so it asserts both
issubclass(PromptBuildError, EngineError) and isinstance(PromptBuildError(),
EngineError); locate the test function named
test_prompt_build_error_is_engine_error and create a PromptBuildError instance
to verify it is an EngineError for consistency with
test_execution_state_error_is_engine_error and
test_max_turns_exceeded_error_is_engine_error.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 50985699-4edf-49b0-bcba-c4cb78bf1cb6

📥 Commits

Reviewing files that changed from the base of the PR and between 99766ee and 8d01284.

📒 Files selected for processing (12)

DESIGN_SPEC.md
README.md
src/ai_company/engine/__init__.py
src/ai_company/engine/context.py
src/ai_company/engine/errors.py
src/ai_company/engine/task_execution.py
src/ai_company/observability/events.py
tests/unit/engine/conftest.py
tests/unit/engine/test_context.py
tests/unit/engine/test_errors.py
tests/unit/engine/test_exports.py
tests/unit/engine/test_task_execution.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: All public functions must have type hints; mypy strict mode is enforced
Docstrings must use Google style format; required on all public classes and functions (enforced by ruff D rules)
Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) for multiple exception handling — PEP 758 enforcement by ruff on Python 3.14
Line length must not exceed 88 characters (enforced by ruff)
Functions must be fewer than 50 lines
Files must be fewer than 800 lines

Files:

src/ai_company/engine/errors.py
tests/unit/engine/test_context.py
tests/unit/engine/test_task_execution.py
tests/unit/engine/test_exports.py
tests/unit/engine/conftest.py
src/ai_company/engine/__init__.py
src/ai_company/engine/task_execution.py
src/ai_company/engine/context.py
src/ai_company/observability/events.py
tests/unit/engine/test_errors.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic must import and use the logger: from ai_company.observability import get_logger then logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code — use the structured logger from ai_company.observability
Logger variable name must always be logger (not _logger, not log)
Event names must always use constants from ai_company.observability.events (e.g., PROVIDER_CALL_START, BUDGET_RECORD_ADDED, TOOL_INVOKE_START); import directly: from ai_company.observability.events import EVENT_CONSTANT
Structured logging must use logger.info(EVENT, key=value) format — never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR level with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT require logging
Use immutability — create new objects, never mutate existing ones. For dict/list fields in frozen Pydantic models, use MappingProxyType wrapping at construction (not deepcopy on access). Deep-copy only at system boundaries (e.g., passing data to tool.execute(), serializing for persistence)
Config vs runtime state: use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, ConfigDict). For new code: use @computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr (from core.types) for non-optional identifier/name fields instead of manual whitespace validators
For async concurrency in new code, prefer asyncio.TaskGroup for fan-out/fan-in parallel operations (e.g., multiple tool invocations, parallel agent cal...

Files:

src/ai_company/engine/errors.py
src/ai_company/engine/__init__.py
src/ai_company/engine/task_execution.py
src/ai_company/engine/context.py
src/ai_company/observability/events.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test markers must be: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Coverage must be 80% minimum (enforced in CI)
Async test mode: use asyncio_mode = "auto" — no manual @pytest.mark.asyncio needed
Test timeout: 30 seconds per test
Use vendor-agnostic fixtures with fake model IDs/names in tests (e.g., test-haiku-001, test-provider), never real vendor model IDs — tests must not be coupled to external providers

Files:

tests/unit/engine/test_context.py
tests/unit/engine/test_task_execution.py
tests/unit/engine/test_exports.py
tests/unit/engine/conftest.py
tests/unit/engine/test_errors.py

🧠 Learnings (18)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Config vs runtime state: use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model

📚 Learning: 2026-03-05T19:47:27.026Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains

Applied to files:

src/ai_company/engine/errors.py

📚 Learning: 2026-03-05T19:47:27.026Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Handle errors explicitly, never silently swallow them

Applied to files:

src/ai_company/engine/errors.py

📚 Learning: 2026-01-24T09:54:45.426Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

src/ai_company/engine/errors.py

📚 Learning: 2026-01-24T09:54:45.426Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

tests/unit/engine/test_context.py

📚 Learning: 2026-01-24T09:54:45.426Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use `StoryState` from `memory/story_state.py` for context management and balance context size vs. token limits when passing story context

Applied to files:

tests/unit/engine/test_context.py
DESIGN_SPEC.md
src/ai_company/engine/context.py

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest. Place test files in `tests/unit/` with `test_*.py` naming convention.

Applied to files:

tests/unit/engine/test_exports.py

📚 Learning: 2026-01-24T16:33:29.354Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest in `tests/unit/` with `test_*.py` naming convention

Applied to files:

tests/unit/engine/test_exports.py

📚 Learning: 2026-03-05T19:47:27.026Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to tests/**/*.py : Use vendor-agnostic fixtures with fake model IDs/names in tests (e.g., `test-haiku-001`, `test-provider`), never real vendor model IDs — tests must not be coupled to external providers

Applied to files:

tests/unit/engine/conftest.py

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

tests/unit/engine/conftest.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

tests/unit/engine/conftest.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

tests/unit/engine/conftest.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

tests/unit/engine/conftest.py

📚 Learning: 2026-03-05T19:47:27.026Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Config vs runtime state: use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model

Applied to files:

src/ai_company/engine/task_execution.py
DESIGN_SPEC.md
src/ai_company/engine/context.py

📚 Learning: 2026-03-05T19:47:27.026Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : All state transitions must log at INFO level

Applied to files:

src/ai_company/engine/task_execution.py

📚 Learning: 2026-03-05T19:47:27.026Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Use immutability — create new objects, never mutate existing ones. For `dict`/`list` fields in frozen Pydantic models, use `MappingProxyType` wrapping at construction (not `deepcopy` on access). Deep-copy only at system boundaries (e.g., passing data to `tool.execute()`, serializing for persistence)

Applied to files:

DESIGN_SPEC.md

📚 Learning: 2026-03-05T19:47:27.026Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `ConfigDict`). For new code: use `computed_field` for derived values instead of storing + validating redundant fields; use `NotBlankStr` (from `core.types`) for non-optional identifier/name fields instead of manual whitespace validators

Applied to files:

DESIGN_SPEC.md

📚 Learning: 2026-03-05T19:47:27.026Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Event names must always use constants from `ai_company.observability.events` (e.g., `PROVIDER_CALL_START`, `BUDGET_RECORD_ADDED`, `TOOL_INVOKE_START`); import directly: `from ai_company.observability.events import EVENT_CONSTANT`

Applied to files:

src/ai_company/observability/events.py

🪛 LanguageTool

DESIGN_SPEC.md

[style] ~549-~549: Consider using the typographical ellipsis character here instead.
Context: ...odel that tracks status transitions via model_copy(update=...), accumulates TokenUsage cost, and r...

(ELLIPSIS)

[typographical] ~1440-~1440: In American English, use a period after an abbreviation.
Context: ...d shallow (pydantic#7784). | | Config vs runtime split | Adopted (M3) | Frozen...

(MISSING_PERIOD_AFTER_ABBREVIATION)

[style] ~1440-~1440: Consider using the typographical ellipsis character here instead.
Context: ...3) | Frozen models for config/identity; model_copy(update=...) for runtime state transitions | `Task...

(ELLIPSIS)

[style] ~1440-~1440: Consider using the typographical ellipsis character here instead.
Context: .../) are frozen Pydantic models that use model_copy(update=...)` for O(1) state transitions, skipping ...

(ELLIPSIS)

🔇 Additional comments (16)

src/ai_company/engine/errors.py (1)

12-17: New execution error types are cleanly integrated.

Hierarchy and docstrings are consistent with existing engine errors.

src/ai_company/observability/events.py (1)

142-150: Execution event constants look consistent and well-scoped.

Naming and structure align with the existing observability catalog.

src/ai_company/engine/context.py (1)

239-247: Missing-task transition handling is solid.

The path logs structured context and raises a domain-specific error cleanly.

src/ai_company/engine/task_execution.py (1)

35-55: add_token_usage correctly preserves the token invariant.

Computing total_tokens from summed input/output is the right approach.

tests/unit/engine/conftest.py (1)

123-150: New runtime fixtures are clean and reusable.

These fixtures nicely support isolated testing of context and task execution flows.

src/ai_company/engine/__init__.py (1)

7-46: Engine public exports are updated coherently.

__all__ aligns with the newly introduced runtime and error APIs.

DESIGN_SPEC.md (1)

168-183: M3 runtime-state documentation updates are clear and aligned with implementation.

Also applies to: 549-550, 1304-1309, 1440-1440

README.md (1)

26-26: Status update is concise and correctly reflects current milestone progress.

tests/unit/engine/test_exports.py (1)

1-14: LGTM!

Clean and effective test for verifying that all names in __all__ are actually importable. Good use of the @pytest.mark.unit marker and proper type hints.

tests/unit/engine/test_context.py (3)

1-32: LGTM!

Good test file structure with proper imports, type hints, and helper functions. The helper functions _make_assistant_msg and _make_user_msg follow the underscore convention for test-internal utilities.

34-162: LGTM!

Comprehensive test coverage for AgentContext factory, conversation handling, and turn management. Good use of boundary testing in test_has_turns_remaining_boundary and proper verification of immutability through test_original_unchanged.

165-298: LGTM!

Excellent coverage of transitions, snapshots, immutability, and logging. The tests properly verify:

Valid and invalid state transitions with appropriate error types

Snapshot generation with and without task binding

Frozen model behavior preventing direct mutation

Observability event emission for all key operations

tests/unit/engine/test_task_execution.py (4)

1-57: LGTM!

Well-structured test file with proper imports and comprehensive coverage of StatusTransition. The tests correctly verify construction, default values, and immutability (frozen model behavior).

83-156: LGTM!

Excellent coverage of the state machine transitions including:

Valid and invalid transition paths

Transition log accumulation

Timestamp management (started_at on first IN_PROGRESS, preservation on rework, completed_at on terminal states)

Both COMPLETED and CANCELLED terminal states

Full lifecycle end-to-end test

159-286: LGTM!

Comprehensive testing of cost accumulation and the add_token_usage helper:

Single and multiple cost accumulations with turn count verification

Token usage summation with proper float comparison using pytest.approx

Invariant verification that total_tokens == input_tokens + output_tokens

Edge case with ZERO_TOKEN_USAGE constant

189-315: LGTM!

Good coverage of snapshot generation, immutability guarantees, and logging events. The tests properly verify that:

Snapshots reflect updated status while preserving original task fields

Frozen models prevent direct mutation

Original objects remain unchanged after operations (key for the immutable-via-copy pattern)

All key operations emit the correct observability events

Based on learnings: "use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state" - the immutability tests properly validate this pattern.

src/ai_company/engine/context.py

coderabbitai · 2026-03-05T20:42:01Z

src/ai_company/engine/task_execution.py

+        logger.debug(
+            EXECUTION_COST_RECORDED,
+            task_id=self.task.id,
+            turn=result.turn_count,
+            cost_usd=usage.cost_usd,
+        )


⚠️ Potential issue | 🟠 Major

Log with_cost state updates at INFO, not DEBUG.

with_cost changes execution state (accumulated_cost, turn_count), so this emission should be INFO for lifecycle observability consistency.

As per coding guidelines: "All state transitions must log at INFO level."

💡 Suggested fix

- logger.debug( + logger.info( EXECUTION_COST_RECORDED, task_id=self.task.id, turn=result.turn_count, cost_usd=usage.cost_usd, )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.debug(

EXECUTION_COST_RECORDED,

task_id=self.task.id,

turn=result.turn_count,

cost_usd=usage.cost_usd,

)

logger.info(

EXECUTION_COST_RECORDED,

task_id=self.task.id,

turn=result.turn_count,

cost_usd=usage.cost_usd,

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/task_execution.py` around lines 208 - 213, The log emission that records cost/state changes uses logger.debug for EXECUTION_COST_RECORDED but must be INFO because with_cost updates execution state (accumulated_cost, turn_count); change the call from logger.debug(...) to logger.info(...) in the task execution flow (the site where logger.debug is invoked with EXECUTION_COST_RECORDED, task_id=self.task.id, turn=result.turn_count, cost_usd=usage.cost_usd) so the lifecycle transition is logged at INFO level.

tests/unit/engine/test_context.py

tests/unit/engine/test_errors.py

…t, Gemini, and Greptile - Enforce max_turns in with_turn_completed, raising MaxTurnsExceededError - Guard with_cost against terminal TaskExecution states - Derive _TERMINAL_STATUSES from VALID_TRANSITIONS (no manual duplication) - Add execution-context logging on validate_transition failures - Move ZERO_TOKEN_USAGE and add_token_usage to providers/models.py - Use NotBlankStr | None for AgentContextSnapshot.task_id - Fix O(1) claim in DESIGN_SPEC.md (now "copy-on-write without re-validation") - Sort __all__ alphabetically (ruff RUF022) - Add tests: _validate_task_pair error path, max_turns=0 rejection, terminal cost guard, transition failure logging, MaxTurnsExceededError - Clarify MaxTurnsExceededError and started_at docstrings - Add isinstance check for PromptBuildError test consistency - Use DEFAULT_MAX_TURNS constant in test instead of magic number - Add 4 new event constants for failure/boundary scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-06T05:41:44Z

src/ai_company/engine/context.py

+    def with_turn_completed(
+        self,
+        usage: TokenUsage,
+        response_msg: ChatMessage,
+    ) -> AgentContext:
+        """Record a completed turn.
+
+        Increments turn count, appends the response message, and
+        accumulates cost on both the context and the task execution
+        (if present).
+
+        Args:
+            usage: Token usage from this turn's LLM call.
+            response_msg: The assistant's response message.
+
+        Returns:
+            New ``AgentContext`` with updated state.
+
+        Raises:
+            MaxTurnsExceededError: If ``max_turns`` has been reached.
+        """
+        if not self.has_turns_remaining:
+            msg = (
+                f"Agent {self.identity.id} exceeded max_turns "
+                f"({self.max_turns}) for execution {self.execution_id}"
+            )
+            logger.error(
+                EXECUTION_MAX_TURNS_EXCEEDED,
+                execution_id=self.execution_id,
+                agent_id=str(self.identity.id),
+                max_turns=self.max_turns,
+                turn_count=self.turn_count,
+            )
+            raise MaxTurnsExceededError(msg)
+        updates: dict[str, object] = {
+            "turn_count": self.turn_count + 1,
+            "conversation": (*self.conversation, response_msg),
+            "accumulated_cost": add_token_usage(self.accumulated_cost, usage),
+        }
+        if self.task_execution is not None:
+            updates["task_execution"] = self.task_execution.with_cost(usage)
+
+        result = self.model_copy(update=updates)
+        logger.info(
+            EXECUTION_CONTEXT_TURN,
+            execution_id=self.execution_id,
+            turn=result.turn_count,
+            cost_usd=usage.cost_usd,
+        )
+        return result


Undocumented ExecutionStateError raise path

with_turn_completed can raise ExecutionStateError in addition to MaxTurnsExceededError, but this is not documented in the Raises: section.

When task_execution is non-None and in a terminal state (e.g., COMPLETED or CANCELLED), the delegation to self.task_execution.with_cost(usage) at line 224 will raise ExecutionStateError("Cannot record cost on terminal task execution …"). This is a realistic scenario: the engine could transition the task to COMPLETED via with_task_transition, then still call with_turn_completed for a wrap-up/summary turn. Any caller who only catches MaxTurnsExceededError (as the docstring implies is the only exceptional path) will be surprised by the ExecutionStateError.

The Raises: block should at minimum document this:

Suggested change

def with_turn_completed(

self,

usage: TokenUsage,

response_msg: ChatMessage,

) -> AgentContext:

"""Record a completed turn.

Increments turn count, appends the response message, and

accumulates cost on both the context and the task execution

(if present).

Args:

usage: Token usage from this turn's LLM call.

response_msg: The assistant's response message.

Returns:

New ``AgentContext`` with updated state.

Raises:

MaxTurnsExceededError: If ``max_turns`` has been reached.

"""

if not self.has_turns_remaining:

msg = (

f"Agent {self.identity.id} exceeded max_turns "

f"({self.max_turns}) for execution {self.execution_id}"

)

logger.error(

EXECUTION_MAX_TURNS_EXCEEDED,

execution_id=self.execution_id,

agent_id=str(self.identity.id),

max_turns=self.max_turns,

turn_count=self.turn_count,

)

raise MaxTurnsExceededError(msg)

updates: dict[str, object] = {

"turn_count": self.turn_count + 1,

"conversation": (*self.conversation, response_msg),

"accumulated_cost": add_token_usage(self.accumulated_cost, usage),

}

if self.task_execution is not None:

updates["task_execution"] = self.task_execution.with_cost(usage)

result = self.model_copy(update=updates)

logger.info(

EXECUTION_CONTEXT_TURN,

execution_id=self.execution_id,

turn=result.turn_count,

cost_usd=usage.cost_usd,

)

return result

def with_turn_completed(

self,

usage: TokenUsage,

response_msg: ChatMessage,

) -> AgentContext:

"""Record a completed turn.

Increments turn count, appends the response message, and

accumulates cost on both the context and the task execution

(if present).

Args:

usage: Token usage from this turn's LLM call.

response_msg: The assistant's response message.

Returns:

New ``AgentContext`` with updated state.

Raises:

MaxTurnsExceededError: If ``max_turns`` has been reached.

ExecutionStateError: If a task execution is present and is

already in a terminal state.

"""

Alternatively, guard the with_cost call so a terminal task execution simply skips cost accumulation (if that matches the intended semantics), but the docstring discrepancy should be resolved either way.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/engine/context.py Line: 184-233 Comment: **Undocumented `ExecutionStateError` raise path** `with_turn_completed` can raise `ExecutionStateError` in addition to `MaxTurnsExceededError`, but this is not documented in the `Raises:` section. When `task_execution` is non-None and in a terminal state (e.g., `COMPLETED` or `CANCELLED`), the delegation to `self.task_execution.with_cost(usage)` at line 224 will raise `ExecutionStateError("Cannot record cost on terminal task execution …")`. This is a realistic scenario: the engine could transition the task to `COMPLETED` via `with_task_transition`, then still call `with_turn_completed` for a wrap-up/summary turn. Any caller who only catches `MaxTurnsExceededError` (as the docstring implies is the only exceptional path) will be surprised by the `ExecutionStateError`. The `Raises:` block should at minimum document this: ```suggestion def with_turn_completed( self, usage: TokenUsage, response_msg: ChatMessage, ) -> AgentContext: """Record a completed turn. Increments turn count, appends the response message, and accumulates cost on both the context and the task execution (if present). Args: usage: Token usage from this turn's LLM call. response_msg: The assistant's response message. Returns: New ``AgentContext`` with updated state. Raises: MaxTurnsExceededError: If ``max_turns`` has been reached. ExecutionStateError: If a task execution is present and is already in a terminal state. """ ``` Alternatively, guard the `with_cost` call so a terminal task execution simply skips cost accumulation (if that matches the intended semantics), but the docstring discrepancy should be resolved either way. How can I resolve this? If you propose a fix, please make it concise.

🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>

Aureliolo and others added 2 commits March 5, 2026 21:18

feat: implement agent runtime state vs immutable config split (#106)

8d01284

Pre-reviewed by 9 agents, 16 findings addressed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 5, 2026 20:33

gemini-code-assist bot reviewed Mar 5, 2026

View reviewed changes

Copilot AI reviewed Mar 5, 2026

View reviewed changes

greptile-apps bot reviewed Mar 5, 2026

View reviewed changes

tests/unit/engine/test_context.py Outdated Show resolved Hide resolved

src/ai_company/engine/context.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 5, 2026

View reviewed changes

Aureliolo merged commit 4cb1ca5 into main Mar 6, 2026
9 of 10 checks passed

Aureliolo deleted the feat/agent-runtime-state-split branch March 6, 2026 05:39

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

This was referenced Mar 6, 2026

refactor: split events.py into per-domain event modules #136

Merged

feat: implement crash recovery with fail-and-reassign strategy #149

Merged

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release ai-company 0.1.1 #282

Merged

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release 0.1.0 #283

Merged

This was referenced Mar 15, 2026

chore(main): release 0.2.4 #431

Merged

chore(main): release 0.2.0 #442

Closed

chore(main): release 0.2.5 #447

Merged

chore(main): release 0.2.0 #460

Closed

chore(main): release 0.2.0 #471

Closed

	default=ZERO_TOKEN_USAGE,
	default_factory=lambda: ZERO_TOKEN_USAGE.model_copy(deep=True),



		class MaxTurnsExceededError(EngineError):
		"""Raised when ``turn_count`` reaches ``max_turns`` in ``AgentContext``."""

	"""Raised when ``turn_count`` reaches ``max_turns`` in ``AgentContext``."""
	"""Raised when an agent's ``turn_count`` reaches or exceeds its configured ``max_turns``."""

Conversation

Aureliolo commented Mar 5, 2026

Summary

Pre-PR Review

Test Plan

Uh oh!

github-actions bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 5, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

github-actions bot commented Mar 5, 2026 •

edited

Loading

coderabbitai bot commented Mar 5, 2026 •

edited

Loading

greptile-apps bot commented Mar 5, 2026 •

edited

Loading