feat: implement agent runtime state vs immutable config split#115
feat: implement agent runtime state vs immutable config split#115
Conversation
Add TaskExecution and AgentContext frozen Pydantic models that use model_copy(update=...) for O(1) state transitions without re-running validators. TaskExecution wraps Task with status transitions, cost accumulation, and audit trail. AgentContext wraps AgentIdentity with conversation history, turn tracking, and configurable max turns. - Add StatusTransition audit record and TaskExecution runtime wrapper - Add AgentContext and AgentContextSnapshot for execution tracking - Add ExecutionStateError and MaxTurnsExceededError to engine errors - Add 6 execution-domain event constants to observability - Update DESIGN_SPEC.md sections 3.1, 6.1, 15.3, 15.5 - 53 new tests with 100% coverage on both modules Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 9 agents, 16 findings addressed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (11)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds an M3 runtime layer: immutable, audited TaskExecution wraps Task for status transitions and cost tracking; AgentContext wraps AgentIdentity + TaskExecution for conversation, turn management, and snapshots; new engine errors, observability event names, provider token-usage helpers, exports, and tests accompany the changes. (46 words) Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant AgentContext
participant TaskExecution
participant Task
participant Observability
User->>AgentContext: from_identity(identity, task)
AgentContext->>TaskExecution: from_task(task)
TaskExecution->>Task: read snapshot/status
TaskExecution->>Observability: emit EXECUTION_TASK_CREATED
AgentContext->>Observability: emit EXECUTION_CONTEXT_CREATED
User->>AgentContext: with_message(user_msg)
AgentContext->>AgentContext: append conversation (immutable copy)
User->>AgentContext: with_turn_completed(usage, response_msg)
AgentContext->>AgentContext: increment turn_count, append response
AgentContext->>TaskExecution: with_cost(usage)
TaskExecution->>TaskExecution: update accumulated_cost
TaskExecution->>Observability: emit EXECUTION_COST_RECORDED
AgentContext->>Observability: emit EXECUTION_CONTEXT_TURN
User->>AgentContext: with_task_transition(target_status)
AgentContext->>TaskExecution: with_transition(target_status)
TaskExecution->>TaskExecution: append StatusTransition, set timestamps
TaskExecution->>Observability: emit EXECUTION_TASK_TRANSITION
User->>AgentContext: to_snapshot()
AgentContext->>Observability: emit EXECUTION_CONTEXT_SNAPSHOT
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the agent execution engine by introducing a clear separation between immutable configuration models and mutable runtime state. It establishes new frozen Pydantic models, Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a well-designed separation between immutable agent configuration and mutable runtime state. The new TaskExecution and AgentContext models are implemented as frozen Pydantic models, correctly using model_copy(update=...) for state transitions, which is a robust pattern for immutability. The changes are supported by comprehensive unit tests covering functionality, edge cases, and logging, which significantly increases confidence in the new execution engine components. The documentation in DESIGN_SPEC.md and README.md has also been updated to reflect these architectural changes. I have one minor suggestion to improve code maintainability.
| __all__ = [ | ||
| "DEFAULT_MAX_TURNS", | ||
| "ZERO_TOKEN_USAGE", | ||
| "AgentContext", | ||
| "AgentContextSnapshot", | ||
| "DefaultTokenEstimator", | ||
| "EngineError", | ||
| "ExecutionStateError", | ||
| "MaxTurnsExceededError", | ||
| "PromptBuildError", | ||
| "PromptTokenEstimator", | ||
| "StatusTransition", | ||
| "SystemPrompt", | ||
| "TaskExecution", | ||
| "add_token_usage", | ||
| "build_system_prompt", | ||
| ] |
There was a problem hiding this comment.
For better maintainability and readability, it's a good practice to keep the __all__ list sorted alphabetically. This makes it easier to find exports as the list grows. The previous __all__ list in this file was sorted, so this change aligns with the existing project convention.
| __all__ = [ | |
| "DEFAULT_MAX_TURNS", | |
| "ZERO_TOKEN_USAGE", | |
| "AgentContext", | |
| "AgentContextSnapshot", | |
| "DefaultTokenEstimator", | |
| "EngineError", | |
| "ExecutionStateError", | |
| "MaxTurnsExceededError", | |
| "PromptBuildError", | |
| "PromptTokenEstimator", | |
| "StatusTransition", | |
| "SystemPrompt", | |
| "TaskExecution", | |
| "add_token_usage", | |
| "build_system_prompt", | |
| ] | |
| __all__ = [ | |
| "AgentContext", | |
| "AgentContextSnapshot", | |
| "DEFAULT_MAX_TURNS", | |
| "DefaultTokenEstimator", | |
| "EngineError", | |
| "ExecutionStateError", | |
| "MaxTurnsExceededError", | |
| "PromptBuildError", | |
| "PromptTokenEstimator", | |
| "StatusTransition", | |
| "SystemPrompt", | |
| "TaskExecution", | |
| "ZERO_TOKEN_USAGE", | |
| "add_token_usage", | |
| "build_system_prompt", | |
| ] |
There was a problem hiding this comment.
Pull request overview
Implements a clear split between immutable agent/task configuration and runtime execution state by introducing frozen runtime models (TaskExecution, AgentContext) with copy-on-write transitions, plus supporting errors, events, exports, and tests.
Changes:
- Added frozen runtime state models for task execution and agent context, including snapshots, transitions, logging, and token usage accumulation.
- Introduced new engine errors/events and re-exported the public runtime API from
ai_company.engine. - Added unit tests and updated docs/status to reflect the adopted runtime/config split.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/engine/test_task_execution.py | Adds unit coverage for TaskExecution, transitions, cost accumulation, immutability, and logging events. |
| tests/unit/engine/test_context.py | Adds unit coverage for AgentContext/AgentContextSnapshot, turn tracking, transitions, invariants, and logging events. |
| tests/unit/engine/test_errors.py | Validates engine error inheritance for newly introduced error types. |
| tests/unit/engine/test_exports.py | Ensures engine.__all__ re-exports remain importable. |
| tests/unit/engine/conftest.py | Provides shared fixtures for token usage, task execution, and agent context. |
| src/ai_company/observability/events.py | Introduces execution lifecycle event constants for structured logging. |
| src/ai_company/engine/task_execution.py | Implements TaskExecution, StatusTransition, and add_token_usage helper with lifecycle logging. |
| src/ai_company/engine/context.py | Implements AgentContext, AgentContextSnapshot with invariant validation, cost/turn tracking, and snapshotting. |
| src/ai_company/engine/errors.py | Adds ExecutionStateError and MaxTurnsExceededError to the engine error hierarchy. |
| src/ai_company/engine/init.py | Re-exports the runtime state API, errors, and utilities via __all__. |
| README.md | Updates milestone status to reflect M3 in progress. |
| DESIGN_SPEC.md | Updates runtime state documentation and marks the config/runtime split as adopted. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| accumulated_cost: TokenUsage = Field( | ||
| default=ZERO_TOKEN_USAGE, | ||
| description="Running cost totals", | ||
| ) |
There was a problem hiding this comment.
Using a TokenUsage model instance as a field default can create a shared mutable default across TaskExecution instances if TokenUsage itself is mutable (not frozen). Prefer default_factory to generate a fresh zero-usage value per instance (e.g., default_factory=lambda: ZERO_TOKEN_USAGE.model_copy() or a small zero_token_usage() factory) while still keeping ZERO_TOKEN_USAGE as an exported constant.
| description="Accumulated conversation messages", | ||
| ) | ||
| accumulated_cost: TokenUsage = Field( | ||
| default=ZERO_TOKEN_USAGE, |
There was a problem hiding this comment.
Same shared-default issue as in TaskExecution: ZERO_TOKEN_USAGE is a TokenUsage instance, so if it’s mutable this default can be shared across all AgentContext instances. Switch this field to a default_factory that returns a new zero-usage object to avoid cross-instance contamination through nested mutation.
| default=ZERO_TOKEN_USAGE, | |
| default_factory=lambda: ZERO_TOKEN_USAGE.model_copy(deep=True), |
src/ai_company/engine/errors.py
Outdated
|
|
||
|
|
||
| class MaxTurnsExceededError(EngineError): | ||
| """Raised when ``turn_count`` reaches ``max_turns`` in ``AgentContext``.""" |
There was a problem hiding this comment.
This docstring states the error is raised by AgentContext, but there’s no raising logic in AgentContext.with_turn_completed() (it only increments and exposes has_turns_remaining). Either implement the raise where the boundary is crossed (likely in with_turn_completed) or adjust the docstring to reflect where the error is actually intended to be raised (e.g., by the execution loop/engine using has_turns_remaining).
| """Raised when ``turn_count`` reaches ``max_turns`` in ``AgentContext``.""" | |
| """Raised when an agent's ``turn_count`` reaches or exceeds its configured ``max_turns``.""" |
DESIGN_SPEC.md
Outdated
| |------------|--------|----------|-----------| | ||
| | **Immutability strategy** | Adopted | `MappingProxyType` at construction for dict fields in registries and collections; `frozen=True` on all config/identity models | MappingProxyType is O(1) and prevents accidental mutation. Pydantic `frozen=True` is confirmed shallow (pydantic#7784). | | ||
| | **Config vs runtime split** | Planned (M3) | Frozen models for config/identity; `model_copy(update=...)` for runtime state transitions | Frozen models cannot represent evolving state without serialize/validate round-trips. Separate models keep config immutable while state is explicit. Currently only config layer exists (`AgentIdentity`). | | ||
| | **Config vs runtime split** | Adopted (M3) | Frozen models for config/identity; `model_copy(update=...)` for runtime state transitions | `TaskExecution` and `AgentContext` (in `engine/`) are frozen Pydantic models that use `model_copy(update=...)` for O(1) state transitions, skipping all validators. Config layer (`AgentIdentity`, `Task`) remains unchanged. | |
There was a problem hiding this comment.
The claim that transitions are “O(1)” is inaccurate for the implemented code paths (e.g., transition_log uses tuple concatenation which grows with history). Also “skipping all validators” is a subtle behavior that depends on Pydantic model_copy semantics and may mislead readers about what constraints are/aren’t enforced. Consider rewording to something like “copy-on-write transitions without re-validation” and avoid the O(1) assertion (or qualify it precisely).
| | **Config vs runtime split** | Adopted (M3) | Frozen models for config/identity; `model_copy(update=...)` for runtime state transitions | `TaskExecution` and `AgentContext` (in `engine/`) are frozen Pydantic models that use `model_copy(update=...)` for O(1) state transitions, skipping all validators. Config layer (`AgentIdentity`, `Task`) remains unchanged. | | |
| | **Config vs runtime split** | Adopted (M3) | Frozen models for config/identity; `model_copy(update=...)` for runtime state transitions | `TaskExecution` and `AgentContext` (in `engine/`) are frozen Pydantic models that use `model_copy(update=...)` for copy-on-write style state transitions without re-running validators (per Pydantic `model_copy` semantics). Config layer (`AgentIdentity`, `Task`) remains unchanged. | |
Greptile SummaryThis PR implements a clean runtime state / immutable config split for the agent execution engine: a frozen Key changes:
Issues found:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant AgentContext
participant TaskExecution
participant StatusTransition
Caller->>AgentContext: from_identity(identity, task)
AgentContext->>TaskExecution: from_task(task)
AgentContext-->>Caller: AgentContext (turn_count=0)
Caller->>AgentContext: with_task_transition(IN_PROGRESS)
AgentContext->>TaskExecution: with_transition(IN_PROGRESS)
TaskExecution->>StatusTransition: create(ASSIGNED→IN_PROGRESS)
TaskExecution-->>AgentContext: new TaskExecution (started_at set)
AgentContext-->>Caller: new AgentContext
Caller->>AgentContext: with_turn_completed(usage, msg)
AgentContext->>AgentContext: check has_turns_remaining
AgentContext->>TaskExecution: with_cost(usage)
TaskExecution-->>AgentContext: new TaskExecution (turn_count+1)
AgentContext-->>Caller: new AgentContext (turn_count+1)
Caller->>AgentContext: with_task_transition(COMPLETED)
AgentContext->>TaskExecution: with_transition(COMPLETED)
TaskExecution->>StatusTransition: create(IN_REVIEW→COMPLETED)
TaskExecution-->>AgentContext: new TaskExecution (completed_at set, is_terminal=True)
AgentContext-->>Caller: new AgentContext
Caller->>AgentContext: to_snapshot()
AgentContext-->>Caller: AgentContextSnapshot
Last reviewed commit: 608a234 |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/engine/context.py`:
- Line 17: with_turn_completed currently doesn't enforce the state's max_turns
limit; update the with_turn_completed context manager (and the analogous block
around lines 181-214) to check state.turns (or state.turn_count) against
state.max_turns before allowing a new turn to complete and raise
MaxTurnsExceededError (or wrap it in ExecutionStateError if the codebase expects
that) when the hard limit would be exceeded; locate the function/method named
with_turn_completed and add a pre-commit check that prevents
incrementing/committing the turn if state.max_turns is not None and state.turns
>= state.max_turns, returning/raising the appropriate MaxTurnsExceededError so
the limit is enforced at the state boundary.
In `@src/ai_company/engine/task_execution.py`:
- Around line 208-213: The log emission that records cost/state changes uses
logger.debug for EXECUTION_COST_RECORDED but must be INFO because with_cost
updates execution state (accumulated_cost, turn_count); change the call from
logger.debug(...) to logger.info(...) in the task execution flow (the site where
logger.debug is invoked with EXECUTION_COST_RECORDED, task_id=self.task.id,
turn=result.turn_count, cost_usd=usage.cost_usd) so the lifecycle transition is
logged at INFO level.
In `@tests/unit/engine/test_context.py`:
- Around line 55-61: Update the test_defaults test to use the exported
DEFAULT_MAX_TURNS constant instead of the hardcoded 20: locate the test function
test_defaults and replace the assertion against the literal 20 with an assertion
that ctx.max_turns == DEFAULT_MAX_TURNS (import DEFAULT_MAX_TURNS at top of the
test module if not already present), so the test relies on the AgentContext
default via AgentContext.from_identity and remains correct if the default
changes.
In `@tests/unit/engine/test_errors.py`:
- Around line 28-29: Add an instance check to the
test_prompt_build_error_is_engine_error so it asserts both
issubclass(PromptBuildError, EngineError) and isinstance(PromptBuildError(),
EngineError); locate the test function named
test_prompt_build_error_is_engine_error and create a PromptBuildError instance
to verify it is an EngineError for consistency with
test_execution_state_error_is_engine_error and
test_max_turns_exceeded_error_is_engine_error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 50985699-4edf-49b0-bcba-c4cb78bf1cb6
📒 Files selected for processing (12)
DESIGN_SPEC.mdREADME.mdsrc/ai_company/engine/__init__.pysrc/ai_company/engine/context.pysrc/ai_company/engine/errors.pysrc/ai_company/engine/task_execution.pysrc/ai_company/observability/events.pytests/unit/engine/conftest.pytests/unit/engine/test_context.pytests/unit/engine/test_errors.pytests/unit/engine/test_exports.pytests/unit/engine/test_task_execution.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: All public functions must have type hints; mypy strict mode is enforced
Docstrings must use Google style format; required on all public classes and functions (enforced by ruff D rules)
Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) for multiple exception handling — PEP 758 enforcement by ruff on Python 3.14
Line length must not exceed 88 characters (enforced by ruff)
Functions must be fewer than 50 lines
Files must be fewer than 800 lines
Files:
src/ai_company/engine/errors.pytests/unit/engine/test_context.pytests/unit/engine/test_task_execution.pytests/unit/engine/test_exports.pytests/unit/engine/conftest.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/task_execution.pysrc/ai_company/engine/context.pysrc/ai_company/observability/events.pytests/unit/engine/test_errors.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic must import and use the logger:from ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code — use the structured logger fromai_company.observability
Logger variable name must always belogger(not_logger, notlog)
Event names must always use constants fromai_company.observability.events(e.g.,PROVIDER_CALL_START,BUDGET_RECORD_ADDED,TOOL_INVOKE_START); import directly:from ai_company.observability.events import EVENT_CONSTANT
Structured logging must uselogger.info(EVENT, key=value)format — neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR level with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions
Pure data models, enums, and re-exports do NOT require logging
Use immutability — create new objects, never mutate existing ones. Fordict/listfields in frozen Pydantic models, useMappingProxyTypewrapping at construction (notdeepcopyon access). Deep-copy only at system boundaries (e.g., passing data totool.execute(), serializing for persistence)
Config vs runtime state: use frozen Pydantic models for config/identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel,model_validator,ConfigDict). For new code: use@computed_fieldfor derived values instead of storing + validating redundant fields; useNotBlankStr(fromcore.types) for non-optional identifier/name fields instead of manual whitespace validators
For async concurrency in new code, preferasyncio.TaskGroupfor fan-out/fan-in parallel operations (e.g., multiple tool invocations, parallel agent cal...
Files:
src/ai_company/engine/errors.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/task_execution.pysrc/ai_company/engine/context.pysrc/ai_company/observability/events.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers must be:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Coverage must be 80% minimum (enforced in CI)
Async test mode: useasyncio_mode = "auto"— no manual@pytest.mark.asyncioneeded
Test timeout: 30 seconds per test
Use vendor-agnostic fixtures with fake model IDs/names in tests (e.g.,test-haiku-001,test-provider), never real vendor model IDs — tests must not be coupled to external providers
Files:
tests/unit/engine/test_context.pytests/unit/engine/test_task_execution.pytests/unit/engine/test_exports.pytests/unit/engine/conftest.pytests/unit/engine/test_errors.py
🧠 Learnings (18)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Config vs runtime state: use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
📚 Learning: 2026-03-05T19:47:27.026Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains
Applied to files:
src/ai_company/engine/errors.py
📚 Learning: 2026-03-05T19:47:27.026Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Handle errors explicitly, never silently swallow them
Applied to files:
src/ai_company/engine/errors.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Applied to files:
src/ai_company/engine/errors.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
tests/unit/engine/test_context.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use `StoryState` from `memory/story_state.py` for context management and balance context size vs. token limits when passing story context
Applied to files:
tests/unit/engine/test_context.pyDESIGN_SPEC.mdsrc/ai_company/engine/context.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest. Place test files in `tests/unit/` with `test_*.py` naming convention.
Applied to files:
tests/unit/engine/test_exports.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest in `tests/unit/` with `test_*.py` naming convention
Applied to files:
tests/unit/engine/test_exports.py
📚 Learning: 2026-03-05T19:47:27.026Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to tests/**/*.py : Use vendor-agnostic fixtures with fake model IDs/names in tests (e.g., `test-haiku-001`, `test-provider`), never real vendor model IDs — tests must not be coupled to external providers
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings
Applied to files:
tests/unit/engine/conftest.py
📚 Learning: 2026-03-05T19:47:27.026Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Config vs runtime state: use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model
Applied to files:
src/ai_company/engine/task_execution.pyDESIGN_SPEC.mdsrc/ai_company/engine/context.py
📚 Learning: 2026-03-05T19:47:27.026Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : All state transitions must log at INFO level
Applied to files:
src/ai_company/engine/task_execution.py
📚 Learning: 2026-03-05T19:47:27.026Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Use immutability — create new objects, never mutate existing ones. For `dict`/`list` fields in frozen Pydantic models, use `MappingProxyType` wrapping at construction (not `deepcopy` on access). Deep-copy only at system boundaries (e.g., passing data to `tool.execute()`, serializing for persistence)
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-03-05T19:47:27.026Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `ConfigDict`). For new code: use `computed_field` for derived values instead of storing + validating redundant fields; use `NotBlankStr` (from `core.types`) for non-optional identifier/name fields instead of manual whitespace validators
Applied to files:
DESIGN_SPEC.md
📚 Learning: 2026-03-05T19:47:27.026Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T19:47:27.026Z
Learning: Applies to src/ai_company/**/*.py : Event names must always use constants from `ai_company.observability.events` (e.g., `PROVIDER_CALL_START`, `BUDGET_RECORD_ADDED`, `TOOL_INVOKE_START`); import directly: `from ai_company.observability.events import EVENT_CONSTANT`
Applied to files:
src/ai_company/observability/events.py
🪛 LanguageTool
DESIGN_SPEC.md
[style] ~549-~549: Consider using the typographical ellipsis character here instead.
Context: ...odel that tracks status transitions via model_copy(update=...), accumulates TokenUsage cost, and r...
(ELLIPSIS)
[typographical] ~1440-~1440: In American English, use a period after an abbreviation.
Context: ...d shallow (pydantic#7784). | | Config vs runtime split | Adopted (M3) | Frozen...
(MISSING_PERIOD_AFTER_ABBREVIATION)
[style] ~1440-~1440: Consider using the typographical ellipsis character here instead.
Context: ...3) | Frozen models for config/identity; model_copy(update=...) for runtime state transitions | `Task...
(ELLIPSIS)
[style] ~1440-~1440: Consider using the typographical ellipsis character here instead.
Context: .../) are frozen Pydantic models that use model_copy(update=...)` for O(1) state transitions, skipping ...
(ELLIPSIS)
🔇 Additional comments (16)
src/ai_company/engine/errors.py (1)
12-17: New execution error types are cleanly integrated.Hierarchy and docstrings are consistent with existing engine errors.
src/ai_company/observability/events.py (1)
142-150: Execution event constants look consistent and well-scoped.Naming and structure align with the existing observability catalog.
src/ai_company/engine/context.py (1)
239-247: Missing-task transition handling is solid.The path logs structured context and raises a domain-specific error cleanly.
src/ai_company/engine/task_execution.py (1)
35-55:add_token_usagecorrectly preserves the token invariant.Computing
total_tokensfrom summed input/output is the right approach.tests/unit/engine/conftest.py (1)
123-150: New runtime fixtures are clean and reusable.These fixtures nicely support isolated testing of context and task execution flows.
src/ai_company/engine/__init__.py (1)
7-46: Engine public exports are updated coherently.
__all__aligns with the newly introduced runtime and error APIs.DESIGN_SPEC.md (1)
168-183: M3 runtime-state documentation updates are clear and aligned with implementation.Also applies to: 549-550, 1304-1309, 1440-1440
README.md (1)
26-26: Status update is concise and correctly reflects current milestone progress.tests/unit/engine/test_exports.py (1)
1-14: LGTM!Clean and effective test for verifying that all names in
__all__are actually importable. Good use of the@pytest.mark.unitmarker and proper type hints.tests/unit/engine/test_context.py (3)
1-32: LGTM!Good test file structure with proper imports, type hints, and helper functions. The helper functions
_make_assistant_msgand_make_user_msgfollow the underscore convention for test-internal utilities.
34-162: LGTM!Comprehensive test coverage for
AgentContextfactory, conversation handling, and turn management. Good use of boundary testing intest_has_turns_remaining_boundaryand proper verification of immutability throughtest_original_unchanged.
165-298: LGTM!Excellent coverage of transitions, snapshots, immutability, and logging. The tests properly verify:
- Valid and invalid state transitions with appropriate error types
- Snapshot generation with and without task binding
- Frozen model behavior preventing direct mutation
- Observability event emission for all key operations
tests/unit/engine/test_task_execution.py (4)
1-57: LGTM!Well-structured test file with proper imports and comprehensive coverage of
StatusTransition. The tests correctly verify construction, default values, and immutability (frozen model behavior).
83-156: LGTM!Excellent coverage of the state machine transitions including:
- Valid and invalid transition paths
- Transition log accumulation
- Timestamp management (
started_aton first IN_PROGRESS, preservation on rework,completed_aton terminal states)- Both COMPLETED and CANCELLED terminal states
- Full lifecycle end-to-end test
159-286: LGTM!Comprehensive testing of cost accumulation and the
add_token_usagehelper:
- Single and multiple cost accumulations with turn count verification
- Token usage summation with proper float comparison using
pytest.approx- Invariant verification that
total_tokens == input_tokens + output_tokens- Edge case with
ZERO_TOKEN_USAGEconstant
189-315: LGTM!Good coverage of snapshot generation, immutability guarantees, and logging events. The tests properly verify that:
- Snapshots reflect updated status while preserving original task fields
- Frozen models prevent direct mutation
- Original objects remain unchanged after operations (key for the immutable-via-copy pattern)
- All key operations emit the correct observability events
Based on learnings: "use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using
model_copy(update=...)) for runtime state" - the immutability tests properly validate this pattern.
| logger.debug( | ||
| EXECUTION_COST_RECORDED, | ||
| task_id=self.task.id, | ||
| turn=result.turn_count, | ||
| cost_usd=usage.cost_usd, | ||
| ) |
There was a problem hiding this comment.
Log with_cost state updates at INFO, not DEBUG.
with_cost changes execution state (accumulated_cost, turn_count), so this emission should be INFO for lifecycle observability consistency.
As per coding guidelines: "All state transitions must log at INFO level."
💡 Suggested fix
- logger.debug(
+ logger.info(
EXECUTION_COST_RECORDED,
task_id=self.task.id,
turn=result.turn_count,
cost_usd=usage.cost_usd,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.debug( | |
| EXECUTION_COST_RECORDED, | |
| task_id=self.task.id, | |
| turn=result.turn_count, | |
| cost_usd=usage.cost_usd, | |
| ) | |
| logger.info( | |
| EXECUTION_COST_RECORDED, | |
| task_id=self.task.id, | |
| turn=result.turn_count, | |
| cost_usd=usage.cost_usd, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/engine/task_execution.py` around lines 208 - 213, The log
emission that records cost/state changes uses logger.debug for
EXECUTION_COST_RECORDED but must be INFO because with_cost updates execution
state (accumulated_cost, turn_count); change the call from logger.debug(...) to
logger.info(...) in the task execution flow (the site where logger.debug is
invoked with EXECUTION_COST_RECORDED, task_id=self.task.id,
turn=result.turn_count, cost_usd=usage.cost_usd) so the lifecycle transition is
logged at INFO level.
…t, Gemini, and Greptile - Enforce max_turns in with_turn_completed, raising MaxTurnsExceededError - Guard with_cost against terminal TaskExecution states - Derive _TERMINAL_STATUSES from VALID_TRANSITIONS (no manual duplication) - Add execution-context logging on validate_transition failures - Move ZERO_TOKEN_USAGE and add_token_usage to providers/models.py - Use NotBlankStr | None for AgentContextSnapshot.task_id - Fix O(1) claim in DESIGN_SPEC.md (now "copy-on-write without re-validation") - Sort __all__ alphabetically (ruff RUF022) - Add tests: _validate_task_pair error path, max_turns=0 rejection, terminal cost guard, transition failure logging, MaxTurnsExceededError - Clarify MaxTurnsExceededError and started_at docstrings - Add isinstance check for PromptBuildError test consistency - Use DEFAULT_MAX_TURNS constant in test instead of magic number - Add 4 new event constants for failure/boundary scenarios Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| def with_turn_completed( | ||
| self, | ||
| usage: TokenUsage, | ||
| response_msg: ChatMessage, | ||
| ) -> AgentContext: | ||
| """Record a completed turn. | ||
|
|
||
| Increments turn count, appends the response message, and | ||
| accumulates cost on both the context and the task execution | ||
| (if present). | ||
|
|
||
| Args: | ||
| usage: Token usage from this turn's LLM call. | ||
| response_msg: The assistant's response message. | ||
|
|
||
| Returns: | ||
| New ``AgentContext`` with updated state. | ||
|
|
||
| Raises: | ||
| MaxTurnsExceededError: If ``max_turns`` has been reached. | ||
| """ | ||
| if not self.has_turns_remaining: | ||
| msg = ( | ||
| f"Agent {self.identity.id} exceeded max_turns " | ||
| f"({self.max_turns}) for execution {self.execution_id}" | ||
| ) | ||
| logger.error( | ||
| EXECUTION_MAX_TURNS_EXCEEDED, | ||
| execution_id=self.execution_id, | ||
| agent_id=str(self.identity.id), | ||
| max_turns=self.max_turns, | ||
| turn_count=self.turn_count, | ||
| ) | ||
| raise MaxTurnsExceededError(msg) | ||
| updates: dict[str, object] = { | ||
| "turn_count": self.turn_count + 1, | ||
| "conversation": (*self.conversation, response_msg), | ||
| "accumulated_cost": add_token_usage(self.accumulated_cost, usage), | ||
| } | ||
| if self.task_execution is not None: | ||
| updates["task_execution"] = self.task_execution.with_cost(usage) | ||
|
|
||
| result = self.model_copy(update=updates) | ||
| logger.info( | ||
| EXECUTION_CONTEXT_TURN, | ||
| execution_id=self.execution_id, | ||
| turn=result.turn_count, | ||
| cost_usd=usage.cost_usd, | ||
| ) | ||
| return result |
There was a problem hiding this comment.
Undocumented ExecutionStateError raise path
with_turn_completed can raise ExecutionStateError in addition to MaxTurnsExceededError, but this is not documented in the Raises: section.
When task_execution is non-None and in a terminal state (e.g., COMPLETED or CANCELLED), the delegation to self.task_execution.with_cost(usage) at line 224 will raise ExecutionStateError("Cannot record cost on terminal task execution …"). This is a realistic scenario: the engine could transition the task to COMPLETED via with_task_transition, then still call with_turn_completed for a wrap-up/summary turn. Any caller who only catches MaxTurnsExceededError (as the docstring implies is the only exceptional path) will be surprised by the ExecutionStateError.
The Raises: block should at minimum document this:
| def with_turn_completed( | |
| self, | |
| usage: TokenUsage, | |
| response_msg: ChatMessage, | |
| ) -> AgentContext: | |
| """Record a completed turn. | |
| Increments turn count, appends the response message, and | |
| accumulates cost on both the context and the task execution | |
| (if present). | |
| Args: | |
| usage: Token usage from this turn's LLM call. | |
| response_msg: The assistant's response message. | |
| Returns: | |
| New ``AgentContext`` with updated state. | |
| Raises: | |
| MaxTurnsExceededError: If ``max_turns`` has been reached. | |
| """ | |
| if not self.has_turns_remaining: | |
| msg = ( | |
| f"Agent {self.identity.id} exceeded max_turns " | |
| f"({self.max_turns}) for execution {self.execution_id}" | |
| ) | |
| logger.error( | |
| EXECUTION_MAX_TURNS_EXCEEDED, | |
| execution_id=self.execution_id, | |
| agent_id=str(self.identity.id), | |
| max_turns=self.max_turns, | |
| turn_count=self.turn_count, | |
| ) | |
| raise MaxTurnsExceededError(msg) | |
| updates: dict[str, object] = { | |
| "turn_count": self.turn_count + 1, | |
| "conversation": (*self.conversation, response_msg), | |
| "accumulated_cost": add_token_usage(self.accumulated_cost, usage), | |
| } | |
| if self.task_execution is not None: | |
| updates["task_execution"] = self.task_execution.with_cost(usage) | |
| result = self.model_copy(update=updates) | |
| logger.info( | |
| EXECUTION_CONTEXT_TURN, | |
| execution_id=self.execution_id, | |
| turn=result.turn_count, | |
| cost_usd=usage.cost_usd, | |
| ) | |
| return result | |
| def with_turn_completed( | |
| self, | |
| usage: TokenUsage, | |
| response_msg: ChatMessage, | |
| ) -> AgentContext: | |
| """Record a completed turn. | |
| Increments turn count, appends the response message, and | |
| accumulates cost on both the context and the task execution | |
| (if present). | |
| Args: | |
| usage: Token usage from this turn's LLM call. | |
| response_msg: The assistant's response message. | |
| Returns: | |
| New ``AgentContext`` with updated state. | |
| Raises: | |
| MaxTurnsExceededError: If ``max_turns`` has been reached. | |
| ExecutionStateError: If a task execution is present and is | |
| already in a terminal state. | |
| """ |
Alternatively, guard the with_cost call so a terminal task execution simply skips cost accumulation (if that matches the intended semantics), but the docstring discrepancy should be resolved either way.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/context.py
Line: 184-233
Comment:
**Undocumented `ExecutionStateError` raise path**
`with_turn_completed` can raise `ExecutionStateError` in addition to `MaxTurnsExceededError`, but this is not documented in the `Raises:` section.
When `task_execution` is non-None and in a terminal state (e.g., `COMPLETED` or `CANCELLED`), the delegation to `self.task_execution.with_cost(usage)` at line 224 will raise `ExecutionStateError("Cannot record cost on terminal task execution …")`. This is a realistic scenario: the engine could transition the task to `COMPLETED` via `with_task_transition`, then still call `with_turn_completed` for a wrap-up/summary turn. Any caller who only catches `MaxTurnsExceededError` (as the docstring implies is the only exceptional path) will be surprised by the `ExecutionStateError`.
The `Raises:` block should at minimum document this:
```suggestion
def with_turn_completed(
self,
usage: TokenUsage,
response_msg: ChatMessage,
) -> AgentContext:
"""Record a completed turn.
Increments turn count, appends the response message, and
accumulates cost on both the context and the task execution
(if present).
Args:
usage: Token usage from this turn's LLM call.
response_msg: The assistant's response message.
Returns:
New ``AgentContext`` with updated state.
Raises:
MaxTurnsExceededError: If ``max_turns`` has been reached.
ExecutionStateError: If a task execution is present and is
already in a terminal state.
"""
```
Alternatively, guard the `with_cost` call so a terminal task execution simply skips cost accumulation (if that matches the intended semantics), but the docstring discrepancy should be resolved either way.
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
TaskExecution— frozen Pydantic model wrappingTaskwith evolving execution state (status transitions viamodel_copy(update=...)), cost accumulation (TokenUsage), turn counting, andStatusTransitionaudit trailAgentContext— frozen runtime context wrappingAgentIdentity+ optionalTaskExecutionwith conversation history, accumulated cost, turn limits, and snapshot generationAgentContextSnapshot— compact frozen DTO for reporting/logging withtask_id/task_statuspair invariant validationExecutionStateErrorandMaxTurnsExceededErrorunderEngineErrorhierarchyadd_token_usage(),ZERO_TOKEN_USAGE,DEFAULT_MAX_TURNSexported fromenginepackageerrors.py,prompt_template.py), §15.5 "Config vs runtime split" marked AdoptedPre-PR Review
Pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency). 16 findings addressed:
@pytest.mark.unitmarkers to all test classes (was invisible to-m unitruns)logger.error()before raisingExecutionStateError(CLAUDE.md logging rule)MaxTurnsExceededError, error hierarchy,__all__re-exports, zero-value token usageNotBlankStrforexecution_id/agent_ididentifier fieldsmodel_validatorfortask_id/task_statuspair invariant onAgentContextSnapshot_ZERO_USAGE/_add_token_usageto public (ZERO_TOKEN_USAGE/add_token_usage)DEFAULT_MAX_TURNSconstant to avoid magic number duplication_add_token_usagedocstring accuracy (Returns section)MaxTurnsExceededErrordocstring with field referenceserrors.py,prompt_template.py)test_errors.py/test_exports.pyTest Plan
uv run ruff check src/ tests/— cleanuv run ruff format src/ tests/— no changesuv run mypy src/ tests/— no issues (199 files)uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80— 1770 passed, 95.25% coveragecloses #106