Skip to content

feat: implement single-task execution lifecycle (#21)#144

Merged
Aureliolo merged 3 commits intomainfrom
feat/single-task-lifecycle
Mar 6, 2026
Merged

feat: implement single-task execution lifecycle (#21)#144
Aureliolo merged 3 commits intomainfrom
feat/single-task-lifecycle

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • AgentEngine single-task lifecycle: run() orchestrates identity validation, system prompt construction, context preparation, execution loop delegation (with optional wall-clock timeout via asyncio.wait_for), cost recording, and post-execution task transitions (ASSIGNED → IN_PROGRESS → IN_REVIEW → COMPLETED)
  • AgentRunResult: Frozen Pydantic model wrapping ExecutionResult with engine metadata and computed fields (termination_reason, total_turns, total_cost_usd, is_success, completion_summary)
  • TaskCompletionMetrics: Frozen Pydantic model for proxy overhead metrics (turns_per_task, tokens_per_task, cost_per_task, duration_seconds) with from_run_result() factory, logged at task completion
  • Execution events: Added EXECUTION_ENGINE_TASK_METRICS and EXECUTION_ENGINE_TIMEOUT event constants (12 total engine events)
  • DESIGN_SPEC.md updates: §6.5 run() signature, pipeline steps, constants count, computed fields; §10.5 metric sources and TaskCompletionMetrics model; §15.3 project structure

Pre-PR Review Fixes

Pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency). 18 findings addressed:

  • Guard except TimeoutError to re-raise non-wall-clock timeouts (silent-failure-hunter)
  • Wrap post-execution transitions in try/except to protect successful results from bookkeeping failures (silent-failure-hunter)
  • Move duration snapshot after cost recording + transitions for accurate wall-clock measurement (python-reviewer)
  • Change raise exc from None to raise exc from build_exc to preserve both exceptions in traceback chain (python-reviewer)
  • Remove redundant execution_result param from _log_completion (python-reviewer)
  • Update docstrings for timeout_seconds and _apply_post_execution_transitions (comment-analyzer)
  • Add TODO(M4) marker for auto-complete scaffolding (type-design-analyzer)
  • Split test_agent_engine.py (1042→733 lines) into two files under 800 (code-reviewer)
  • 8 DESIGN_SPEC.md drift fixes (docs-consistency)

Test Plan

  • 1997 tests pass (0 failures)
  • 95.40% coverage (≥80% threshold)
  • ruff lint clean
  • mypy strict clean
  • Pre-commit hooks pass
  • CI pipeline (lint + type-check + test)

Closes #21

Aureliolo and others added 2 commits March 6, 2026 23:18
- Add post-execution task transitions: COMPLETED runs auto-complete
  via two-hop IN_PROGRESS → IN_REVIEW → COMPLETED; non-completion
  reasons (MAX_TURNS, BUDGET_EXHAUSTED, ERROR) leave task IN_PROGRESS
- Add TaskCompletionMetrics model (engine/metrics.py) for proxy
  overhead metrics per DESIGN_SPEC §10.5: turns_per_task,
  tokens_per_task, cost_per_task, duration_seconds
- Add completion_summary computed field on AgentRunResult (last
  assistant message content)
- Add wall-clock timeout via asyncio.wait_for() with timeout_seconds
  parameter on AgentEngine.run()
- Add EXECUTION_ENGINE_TASK_METRICS and EXECUTION_ENGINE_TIMEOUT
  event constants
- Log TaskCompletionMetrics at INFO on every run completion
- Update existing tests for COMPLETED final status
- Add comprehensive tests: post-execution transitions (7 tests),
  timeout (3 tests), metrics (9 tests), completion_summary (5 tests),
  full lifecycle integration test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Guard `except TimeoutError` to re-raise non-wall-clock timeouts
- Wrap post-execution transitions in try/except to protect successful
  results from bookkeeping failures
- Move duration snapshot after cost recording + transitions for
  accurate wall-clock measurement
- Remove redundant `execution_result` param from `_log_completion`
- Change `raise exc from None` to `raise exc from build_exc` to
  preserve both exceptions in traceback chain
- Update `timeout_seconds` and `_apply_post_execution_transitions`
  docstrings for accuracy
- Add TODO(M4) marker for auto-complete scaffolding
- Split test_agent_engine.py (1042 lines) into two files under 800
- Update DESIGN_SPEC.md: §6.5 run() signature, pipeline steps,
  constants count, computed fields; §10.5 metric sources and
  TaskCompletionMetrics model; §15.3 add metrics.py

Pre-reviewed by 9 agents, 18 findings addressed

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 22:34
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b83684b6-6012-4b12-93e9-2c44d3202a9e

📥 Commits

Reviewing files that changed from the base of the PR and between 24647bd and 73b7769.

📒 Files selected for processing (7)
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/metrics.py
  • src/ai_company/engine/run_result.py
  • tests/unit/engine/conftest.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_agent_engine_lifecycle.py
  • tests/unit/engine/test_metrics.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Optional wall‑clock timeout for agent runs; timeouts mark runs as errored while preserving results and post‑run steps.
    • Automatic post‑execution state progression for successful runs (IN_PROGRESS → IN_REVIEW → COMPLETED).
    • Agent run results include a new completion_summary and produced artifacts.
    • New task‑level metrics model and emission (turns, tokens, cost, duration) and two observability event additions.
  • Tests

    • Expanded unit and integration tests covering lifecycle, timeout, metrics, and completion summary.

Walkthrough

Adds task-level timeout and post-execution state transitions, exposes per-task completion metrics and completion_summary on run results, introduces TaskCompletionMetrics and timeout/metrics observability events, and scaffolds a ReAct loop and recovery strategy while updating tests to assert the new lifecycle and metrics.

Changes

Cohort / File(s) Summary
Engine public surface
src/ai_company/engine/__init__.py
Exports new TaskCompletionMetrics from the engine package.
Agent orchestration
src/ai_company/engine/agent_engine.py
Adds optional timeout_seconds to AgentEngine.run, implements _run_loop_with_timeout, input and assignment validations, _apply_post_execution_transitions (IN_PROGRESS → IN_REVIEW → COMPLETED on success), and emits timeout/metrics events.
Run result model
src/ai_company/engine/run_result.py
Adds produced_artifacts field and computed completion_summary property on AgentRunResult.
Metrics model
src/ai_company/engine/metrics.py
New public TaskCompletionMetrics Pydantic model and from_run_result() factory mapping run result fields into task-level metrics.
Observability events
src/ai_company/observability/events/execution.py
Adds EXECUTION_ENGINE_TASK_METRICS and EXECUTION_ENGINE_TIMEOUT event identifiers.
Loop & recovery scaffolding
src/ai_company/engine/react_loop.py, (new recovery strategy files) ...
Adds ReAct loop scaffold and RecoveryStrategy concepts (Fail-and-Reassign MVP) — new modules added to engine package.
Tests — integration
tests/integration/engine/test_agent_engine_integration.py
Updates integration expectations to assert full ASSIGNED→IN_PROGRESS→IN_REVIEW→COMPLETED lifecycle and verifies completion_summary and task metrics.
Tests — unit (engine)
tests/unit/engine/test_agent_engine.py, tests/unit/engine/test_agent_engine_lifecycle.py
Updates lifecycle tests to reflect auto-complete transitions, adds comprehensive lifecycle/timeout/metrics unit tests.
Tests — metrics & run_result
tests/unit/engine/test_metrics.py, tests/unit/engine/test_run_result.py
Adds tests for TaskCompletionMetrics construction/from_run_result and tests for completion_summary behavior.
Test fixtures
tests/unit/engine/conftest.py
Updates fixture signature so sample tasks use the sample agent's id for assigned_to.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant AgentEngine
    participant ReActLoop as ReAct Loop
    participant Observability
    participant TaskStore as Task State Store

    Client->>AgentEngine: run(identity, task, timeout_seconds?)
    AgentEngine->>TaskStore: validate assignment & set IN_PROGRESS
    AgentEngine->>ReActLoop: start execution loop (async)
    alt timeout provided
        ReActLoop-->>AgentEngine: running...
        AgentEngine->>ReActLoop: await with timeout (asyncio.wait_for)
        Note right of AgentEngine: on timeout -> cancel loop
        AgentEngine->>Observability: emit EXECUTION_ENGINE_TIMEOUT
        AgentEngine->>AgentEngine: build ERROR ExecutionResult
    else completes or errors
        ReActLoop-->>AgentEngine: ExecutionResult (COMPLETED/ERROR/OTHER)
    end
    AgentEngine->>AgentEngine: _apply_post_execution_transitions(result)
    AgentEngine->>TaskStore: transition IN_PROGRESS→IN_REVIEW→COMPLETED (if COMPLETED)
    AgentEngine->>Observability: emit EXECUTION_ENGINE_TASK_METRICS (TaskCompletionMetrics.from_run_result)
    AgentEngine-->>Client: return AgentRunResult (includes completion_summary, metrics)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 41.67% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: implement single-task execution lifecycle (#21)' accurately and concisely summarizes the main objective of the PR.
Description check ✅ Passed The PR description comprehensively relates to the changeset, covering AgentEngine lifecycle, new models (AgentRunResult, TaskCompletionMetrics), execution events, design spec updates, and test results.
Linked Issues check ✅ Passed The PR successfully implements all acceptance criteria from issue #21: enforces task status transitions, supplies full context to agents, executes tasks with LLM+tools, includes completion summaries, handles timeouts, tracks proxy overhead metrics (turns, tokens, cost, duration), and provides comprehensive unit and integration tests.
Out of Scope Changes check ✅ Passed All changes are within scope of issue #21: the AgentEngine.run() implementation, AgentRunResult and TaskCompletionMetrics models, timeout handling, post-execution transitions, metrics logging, event constants, design spec updates, and comprehensive test coverage—no extraneous features or unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/single-task-lifecycle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the AgentEngine by implementing a comprehensive single-task execution lifecycle. It introduces the ability to set wall-clock timeouts for agent runs, automates task status transitions from assignment through completion, and provides a new mechanism for logging detailed performance metrics. These changes aim to improve the robustness, observability, and control over agent operations, ensuring that task executions are managed efficiently and their outcomes are thoroughly reported.

Highlights

  • AgentEngine Single-Task Lifecycle: The run() method in AgentEngine now orchestrates identity validation, system prompt construction, context preparation, execution loop delegation (with optional wall-clock timeout), cost recording, and post-execution task transitions (ASSIGNED → IN_PROGRESS → IN_REVIEW → COMPLETED).
  • AgentRunResult Enhancements: The AgentRunResult Pydantic model was updated to wrap ExecutionResult with additional engine metadata and new computed fields, including termination_reason, total_turns, total_cost_usd, is_success, and completion_summary.
  • TaskCompletionMetrics Model: A new frozen Pydantic model, TaskCompletionMetrics, was introduced to capture proxy overhead metrics such as turns_per_task, tokens_per_task, cost_per_task, and duration_seconds. It includes a from_run_result() factory method and is logged upon task completion.
  • New Execution Events: Two new event constants, EXECUTION_ENGINE_TASK_METRICS and EXECUTION_ENGINE_TIMEOUT, were added to the execution.engine.* namespace, bringing the total to 12 engine events.
  • DESIGN_SPEC.md Updates: The design specification document was updated to reflect changes in the run() signature, pipeline steps, event constant count, new computed fields, metric sources, and the project structure.
Changelog
  • DESIGN_SPEC.md
    • Updated the run() method signature to include timeout_seconds.
    • Revised pipeline steps to incorporate asyncio.wait_for for timeouts and a new 'Apply post-execution transitions' step.
    • Increased the reported count of execution engine event constants from 10 to 12.
    • Added completion_summary to the list of computed fields for AgentRunResult.
    • Detailed metric sources and the TaskCompletionMetrics model in the design specification.
    • Updated the project structure section to include the new metrics.py file.
  • src/ai_company/engine/init.py
    • Imported TaskCompletionMetrics into the engine's __init__.py.
    • Added TaskCompletionMetrics to the module's __all__ export list.
  • src/ai_company/engine/agent_engine.py
    • Imported the asyncio module for asynchronous operations.
    • Imported TaskCompletionMetrics for logging completion metrics.
    • Imported new event constants EXECUTION_ENGINE_TASK_METRICS and EXECUTION_ENGINE_TIMEOUT.
    • Modified the run method signature to accept an optional timeout_seconds parameter.
    • Implemented validation for the timeout_seconds parameter, ensuring it is positive.
    • Wrapped the execution loop call with asyncio.wait_for to enforce wall-clock timeouts.
    • Introduced _apply_post_execution_transitions to manage task status changes (IN_PROGRESS → IN_REVIEW → COMPLETED).
    • Adjusted the _log_completion method to remove the execution_result parameter and log TaskCompletionMetrics.
    • Changed exception re-raising from raise exc from None to raise exc from build_exc in _handle_fatal_error for better traceback preservation.
  • src/ai_company/engine/metrics.py
    • Added a new file metrics.py to define the TaskCompletionMetrics Pydantic model.
    • Implemented the from_run_result class method to construct TaskCompletionMetrics from an AgentRunResult.
  • src/ai_company/engine/run_result.py
    • Imported MessageRole for conversation message handling.
    • Added a completion_summary computed field to AgentRunResult to extract the content of the last assistant message.
  • src/ai_company/observability/events/execution.py
    • Added EXECUTION_ENGINE_TASK_METRICS and EXECUTION_ENGINE_TIMEOUT constants to the execution events.
  • tests/integration/engine/test_agent_engine_integration.py
    • Updated assertions in test_full_tool_call_loop to reflect the new COMPLETED task status.
    • Added TestAgentEngineFullLifecycle with test_full_lifecycle_assigned_to_completed to verify the complete task lifecycle, including transitions, summary, and metrics.
  • tests/unit/engine/test_agent_engine.py
    • Updated assertions in test_assigned_transitions_to_in_progress and test_in_progress_accepted to expect COMPLETED task status.
    • Modified mock context preparation in test_completion_config_forwarded and test_custom_loop_used to correctly set IN_PROGRESS status before execution.
  • tests/unit/engine/test_agent_engine_lifecycle.py
    • Added a new file test_agent_engine_lifecycle.py for comprehensive unit tests.
    • Included tests for post-execution task transitions, verifying correct status changes based on termination reasons.
    • Added tests for wall-clock timeout functionality, ensuring proper error handling and result generation.
    • Implemented tests for completion metrics, confirming accurate logging of TaskCompletionMetrics.
  • tests/unit/engine/test_metrics.py
    • Added a new file test_metrics.py to provide unit tests for the TaskCompletionMetrics model.
    • Included tests for TaskCompletionMetrics construction, validation, and immutability.
    • Added tests for the from_run_result factory method, verifying correct data extraction from AgentRunResult.
  • tests/unit/engine/test_run_result.py
    • Imported ChatMessage and ToolCall for testing message-related functionalities.
    • Added a helper function _make_result_with_messages to create AgentRunResult instances with specific conversation messages.
    • Introduced TestCompletionSummary with unit tests for the completion_summary computed field, covering various message scenarios.
Activity
  • The pull request was pre-reviewed by 9 automated agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency).
  • 18 findings identified during pre-review were addressed, including guarding TimeoutError re-raises, wrapping post-execution transitions, moving duration snapshots, correcting exception chaining, removing redundant parameters, updating docstrings, adding TODO markers, splitting test files, and fixing DESIGN_SPEC.md drift.
  • The test plan indicates that 1997 tests pass with 0 failures, achieving 95.40% coverage (above the ≥80% threshold).
  • Ruff linting and MyPy strict checks are clean, and pre-commit hooks pass, indicating high code quality standards.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR implements the full single-task execution lifecycle for AgentEngine: the run() method orchestrates identity/task validation, system prompt construction, execution loop delegation (with optional asyncio.wait-based wall-clock timeout), cost recording, post-execution status transitions (ASSIGNED→IN_PROGRESS→IN_REVIEW→COMPLETED), and proxy overhead metrics logging. Two new frozen Pydantic models are introduced — AgentRunResult gains a completion_summary computed field, and TaskCompletionMetrics captures per-run overhead indicators.

Key findings:

  • Spec/code drift on timeout mechanism (DESIGN_SPEC.md:928): The spec documents asyncio.wait_for for timeout control, but the implementation deliberately uses asyncio.wait (a better choice to avoid conflating internal TimeoutError with the engine's wall-clock deadline). The spec must be updated to match.
  • Partial state loss on timeout (agent_engine.py:316–320): When the wall-clock deadline fires, the method returns an ExecutionResult constructed from the pre-execution context, with an empty turns tuple. Any turns, tokens, or costs accumulated during partial execution before the timeout are irrecoverably lost, and subsequent cost recording does nothing. This behavior should either be preserved (with explicit documentation) or fixed to retain partial execution snapshots.
  • produced_artifacts field is never populated (run_result.py:51–54): The field defaults to an empty tuple and is never assigned a non-empty value by the engine. Either document it as a TODO for future scaffolding or defer its addition until extraction logic exists.

Confidence Score: 3/5

  • The timeout implementation silently discards partial execution state (turns and costs), which could lead to missing billing data for incomplete runs. The spec/code drift on the timeout mechanism needs correction. These are meaningful correctness and documentation issues.
  • The PR implements a well-structured execution lifecycle with clean separation of concerns, comprehensive test coverage, and good metrics. However, the timeout path has a silent correctness issue: when the wall-clock deadline fires, ExecutionResult is constructed from the pre-execution context with zero turns, discarding any partial state accumulated before the timeout. This means partial costs are never recorded. Additionally, the DESIGN_SPEC documents the wrong timeout mechanism (asyncio.wait_for instead of asyncio.wait). The produced_artifacts field is scaffolding that needs documentation. These are non-critical but meaningful issues that should be addressed before merge.
  • src/ai_company/engine/agent_engine.py (timeout partial-state loss), DESIGN_SPEC.md (asyncio.wait vs wait_for), src/ai_company/engine/run_result.py (produced_artifacts documentation).

Sequence Diagram

sequenceDiagram
    participant Caller
    participant AgentEngine
    participant _run_loop_with_timeout
    participant ExecutionLoop
    participant _record_costs
    participant _apply_post_execution_transitions
    participant _log_completion

    Caller->>AgentEngine: run(identity, task, timeout_seconds?)
    AgentEngine->>AgentEngine: _validate_run_inputs()
    AgentEngine->>AgentEngine: _validate_agent()
    AgentEngine->>AgentEngine: _validate_task()
    AgentEngine->>AgentEngine: _prepare_context() → ASSIGNED→IN_PROGRESS
    AgentEngine->>_run_loop_with_timeout: await (ctx, timeout_seconds)

    alt timeout_seconds is None
        _run_loop_with_timeout->>ExecutionLoop: await execute()
        ExecutionLoop-->>_run_loop_with_timeout: ExecutionResult (with turns)
    else timeout_seconds set
        _run_loop_with_timeout->>ExecutionLoop: asyncio.create_task(execute())
        _run_loop_with_timeout->>_run_loop_with_timeout: asyncio.wait({loop_task}, timeout)
        alt loop finishes in time
            ExecutionLoop-->>_run_loop_with_timeout: ExecutionResult (with turns)
        else wall-clock timeout fires
            _run_loop_with_timeout->>ExecutionLoop: loop_task.cancel()
            Note over _run_loop_with_timeout: Returns ExecutionResult(ctx=pre-exec ctx)<br/>⚠️ partial turns/cost are lost
        end
    end

    _run_loop_with_timeout-->>AgentEngine: ExecutionResult
    AgentEngine->>_record_costs: await (execution_result)
    _record_costs-->>AgentEngine: (costs recorded per turn, or nothing if empty)
    AgentEngine->>_apply_post_execution_transitions: (execution_result)
    alt TerminationReason.COMPLETED
        _apply_post_execution_transitions->>_apply_post_execution_transitions: IN_PROGRESS→IN_REVIEW→COMPLETED
    else other reason
        _apply_post_execution_transitions->>_apply_post_execution_transitions: no-op
    end
    _apply_post_execution_transitions-->>AgentEngine: ExecutionResult (updated ctx)
    AgentEngine->>AgentEngine: build AgentRunResult(duration_seconds)
    AgentEngine->>_log_completion: (result, duration)
    _log_completion->>_log_completion: TaskCompletionMetrics.from_run_result()
    _log_completion-->>AgentEngine: (metrics logged)
    AgentEngine-->>Caller: AgentRunResult
Loading

Last reviewed commit: 73b7769

start=start,
timeout_seconds=timeout_seconds,
)
except MemoryError, RecursionError:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python 2 except comma syntax — not valid Python 3

except MemoryError, RecursionError: is Python 2 syntax and is a SyntaxError in Python 3. Python 2 parsed this as "catch MemoryError, bind it to the name RecursionError" — it does NOT catch both exception types. The Python 3 form to catch multiple exceptions requires parentheses. This same pattern appears on lines 584 and 669 as well.

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 192

Comment:
**Python 2 `except` comma syntax — not valid Python 3**

`except MemoryError, RecursionError:` is Python 2 syntax and is a `SyntaxError` in Python 3. Python 2 parsed this as "catch `MemoryError`, bind it to the name `RecursionError`" — it does NOT catch both exception types. The Python 3 form to catch multiple exceptions requires parentheses. This same pattern appears on lines **584** and **669** as well.

```suggestion
        except (MemoryError, RecursionError):
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the full single-task execution lifecycle in AgentEngine.run() (validation → prompt/context setup → execution loop with optional wall-clock timeout → cost recording → post-execution transitions), and adds per-run result summarization and proxy overhead metrics to support issue #21 and DESIGN_SPEC alignment.

Changes:

  • Add timeout_seconds support (via asyncio.wait_for) and post-execution task transitions (IN_PROGRESS → IN_REVIEW → COMPLETED on success) in AgentEngine.
  • Introduce TaskCompletionMetrics model with from_run_result() factory; log new execution events for timeout + task metrics.
  • Extend AgentRunResult with completion_summary and update/add unit + integration tests reflecting the new lifecycle behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/engine/test_run_result.py Adds unit coverage for AgentRunResult.completion_summary behavior (assistant-only, skips tool-call-only/empty content).
tests/unit/engine/test_metrics.py Adds unit tests for TaskCompletionMetrics construction, validation, and extraction from AgentRunResult.
tests/unit/engine/test_agent_engine_lifecycle.py Adds unit tests for post-execution transitions, timeout behavior, and metrics computability.
tests/unit/engine/test_agent_engine.py Updates existing unit tests to reflect auto-completion transitions and context expectations.
tests/integration/engine/test_agent_engine_integration.py Updates existing integration assertion and adds a full lifecycle integration test (ASSIGNED → COMPLETED) plus metrics/summary checks.
src/ai_company/observability/events/execution.py Adds new engine event constants for task metrics and timeout.
src/ai_company/engine/run_result.py Adds computed completion_summary derived from the last assistant message with non-empty content.
src/ai_company/engine/metrics.py Introduces TaskCompletionMetrics frozen model and from_run_result() factory.
src/ai_company/engine/agent_engine.py Implements timeout support, post-execution transitions, and logs task metrics; updates completion logging.
src/ai_company/engine/init.py Re-exports TaskCompletionMetrics from the engine package.
DESIGN_SPEC.md Updates spec sections to reflect new run() signature, pipeline steps, computed fields, metrics model, and repo structure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)
else:
execution_result = await coro
except TimeoutError:
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asyncio.wait_for() raises asyncio.TimeoutError, but this handler catches the broad built-in TimeoutError. If the underlying loop/provider/tooling raises its own TimeoutError while timeout_seconds is set, it will be misclassified as a wall-clock timeout and the original error context will be lost. Prefer catching asyncio.TimeoutError (or catching around only the wait_for call) so non-wall-clock timeouts propagate into the normal fatal-error path/logging.

Suggested change
except TimeoutError:
except asyncio.TimeoutError:

Copilot uses AI. Check for mistakes.
Comment on lines +497 to +501
metrics = TaskCompletionMetrics.from_run_result(result)
logger.info(
EXECUTION_ENGINE_TASK_METRICS,
agent_id=agent_id,
task_id=task_id,
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXECUTION_ENGINE_TASK_METRICS is logged unconditionally, even when the run ends with TerminationReason.ERROR / MAX_TURNS / BUDGET_EXHAUSTED. Given the naming (TaskCompletionMetrics) and spec wording (“logged at task completion”), this will emit misleading “completion” metrics for incomplete tasks. Consider either (a) only logging this event when result.is_success/termination_reason == COMPLETED, or (b) include termination_reason in the metrics event (and update naming/docs) to make it clear these are per-run metrics.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the single-task execution lifecycle within the AgentEngine, introducing features like wall-clock timeouts, post-execution task status transitions, and completion metrics. However, a high-severity security issue was identified: the run method lacks an ownership check to ensure that the agent executing the task is the one assigned to it, which could allow unauthorized agents to execute and transition tasks. It is recommended to add a check to verify that task.assigned_to matches the agent's ID before proceeding with execution. Additionally, a critical syntax issue prevents the code from running in Python 3, and there's a potential portability issue with exception handling.

start=start,
timeout_seconds=timeout_seconds,
)
except MemoryError, RecursionError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The except A, B: syntax is from Python 2 and will cause a SyntaxError in Python 3. The correct syntax is to use a tuple: except (MemoryError, RecursionError):.

        except (MemoryError, RecursionError):

Comment on lines +97 to 106
async def run( # noqa: PLR0913
self,
*,
identity: AgentIdentity,
task: Task,
completion_config: CompletionConfig | None = None,
max_turns: int = DEFAULT_MAX_TURNS,
memory_messages: tuple[ChatMessage, ...] = (),
timeout_seconds: float | None = None,
) -> AgentRunResult:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The run method orchestrates the agent execution lifecycle but fails to verify that the provided agent is actually assigned to the task. While it validates the agent's status and the task's status, it does not check the assigned_to field of the task against the agent's ID. This allows any active agent to execute any task that is in an ASSIGNED or IN_PROGRESS state, potentially leading to unauthorized access to task details and unauthorized state transitions. Although this check might be performed at a higher level, as the 'Top-level orchestrator', the AgentEngine should enforce this authorization boundary defensively.

)
else:
execution_result = await coro
except TimeoutError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

asyncio.wait_for raises asyncio.TimeoutError. While this is an alias for the built-in TimeoutError in Python 3.11+, using except TimeoutError: will not catch the exception on older Python 3 versions. For better portability, it's recommended to explicitly catch asyncio.TimeoutError.

        except asyncio.TimeoutError:

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ai_company/engine/agent_engine.py (1)

97-211: 🛠️ Refactor suggestion | 🟠 Major

Split run() and _execute() back under the 50-line cap.

The timeout/telemetry additions leave both lifecycle methods responsible for validation, context prep, timeout orchestration, cost recording, transitions, and error translation in one flow. Please extract the timeout wrapper and post-processing helpers before this gets any harder to reason about safely.

As per coding guidelines "Functions should be less than 50 lines, files less than 800 lines".

Also applies to: 213-289

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 97 - 211, The run()
method and _execute() are too large—extract the timeout orchestration and
post-execution processing into separate helper functions so both run() and
_execute() are each under 50 lines; keep validation (calls to _validate_agent
and _validate_task) and context preparation (call to _prepare_context) in run(),
have run() call a new timeout_wrapper that invokes the core execution loop in
_execute_core (move the existing loop logic from _execute to _execute_core), and
factor out cost/transition/telemetry/post-processing into a helper (e.g.,
_finalize_run) used by _execute_core and by the fatal-error path; update
signatures of _execute/_execute_core/_finalize_run to accept ctx, system_prompt,
start, timeout_seconds and ensure _handle_fatal_error is retained for
exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 237-265: The try/except around asyncio.wait_for is conflating
internal TimeoutError raised by self._loop.execute with wall-clock timeouts;
change to create a Task from the coroutine returned by self._loop.execute (e.g.
task = asyncio.create_task(coro)) and use asyncio.wait({task},
timeout=timeout_seconds) to get (done, pending); if pending is non-empty treat
it as the engine boundary timeout (log EXECUTION_ENGINE_TIMEOUT with
agent_id/task_id, compute duration from start, cancel the task and handle
cleanup), otherwise call task.result() to retrieve the execution_result and let
any inner exceptions propagate normally; keep references to timeout_seconds,
start, EXECUTION_ENGINE_TIMEOUT, self._loop.execute, and execution_result in the
updated flow.

In `@tests/unit/engine/test_agent_engine_lifecycle.py`:
- Around line 7-25: This test module is missing the repo-required 30s timeout
mark; add a module-level pytest mark by defining pytestmark =
pytest.mark.timeout(30) at top-level in
tests/unit/engine/test_agent_engine_lifecycle.py (near the existing import of
pytest) so all async lifecycle tests (e.g., those using AgentEngine,
AgentContext, ExecutionResult, TerminationReason) inherit the 30-second timeout.

---

Outside diff comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 97-211: The run() method and _execute() are too large—extract the
timeout orchestration and post-execution processing into separate helper
functions so both run() and _execute() are each under 50 lines; keep validation
(calls to _validate_agent and _validate_task) and context preparation (call to
_prepare_context) in run(), have run() call a new timeout_wrapper that invokes
the core execution loop in _execute_core (move the existing loop logic from
_execute to _execute_core), and factor out
cost/transition/telemetry/post-processing into a helper (e.g., _finalize_run)
used by _execute_core and by the fatal-error path; update signatures of
_execute/_execute_core/_finalize_run to accept ctx, system_prompt, start,
timeout_seconds and ensure _handle_fatal_error is retained for exceptions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f9b49081-783e-4ff3-bf9b-a6ef344d7605

📥 Commits

Reviewing files that changed from the base of the PR and between f2eb73a and 24647bd.

📒 Files selected for processing (11)
  • DESIGN_SPEC.md
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/agent_engine.py
  • src/ai_company/engine/metrics.py
  • src/ai_company/engine/run_result.py
  • src/ai_company/observability/events/execution.py
  • tests/integration/engine/test_agent_engine_integration.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_agent_engine_lifecycle.py
  • tests/unit/engine/test_metrics.py
  • tests/unit/engine/test_run_result.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT use from __future__ import annotations—Python 3.14 has PEP 649
Use except A, B: syntax (no parentheses) for exception handling on Python 3.14—ruff enforces this
Add type hints to all public functions in Python; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions—ruff D rules enforce this
Create new objects instead of mutating existing ones; use copy.deepcopy() at construction for non-Pydantic internal collections and MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with BaseModel, model_validator, computed_field, and ConfigDict
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over bare create_task
Enforce line length of 88 characters (ruff enforces this)
Functions should be less than 50 lines, files less than 800 lines
Handle errors explicitly; never silently swallow errors in Python code
Validate at system boundaries (user input, external APIs, config files)

Files:

  • src/ai_company/observability/events/execution.py
  • tests/unit/engine/test_run_result.py
  • src/ai_company/engine/run_result.py
  • tests/unit/engine/test_agent_engine.py
  • src/ai_company/engine/__init__.py
  • tests/unit/engine/test_agent_engine_lifecycle.py
  • tests/unit/engine/test_metrics.py
  • src/ai_company/engine/metrics.py
  • src/ai_company/engine/agent_engine.py
  • tests/integration/engine/test_agent_engine_integration.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Every module with business logic MUST import from ai_company.observability import get_logger then logger = get_logger(__name__)
Never use import logging, logging.getLogger(), or print() in application code
Always use logger as the variable name for loggers (not _logger, not log)
Use event name constants from domain-specific modules under ai_company.observability.events (e.g. PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT
Always use structured logging with logger.info(EVENT, key=value) format—never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG level logging should be used for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging

Files:

  • src/ai_company/observability/events/execution.py
  • src/ai_company/engine/run_result.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/metrics.py
  • src/ai_company/engine/agent_engine.py
{src/**/*.py,tests/**/*.py,src/**/*.yaml,src/**/*.yml,tests/**/*.yaml,tests/**/*.yml,examples/**/*.yaml,examples/**/*.yml}

📄 CodeRabbit inference engine (CLAUDE.md)

NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names may only appear in: (1) DESIGN_SPEC.md provider list, (2) .claude/ skill/agent files, (3) third-party import paths/module names (e.g. litellm.types.llms.openai). Tests must use test-provider, test-small-001, etc.

Files:

  • src/ai_company/observability/events/execution.py
  • tests/unit/engine/test_run_result.py
  • src/ai_company/engine/run_result.py
  • tests/unit/engine/test_agent_engine.py
  • src/ai_company/engine/__init__.py
  • tests/unit/engine/test_agent_engine_lifecycle.py
  • tests/unit/engine/test_metrics.py
  • src/ai_company/engine/metrics.py
  • src/ai_company/engine/agent_engine.py
  • tests/integration/engine/test_agent_engine_integration.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark unit tests with @pytest.mark.unit, integration tests with @pytest.mark.integration, e2e tests with @pytest.mark.e2e, and slow tests with @pytest.mark.slow
Use asyncio_mode = 'auto' for pytest async tests—no manual @pytest.mark.asyncio needed
Set a 30-second timeout per test

Files:

  • tests/unit/engine/test_run_result.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_agent_engine_lifecycle.py
  • tests/unit/engine/test_metrics.py
  • tests/integration/engine/test_agent_engine_integration.py
src/ai_company/{providers,engine}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains

Files:

  • src/ai_company/engine/run_result.py
  • src/ai_company/engine/__init__.py
  • src/ai_company/engine/metrics.py
  • src/ai_company/engine/agent_engine.py
🧠 Learnings (3)
📚 Learning: 2026-03-06T21:51:55.175Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T21:51:55.175Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g. `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/ai_company/observability/events/execution.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/unit/engine/test_run_result.py
  • tests/unit/engine/test_agent_engine.py
  • tests/unit/engine/test_agent_engine_lifecycle.py
  • tests/unit/engine/test_metrics.py
  • tests/integration/engine/test_agent_engine_integration.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.

Applied to files:

  • tests/unit/engine/test_agent_engine_lifecycle.py
  • DESIGN_SPEC.md
  • src/ai_company/engine/agent_engine.py
🔇 Additional comments (13)
src/ai_company/observability/events/execution.py (1)

35-36: LGTM!

The new event constants follow the established naming convention and integrate well with the existing EXECUTION_ENGINE_* event family.

src/ai_company/engine/__init__.py (1)

29-29: LGTM!

The TaskCompletionMetrics import and export are correctly integrated into the public API, maintaining alphabetical ordering in __all__.

Also applies to: 62-62

src/ai_company/engine/run_result.py (1)

87-104: LGTM!

The completion_summary computed field correctly handles all edge cases:

  • Skips None content (tool-call-only messages)
  • Skips empty string content
  • Returns the last qualifying assistant message or None

The reverse iteration is efficient for finding the most recent message.

tests/unit/engine/test_agent_engine.py (2)

118-121: LGTM!

Test assertions and comments correctly updated to reflect the new auto-completion behavior where successful runs transition through ASSIGNED → IN_PROGRESS → IN_REVIEW → COMPLETED.

Also applies to: 145-148


538-541: LGTM!

Mock contexts now correctly simulate the IN_PROGRESS state that _prepare_context establishes before handing control to the execution loop. This ensures the mock behavior aligns with real engine execution.

Also applies to: 655-660

tests/unit/engine/test_run_result.py (2)

428-452: LGTM!

The _make_result_with_messages helper provides a clean way to construct test fixtures with specific conversation content, enabling focused testing of the completion_summary computed field.


455-495: LGTM!

Comprehensive test coverage for completion_summary:

  • Returns last assistant content when present
  • Returns None for no assistant messages
  • Returns None for empty conversation
  • Skips tool-call-only messages (content=None)
  • Skips empty string content
tests/integration/engine/test_agent_engine_integration.py (2)

205-208: LGTM!

Existing test updated to verify the auto-completion path where successful runs transition to COMPLETED.


211-296: LGTM!

Comprehensive integration test validating the full task lifecycle:

  • Verifies all three transitions (ASSIGNED → IN_PROGRESS → IN_REVIEW → COMPLETED)
  • Confirms completed_at timestamp is set
  • Validates completion_summary is non-empty
  • Ensures TaskCompletionMetrics can be computed with positive values

This provides excellent end-to-end coverage for the single-task execution lifecycle feature.

src/ai_company/engine/metrics.py (1)

1-75: LGTM!

Well-designed frozen Pydantic model following established patterns:

  • Uses NotBlankStr for identifier fields as per coding guidelines
  • Proper ge=0 constraints on numeric fields
  • Clean factory method from_run_result for extraction from AgentRunResult
  • Good use of TYPE_CHECKING to avoid circular import
tests/unit/engine/test_metrics.py (2)

19-100: LGTM!

Comprehensive unit tests for TaskCompletionMetrics construction and validation:

  • Valid construction with all fields
  • task_id=None handling
  • Frozen immutability enforcement
  • Zero value acceptance
  • Negative value rejection for turns_per_task and tokens_per_task
  • Blank agent_id rejection via NotBlankStr

102-194: LGTM!

The from_run_result factory method tests thoroughly validate extraction from AgentRunResult:

  • Correctly extracts task_id, agent_id, turns_per_task, tokens_per_task, cost_per_task, and duration_seconds
  • Handles zero-turns edge case
  • Uses a well-designed helper method for fixture construction
src/ai_company/engine/agent_engine.py (1)

266-273: Timeout results currently discard partial progress.

The fallback at lines 266-273 reconstructs the result from the pre-loop ctx, leaving turns at the ExecutionResult default of (). This means _record_costs() has no turn data to persist in the timeout path. Since asyncio.wait_for() cancels the underlying task on timeout, timed-out work will be undercounted unless the loop itself catches CancelledError and returns a checkpointed ExecutionResult with accumulated context before this branch executes.

Verify that explicit cancellation/checkpoint logic exists in react_loop.py (or the applicable timeout handler) to preserve the latest AgentContext and turn data when the outer timeout fires. If absent, this path will always record zero cost/tokens for timed-out runs.

Comment on lines +7 to +25
import pytest

from ai_company.core.agent import AgentIdentity # noqa: TC001
from ai_company.core.enums import TaskStatus
from ai_company.core.task import Task # noqa: TC001
from ai_company.engine.agent_engine import AgentEngine
from ai_company.engine.context import AgentContext
from ai_company.engine.loop_protocol import (
ExecutionResult,
TerminationReason,
)

if TYPE_CHECKING:
from .conftest import MockCompletionProvider

from .conftest import make_completion_response as _make_completion_response


@pytest.mark.unit
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add the mandatory 30-second timeout mark to this module.

These async lifecycle tests are new, but none of them carries the repo-standard timeout. A module-level timeout mark is probably the least repetitive way to keep a hung provider/loop mock from stalling the suite.

♻️ Suggested change
 import pytest
 
+pytestmark = pytest.mark.timeout(30)
+
 from ai_company.core.agent import AgentIdentity  # noqa: TC001

As per coding guidelines "Set a 30-second timeout per test".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_agent_engine_lifecycle.py` around lines 7 - 25, This
test module is missing the repo-required 30s timeout mark; add a module-level
pytest mark by defining pytestmark = pytest.mark.timeout(30) at top-level in
tests/unit/engine/test_agent_engine_lifecycle.py (near the existing import of
pytest) so all async lifecycle tests (e.g., those using AgentEngine,
AgentContext, ExecutionResult, TerminationReason) inherit the 30-second timeout.

…t, Gemini, and greptile

- Add produced_artifacts field to AgentRunResult (#1)
- Wrap _log_completion in try/except to preserve valid results (#2)
- Add test for inner TimeoutError propagation without engine timeout (#3)
- Extract _run_loop_with_timeout from _execute (50-line limit) (#4)
- Extract _validate_run_inputs from run() (50-line limit) (#5)
- Rename metrics docstrings from "completed task" to "agent run" + add termination_reason to metrics event (#6)
- Fix raise exc from build_exc chain direction (#7)
- Replace asyncio.wait_for with asyncio.wait for timeout disambiguation (#8)
- Add test for _apply_post_execution_transitions failure resilience (#9)
- Add test for timeout cost recording behavior (#10)
- Fix hardcoded from_status in transition logs (#11)
- Add agent-task ownership check in _validate_task (#12)
- Split test_invalid_timeout_raises into two test methods (#13)
- Add negative validation tests for cost_per_task/duration_seconds (#14)
- Add test_blank_task_id_rejected (#15)
- Update _execute docstring to mention timeout, transitions, metrics (#16)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aureliolo Aureliolo merged commit c7e64e4 into main Mar 6, 2026
2 checks passed
@Aureliolo Aureliolo deleted the feat/single-task-lifecycle branch March 6, 2026 23:15
5. **Transition task** — `ASSIGNED` → `IN_PROGRESS` (pass-through if already `IN_PROGRESS`).
6. **Prepare tools and budget** — creates `ToolInvoker` from registry and `BudgetChecker` from task budget limit.
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config.
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait_for`; on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DESIGN_SPEC references asyncio.wait_for but implementation uses asyncio.wait

The spec description (line 928) states:

If timeout_seconds is set, wraps the call in asyncio.wait_for

However, the actual implementation in _run_loop_with_timeout (lines 278–300) deliberately uses asyncio.wait instead, with clear rationale: this prevents conflating an internal TimeoutError from the loop with the engine's wall-clock deadline (see the method's docstring at lines 280–282).

The implementation choice is correct and well-reasoned; the spec just needs to match:

Suggested change
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait_for`; on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait` (not `asyncio.wait_for`, to avoid conflating internal `TimeoutError` with the engine's wall-clock deadline); on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
Prompt To Fix With AI
This is a comment left during a code review.
Path: DESIGN_SPEC.md
Line: 928

Comment:
**DESIGN_SPEC references `asyncio.wait_for` but implementation uses `asyncio.wait`**

The spec description (line 928) states:

> If `timeout_seconds` is set, wraps the call in `asyncio.wait_for`

However, the actual implementation in `_run_loop_with_timeout` (lines 278–300) deliberately uses `asyncio.wait` instead, with clear rationale: this prevents conflating an internal `TimeoutError` from the loop with the engine's wall-clock deadline (see the method's docstring at lines 280–282).

The implementation choice is correct and well-reasoned; the spec just needs to match:

```suggestion
7. **Delegate to loop** — calls `ExecutionLoop.execute()` with context, provider, tool invoker, budget checker, and completion config. If `timeout_seconds` is set, wraps the call in `asyncio.wait` (not `asyncio.wait_for`, to avoid conflating internal `TimeoutError` with the engine's wall-clock deadline); on expiry the run returns with `TerminationReason.ERROR` but cost recording and post-execution processing still occur.
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +51 to +54
produced_artifacts: tuple[Artifact, ...] = Field(
default=(),
description="Artifacts produced during execution",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

produced_artifacts field is declared but never populated

The produced_artifacts field defaults to an empty tuple and is not passed a non-empty value anywhere in AgentRunResult construction — neither in the normal execution path (_execute, line 248–254) nor in the error handler (_handle_fatal_error, line 736–741). The ExecutionResult returned from ExecutionLoop.execute() has no artifacts field to forward either.

Callers inspecting result.produced_artifacts will always receive an empty tuple, which could cause silent confusion about whether artifacts were actually extracted. Either mark this as an explicit TODO (e.g., # TODO(M?): populate from loop artifacts extraction logic) to signal it's intentional scaffolding, or defer adding the field until the extraction logic exists.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/run_result.py
Line: 51-54

Comment:
**`produced_artifacts` field is declared but never populated**

The `produced_artifacts` field defaults to an empty tuple and is not passed a non-empty value anywhere in `AgentRunResult` construction — neither in the normal execution path (`_execute`, line 248–254) nor in the error handler (`_handle_fatal_error`, line 736–741). The `ExecutionResult` returned from `ExecutionLoop.execute()` has no artifacts field to forward either.

Callers inspecting `result.produced_artifacts` will always receive an empty tuple, which could cause silent confusion about whether artifacts were actually extracted. Either mark this as an explicit TODO (e.g., `# TODO(M?): populate from loop artifacts extraction logic`) to signal it's intentional scaffolding, or defer adding the field until the extraction logic exists.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +316 to +320
return ExecutionResult(
context=ctx,
termination_reason=TerminationReason.ERROR,
error_message=error_msg,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout path creates ExecutionResult with zero turns, silently losing partial execution state

When the wall-clock deadline expires, _run_loop_with_timeout constructs a new ExecutionResult with context=ctx (the pre-execution context passed in at line 270), termination_reason=TerminationReason.ERROR, and error_message. The turns field defaults to an empty tuple.

However, the cancelled loop_task may have completed partial execution — accumulating turns, tokens, and cost data in the context before the timeout fired. By creating a fresh ExecutionResult from the pre-execution context, all that partial state is dropped irretrievably.

The subsequent _record_costs call (line 240) then iterates over an empty turns tuple and records nothing. Any real spend incurred during the partial run is silently lost. Consequently, TaskCompletionMetrics reports zero turns, zero tokens, and zero cost even when the agent ran for multiple turns before the timeout.

If partial state loss is intentional (because the loop was forcibly cancelled and state may be unreliable), this behavior should be documented explicitly so operators understand that pre-timeout costs will not appear in billing records. If partial results should be preserved, the ExecutionLoop may need to expose a snapshot of the partial context upon cancellation so it can be included in the result.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/engine/agent_engine.py
Line: 316-320

Comment:
**Timeout path creates ExecutionResult with zero turns, silently losing partial execution state**

When the wall-clock deadline expires, `_run_loop_with_timeout` constructs a new `ExecutionResult` with `context=ctx` (the *pre-execution* context passed in at line 270), `termination_reason=TerminationReason.ERROR`, and `error_message`. The `turns` field defaults to an empty tuple.

However, the cancelled `loop_task` may have completed partial execution — accumulating turns, tokens, and cost data in the context before the timeout fired. By creating a fresh `ExecutionResult` from the pre-execution context, all that partial state is dropped irretrievably.

The subsequent `_record_costs` call (line 240) then iterates over an empty `turns` tuple and records nothing. Any real spend incurred during the partial run is silently lost. Consequently, `TaskCompletionMetrics` reports zero turns, zero tokens, and zero cost even when the agent ran for multiple turns before the timeout.

If partial state loss is intentional (because the loop was forcibly cancelled and state may be unreliable), this behavior should be documented explicitly so operators understand that pre-timeout costs will not appear in billing records. If partial results *should* be preserved, the `ExecutionLoop` may need to expose a snapshot of the partial context upon cancellation so it can be included in the result.

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement single-task execution lifecycle (assign, execute, complete)

2 participants