Skip to content

feat: parallel tool execution in ToolInvoker.invoke_all#137

Merged
Aureliolo merged 3 commits intomainfrom
feat/parallel-tool-execution
Mar 6, 2026
Merged

feat: parallel tool execution in ToolInvoker.invoke_all#137
Aureliolo merged 3 commits intomainfrom
feat/parallel-tool-execution

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

@Aureliolo Aureliolo commented Mar 6, 2026

Summary

  • Concurrent tool execution: ToolInvoker.invoke_all now uses asyncio.TaskGroup for structured concurrency instead of sequential execution
  • Optional concurrency limit: max_concurrency parameter (via asyncio.Semaphore) bounds parallelism when needed
  • Robust error handling: Fatal errors (MemoryError, RecursionError) are collected via _run_guarded wrapper and re-raised after all tasks complete (bare exception for one, ExceptionGroup for multiple)
  • Type-safe results: Uses dict[int, ToolResult] instead of list[ToolResult | None] — eliminates # type: ignore and provides fail-loud KeyError on missing slots
  • Pre-reviewed: 10 automated review agents, 14 findings addressed

Changes

Source (src/ai_company/tools/invoker.py)

  • invoke_all migrated from sequential loop to asyncio.TaskGroup
  • New _run_guarded helper with contextlib.nullcontext pattern, extended docstring, orchestration-level logging
  • _no_remote_retrieve return type → Never
  • logger.exceptionlogger.error in helper methods (not in exception context)
  • Module docstring corrected (BaseException subclasses propagate uncaught, not "after logging")

Tests (tests/unit/tools/test_invoker.py, conftest.py)

  • New _DelayTool and _ConcurrencyTrackingTool fixtures
  • TestInvokeAllConcurrency: timing proof, order preservation, recoverable/fatal error handling, mixed fatal+success
  • TestInvokeAllBounded: semaphore enforcement (peak=1, peak≤2), unbounded peak≥3, validation
  • TestInvokeAllEdgeCases: single call, generator input, empty with max_concurrency

Docs (DESIGN_SPEC.md)

  • §11.1.1 updated to reflect concurrent execution model
  • §15.3 project structure comment updated
  • §15.5 "Parallel tool execution" row: Planned → Adopted (M2.5)

Test Plan

  • uv run ruff check src/ tests/ — all checks passed
  • uv run mypy src/ tests/ — no issues in 210 files
  • uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80 — 1830 passed, 94.97% coverage
  • Pre-commit hooks passed (ruff, gitleaks, commitizen)

Review Coverage

Pre-reviewed by 10 agents: code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency. 14 findings implemented, 11 triaged as skip (pre-existing design decisions, over-engineering, correct-by-design behavior).

Closes #112

Aureliolo and others added 2 commits March 6, 2026 14:57
Replace sequential list comprehension with asyncio.TaskGroup for
concurrent tool execution. Add optional max_concurrency semaphore.
Fatal errors (MemoryError/RecursionError) are collected via guarded
wrapper and re-raised after all tasks complete.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 10 agents, 14 findings addressed:

Source (invoker.py):
- Fix module docstring: BaseException subclasses propagate uncaught, not "after logging"
- Replace list[ToolResult | None] with dict[int, ToolResult] to eliminate type: ignore
- Use contextlib.nullcontext to remove duplicated invoke call in _run_guarded
- Add logging in _run_guarded catch block for orchestration-level context
- Extend _run_guarded docstring to document BaseException propagation
- Clarify Raises docstring wording for MemoryError/RecursionError
- Change _no_remote_retrieve return type to Never
- Change logger.exception to logger.error in helper methods
- Add safety comment for shared mutable state in asyncio context

Tests (test_invoker.py):
- Add test for mixed fatal + successful tool calls
- Verify peak=1 in test_max_concurrency_one_sequential via tracking tool
- Strengthen peak > 1 assertion to peak >= 3
- Widen timing bound from 0.25s to 0.28s for CI resilience

Docs (DESIGN_SPEC.md):
- Fix stale "sequential execution" comment in §15.3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 6, 2026 14:16
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 6, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bc967cb2-393c-46ef-835a-0d7307094ad5

📥 Commits

Reviewing files that changed from the base of the PR and between 29d79a8 and 7583a1c.

📒 Files selected for processing (3)
  • src/ai_company/tools/invoker.py
  • tests/unit/tools/conftest.py
  • tests/unit/tools/test_invoker.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Tool invocation now runs concurrently for better performance.
    • Added max_concurrency option to limit parallelism.
  • Behavior Changes

    • Non-recoverable errors are now collected and re-raised after all tasks finish (single error re-raises directly; multiple form an exception group).
    • Recoverable tool errors no longer abort sibling invocations.
  • Observability

    • Added start/complete events for batch tool invocation.
  • Tests

    • New concurrency-focused tests and fixtures validating parallelism and error behavior.

Walkthrough

ToolInvoker.invoke_all was changed from sequential to concurrent execution using asyncio.TaskGroup with an optional max_concurrency limit; recoverable ToolResult errors no longer cancel siblings, while non-recoverable errors are collected and re-raised (single as bare exception, multiple as ExceptionGroup). New observability events and concurrency tests were added.

Changes

Cohort / File(s) Summary
Core Concurrency Implementation
src/ai_company/tools/invoker.py
Replaced sequential invoke_all with concurrent TaskGroup-based implementation; added max_concurrency parameter and _run_guarded helper; collect and aggregate non-recoverable errors (MemoryError, RecursionError) and re-raise after tasks complete; updated logging and _no_remote_retrieve return type to Never.
Observability Events
src/ai_company/observability/events/tool.py
Added TOOL_INVOKE_ALL_START and TOOL_INVOKE_ALL_COMPLETE constants for batch invocation lifecycle events.
Documentation
DESIGN_SPEC.md
Updated public API docs to describe concurrent invoke_all behavior and fatal-error aggregation semantics.
Test Infrastructure
tests/unit/tools/conftest.py
Added _DelayTool and _ConcurrencyTrackingTool test utilities and fixtures concurrency_tracking_tool and concurrency_invoker.
Concurrency Tests
tests/unit/tools/test_invoker.py
Added multiple new tests for concurrent execution, ordering, bounded concurrency (max_concurrency), edge cases (single, generator, empty), and error propagation (recoverable vs non-recoverable).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Invoker as ToolInvoker
    participant TG as TaskGroup
    participant Sem as Semaphore
    participant Tools as Tools
    participant Collector as ErrorCollector

    Client->>Invoker: invoke_all(tool_calls, max_concurrency=2)
    Invoker->>Sem: create Semaphore(2) (if bounded)
    Invoker->>TG: create TaskGroup

    par Concurrent execution
        loop per tool_call
            TG->>Invoker: create _run_guarded task
            Invoker->>Sem: acquire (if sem present)
            Sem-->>Invoker: acquired
            Invoker->>Tools: invoke(tool_call)
            alt success
                Tools-->>Invoker: ToolResult(success)
                Invoker->>Invoker: store result[idx]
            else recoverable error
                Tools-->>Invoker: ToolResult(is_error=True)
                Invoker->>Invoker: store error result[idx]
            else non-recoverable error
                Tools-->>Collector: MemoryError/RecursionError
                Collector-->>Invoker: record fatal error
                Invoker->>Invoker: store placeholder result[idx]
            end
            Invoker->>Sem: release (if used)
        end
    end

    TG-->>Invoker: all tasks complete
    alt fatal errors collected
        Collector->>Invoker: fatal_errors
        alt single error
            Invoker-->>Client: raise bare exception
        else multiple errors
            Invoker-->>Client: raise ExceptionGroup(fatal_errors)
        end
    else no fatal errors
        Invoker-->>Client: return tuple(results in input order)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 80.65% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: parallel tool execution in ToolInvoker.invoke_all' directly and accurately summarizes the main change—introducing parallel/concurrent execution to the invoke_all method.
Description check ✅ Passed The description comprehensively documents the concurrent execution implementation, error handling changes, test additions, and documentation updates, all directly related to the changeset.
Linked Issues check ✅ Passed The PR successfully implements all acceptance criteria from issue #112: concurrent execution via TaskGroup [#112], max_concurrency parameter with validation [#112], input-order preservation in results [#112], robust error handling for recoverable and fatal errors [#112], comprehensive test coverage [#112], and documentation updates [#112].
Out of Scope Changes check ✅ Passed All changes are directly scoped to issue #112 objectives: invoker.py modifications implement concurrent execution, test fixtures and tests validate behavior, documentation updates reflect the new model, and event constants support observability—no unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/parallel-tool-execution

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates ToolInvoker.invoke_all from sequential execution to concurrent execution using asyncio.TaskGroup, with an optional max_concurrency semaphore parameter. Fatal errors (MemoryError, RecursionError) are collected by a _run_guarded wrapper and re-raised after all tasks complete.

Changes:

  • Replaced the sequential loop in invoke_all with asyncio.TaskGroup for structured concurrency, adding a _run_guarded helper to collect fatal errors without canceling siblings
  • Added max_concurrency parameter backed by asyncio.Semaphore with input validation
  • Added comprehensive test suites for concurrency behavior, bounded parallelism, and edge cases, plus updated design docs

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/ai_company/tools/invoker.py Core implementation: _run_guarded wrapper, concurrent invoke_all with TaskGroup and semaphore, _no_remote_retrieve return type fix, logging corrections
src/ai_company/observability/events/tool.py New event constants for invoke_all start/complete logging
tests/unit/tools/conftest.py New _DelayTool, _ConcurrencyTrackingTool fixtures and concurrency_invoker fixture
tests/unit/tools/test_invoker.py New test classes for concurrency, bounded parallelism, and edge cases
DESIGN_SPEC.md Updated docs to reflect concurrent execution model and "Adopted" status

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +463 to +464
# Sequential would take >= 0.3s; generous bound for CI overhead
assert elapsed < 0.28
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timing assertion elapsed < 0.28 is unreliable: three tasks each sleeping 0.1s concurrently should take ~0.1s, but 0.28 is dangerously close to the 0.3s sequential threshold. On a heavily loaded CI runner, asyncio task scheduling overhead could push elapsed time past 0.28s while still being concurrent. Consider using a more generous bound like 0.25 (which still proves concurrency vs 0.3s sequential) or, better yet, assert elapsed < 0.25 since the theoretical concurrent time is ~0.1s.

Suggested change
# Sequential would take >= 0.3s; generous bound for CI overhead
assert elapsed < 0.28
# Sequential would take >= 0.3s; concurrent should be well under this
assert elapsed < 0.25

Copilot uses AI. Check for mistakes.
@@ -1,14 +1,19 @@
"""Tests for ToolInvoker."""

import time
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time is imported but only time.monotonic() is used in test_concurrent_faster_than_sequential. For async tests, consider using the event loop's time (asyncio.get_event_loop().time()) for consistency, though time.monotonic() works fine for wall-clock measurements. This is a minor point — just noting the import is used only in one test.

Copilot uses AI. Check for mistakes.
Comment on lines +347 to +355
except (MemoryError, RecursionError) as exc:
logger.warning(
TOOL_INVOKE_NON_RECOVERABLE,
tool_call_id=tool_call.id,
tool_name=tool_call.name,
index=index,
error=f"{type(exc).__name__}: {exc}",
)
fatal_errors.append(exc)
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _run_guarded, MemoryError and RecursionError are caught and silently collected, but self.invoke() already catches these same exceptions, logs them with logger.exception, and re-raises. This means the _run_guarded catch will see them after they've already been logged with a full traceback via logger.exception in _execute_tool / _validate_params. The logger.warning here then logs a second time at a lower severity. This is intentional dual-logging (once at the origin, once at the orchestration level), but the orchestration log at warning level is misleading — a non-recoverable error is more severe than a warning. Consider using logger.error to be consistent with the severity of the event.

Copilot uses AI. Check for mistakes.
Comment on lines +413 to +421
logger.info(
TOOL_INVOKE_ALL_COMPLETE,
count=len(calls),
fatal_count=len(fatal_errors),
)

if fatal_errors:
if len(fatal_errors) == 1:
raise fatal_errors[0]
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When fatal_errors is non-empty, the TOOL_INVOKE_ALL_COMPLETE log at info level fires right before raising. This is fine for observability, but note that when fatal errors occur, results may be incomplete (missing entries for fatal slots). If a caller ever catches the raised exception and somehow tries to inspect partial results, they won't have access. This is acceptable given the current design, but consider documenting that no partial results are available on fatal error.

Copilot uses AI. Check for mistakes.
Comment on lines +387 to +390
@pytest.fixture
def concurrency_tracking_tool() -> _ConcurrencyTrackingTool:
"""Standalone tracking tool for direct peak inspection."""
return _ConcurrencyTrackingTool()
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concurrency_tracking_tool fixture creates a new instance, and the concurrency_invoker fixture receives it as a parameter and includes it in the tool list. However, the _ConcurrencyTrackingTool creates its asyncio.Lock() at __init__ time. If tests run under different event loops (e.g., with pytest-asyncio creating a new loop per test), the lock will be bound to the wrong loop in Python 3.9 (though this is fine in 3.10+ where locks are not bound to a loop). More importantly, the reset() method is never called between tests — since the fixture has no explicit scope and defaults to function scope, a new instance is created per test, so this is actually fine. The reset() method appears to be dead code.

Copilot uses AI. Check for mistakes.
for i in range(5)
]
await concurrency_invoker.invoke_all(calls)
assert concurrency_tracking_tool.peak >= 3
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion (peak >= 3) relies on the asyncio scheduler launching at least 3 of 5 tasks before any completes their 0.05s sleep. While this typically holds, it's not guaranteed — under extreme system load or slow CI, the scheduler might not start the third task before the first completes. This makes the test potentially flaky. Consider increasing the sleep duration (e.g., 0.1s) or lowering the threshold (e.g., peak >= 2) to make it more robust while still proving unbounded concurrency exceeds the bounded case.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/tools/test_invoker.py`:
- Around line 445-464: The timing assertion in
test_concurrent_faster_than_sequential is too tight and can flake under CI;
update the assertion on elapsed (measured around concurrency_invoker.invoke_all)
to a more generous bound (e.g., assert elapsed < 0.5) or use a ratio-based check
(e.g., elapsed < 0.3 * 0.9) so the test still verifies concurrency without
failing due to scheduling variance.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5a032285-ec51-49c6-b32c-6eaf04ce25fe

📥 Commits

Reviewing files that changed from the base of the PR and between e9cba89 and 29d79a8.

📒 Files selected for processing (5)
  • DESIGN_SPEC.md
  • src/ai_company/observability/events/tool.py
  • src/ai_company/tools/invoker.py
  • tests/unit/tools/conftest.py
  • tests/unit/tools/test_invoker.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations in Python code — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) for multiple exception handling in Python 3.14 per PEP 758
All public functions and classes must have type hints; mypy strict mode is enforced
Docstrings must follow Google style and are required on all public classes and functions (enforced by ruff D rules)
Create new objects rather than mutating existing ones — use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement in non-Pydantic internal collections
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models for runtime state that evolves
Use Pydantic v2 BaseModel, model_validator, computed_field, and ConfigDict; use @computed_field for derived values instead of storing redundant fields
Use NotBlankStr from core.types for all identifier and name fields instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over bare create_task
Line length must not exceed 88 characters (ruff enforced)
Functions must be less than 50 lines and files must be less than 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

  • tests/unit/tools/test_invoker.py
  • src/ai_company/observability/events/tool.py
  • src/ai_company/tools/invoker.py
  • tests/unit/tools/conftest.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use fake model IDs and names in tests (e.g. test-haiku-001, test-provider), never real vendor model IDs — tests must not be coupled to external providers
Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow

Files:

  • tests/unit/tools/test_invoker.py
  • tests/unit/tools/conftest.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic must have: from ai_company.observability import get_logger then logger = get_logger(name); never use import logging or print()
Always use logger variable name (not _logger or log)
Use event name constants from ai_company.observability.events domain-specific modules instead of string literals
Always use structured logging with logger.info(EVENT, key=value) format; never use logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG level for object creation, internal flow, and entry/exit of key functions

Files:

  • src/ai_company/observability/events/tool.py
  • src/ai_company/tools/invoker.py
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Applies to **/*.py : Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over bare create_task
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Use `threading.Event` instead of `time.sleep()` for thread synchronization in test code.

Applied to files:

  • tests/unit/tools/test_invoker.py
  • tests/unit/tools/conftest.py
📚 Learning: 2026-03-06T13:28:37.851Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from ai_company.observability.events domain-specific modules instead of string literals

Applied to files:

  • src/ai_company/observability/events/tool.py
📚 Learning: 2026-03-06T13:28:37.851Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Applies to **/*.py : Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over bare create_task

Applied to files:

  • src/ai_company/tools/invoker.py
  • DESIGN_SPEC.md
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use `async def` for async test functions; pytest-asyncio is configured with `asyncio_mode = "auto"`; mark async tests with `pytest.mark.asyncio` if needed

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

  • tests/unit/tools/conftest.py
📚 Learning: 2026-03-06T13:28:37.851Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Set asyncio_mode = 'auto' in pytest configuration; do not use manual pytest.mark.asyncio

Applied to files:

  • tests/unit/tools/conftest.py
🔇 Additional comments (14)
src/ai_company/observability/events/tool.py (1)

20-21: LGTM!

The new event constants follow the established naming convention and are consistent with existing tool events.

tests/unit/tools/conftest.py (3)

316-340: LGTM!

The _DelayTool correctly implements the async execute method and uses asyncio.sleep for non-blocking delays, making it suitable for testing concurrent execution.


343-384: LGTM!

The _ConcurrencyTrackingTool correctly uses asyncio.Lock to protect the counter updates and track peak concurrency. The lock is appropriate here since asyncio.Lock is non-blocking within the async context.


387-405: LGTM!

The fixtures are well-structured. Using concurrency_tracking_tool as a dependency in concurrency_invoker ensures the same tool instance is used for both invocation and peak inspection in tests.

DESIGN_SPEC.md (2)

1476-1477: LGTM!

The documentation accurately describes the concurrent execution model, including the max_concurrency parameter, recoverable vs non-recoverable error handling, and the ExceptionGroup behavior for multiple fatal errors.


2248-2248: LGTM!

The engineering conventions table correctly reflects the adopted parallel tool execution pattern with asyncio.TaskGroup and optional max_concurrency semaphore.

src/ai_company/tools/invoker.py (4)

5-7: LGTM!

The module docstring accurately reflects the updated error handling semantics: recoverable errors return ToolResult(is_error=True), non-recoverable errors are logged and re-raised, and BaseException subclasses propagate uncaught.


47-49: LGTM!

The Never return type correctly documents that this function always raises and never returns normally.


328-355: LGTM!

The _run_guarded wrapper correctly:

  • Uses nullcontext() when no semaphore is provided, avoiding code duplication
  • Catches only MemoryError/RecursionError to prevent TaskGroup cancellation
  • Allows BaseException subclasses (KeyboardInterrupt, CancelledError) to propagate for proper cancellation semantics
  • Stores results by index for order preservation

384-425: Validates max_concurrency and properly re-raises fatal errors.

The implementation is correct:

  • Validates max_concurrency >= 1 with a clear error message
  • Early return for empty input avoids unnecessary TaskGroup creation
  • Uses dict[int, ToolResult] for type-safe result storage
  • Re-raises single fatal error bare, multiple as ExceptionGroup
  • Results tuple preserves input order via index lookup
tests/unit/tools/test_invoker.py (4)

3-15: LGTM!

The imports are appropriately organized with time for timing assertions, Iterator in TYPE_CHECKING block for type hints, and the tracking tool import for fixture type annotations.


489-557: LGTM!

Comprehensive test coverage for error handling:

  • Recoverable errors don't cancel siblings
  • Single fatal error raises bare exception
  • Mixed fatal and success still raises
  • Multiple fatal errors raise ExceptionGroup with correct exception count

564-627: LGTM!

The bounded concurrency tests correctly verify:

  • max_concurrency=1 enforces sequential execution (peak=1)
  • max_concurrency=2 bounds parallelism (peak≤2)
  • max_concurrency=None allows unbounded parallelism
  • Invalid values (0, negative) raise ValueError

630-667: LGTM!

Edge case coverage is thorough:

  • Single-element input
  • Generator input (non-list iterable)
  • Empty input with max_concurrency parameter

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR migrates ToolInvoker.invoke_all from a sequential for loop to asyncio.TaskGroup-based concurrent execution, with an optional max_concurrency semaphore to bound parallelism. The change is well-motivated, correctly implemented, and thoroughly tested (94.97% coverage, 1830 tests passing). The _run_guarded wrapper cleanly isolates fatal exceptions from sibling tasks by collecting them rather than propagating them, which is the right pattern for structured concurrency.

Key observations:

  • Log level inconsistency: TOOL_INVOKE_ALL_COMPLETE is emitted at info level even when fatal_count > 0, immediately before re-raising a fatal exception. Dashboards and alerting systems that count this event at info as a clean-completion signal will miscount.
  • Asymmetric exception re-raise: One fatal error → bare MemoryError/RecursionError; multiple fatal errors → ExceptionGroup. Callers must guard both shapes to avoid silently swallowed errors in the multi-error case.
  • _ConcurrencyTrackingTool stale peak in content: The test fixture returns str(self._peak) as the result content, but this value is captured before sibling tasks finish, so it may under-report. No existing test asserts on this content so there's no failure today, but it could mislead if content is ever used as a concurrency measurement.

Confidence Score: 4/5

  • Safe to merge — the concurrency model is correct and the fatal-error collection pattern is sound; remaining issues are style/observability concerns.
  • The core implementation is correct: asyncio.TaskGroup + _run_guarded correctly prevents fatal exceptions from cancelling sibling tasks, order preservation via indexed dict is type-safe, and the semaphore correctly limits concurrency. The two deductions are: (1) TOOL_INVOKE_ALL_COMPLETE is logged at info regardless of fatal_count, which can confuse monitoring, and (2) the bare-vs-ExceptionGroup asymmetry for fatal re-raises can lead to incomplete exception handling in callers.
  • src/ai_company/tools/invoker.py lines 406–416 (log level + exception re-raise shape)

Important Files Changed

Filename Overview
src/ai_company/tools/invoker.py Core change — migrates invoke_all from sequential loop to asyncio.TaskGroup with optional Semaphore-based concurrency cap. Implementation is sound; _run_guarded correctly prevents fatal exceptions from cancelling sibling tasks. Two style issues: TOOL_INVOKE_ALL_COMPLETE is logged at info level even when fatal errors are about to be raised, and the bare-vs-ExceptionGroup asymmetry for re-raised fatals may surprise callers.
tests/unit/tools/test_invoker.py Comprehensive new test suites covering concurrency, bounded execution, edge cases, and fatal-error propagation. Timing thresholds are generous (0.5 s for 3×0.1 s tasks). The _ConcurrencyTrackingTool.peak content value returned in results may be stale at task completion time, though no test asserts on it so correctness is unaffected today.
tests/unit/tools/conftest.py New _DelayTool and _ConcurrencyTrackingTool helpers and their fixtures are well-structured. concurrency_tracking_tool is correctly injected into concurrency_invoker so both fixtures share the same instance. Minor: _ConcurrencyTrackingTool.execute returns str(self._peak) as content at a point where the peak can still grow from sibling tasks.
src/ai_company/observability/events/tool.py Adds two new Final[str] event constants — TOOL_INVOKE_ALL_START and TOOL_INVOKE_ALL_COMPLETE — following the existing pattern. No issues.
DESIGN_SPEC.md Documentation updates accurately reflect the new concurrent model. §11.1.1 and §15.5 are updated consistently. No issues.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant invoke_all
    participant TaskGroup
    participant _run_guarded
    participant Semaphore
    participant invoke

    Caller->>invoke_all: invoke_all(tool_calls, max_concurrency=N)
    invoke_all->>invoke_all: validate max_concurrency, list(tool_calls)
    invoke_all->>invoke_all: log TOOL_INVOKE_ALL_START

    invoke_all->>TaskGroup: async with TaskGroup() as tg
    loop for each (idx, call)
        TaskGroup->>_run_guarded: tg.create_task(_run_guarded(idx, call, ...))
    end

    par concurrent execution
        _run_guarded->>Semaphore: async with semaphore (or nullcontext)
        Semaphore-->>_run_guarded: acquired
        _run_guarded->>invoke: await self.invoke(tool_call)
        alt recoverable error
            invoke-->>_run_guarded: ToolResult(is_error=True)
            _run_guarded->>_run_guarded: results[idx] = result
        else fatal error (MemoryError/RecursionError)
            invoke-->>_run_guarded: raises MemoryError/RecursionError
            _run_guarded->>_run_guarded: fatal_errors.append(exc)
        else BaseException (KeyboardInterrupt, CancelledError)
            invoke-->>_run_guarded: raises BaseException
            _run_guarded-->>TaskGroup: propagates → cancels siblings
        end
        _run_guarded->>Semaphore: release
    end

    TaskGroup-->>invoke_all: all tasks complete
    invoke_all->>invoke_all: log TOOL_INVOKE_ALL_COMPLETE (fatal_count)

    alt no fatal errors
        invoke_all-->>Caller: tuple(results[i] for i in range(len(calls)))
    else exactly 1 fatal error
        invoke_all-->>Caller: raise fatal_errors[0] (bare exception)
    else multiple fatal errors
        invoke_all-->>Caller: raise ExceptionGroup(fatal_errors)
    end
Loading

Last reviewed commit: 7583a1c

…, and Greptile

- Remove duplicate TOOL_INVOKE_NON_RECOVERABLE log in _run_guarded
  (upstream logger.exception already records full context + traceback)
- Use logger.exception in _unexpected_validation_result to preserve
  traceback for unexpected validation errors
- Relax timing assertion in test_concurrent_faster_than_sequential
  from 0.28s to 0.5s to prevent CI flakiness
- Rename test_empty_schema_skips_validation to
  test_no_schema_skips_validation (tests None, not empty dict)
- Fix misleading "atomic counter" docstring to "lock-guarded counter"
- Expand extended_invoker fixture docstring to enumerate tools
- Remove unused _ConcurrencyTrackingTool.reset() dead code
- Fix imprecise deepcopy test comment
- Fix misleading "generous bound" comment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Aureliolo Aureliolo merged commit 58517ee into main Mar 6, 2026
10 of 11 checks passed
@Aureliolo Aureliolo deleted the feat/parallel-tool-execution branch March 6, 2026 14:28
Comment on lines +406 to +410
logger.info(
TOOL_INVOKE_ALL_COMPLETE,
count=len(calls),
fatal_count=len(fatal_errors),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info-level completion log emitted even when fatal errors will be raised

TOOL_INVOKE_ALL_COMPLETE is logged at info regardless of fatal_count. Any log aggregator or dashboard that counts TOOL_INVOKE_ALL_COMPLETE at info as a success metric will miscount — the event is emitted, and then an exception is immediately raised, making the operation an effective failure from the caller's perspective.

Consider using a conditional log level when fatal_errors is non-empty:

Suggested change
logger.info(
TOOL_INVOKE_ALL_COMPLETE,
count=len(calls),
fatal_count=len(fatal_errors),
)
if fatal_errors:
logger.warning(
TOOL_INVOKE_ALL_COMPLETE,
count=len(calls),
fatal_count=len(fatal_errors),
)
else:
logger.info(
TOOL_INVOKE_ALL_COMPLETE,
count=len(calls),
fatal_count=0,
)

This ensures that monitoring systems observing TOOL_INVOKE_ALL_COMPLETE at info level accurately reflects clean completions only.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/invoker.py
Line: 406-410

Comment:
**`info`-level completion log emitted even when fatal errors will be raised**

`TOOL_INVOKE_ALL_COMPLETE` is logged at `info` regardless of `fatal_count`. Any log aggregator or dashboard that counts `TOOL_INVOKE_ALL_COMPLETE` at `info` as a success metric will miscount — the event is emitted, and then an exception is immediately raised, making the operation an effective failure from the caller's perspective.

Consider using a conditional log level when `fatal_errors` is non-empty:

```suggestion
        if fatal_errors:
            logger.warning(
                TOOL_INVOKE_ALL_COMPLETE,
                count=len(calls),
                fatal_count=len(fatal_errors),
            )
        else:
            logger.info(
                TOOL_INVOKE_ALL_COMPLETE,
                count=len(calls),
                fatal_count=0,
            )
```

This ensures that monitoring systems observing `TOOL_INVOKE_ALL_COMPLETE` at `info` level accurately reflects clean completions only.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +412 to +416
if fatal_errors:
if len(fatal_errors) == 1:
raise fatal_errors[0]
msg = "multiple non-recoverable tool errors"
raise ExceptionGroup(msg, fatal_errors)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asymmetric exception type makes caller handling error-prone

When exactly one fatal error is collected, invoke_all re-raises a bare MemoryError or RecursionError. When two or more occur, it raises an ExceptionGroup. This means every caller that needs to handle fatal errors from invoke_all must cover two structurally different exception types:

try:
    results = await invoker.invoke_all(calls)
except (MemoryError, RecursionError):
    # single-error path
    ...
except ExceptionGroup as eg:
    # multi-error path  
    ...

If a caller only writes the first except clause (which is the obvious, natural thing to do), it silently misses the ExceptionGroup case and the errors go unhandled.

A consistent approach — always wrapping in ExceptionGroup, even for a single error — would let callers use a single except ExceptionGroup branch and introspect the contained exceptions uniformly. The PR description's own "Robust error handling" bullet even describes the asymmetry as a feature, but the caller-side complexity it creates is real. At minimum the docstring should warn that callers must handle both exception shapes.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/invoker.py
Line: 412-416

Comment:
**Asymmetric exception type makes caller handling error-prone**

When exactly one fatal error is collected, `invoke_all` re-raises a bare `MemoryError` or `RecursionError`. When two or more occur, it raises an `ExceptionGroup`. This means every caller that needs to handle fatal errors from `invoke_all` must cover two structurally different exception types:

```python
try:
    results = await invoker.invoke_all(calls)
except (MemoryError, RecursionError):
    # single-error path
    ...
except ExceptionGroup as eg:
    # multi-error path  
    ...
```

If a caller only writes the first `except` clause (which is the obvious, natural thing to do), it silently misses the `ExceptionGroup` case and the errors go unhandled.

A consistent approach — always wrapping in `ExceptionGroup`, even for a single error — would let callers use a single `except ExceptionGroup` branch and introspect the contained exceptions uniformly. The PR description's own "Robust error handling" bullet even describes the asymmetry as a feature, but the caller-side complexity it creates is real. At minimum the docstring should warn that callers must handle **both** exception shapes.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +374 to +381
) -> ToolExecutionResult:
async with self._lock:
self._current += 1
self._peak = max(self._peak, self._current)
await asyncio.sleep(arguments["duration"])
async with self._lock:
self._current -= 1
return ToolExecutionResult(content=str(self._peak))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ConcurrencyTrackingTool.peak value returned as content may be stale

Inside execute, the content returned is str(self._peak) captured at the moment the individual task finishes, not after all concurrent tasks have exited. Because sibling tasks are still running when a given task releases the lock, the _peak value written to content may be lower than the true final peak:

await asyncio.sleep(arguments["duration"])      # other tasks still running
async with self._lock:
    self._current -= 1
return ToolExecutionResult(content=str(self._peak))   # stale peak

The tests don't assert on result.content for the tracking tool (they correctly read concurrency_tracking_tool.peak after all invocations), so this doesn't affect test correctness today. However, if content is ever used as evidence of peak concurrency elsewhere, it will silently give wrong numbers. Consider capturing peak from the shared state after all tasks complete, or removing it from the result content altogether since the fixture exposes .peak directly.

Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/tools/conftest.py
Line: 374-381

Comment:
**`_ConcurrencyTrackingTool.peak` value returned as content may be stale**

Inside `execute`, the content returned is `str(self._peak)` captured at the moment the individual task finishes, not after all concurrent tasks have exited. Because sibling tasks are still running when a given task releases the lock, the `_peak` value written to `content` may be lower than the true final peak:

```python
await asyncio.sleep(arguments["duration"])      # other tasks still running
async with self._lock:
    self._current -= 1
return ToolExecutionResult(content=str(self._peak))   # stale peak
```

The tests don't assert on `result.content` for the tracking tool (they correctly read `concurrency_tracking_tool.peak` after all invocations), so this doesn't affect test correctness today. However, if `content` is ever used as evidence of peak concurrency elsewhere, it will silently give wrong numbers. Consider capturing peak from the shared state after all tasks complete, or removing it from the result content altogether since the fixture exposes `.peak` directly.

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support parallel tool execution in ToolInvoker.invoke_all

2 participants