feat: parallel tool execution in ToolInvoker.invoke_all by Aureliolo · Pull Request #137 · Aureliolo/synthorg

Aureliolo · 2026-03-06T14:16:16Z

Summary

Concurrent tool execution: ToolInvoker.invoke_all now uses asyncio.TaskGroup for structured concurrency instead of sequential execution
Optional concurrency limit: max_concurrency parameter (via asyncio.Semaphore) bounds parallelism when needed
Robust error handling: Fatal errors (MemoryError, RecursionError) are collected via _run_guarded wrapper and re-raised after all tasks complete (bare exception for one, ExceptionGroup for multiple)
Type-safe results: Uses dict[int, ToolResult] instead of list[ToolResult | None] — eliminates # type: ignore and provides fail-loud KeyError on missing slots
Pre-reviewed: 10 automated review agents, 14 findings addressed

Changes

Source (`src/ai_company/tools/invoker.py`)

invoke_all migrated from sequential loop to asyncio.TaskGroup
New _run_guarded helper with contextlib.nullcontext pattern, extended docstring, orchestration-level logging
_no_remote_retrieve return type → Never
logger.exception → logger.error in helper methods (not in exception context)
Module docstring corrected (BaseException subclasses propagate uncaught, not "after logging")

Tests (`tests/unit/tools/test_invoker.py`, `conftest.py`)

New _DelayTool and _ConcurrencyTrackingTool fixtures
TestInvokeAllConcurrency: timing proof, order preservation, recoverable/fatal error handling, mixed fatal+success
TestInvokeAllBounded: semaphore enforcement (peak=1, peak≤2), unbounded peak≥3, validation
TestInvokeAllEdgeCases: single call, generator input, empty with max_concurrency

Docs (`DESIGN_SPEC.md`)

§11.1.1 updated to reflect concurrent execution model
§15.3 project structure comment updated
§15.5 "Parallel tool execution" row: Planned → Adopted (M2.5)

Test Plan

uv run ruff check src/ tests/ — all checks passed
uv run mypy src/ tests/ — no issues in 210 files
uv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80 — 1830 passed, 94.97% coverage
Pre-commit hooks passed (ruff, gitleaks, commitizen)

Review Coverage

Pre-reviewed by 10 agents: code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency. 14 findings implemented, 11 triaged as skip (pre-existing design decisions, over-engineering, correct-by-design behavior).

Closes #112

Replace sequential list comprehension with asyncio.TaskGroup for concurrent tool execution. Add optional max_concurrency semaphore. Fatal errors (MemoryError/RecursionError) are collected via guarded wrapper and re-raised after all tasks complete. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Pre-reviewed by 10 agents, 14 findings addressed: Source (invoker.py): - Fix module docstring: BaseException subclasses propagate uncaught, not "after logging" - Replace list[ToolResult | None] with dict[int, ToolResult] to eliminate type: ignore - Use contextlib.nullcontext to remove duplicated invoke call in _run_guarded - Add logging in _run_guarded catch block for orchestration-level context - Extend _run_guarded docstring to document BaseException propagation - Clarify Raises docstring wording for MemoryError/RecursionError - Change _no_remote_retrieve return type to Never - Change logger.exception to logger.error in helper methods - Add safety comment for shared mutable state in asyncio context Tests (test_invoker.py): - Add test for mixed fatal + successful tool calls - Verify peak=1 in test_max_concurrency_one_sequential via tracking tool - Strengthen peak > 1 assertion to peak >= 3 - Widen timing bound from 0.25s to 0.28s for CI resilience Docs (DESIGN_SPEC.md): - Fix stale "sequential execution" comment in §15.3 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-06T14:16:25Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

coderabbitai · 2026-03-06T14:16:32Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bc967cb2-393c-46ef-835a-0d7307094ad5

📥 Commits

Reviewing files that changed from the base of the PR and between 29d79a8 and 7583a1c.

📒 Files selected for processing (3)

src/ai_company/tools/invoker.py
tests/unit/tools/conftest.py
tests/unit/tools/test_invoker.py

📝 Walkthrough

Summary by CodeRabbit

New Features
- Tool invocation now runs concurrently for better performance.
- Added max_concurrency option to limit parallelism.
Behavior Changes
- Non-recoverable errors are now collected and re-raised after all tasks finish (single error re-raises directly; multiple form an exception group).
- Recoverable tool errors no longer abort sibling invocations.
Observability
- Added start/complete events for batch tool invocation.
Tests
- New concurrency-focused tests and fixtures validating parallelism and error behavior.

Walkthrough

ToolInvoker.invoke_all was changed from sequential to concurrent execution using asyncio.TaskGroup with an optional max_concurrency limit; recoverable ToolResult errors no longer cancel siblings, while non-recoverable errors are collected and re-raised (single as bare exception, multiple as ExceptionGroup). New observability events and concurrency tests were added.

Changes

Cohort / File(s)	Summary
Core Concurrency Implementation `src/ai_company/tools/invoker.py`	Replaced sequential invoke_all with concurrent TaskGroup-based implementation; added `max_concurrency` parameter and `_run_guarded` helper; collect and aggregate non-recoverable errors (MemoryError, RecursionError) and re-raise after tasks complete; updated logging and `_no_remote_retrieve` return type to `Never`.
Observability Events `src/ai_company/observability/events/tool.py`	Added `TOOL_INVOKE_ALL_START` and `TOOL_INVOKE_ALL_COMPLETE` constants for batch invocation lifecycle events.
Documentation `DESIGN_SPEC.md`	Updated public API docs to describe concurrent invoke_all behavior and fatal-error aggregation semantics.
Test Infrastructure `tests/unit/tools/conftest.py`	Added `_DelayTool` and `_ConcurrencyTrackingTool` test utilities and fixtures `concurrency_tracking_tool` and `concurrency_invoker`.
Concurrency Tests `tests/unit/tools/test_invoker.py`	Added multiple new tests for concurrent execution, ordering, bounded concurrency (max_concurrency), edge cases (single, generator, empty), and error propagation (recoverable vs non-recoverable).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Invoker as ToolInvoker
    participant TG as TaskGroup
    participant Sem as Semaphore
    participant Tools as Tools
    participant Collector as ErrorCollector

    Client->>Invoker: invoke_all(tool_calls, max_concurrency=2)
    Invoker->>Sem: create Semaphore(2) (if bounded)
    Invoker->>TG: create TaskGroup

    par Concurrent execution
        loop per tool_call
            TG->>Invoker: create _run_guarded task
            Invoker->>Sem: acquire (if sem present)
            Sem-->>Invoker: acquired
            Invoker->>Tools: invoke(tool_call)
            alt success
                Tools-->>Invoker: ToolResult(success)
                Invoker->>Invoker: store result[idx]
            else recoverable error
                Tools-->>Invoker: ToolResult(is_error=True)
                Invoker->>Invoker: store error result[idx]
            else non-recoverable error
                Tools-->>Collector: MemoryError/RecursionError
                Collector-->>Invoker: record fatal error
                Invoker->>Invoker: store placeholder result[idx]
            end
            Invoker->>Sem: release (if used)
        end
    end

    TG-->>Invoker: all tasks complete
    alt fatal errors collected
        Collector->>Invoker: fatal_errors
        alt single error
            Invoker-->>Client: raise bare exception
        else multiple errors
            Invoker-->>Client: raise ExceptionGroup(fatal_errors)
        end
    else no fatal errors
        Invoker-->>Client: return tuple(results in input order)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

fix: strengthen immutability for BaseTool schema and ToolInvoker boundaries #117: Modifies ToolInvoker error-handling and invocation behavior; also deals with concurrency and attribution of deepcopy vs execution errors.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 80.65% which is insufficient. The required threshold is 100.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: parallel tool execution in ToolInvoker.invoke_all' directly and accurately summarizes the main change—introducing parallel/concurrent execution to the invoke_all method.
Description check	✅ Passed	The description comprehensively documents the concurrent execution implementation, error handling changes, test additions, and documentation updates, all directly related to the changeset.
Linked Issues check	✅ Passed	The PR successfully implements all acceptance criteria from issue `#112`: concurrent execution via TaskGroup [`#112`], max_concurrency parameter with validation [`#112`], input-order preservation in results [`#112`], robust error handling for recoverable and fatal errors [`#112`], comprehensive test coverage [`#112`], and documentation updates [`#112`].
Out of Scope Changes check	✅ Passed	All changes are directly scoped to issue `#112` objectives: invoker.py modifications implement concurrent execution, test fixtures and tests validate behavior, documentation updates reflect the new model, and event constants support observability—no unrelated changes detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/parallel-tool-execution

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR migrates ToolInvoker.invoke_all from sequential execution to concurrent execution using asyncio.TaskGroup, with an optional max_concurrency semaphore parameter. Fatal errors (MemoryError, RecursionError) are collected by a _run_guarded wrapper and re-raised after all tasks complete.

Changes:

Replaced the sequential loop in invoke_all with asyncio.TaskGroup for structured concurrency, adding a _run_guarded helper to collect fatal errors without canceling siblings
Added max_concurrency parameter backed by asyncio.Semaphore with input validation
Added comprehensive test suites for concurrency behavior, bounded parallelism, and edge cases, plus updated design docs

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`src/ai_company/tools/invoker.py`	Core implementation: `_run_guarded` wrapper, concurrent `invoke_all` with `TaskGroup` and semaphore, `_no_remote_retrieve` return type fix, logging corrections
`src/ai_company/observability/events/tool.py`	New event constants for `invoke_all` start/complete logging
`tests/unit/tools/conftest.py`	New `_DelayTool`, `_ConcurrencyTrackingTool` fixtures and `concurrency_invoker` fixture
`tests/unit/tools/test_invoker.py`	New test classes for concurrency, bounded parallelism, and edge cases
`DESIGN_SPEC.md`	Updated docs to reflect concurrent execution model and "Adopted" status

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-06T14:18:16Z

tests/unit/tools/test_invoker.py

+        # Sequential would take >= 0.3s; generous bound for CI overhead
+        assert elapsed < 0.28


The timing assertion elapsed < 0.28 is unreliable: three tasks each sleeping 0.1s concurrently should take ~0.1s, but 0.28 is dangerously close to the 0.3s sequential threshold. On a heavily loaded CI runner, asyncio task scheduling overhead could push elapsed time past 0.28s while still being concurrent. Consider using a more generous bound like 0.25 (which still proves concurrency vs 0.3s sequential) or, better yet, assert elapsed < 0.25 since the theoretical concurrent time is ~0.1s.

Suggested change

# Sequential would take >= 0.3s; generous bound for CI overhead

assert elapsed < 0.28

# Sequential would take >= 0.3s; concurrent should be well under this

assert elapsed < 0.25

Copilot · 2026-03-06T14:18:16Z

tests/unit/tools/test_invoker.py

@@ -1,14 +1,19 @@
 """Tests for ToolInvoker."""

+import time


time is imported but only time.monotonic() is used in test_concurrent_faster_than_sequential. For async tests, consider using the event loop's time (asyncio.get_event_loop().time()) for consistency, though time.monotonic() works fine for wall-clock measurements. This is a minor point — just noting the import is used only in one test.

Copilot · 2026-03-06T14:18:16Z

src/ai_company/tools/invoker.py

+        except (MemoryError, RecursionError) as exc:
+            logger.warning(
+                TOOL_INVOKE_NON_RECOVERABLE,
+                tool_call_id=tool_call.id,
+                tool_name=tool_call.name,
+                index=index,
+                error=f"{type(exc).__name__}: {exc}",
+            )
+            fatal_errors.append(exc)


In _run_guarded, MemoryError and RecursionError are caught and silently collected, but self.invoke() already catches these same exceptions, logs them with logger.exception, and re-raises. This means the _run_guarded catch will see them after they've already been logged with a full traceback via logger.exception in _execute_tool / _validate_params. The logger.warning here then logs a second time at a lower severity. This is intentional dual-logging (once at the origin, once at the orchestration level), but the orchestration log at warning level is misleading — a non-recoverable error is more severe than a warning. Consider using logger.error to be consistent with the severity of the event.

Copilot · 2026-03-06T14:18:17Z

src/ai_company/tools/invoker.py

+        logger.info(
+            TOOL_INVOKE_ALL_COMPLETE,
+            count=len(calls),
+            fatal_count=len(fatal_errors),
+        )
+
+        if fatal_errors:
+            if len(fatal_errors) == 1:
+                raise fatal_errors[0]


When fatal_errors is non-empty, the TOOL_INVOKE_ALL_COMPLETE log at info level fires right before raising. This is fine for observability, but note that when fatal errors occur, results may be incomplete (missing entries for fatal slots). If a caller ever catches the raised exception and somehow tries to inspect partial results, they won't have access. This is acceptable given the current design, but consider documenting that no partial results are available on fatal error.

Copilot · 2026-03-06T14:18:17Z

tests/unit/tools/conftest.py

+@pytest.fixture
+def concurrency_tracking_tool() -> _ConcurrencyTrackingTool:
+    """Standalone tracking tool for direct peak inspection."""
+    return _ConcurrencyTrackingTool()


The concurrency_tracking_tool fixture creates a new instance, and the concurrency_invoker fixture receives it as a parameter and includes it in the tool list. However, the _ConcurrencyTrackingTool creates its asyncio.Lock() at __init__ time. If tests run under different event loops (e.g., with pytest-asyncio creating a new loop per test), the lock will be bound to the wrong loop in Python 3.9 (though this is fine in 3.10+ where locks are not bound to a loop). More importantly, the reset() method is never called between tests — since the fixture has no explicit scope and defaults to function scope, a new instance is created per test, so this is actually fine. The reset() method appears to be dead code.

Copilot · 2026-03-06T14:18:17Z

tests/unit/tools/test_invoker.py

+            for i in range(5)
+        ]
+        await concurrency_invoker.invoke_all(calls)
+        assert concurrency_tracking_tool.peak >= 3


This assertion (peak >= 3) relies on the asyncio scheduler launching at least 3 of 5 tasks before any completes their 0.05s sleep. While this typically holds, it's not guaranteed — under extreme system load or slow CI, the scheduler might not start the third task before the first completes. This makes the test potentially flaky. Consider increasing the sleep duration (e.g., 0.1s) or lowering the threshold (e.g., peak >= 2) to make it more robust while still proving unbounded concurrency exceeds the bounded case.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/tools/test_invoker.py`:
- Around line 445-464: The timing assertion in
test_concurrent_faster_than_sequential is too tight and can flake under CI;
update the assertion on elapsed (measured around concurrency_invoker.invoke_all)
to a more generous bound (e.g., assert elapsed < 0.5) or use a ratio-based check
(e.g., elapsed < 0.3 * 0.9) so the test still verifies concurrency without
failing due to scheduling variance.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5a032285-ec51-49c6-b32c-6eaf04ce25fe

📥 Commits

Reviewing files that changed from the base of the PR and between e9cba89 and 29d79a8.

📒 Files selected for processing (5)

DESIGN_SPEC.md
src/ai_company/observability/events/tool.py
src/ai_company/tools/invoker.py
tests/unit/tools/conftest.py
tests/unit/tools/test_invoker.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations in Python code — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) for multiple exception handling in Python 3.14 per PEP 758
All public functions and classes must have type hints; mypy strict mode is enforced
Docstrings must follow Google style and are required on all public classes and functions (enforced by ruff D rules)
Create new objects rather than mutating existing ones — use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement in non-Pydantic internal collections
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models for runtime state that evolves
Use Pydantic v2 BaseModel, model_validator, computed_field, and ConfigDict; use @computed_field for derived values instead of storing redundant fields
Use NotBlankStr from core.types for all identifier and name fields instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over bare create_task
Line length must not exceed 88 characters (ruff enforced)
Functions must be less than 50 lines and files must be less than 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

tests/unit/tools/test_invoker.py
src/ai_company/observability/events/tool.py
src/ai_company/tools/invoker.py
tests/unit/tools/conftest.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use fake model IDs and names in tests (e.g. test-haiku-001, test-provider), never real vendor model IDs — tests must not be coupled to external providers
Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow

Files:

tests/unit/tools/test_invoker.py
tests/unit/tools/conftest.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic must have: from ai_company.observability import get_logger then logger = get_logger(name); never use import logging or print()
Always use logger variable name (not _logger or log)
Use event name constants from ai_company.observability.events domain-specific modules instead of string literals
Always use structured logging with logger.info(EVENT, key=value) format; never use logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG level for object creation, internal flow, and entry/exit of key functions

Files:

src/ai_company/observability/events/tool.py
src/ai_company/tools/invoker.py

🧠 Learnings (10)

📓 Common learnings

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Applies to **/*.py : Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over bare create_task

📚 Learning: 2026-02-26T17:43:50.902Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Use `threading.Event` instead of `time.sleep()` for thread synchronization in test code.

Applied to files:

tests/unit/tools/test_invoker.py
tests/unit/tools/conftest.py

📚 Learning: 2026-03-06T13:28:37.851Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from ai_company.observability.events domain-specific modules instead of string literals

Applied to files:

src/ai_company/observability/events/tool.py

📚 Learning: 2026-03-06T13:28:37.851Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Applies to **/*.py : Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over bare create_task

Applied to files:

src/ai_company/tools/invoker.py
DESIGN_SPEC.md

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use `async def` for async test functions; pytest-asyncio is configured with `asyncio_mode = "auto"`; mark async tests with `pytest.mark.asyncio` if needed

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

tests/unit/tools/conftest.py

📚 Learning: 2026-03-06T13:28:37.851Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Set asyncio_mode = 'auto' in pytest configuration; do not use manual pytest.mark.asyncio

Applied to files:

tests/unit/tools/conftest.py

🔇 Additional comments (14)

src/ai_company/observability/events/tool.py (1)

20-21: LGTM!

The new event constants follow the established naming convention and are consistent with existing tool events.

tests/unit/tools/conftest.py (3)

316-340: LGTM!

The _DelayTool correctly implements the async execute method and uses asyncio.sleep for non-blocking delays, making it suitable for testing concurrent execution.

343-384: LGTM!

The _ConcurrencyTrackingTool correctly uses asyncio.Lock to protect the counter updates and track peak concurrency. The lock is appropriate here since asyncio.Lock is non-blocking within the async context.

387-405: LGTM!

The fixtures are well-structured. Using concurrency_tracking_tool as a dependency in concurrency_invoker ensures the same tool instance is used for both invocation and peak inspection in tests.

DESIGN_SPEC.md (2)

1476-1477: LGTM!

The documentation accurately describes the concurrent execution model, including the max_concurrency parameter, recoverable vs non-recoverable error handling, and the ExceptionGroup behavior for multiple fatal errors.

2248-2248: LGTM!

The engineering conventions table correctly reflects the adopted parallel tool execution pattern with asyncio.TaskGroup and optional max_concurrency semaphore.

src/ai_company/tools/invoker.py (4)

5-7: LGTM!

The module docstring accurately reflects the updated error handling semantics: recoverable errors return ToolResult(is_error=True), non-recoverable errors are logged and re-raised, and BaseException subclasses propagate uncaught.

47-49: LGTM!

The Never return type correctly documents that this function always raises and never returns normally.

328-355: LGTM!

The _run_guarded wrapper correctly:

Uses nullcontext() when no semaphore is provided, avoiding code duplication

Catches only MemoryError/RecursionError to prevent TaskGroup cancellation

Allows BaseException subclasses (KeyboardInterrupt, CancelledError) to propagate for proper cancellation semantics

Stores results by index for order preservation

384-425: Validates max_concurrency and properly re-raises fatal errors.

The implementation is correct:

Validates max_concurrency >= 1 with a clear error message

Early return for empty input avoids unnecessary TaskGroup creation

Uses dict[int, ToolResult] for type-safe result storage

Re-raises single fatal error bare, multiple as ExceptionGroup

Results tuple preserves input order via index lookup

tests/unit/tools/test_invoker.py (4)

3-15: LGTM!

The imports are appropriately organized with time for timing assertions, Iterator in TYPE_CHECKING block for type hints, and the tracking tool import for fixture type annotations.

489-557: LGTM!

Comprehensive test coverage for error handling:

Recoverable errors don't cancel siblings

Single fatal error raises bare exception

Mixed fatal and success still raises

Multiple fatal errors raise ExceptionGroup with correct exception count

564-627: LGTM!

The bounded concurrency tests correctly verify:

max_concurrency=1 enforces sequential execution (peak=1)

max_concurrency=2 bounds parallelism (peak≤2)

max_concurrency=None allows unbounded parallelism

Invalid values (0, negative) raise ValueError

630-667: LGTM!

Edge case coverage is thorough:

Single-element input

Generator input (non-list iterable)

Empty input with max_concurrency parameter

tests/unit/tools/test_invoker.py

greptile-apps · 2026-03-06T14:21:58Z

Greptile Summary

This PR migrates ToolInvoker.invoke_all from a sequential for loop to asyncio.TaskGroup-based concurrent execution, with an optional max_concurrency semaphore to bound parallelism. The change is well-motivated, correctly implemented, and thoroughly tested (94.97% coverage, 1830 tests passing). The _run_guarded wrapper cleanly isolates fatal exceptions from sibling tasks by collecting them rather than propagating them, which is the right pattern for structured concurrency.

Key observations:

Log level inconsistency: TOOL_INVOKE_ALL_COMPLETE is emitted at info level even when fatal_count > 0, immediately before re-raising a fatal exception. Dashboards and alerting systems that count this event at info as a clean-completion signal will miscount.
Asymmetric exception re-raise: One fatal error → bare MemoryError/RecursionError; multiple fatal errors → ExceptionGroup. Callers must guard both shapes to avoid silently swallowed errors in the multi-error case.
_ConcurrencyTrackingTool stale peak in content: The test fixture returns str(self._peak) as the result content, but this value is captured before sibling tasks finish, so it may under-report. No existing test asserts on this content so there's no failure today, but it could mislead if content is ever used as a concurrency measurement.

Confidence Score: 4/5

Safe to merge — the concurrency model is correct and the fatal-error collection pattern is sound; remaining issues are style/observability concerns.
The core implementation is correct: asyncio.TaskGroup + _run_guarded correctly prevents fatal exceptions from cancelling sibling tasks, order preservation via indexed dict is type-safe, and the semaphore correctly limits concurrency. The two deductions are: (1) TOOL_INVOKE_ALL_COMPLETE is logged at info regardless of fatal_count, which can confuse monitoring, and (2) the bare-vs-ExceptionGroup asymmetry for fatal re-raises can lead to incomplete exception handling in callers.
src/ai_company/tools/invoker.py lines 406–416 (log level + exception re-raise shape)

Important Files Changed

Filename	Overview
src/ai_company/tools/invoker.py	Core change — migrates `invoke_all` from sequential loop to `asyncio.TaskGroup` with optional `Semaphore`-based concurrency cap. Implementation is sound; `_run_guarded` correctly prevents fatal exceptions from cancelling sibling tasks. Two style issues: `TOOL_INVOKE_ALL_COMPLETE` is logged at `info` level even when fatal errors are about to be raised, and the bare-vs-`ExceptionGroup` asymmetry for re-raised fatals may surprise callers.
tests/unit/tools/test_invoker.py	Comprehensive new test suites covering concurrency, bounded execution, edge cases, and fatal-error propagation. Timing thresholds are generous (0.5 s for 3×0.1 s tasks). The `_ConcurrencyTrackingTool.peak` content value returned in results may be stale at task completion time, though no test asserts on it so correctness is unaffected today.
tests/unit/tools/conftest.py	New `_DelayTool` and `_ConcurrencyTrackingTool` helpers and their fixtures are well-structured. `concurrency_tracking_tool` is correctly injected into `concurrency_invoker` so both fixtures share the same instance. Minor: `_ConcurrencyTrackingTool.execute` returns `str(self._peak)` as content at a point where the peak can still grow from sibling tasks.
src/ai_company/observability/events/tool.py	Adds two new `Final[str]` event constants — `TOOL_INVOKE_ALL_START` and `TOOL_INVOKE_ALL_COMPLETE` — following the existing pattern. No issues.
DESIGN_SPEC.md	Documentation updates accurately reflect the new concurrent model. §11.1.1 and §15.5 are updated consistently. No issues.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant invoke_all
    participant TaskGroup
    participant _run_guarded
    participant Semaphore
    participant invoke

    Caller->>invoke_all: invoke_all(tool_calls, max_concurrency=N)
    invoke_all->>invoke_all: validate max_concurrency, list(tool_calls)
    invoke_all->>invoke_all: log TOOL_INVOKE_ALL_START

    invoke_all->>TaskGroup: async with TaskGroup() as tg
    loop for each (idx, call)
        TaskGroup->>_run_guarded: tg.create_task(_run_guarded(idx, call, ...))
    end

    par concurrent execution
        _run_guarded->>Semaphore: async with semaphore (or nullcontext)
        Semaphore-->>_run_guarded: acquired
        _run_guarded->>invoke: await self.invoke(tool_call)
        alt recoverable error
            invoke-->>_run_guarded: ToolResult(is_error=True)
            _run_guarded->>_run_guarded: results[idx] = result
        else fatal error (MemoryError/RecursionError)
            invoke-->>_run_guarded: raises MemoryError/RecursionError
            _run_guarded->>_run_guarded: fatal_errors.append(exc)
        else BaseException (KeyboardInterrupt, CancelledError)
            invoke-->>_run_guarded: raises BaseException
            _run_guarded-->>TaskGroup: propagates → cancels siblings
        end
        _run_guarded->>Semaphore: release
    end

    TaskGroup-->>invoke_all: all tasks complete
    invoke_all->>invoke_all: log TOOL_INVOKE_ALL_COMPLETE (fatal_count)

    alt no fatal errors
        invoke_all-->>Caller: tuple(results[i] for i in range(len(calls)))
    else exactly 1 fatal error
        invoke_all-->>Caller: raise fatal_errors[0] (bare exception)
    else multiple fatal errors
        invoke_all-->>Caller: raise ExceptionGroup(fatal_errors)
    end

_{Last reviewed commit: 7583a1c}

src/ai_company/tools/invoker.py

…, and Greptile - Remove duplicate TOOL_INVOKE_NON_RECOVERABLE log in _run_guarded (upstream logger.exception already records full context + traceback) - Use logger.exception in _unexpected_validation_result to preserve traceback for unexpected validation errors - Relax timing assertion in test_concurrent_faster_than_sequential from 0.28s to 0.5s to prevent CI flakiness - Rename test_empty_schema_skips_validation to test_no_schema_skips_validation (tests None, not empty dict) - Fix misleading "atomic counter" docstring to "lock-guarded counter" - Expand extended_invoker fixture docstring to enumerate tools - Remove unused _ConcurrencyTrackingTool.reset() dead code - Fix imprecise deepcopy test comment - Fix misleading "generous bound" comment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

greptile-apps · 2026-03-06T14:32:02Z

src/ai_company/tools/invoker.py

+        logger.info(
+            TOOL_INVOKE_ALL_COMPLETE,
+            count=len(calls),
+            fatal_count=len(fatal_errors),
+        )


info-level completion log emitted even when fatal errors will be raised

TOOL_INVOKE_ALL_COMPLETE is logged at info regardless of fatal_count. Any log aggregator or dashboard that counts TOOL_INVOKE_ALL_COMPLETE at info as a success metric will miscount — the event is emitted, and then an exception is immediately raised, making the operation an effective failure from the caller's perspective.

Consider using a conditional log level when fatal_errors is non-empty:

Suggested change

logger.info(

TOOL_INVOKE_ALL_COMPLETE,

count=len(calls),

fatal_count=len(fatal_errors),

)

if fatal_errors:

logger.warning(

TOOL_INVOKE_ALL_COMPLETE,

count=len(calls),

fatal_count=len(fatal_errors),

)

else:

logger.info(

TOOL_INVOKE_ALL_COMPLETE,

count=len(calls),

fatal_count=0,

)

This ensures that monitoring systems observing TOOL_INVOKE_ALL_COMPLETE at info level accurately reflects clean completions only.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/tools/invoker.py Line: 406-410 Comment: **`info`-level completion log emitted even when fatal errors will be raised** `TOOL_INVOKE_ALL_COMPLETE` is logged at `info` regardless of `fatal_count`. Any log aggregator or dashboard that counts `TOOL_INVOKE_ALL_COMPLETE` at `info` as a success metric will miscount — the event is emitted, and then an exception is immediately raised, making the operation an effective failure from the caller's perspective. Consider using a conditional log level when `fatal_errors` is non-empty: ```suggestion if fatal_errors: logger.warning( TOOL_INVOKE_ALL_COMPLETE, count=len(calls), fatal_count=len(fatal_errors), ) else: logger.info( TOOL_INVOKE_ALL_COMPLETE, count=len(calls), fatal_count=0, ) ``` This ensures that monitoring systems observing `TOOL_INVOKE_ALL_COMPLETE` at `info` level accurately reflects clean completions only. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-06T14:32:03Z

src/ai_company/tools/invoker.py

+        if fatal_errors:
+            if len(fatal_errors) == 1:
+                raise fatal_errors[0]
+            msg = "multiple non-recoverable tool errors"
+            raise ExceptionGroup(msg, fatal_errors)


Asymmetric exception type makes caller handling error-prone

When exactly one fatal error is collected, invoke_all re-raises a bare MemoryError or RecursionError. When two or more occur, it raises an ExceptionGroup. This means every caller that needs to handle fatal errors from invoke_all must cover two structurally different exception types:

try: results = await invoker.invoke_all(calls) except (MemoryError, RecursionError): # single-error path ... except ExceptionGroup as eg: # multi-error path ...

If a caller only writes the first except clause (which is the obvious, natural thing to do), it silently misses the ExceptionGroup case and the errors go unhandled.

A consistent approach — always wrapping in ExceptionGroup, even for a single error — would let callers use a single except ExceptionGroup branch and introspect the contained exceptions uniformly. The PR description's own "Robust error handling" bullet even describes the asymmetry as a feature, but the caller-side complexity it creates is real. At minimum the docstring should warn that callers must handle both exception shapes.

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/tools/invoker.py Line: 412-416 Comment: **Asymmetric exception type makes caller handling error-prone** When exactly one fatal error is collected, `invoke_all` re-raises a bare `MemoryError` or `RecursionError`. When two or more occur, it raises an `ExceptionGroup`. This means every caller that needs to handle fatal errors from `invoke_all` must cover two structurally different exception types: ```python try: results = await invoker.invoke_all(calls) except (MemoryError, RecursionError): # single-error path ... except ExceptionGroup as eg: # multi-error path ... ``` If a caller only writes the first `except` clause (which is the obvious, natural thing to do), it silently misses the `ExceptionGroup` case and the errors go unhandled. A consistent approach — always wrapping in `ExceptionGroup`, even for a single error — would let callers use a single `except ExceptionGroup` branch and introspect the contained exceptions uniformly. The PR description's own "Robust error handling" bullet even describes the asymmetry as a feature, but the caller-side complexity it creates is real. At minimum the docstring should warn that callers must handle **both** exception shapes. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-06T14:32:04Z

tests/unit/tools/conftest.py

+    ) -> ToolExecutionResult:
+        async with self._lock:
+            self._current += 1
+            self._peak = max(self._peak, self._current)
+        await asyncio.sleep(arguments["duration"])
+        async with self._lock:
+            self._current -= 1
+        return ToolExecutionResult(content=str(self._peak))


_ConcurrencyTrackingTool.peak value returned as content may be stale

Inside execute, the content returned is str(self._peak) captured at the moment the individual task finishes, not after all concurrent tasks have exited. Because sibling tasks are still running when a given task releases the lock, the _peak value written to content may be lower than the true final peak:

await asyncio.sleep(arguments["duration"]) # other tasks still running async with self._lock: self._current -= 1 return ToolExecutionResult(content=str(self._peak)) # stale peak

The tests don't assert on result.content for the tracking tool (they correctly read concurrency_tracking_tool.peak after all invocations), so this doesn't affect test correctness today. However, if content is ever used as evidence of peak concurrency elsewhere, it will silently give wrong numbers. Consider capturing peak from the shared state after all tasks complete, or removing it from the result content altogether since the fixture exposes .peak directly.

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/unit/tools/conftest.py Line: 374-381 Comment: **`_ConcurrencyTrackingTool.peak` value returned as content may be stale** Inside `execute`, the content returned is `str(self._peak)` captured at the moment the individual task finishes, not after all concurrent tasks have exited. Because sibling tasks are still running when a given task releases the lock, the `_peak` value written to `content` may be lower than the true final peak: ```python await asyncio.sleep(arguments["duration"]) # other tasks still running async with self._lock: self._current -= 1 return ToolExecutionResult(content=str(self._peak)) # stale peak ``` The tests don't assert on `result.content` for the tracking tool (they correctly read `concurrency_tracking_tool.peak` after all invocations), so this doesn't affect test correctness today. However, if `content` is ever used as evidence of peak concurrency elsewhere, it will silently give wrong numbers. Consider capturing peak from the shared state after all tasks complete, or removing it from the result content altogether since the fixture exposes `.peak` directly. How can I resolve this? If you propose a fix, please make it concise.

🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>

Aureliolo and others added 2 commits March 6, 2026 14:57

Copilot AI review requested due to automatic review settings March 6, 2026 14:16

Copilot AI reviewed Mar 6, 2026

View reviewed changes

coderabbitai bot reviewed Mar 6, 2026

View reviewed changes

tests/unit/tools/test_invoker.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

src/ai_company/tools/invoker.py Show resolved Hide resolved

Aureliolo merged commit 58517ee into main Mar 6, 2026
10 of 11 checks passed

Aureliolo deleted the feat/parallel-tool-execution branch March 6, 2026 14:28

greptile-apps bot reviewed Mar 6, 2026

View reviewed changes

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release ai-company 0.1.1 #282

Merged

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release 0.1.0 #283

Merged

This was referenced Mar 15, 2026

chore(main): release 0.2.4 #431

Merged

chore(main): release 0.2.0 #442

Closed

chore(main): release 0.2.5 #447

Merged

chore(main): release 0.2.0 #460

Closed

chore(main): release 0.2.0 #471

Closed

		# Sequential would take >= 0.3s; generous bound for CI overhead
		assert elapsed < 0.28

Conversation

Aureliolo commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Source (src/ai_company/tools/invoker.py)

Tests (tests/unit/tools/test_invoker.py, conftest.py)

Docs (DESIGN_SPEC.md)

Test Plan

Review Coverage

Uh oh!

github-actions bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

coderabbitai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

greptile-apps bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aureliolo commented Mar 6, 2026 •

edited

Loading

Source (`src/ai_company/tools/invoker.py`)

Tests (`tests/unit/tools/test_invoker.py`, `conftest.py`)

Docs (`DESIGN_SPEC.md`)

github-actions bot commented Mar 6, 2026 •

edited

Loading

coderabbitai bot commented Mar 6, 2026 •

edited

Loading

greptile-apps bot commented Mar 6, 2026 •

edited

Loading