feat: parallel tool execution in ToolInvoker.invoke_all#137
Conversation
Replace sequential list comprehension with asyncio.TaskGroup for concurrent tool execution. Add optional max_concurrency semaphore. Fatal errors (MemoryError/RecursionError) are collected via guarded wrapper and re-raised after all tasks complete. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-reviewed by 10 agents, 14 findings addressed: Source (invoker.py): - Fix module docstring: BaseException subclasses propagate uncaught, not "after logging" - Replace list[ToolResult | None] with dict[int, ToolResult] to eliminate type: ignore - Use contextlib.nullcontext to remove duplicated invoke call in _run_guarded - Add logging in _run_guarded catch block for orchestration-level context - Extend _run_guarded docstring to document BaseException propagation - Clarify Raises docstring wording for MemoryError/RecursionError - Change _no_remote_retrieve return type to Never - Change logger.exception to logger.error in helper methods - Add safety comment for shared mutable state in asyncio context Tests (test_invoker.py): - Add test for mixed fatal + successful tool calls - Verify peak=1 in test_max_concurrency_one_sequential via tracking tool - Strengthen peak > 1 assertion to peak >= 3 - Widen timing bound from 0.25s to 0.28s for CI resilience Docs (DESIGN_SPEC.md): - Fix stale "sequential execution" comment in §15.3 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughSummary by CodeRabbit
WalkthroughToolInvoker.invoke_all was changed from sequential to concurrent execution using asyncio.TaskGroup with an optional max_concurrency limit; recoverable ToolResult errors no longer cancel siblings, while non-recoverable errors are collected and re-raised (single as bare exception, multiple as ExceptionGroup). New observability events and concurrency tests were added. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Invoker as ToolInvoker
participant TG as TaskGroup
participant Sem as Semaphore
participant Tools as Tools
participant Collector as ErrorCollector
Client->>Invoker: invoke_all(tool_calls, max_concurrency=2)
Invoker->>Sem: create Semaphore(2) (if bounded)
Invoker->>TG: create TaskGroup
par Concurrent execution
loop per tool_call
TG->>Invoker: create _run_guarded task
Invoker->>Sem: acquire (if sem present)
Sem-->>Invoker: acquired
Invoker->>Tools: invoke(tool_call)
alt success
Tools-->>Invoker: ToolResult(success)
Invoker->>Invoker: store result[idx]
else recoverable error
Tools-->>Invoker: ToolResult(is_error=True)
Invoker->>Invoker: store error result[idx]
else non-recoverable error
Tools-->>Collector: MemoryError/RecursionError
Collector-->>Invoker: record fatal error
Invoker->>Invoker: store placeholder result[idx]
end
Invoker->>Sem: release (if used)
end
end
TG-->>Invoker: all tasks complete
alt fatal errors collected
Collector->>Invoker: fatal_errors
alt single error
Invoker-->>Client: raise bare exception
else multiple errors
Invoker-->>Client: raise ExceptionGroup(fatal_errors)
end
else no fatal errors
Invoker-->>Client: return tuple(results in input order)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This PR migrates ToolInvoker.invoke_all from sequential execution to concurrent execution using asyncio.TaskGroup, with an optional max_concurrency semaphore parameter. Fatal errors (MemoryError, RecursionError) are collected by a _run_guarded wrapper and re-raised after all tasks complete.
Changes:
- Replaced the sequential loop in
invoke_allwithasyncio.TaskGroupfor structured concurrency, adding a_run_guardedhelper to collect fatal errors without canceling siblings - Added
max_concurrencyparameter backed byasyncio.Semaphorewith input validation - Added comprehensive test suites for concurrency behavior, bounded parallelism, and edge cases, plus updated design docs
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai_company/tools/invoker.py |
Core implementation: _run_guarded wrapper, concurrent invoke_all with TaskGroup and semaphore, _no_remote_retrieve return type fix, logging corrections |
src/ai_company/observability/events/tool.py |
New event constants for invoke_all start/complete logging |
tests/unit/tools/conftest.py |
New _DelayTool, _ConcurrencyTrackingTool fixtures and concurrency_invoker fixture |
tests/unit/tools/test_invoker.py |
New test classes for concurrency, bounded parallelism, and edge cases |
DESIGN_SPEC.md |
Updated docs to reflect concurrent execution model and "Adopted" status |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/unit/tools/test_invoker.py
Outdated
| # Sequential would take >= 0.3s; generous bound for CI overhead | ||
| assert elapsed < 0.28 |
There was a problem hiding this comment.
The timing assertion elapsed < 0.28 is unreliable: three tasks each sleeping 0.1s concurrently should take ~0.1s, but 0.28 is dangerously close to the 0.3s sequential threshold. On a heavily loaded CI runner, asyncio task scheduling overhead could push elapsed time past 0.28s while still being concurrent. Consider using a more generous bound like 0.25 (which still proves concurrency vs 0.3s sequential) or, better yet, assert elapsed < 0.25 since the theoretical concurrent time is ~0.1s.
| # Sequential would take >= 0.3s; generous bound for CI overhead | |
| assert elapsed < 0.28 | |
| # Sequential would take >= 0.3s; concurrent should be well under this | |
| assert elapsed < 0.25 |
| @@ -1,14 +1,19 @@ | |||
| """Tests for ToolInvoker.""" | |||
|
|
|||
| import time | |||
There was a problem hiding this comment.
time is imported but only time.monotonic() is used in test_concurrent_faster_than_sequential. For async tests, consider using the event loop's time (asyncio.get_event_loop().time()) for consistency, though time.monotonic() works fine for wall-clock measurements. This is a minor point — just noting the import is used only in one test.
| except (MemoryError, RecursionError) as exc: | ||
| logger.warning( | ||
| TOOL_INVOKE_NON_RECOVERABLE, | ||
| tool_call_id=tool_call.id, | ||
| tool_name=tool_call.name, | ||
| index=index, | ||
| error=f"{type(exc).__name__}: {exc}", | ||
| ) | ||
| fatal_errors.append(exc) |
There was a problem hiding this comment.
In _run_guarded, MemoryError and RecursionError are caught and silently collected, but self.invoke() already catches these same exceptions, logs them with logger.exception, and re-raises. This means the _run_guarded catch will see them after they've already been logged with a full traceback via logger.exception in _execute_tool / _validate_params. The logger.warning here then logs a second time at a lower severity. This is intentional dual-logging (once at the origin, once at the orchestration level), but the orchestration log at warning level is misleading — a non-recoverable error is more severe than a warning. Consider using logger.error to be consistent with the severity of the event.
| logger.info( | ||
| TOOL_INVOKE_ALL_COMPLETE, | ||
| count=len(calls), | ||
| fatal_count=len(fatal_errors), | ||
| ) | ||
|
|
||
| if fatal_errors: | ||
| if len(fatal_errors) == 1: | ||
| raise fatal_errors[0] |
There was a problem hiding this comment.
When fatal_errors is non-empty, the TOOL_INVOKE_ALL_COMPLETE log at info level fires right before raising. This is fine for observability, but note that when fatal errors occur, results may be incomplete (missing entries for fatal slots). If a caller ever catches the raised exception and somehow tries to inspect partial results, they won't have access. This is acceptable given the current design, but consider documenting that no partial results are available on fatal error.
| @pytest.fixture | ||
| def concurrency_tracking_tool() -> _ConcurrencyTrackingTool: | ||
| """Standalone tracking tool for direct peak inspection.""" | ||
| return _ConcurrencyTrackingTool() |
There was a problem hiding this comment.
The concurrency_tracking_tool fixture creates a new instance, and the concurrency_invoker fixture receives it as a parameter and includes it in the tool list. However, the _ConcurrencyTrackingTool creates its asyncio.Lock() at __init__ time. If tests run under different event loops (e.g., with pytest-asyncio creating a new loop per test), the lock will be bound to the wrong loop in Python 3.9 (though this is fine in 3.10+ where locks are not bound to a loop). More importantly, the reset() method is never called between tests — since the fixture has no explicit scope and defaults to function scope, a new instance is created per test, so this is actually fine. The reset() method appears to be dead code.
| for i in range(5) | ||
| ] | ||
| await concurrency_invoker.invoke_all(calls) | ||
| assert concurrency_tracking_tool.peak >= 3 |
There was a problem hiding this comment.
This assertion (peak >= 3) relies on the asyncio scheduler launching at least 3 of 5 tasks before any completes their 0.05s sleep. While this typically holds, it's not guaranteed — under extreme system load or slow CI, the scheduler might not start the third task before the first completes. This makes the test potentially flaky. Consider increasing the sleep duration (e.g., 0.1s) or lowering the threshold (e.g., peak >= 2) to make it more robust while still proving unbounded concurrency exceeds the bounded case.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/tools/test_invoker.py`:
- Around line 445-464: The timing assertion in
test_concurrent_faster_than_sequential is too tight and can flake under CI;
update the assertion on elapsed (measured around concurrency_invoker.invoke_all)
to a more generous bound (e.g., assert elapsed < 0.5) or use a ratio-based check
(e.g., elapsed < 0.3 * 0.9) so the test still verifies concurrency without
failing due to scheduling variance.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5a032285-ec51-49c6-b32c-6eaf04ce25fe
📒 Files selected for processing (5)
DESIGN_SPEC.mdsrc/ai_company/observability/events/tool.pysrc/ai_company/tools/invoker.pytests/unit/tools/conftest.pytests/unit/tools/test_invoker.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotationsin Python code — Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) for multiple exception handling in Python 3.14 per PEP 758
All public functions and classes must have type hints; mypy strict mode is enforced
Docstrings must follow Google style and are required on all public classes and functions (enforced by ruff D rules)
Create new objects rather than mutating existing ones — use copy.deepcopy() at construction and MappingProxyType wrapping for read-only enforcement in non-Pydantic internal collections
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models for runtime state that evolves
Use Pydantic v2 BaseModel, model_validator, computed_field, and ConfigDict; use@computed_fieldfor derived values instead of storing redundant fields
Use NotBlankStr from core.types for all identifier and name fields instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over bare create_task
Line length must not exceed 88 characters (ruff enforced)
Functions must be less than 50 lines and files must be less than 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
tests/unit/tools/test_invoker.pysrc/ai_company/observability/events/tool.pysrc/ai_company/tools/invoker.pytests/unit/tools/conftest.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use fake model IDs and names in tests (e.g. test-haiku-001, test-provider), never real vendor model IDs — tests must not be coupled to external providers
Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow
Files:
tests/unit/tools/test_invoker.pytests/unit/tools/conftest.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic must have: from ai_company.observability import get_logger then logger = get_logger(name); never use import logging or print()
Always use logger variable name (not _logger or log)
Use event name constants from ai_company.observability.events domain-specific modules instead of string literals
Always use structured logging with logger.info(EVENT, key=value) format; never use logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG level for object creation, internal flow, and entry/exit of key functions
Files:
src/ai_company/observability/events/tool.pysrc/ai_company/tools/invoker.py
🧠 Learnings (10)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Applies to **/*.py : Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over bare create_task
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Use `threading.Event` instead of `time.sleep()` for thread synchronization in test code.
Applied to files:
tests/unit/tools/test_invoker.pytests/unit/tools/conftest.py
📚 Learning: 2026-03-06T13:28:37.851Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from ai_company.observability.events domain-specific modules instead of string literals
Applied to files:
src/ai_company/observability/events/tool.py
📚 Learning: 2026-03-06T13:28:37.851Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Applies to **/*.py : Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls) over bare create_task
Applied to files:
src/ai_company/tools/invoker.pyDESIGN_SPEC.md
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`
Applied to files:
tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures
Applied to files:
tests/unit/tools/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`
Applied to files:
tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use `async def` for async test functions; pytest-asyncio is configured with `asyncio_mode = "auto"`; mark async tests with `pytest.mark.asyncio` if needed
Applied to files:
tests/unit/tools/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings
Applied to files:
tests/unit/tools/conftest.py
📚 Learning: 2026-03-06T13:28:37.851Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-06T13:28:37.851Z
Learning: Set asyncio_mode = 'auto' in pytest configuration; do not use manual pytest.mark.asyncio
Applied to files:
tests/unit/tools/conftest.py
🔇 Additional comments (14)
src/ai_company/observability/events/tool.py (1)
20-21: LGTM!The new event constants follow the established naming convention and are consistent with existing tool events.
tests/unit/tools/conftest.py (3)
316-340: LGTM!The
_DelayToolcorrectly implements the asyncexecutemethod and usesasyncio.sleepfor non-blocking delays, making it suitable for testing concurrent execution.
343-384: LGTM!The
_ConcurrencyTrackingToolcorrectly usesasyncio.Lockto protect the counter updates and track peak concurrency. The lock is appropriate here sinceasyncio.Lockis non-blocking within the async context.
387-405: LGTM!The fixtures are well-structured. Using
concurrency_tracking_toolas a dependency inconcurrency_invokerensures the same tool instance is used for both invocation and peak inspection in tests.DESIGN_SPEC.md (2)
1476-1477: LGTM!The documentation accurately describes the concurrent execution model, including the
max_concurrencyparameter, recoverable vs non-recoverable error handling, and theExceptionGroupbehavior for multiple fatal errors.
2248-2248: LGTM!The engineering conventions table correctly reflects the adopted parallel tool execution pattern with
asyncio.TaskGroupand optionalmax_concurrencysemaphore.src/ai_company/tools/invoker.py (4)
5-7: LGTM!The module docstring accurately reflects the updated error handling semantics: recoverable errors return
ToolResult(is_error=True), non-recoverable errors are logged and re-raised, andBaseExceptionsubclasses propagate uncaught.
47-49: LGTM!The
Neverreturn type correctly documents that this function always raises and never returns normally.
328-355: LGTM!The
_run_guardedwrapper correctly:
- Uses
nullcontext()when no semaphore is provided, avoiding code duplication- Catches only
MemoryError/RecursionErrorto preventTaskGroupcancellation- Allows
BaseExceptionsubclasses (KeyboardInterrupt,CancelledError) to propagate for proper cancellation semantics- Stores results by index for order preservation
384-425: Validatesmax_concurrencyand properly re-raises fatal errors.The implementation is correct:
- Validates
max_concurrency >= 1with a clear error message- Early return for empty input avoids unnecessary TaskGroup creation
- Uses
dict[int, ToolResult]for type-safe result storage- Re-raises single fatal error bare, multiple as
ExceptionGroup- Results tuple preserves input order via index lookup
tests/unit/tools/test_invoker.py (4)
3-15: LGTM!The imports are appropriately organized with
timefor timing assertions,IteratorinTYPE_CHECKINGblock for type hints, and the tracking tool import for fixture type annotations.
489-557: LGTM!Comprehensive test coverage for error handling:
- Recoverable errors don't cancel siblings
- Single fatal error raises bare exception
- Mixed fatal and success still raises
- Multiple fatal errors raise
ExceptionGroupwith correct exception count
564-627: LGTM!The bounded concurrency tests correctly verify:
max_concurrency=1enforces sequential execution (peak=1)max_concurrency=2bounds parallelism (peak≤2)max_concurrency=Noneallows unbounded parallelism- Invalid values (0, negative) raise
ValueError
630-667: LGTM!Edge case coverage is thorough:
- Single-element input
- Generator input (non-list iterable)
- Empty input with
max_concurrencyparameter
Greptile SummaryThis PR migrates Key observations:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant invoke_all
participant TaskGroup
participant _run_guarded
participant Semaphore
participant invoke
Caller->>invoke_all: invoke_all(tool_calls, max_concurrency=N)
invoke_all->>invoke_all: validate max_concurrency, list(tool_calls)
invoke_all->>invoke_all: log TOOL_INVOKE_ALL_START
invoke_all->>TaskGroup: async with TaskGroup() as tg
loop for each (idx, call)
TaskGroup->>_run_guarded: tg.create_task(_run_guarded(idx, call, ...))
end
par concurrent execution
_run_guarded->>Semaphore: async with semaphore (or nullcontext)
Semaphore-->>_run_guarded: acquired
_run_guarded->>invoke: await self.invoke(tool_call)
alt recoverable error
invoke-->>_run_guarded: ToolResult(is_error=True)
_run_guarded->>_run_guarded: results[idx] = result
else fatal error (MemoryError/RecursionError)
invoke-->>_run_guarded: raises MemoryError/RecursionError
_run_guarded->>_run_guarded: fatal_errors.append(exc)
else BaseException (KeyboardInterrupt, CancelledError)
invoke-->>_run_guarded: raises BaseException
_run_guarded-->>TaskGroup: propagates → cancels siblings
end
_run_guarded->>Semaphore: release
end
TaskGroup-->>invoke_all: all tasks complete
invoke_all->>invoke_all: log TOOL_INVOKE_ALL_COMPLETE (fatal_count)
alt no fatal errors
invoke_all-->>Caller: tuple(results[i] for i in range(len(calls)))
else exactly 1 fatal error
invoke_all-->>Caller: raise fatal_errors[0] (bare exception)
else multiple fatal errors
invoke_all-->>Caller: raise ExceptionGroup(fatal_errors)
end
Last reviewed commit: 7583a1c |
…, and Greptile - Remove duplicate TOOL_INVOKE_NON_RECOVERABLE log in _run_guarded (upstream logger.exception already records full context + traceback) - Use logger.exception in _unexpected_validation_result to preserve traceback for unexpected validation errors - Relax timing assertion in test_concurrent_faster_than_sequential from 0.28s to 0.5s to prevent CI flakiness - Rename test_empty_schema_skips_validation to test_no_schema_skips_validation (tests None, not empty dict) - Fix misleading "atomic counter" docstring to "lock-guarded counter" - Expand extended_invoker fixture docstring to enumerate tools - Remove unused _ConcurrencyTrackingTool.reset() dead code - Fix imprecise deepcopy test comment - Fix misleading "generous bound" comment Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| logger.info( | ||
| TOOL_INVOKE_ALL_COMPLETE, | ||
| count=len(calls), | ||
| fatal_count=len(fatal_errors), | ||
| ) |
There was a problem hiding this comment.
info-level completion log emitted even when fatal errors will be raised
TOOL_INVOKE_ALL_COMPLETE is logged at info regardless of fatal_count. Any log aggregator or dashboard that counts TOOL_INVOKE_ALL_COMPLETE at info as a success metric will miscount — the event is emitted, and then an exception is immediately raised, making the operation an effective failure from the caller's perspective.
Consider using a conditional log level when fatal_errors is non-empty:
| logger.info( | |
| TOOL_INVOKE_ALL_COMPLETE, | |
| count=len(calls), | |
| fatal_count=len(fatal_errors), | |
| ) | |
| if fatal_errors: | |
| logger.warning( | |
| TOOL_INVOKE_ALL_COMPLETE, | |
| count=len(calls), | |
| fatal_count=len(fatal_errors), | |
| ) | |
| else: | |
| logger.info( | |
| TOOL_INVOKE_ALL_COMPLETE, | |
| count=len(calls), | |
| fatal_count=0, | |
| ) |
This ensures that monitoring systems observing TOOL_INVOKE_ALL_COMPLETE at info level accurately reflects clean completions only.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/invoker.py
Line: 406-410
Comment:
**`info`-level completion log emitted even when fatal errors will be raised**
`TOOL_INVOKE_ALL_COMPLETE` is logged at `info` regardless of `fatal_count`. Any log aggregator or dashboard that counts `TOOL_INVOKE_ALL_COMPLETE` at `info` as a success metric will miscount — the event is emitted, and then an exception is immediately raised, making the operation an effective failure from the caller's perspective.
Consider using a conditional log level when `fatal_errors` is non-empty:
```suggestion
if fatal_errors:
logger.warning(
TOOL_INVOKE_ALL_COMPLETE,
count=len(calls),
fatal_count=len(fatal_errors),
)
else:
logger.info(
TOOL_INVOKE_ALL_COMPLETE,
count=len(calls),
fatal_count=0,
)
```
This ensures that monitoring systems observing `TOOL_INVOKE_ALL_COMPLETE` at `info` level accurately reflects clean completions only.
How can I resolve this? If you propose a fix, please make it concise.| if fatal_errors: | ||
| if len(fatal_errors) == 1: | ||
| raise fatal_errors[0] | ||
| msg = "multiple non-recoverable tool errors" | ||
| raise ExceptionGroup(msg, fatal_errors) |
There was a problem hiding this comment.
Asymmetric exception type makes caller handling error-prone
When exactly one fatal error is collected, invoke_all re-raises a bare MemoryError or RecursionError. When two or more occur, it raises an ExceptionGroup. This means every caller that needs to handle fatal errors from invoke_all must cover two structurally different exception types:
try:
results = await invoker.invoke_all(calls)
except (MemoryError, RecursionError):
# single-error path
...
except ExceptionGroup as eg:
# multi-error path
...If a caller only writes the first except clause (which is the obvious, natural thing to do), it silently misses the ExceptionGroup case and the errors go unhandled.
A consistent approach — always wrapping in ExceptionGroup, even for a single error — would let callers use a single except ExceptionGroup branch and introspect the contained exceptions uniformly. The PR description's own "Robust error handling" bullet even describes the asymmetry as a feature, but the caller-side complexity it creates is real. At minimum the docstring should warn that callers must handle both exception shapes.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/tools/invoker.py
Line: 412-416
Comment:
**Asymmetric exception type makes caller handling error-prone**
When exactly one fatal error is collected, `invoke_all` re-raises a bare `MemoryError` or `RecursionError`. When two or more occur, it raises an `ExceptionGroup`. This means every caller that needs to handle fatal errors from `invoke_all` must cover two structurally different exception types:
```python
try:
results = await invoker.invoke_all(calls)
except (MemoryError, RecursionError):
# single-error path
...
except ExceptionGroup as eg:
# multi-error path
...
```
If a caller only writes the first `except` clause (which is the obvious, natural thing to do), it silently misses the `ExceptionGroup` case and the errors go unhandled.
A consistent approach — always wrapping in `ExceptionGroup`, even for a single error — would let callers use a single `except ExceptionGroup` branch and introspect the contained exceptions uniformly. The PR description's own "Robust error handling" bullet even describes the asymmetry as a feature, but the caller-side complexity it creates is real. At minimum the docstring should warn that callers must handle **both** exception shapes.
How can I resolve this? If you propose a fix, please make it concise.| ) -> ToolExecutionResult: | ||
| async with self._lock: | ||
| self._current += 1 | ||
| self._peak = max(self._peak, self._current) | ||
| await asyncio.sleep(arguments["duration"]) | ||
| async with self._lock: | ||
| self._current -= 1 | ||
| return ToolExecutionResult(content=str(self._peak)) |
There was a problem hiding this comment.
_ConcurrencyTrackingTool.peak value returned as content may be stale
Inside execute, the content returned is str(self._peak) captured at the moment the individual task finishes, not after all concurrent tasks have exited. Because sibling tasks are still running when a given task releases the lock, the _peak value written to content may be lower than the true final peak:
await asyncio.sleep(arguments["duration"]) # other tasks still running
async with self._lock:
self._current -= 1
return ToolExecutionResult(content=str(self._peak)) # stale peakThe tests don't assert on result.content for the tracking tool (they correctly read concurrency_tracking_tool.peak after all invocations), so this doesn't affect test correctness today. However, if content is ever used as evidence of peak concurrency elsewhere, it will silently give wrong numbers. Consider capturing peak from the shared state after all tasks complete, or removing it from the result content altogether since the fixture exposes .peak directly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tests/unit/tools/conftest.py
Line: 374-381
Comment:
**`_ConcurrencyTrackingTool.peak` value returned as content may be stale**
Inside `execute`, the content returned is `str(self._peak)` captured at the moment the individual task finishes, not after all concurrent tasks have exited. Because sibling tasks are still running when a given task releases the lock, the `_peak` value written to `content` may be lower than the true final peak:
```python
await asyncio.sleep(arguments["duration"]) # other tasks still running
async with self._lock:
self._current -= 1
return ToolExecutionResult(content=str(self._peak)) # stale peak
```
The tests don't assert on `result.content` for the tracking tool (they correctly read `concurrency_tracking_tool.peak` after all invocations), so this doesn't affect test correctness today. However, if `content` is ever used as evidence of peak concurrency elsewhere, it will silently give wrong numbers. Consider capturing peak from the shared state after all tasks complete, or removing it from the result content altogether since the fixture exposes `.peak` directly.
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
ToolInvoker.invoke_allnow usesasyncio.TaskGroupfor structured concurrency instead of sequential executionmax_concurrencyparameter (viaasyncio.Semaphore) bounds parallelism when neededMemoryError,RecursionError) are collected via_run_guardedwrapper and re-raised after all tasks complete (bare exception for one,ExceptionGroupfor multiple)dict[int, ToolResult]instead oflist[ToolResult | None]— eliminates# type: ignoreand provides fail-loudKeyErroron missing slotsChanges
Source (
src/ai_company/tools/invoker.py)invoke_allmigrated from sequential loop toasyncio.TaskGroup_run_guardedhelper withcontextlib.nullcontextpattern, extended docstring, orchestration-level logging_no_remote_retrievereturn type →Neverlogger.exception→logger.errorin helper methods (not in exception context)Tests (
tests/unit/tools/test_invoker.py,conftest.py)_DelayTooland_ConcurrencyTrackingToolfixturesTestInvokeAllConcurrency: timing proof, order preservation, recoverable/fatal error handling, mixed fatal+successTestInvokeAllBounded: semaphore enforcement (peak=1, peak≤2), unbounded peak≥3, validationTestInvokeAllEdgeCases: single call, generator input, empty with max_concurrencyDocs (
DESIGN_SPEC.md)Test Plan
uv run ruff check src/ tests/— all checks passeduv run mypy src/ tests/— no issues in 210 filesuv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80— 1830 passed, 94.97% coverageReview Coverage
Pre-reviewed by 10 agents: code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency. 14 findings implemented, 11 triaged as skip (pre-existing design decisions, over-engineering, correct-by-design behavior).
Closes #112