test: add provider adapter integration tests by Aureliolo · Pull Request #90 · Aureliolo/synthorg

Aureliolo · 2026-03-01T16:46:02Z

Summary

Adds 39 integration tests for the provider adapter layer, completing the final unchecked acceptance criterion from Implement provider adapter layer (direct API or via LiteLLM) #5: "Integration tests with mock/recorded API responses"
All source code was already implemented in PRs feat: design unified provider interface #86 and feat: implement LiteLLM driver and provider registry #88 — this PR covers only the integration test suite
Mocks at litellm.acompletion level using real litellm.ModelResponse objects (not MagicMock), exercising actual attribute access paths through _map_response, _process_chunk, and extract_tool_calls

Test files

File	Tests	Coverage
`test_anthropic_pipeline.py`	13	Config→registry→complete/stream, alias resolution, cost computation, streaming
`test_openrouter_pipeline.py`	5	Custom base_url forwarding, model prefixing, multi-model alias resolution
`test_ollama_pipeline.py`	4	No api_key, localhost base_url, zero-cost models
`test_error_scenarios.py`	9	Rate limit (429 + retry-after), auth (401), timeout, connection, internal, unknown
`test_tool_calling_pipeline.py`	8	Single/multiple tool calls, streaming accumulation, mixed text+tools, multi-turn
`conftest.py`	—	Config factories, real ModelResponse builders, stream helpers

Verification

ruff check — all passed
ruff format — all formatted
mypy — 0 errors (7 files)
pytest — 1331 total tests pass, 94.49% coverage (80% required)

Closes #5

Test plan

CI passes (lint + type-check + test + coverage)
39 integration tests pass under pytest -m integration
No regressions in existing 1292 unit tests
Coverage remains above 80% threshold

39 integration tests exercising the full provider pipeline (config → registry → driver.complete/stream) with real litellm ModelResponse objects mocked at the acompletion level. - test_anthropic_pipeline: 13 tests (alias resolution, cost, streaming) - test_openrouter_pipeline: 5 tests (base_url, model prefix, multi-model) - test_ollama_pipeline: 4 tests (no api_key, localhost, zero-cost) - test_error_scenarios: 9 tests (rate limit, auth, timeout, connection) - test_tool_calling_pipeline: 8 tests (single/multi tool calls, streaming)

coderabbitai · 2026-03-01T16:46:17Z

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

📥 Commits

Reviewing files that changed from the base of the PR and between 22ec82f and 0531eed.

📒 Files selected for processing (6)

tests/integration/providers/conftest.py
tests/integration/providers/test_anthropic_pipeline.py
tests/integration/providers/test_error_scenarios.py
tests/integration/providers/test_ollama_pipeline.py
tests/integration/providers/test_openrouter_pipeline.py
tests/integration/providers/test_tool_calling_pipeline.py

_{✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.}

📝 Walkthrough

Walkthrough

The PR introduces a comprehensive integration test suite for the provider adapter layer, covering end-to-end pipelines for multiple providers (Anthropic, OpenRouter, Ollama), error mapping scenarios, streaming behavior, and tool-calling functionality using mocked litellm responses.

Changes

Cohort / File(s)	Summary
Test Infrastructure `tests/integration/providers/__init__.py`, `tests/integration/providers/conftest.py`	Module docstring and comprehensive pytest fixtures including provider config factories (anthropic, openrouter, ollama), ModelResponse builders for streaming/non-streaming scenarios, async iterator helpers, and standard message/tool definition fixtures.
Provider Pipeline Tests `tests/integration/providers/test_anthropic_pipeline.py`, `tests/integration/providers/test_ollama_pipeline.py`, `tests/integration/providers/test_openrouter_pipeline.py`	End-to-end integration tests for provider drivers validating configuration resolution, API key/base URL forwarding, cost calculation, finish reason mapping, usage token propagation, and model alias resolution through patched litellm calls.
Error Mapping Tests `tests/integration/providers/test_error_scenarios.py`	Cross-provider error scenario tests validating LiteLLM exception mapping to provider errors (RateLimitError, AuthenticationError, Timeout, ConnectionError, InternalServerError) with metadata like is_retryable and retry_after, covering both non-streaming and streaming paths.
Tool Calling Tests `tests/integration/providers/test_tool_calling_pipeline.py`	Integration tests for end-to-end tool-calling behavior including single/multiple tool call extraction, argument parsing, streaming tool-call accumulation with TOOL_CALL_DELTA events, mixed content/tool-call streaming, and multi-turn conversations with tool results.

Possibly related PRs

feat: implement LiteLLM driver and provider registry #88: Introduces the LiteLLM driver, provider registry, and mapper functions that these integration tests directly exercise and validate.
feat: design unified provider interface #86: Defines the provider interface types and error classes that these integration tests target and verify through end-to-end scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 96.72% which is insufficient. The required threshold is 100.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'test: add provider adapter integration tests' accurately summarizes the main change—adding integration tests for the provider adapter layer, as confirmed by the detailed changeset.
Description check	✅ Passed	The description clearly relates to the changeset, providing a structured overview of 39 integration tests across 5 test files and conftest.py, verification results, and reference to issue `#5`.
Linked Issues check	✅ Passed	The PR successfully fulfills the primary acceptance criterion from issue `#5` by implementing integration tests with mock/recorded API responses, covering all major error scenarios and provider pipelines.
Out of Scope Changes check	✅ Passed	All changes are focused on adding integration tests for the provider adapter layer. No unrelated code modifications, refactoring, or feature changes are present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch test/provider-integration-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-01T16:46:22Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the testing suite by introducing 39 new integration tests for the provider adapter layer. These tests ensure the robustness and correctness of the LLM integration across various providers and scenarios, including streaming, error handling, and tool calling. By mocking at the litellm.acompletion level with real ModelResponse objects, the tests accurately reflect the system's behavior, providing high confidence in the adapter's functionality.

Highlights

Comprehensive Integration Test Suite: Added 39 new integration tests for the provider adapter layer, fulfilling a key acceptance criterion from issue Implement provider adapter layer (direct API or via LiteLLM) #5.
Realistic Mocking Strategy: Implemented tests to mock at the litellm.acompletion level using real litellm.ModelResponse objects, ensuring realistic attribute access paths through core processing functions.
Provider-Specific Pipeline Coverage: Covered Anthropic, OpenRouter, and Ollama provider pipelines, including tests for configuration, model alias resolution, cost computation, and streaming behavior.
Robust Error Handling Verification: Introduced comprehensive error scenario tests for rate limits (429), authentication (401), timeouts, connection issues, internal server errors (500), and unknown exceptions, verifying correct mapping to custom ProviderError subclasses and retryable flags.
Advanced Tool Calling Integration: Developed integration tests for tool calling functionality, verifying tool definition forwarding, single/multiple tool call extraction, streaming accumulation of tool calls, and multi-turn tool conversations.

Changelog

tests/integration/providers/init.py
- Initialized the providers directory as a Python package for integration tests.
tests/integration/providers/conftest.py
- Introduced shared pytest fixtures and utility functions for building mock litellm.ModelResponse objects and provider configurations.
tests/integration/providers/test_anthropic_pipeline.py
- Added integration tests for the Anthropic provider, covering configuration, model alias resolution, cost computation, and streaming.
tests/integration/providers/test_error_scenarios.py
- Implemented integration tests to verify the mapping of various litellm exceptions to custom ProviderError subclasses, including retryable flags.
tests/integration/providers/test_ollama_pipeline.py
- Added integration tests for the Ollama provider, focusing on no API key scenarios, localhost base URL forwarding, and zero-cost models.
tests/integration/providers/test_openrouter_pipeline.py
- Included integration tests for the OpenRouter provider, validating custom base URL, model prefixing, and multi-model alias resolution.
tests/integration/providers/test_tool_calling_pipeline.py
- Developed integration tests for tool calling functionality, covering single/multiple calls, streaming accumulation, and multi-turn conversations.

Activity

Authored 39 new integration tests for the provider adapter layer.
Ensured all source code for the adapter was already implemented in PRs feat: design unified provider interface #86 and feat: implement LiteLLM driver and provider registry #88.
Verified ruff check and ruff format passed.
Confirmed mypy reported 0 errors across 7 files.
Executed pytest, resulting in 1331 total tests passing and 94.49% coverage.
Closed issue Implement provider adapter layer (direct API or via LiteLLM) #5, addressing the acceptance criterion for integration tests with mock/recorded API responses.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a comprehensive suite of 39 integration tests for the provider adapter layer. The tests are well-structured, covering various providers (Anthropic, OpenRouter, Ollama), different features like streaming and tool calling, and a wide range of error scenarios. The approach of mocking at the litellm.acompletion level and using real litellm.ModelResponse objects is excellent for verifying the response mapping logic. Overall, this is a high-quality addition that significantly improves confidence in the provider layer. I have one suggestion to enhance the multi-turn tool conversation test to make it more robust and reflect a more standard interaction pattern.

gemini-code-assist · 2026-03-01T16:48:25Z

tests/integration/providers/test_tool_calling_pipeline.py

+async def test_multi_turn_tool_conversation(
+    sample_tool_definitions: list[ToolDefinition],
+) -> None:
+    """Multi-turn: user -> assistant(tool_call) -> tool_result -> assistant."""
+    driver = _make_driver()
+
+    # Turn 1: user asks, model calls tool
+    messages_t1 = [
+        ChatMessage(role=MessageRole.USER, content="What's the weather?"),
+    ]
+    tc = build_tool_call_dict(
+        call_id="call_w1",
+        name="get_weather",
+        arguments='{"location": "Tokyo"}',
+    )
+    mock_resp_t1 = build_model_response(
+        content=None,
+        tool_calls=[tc],
+        finish_reason="tool_calls",
+    )
+    with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mock_resp_t1):
+        result_t1 = await driver.complete(
+            messages_t1, "sonnet", tools=sample_tool_definitions
+        )
+
+    assert result_t1.finish_reason == FinishReason.TOOL_USE
+    assert len(result_t1.tool_calls) == 1
+
+    # Turn 2: include tool result, model responds with text
+    messages_t2 = [
+        ChatMessage(role=MessageRole.USER, content="What's the weather?"),
+        ChatMessage(
+            role=MessageRole.ASSISTANT,
+            tool_calls=(
+                ToolCall(
+                    id="call_w1",
+                    name="get_weather",
+                    arguments={"location": "Tokyo"},
+                ),
+            ),
+        ),
+        ChatMessage(
+            role=MessageRole.TOOL,
+            tool_result=ToolResult(
+                tool_call_id="call_w1",
+                content="Sunny, 25°C",
+            ),
+        ),
+        ChatMessage(role=MessageRole.USER, content="Tell me the result"),
+    ]
+    mock_resp_t2 = build_model_response(
+        content="It's sunny and 25°C in Tokyo!",
+        finish_reason="stop",
+    )
+    with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mock_resp_t2):
+        result_t2 = await driver.complete(
+            messages_t2, "sonnet", tools=sample_tool_definitions
+        )
+
+    assert result_t2.content == "It's sunny and 25°C in Tokyo!"
+    assert result_t2.finish_reason == FinishReason.STOP
+    assert len(result_t2.tool_calls) == 0


This test is great for verifying a multi-turn tool conversation. However, it could be improved in two ways to be more robust and realistic:

Simplify the conversation flow: The final user message, ChatMessage(role=MessageRole.USER, content="Tell me the result"), is not typical in a tool-use conversation. The assistant should be able to generate a response based on the tool result without an extra prompt. Removing this message would make the test case reflect a more standard interaction pattern.

Assert on forwarded messages: The test currently only asserts on the final CompletionResponse. It would be more thorough to also assert that the conversation history (messages_t2) is correctly formatted and passed to the underlying litellm.acompletion call. This would ensure the message mapping logic for TOOL and ASSISTANT (with tool calls) roles is working as expected.

Here's a suggested implementation that incorporates these points:

async def test_multi_turn_tool_conversation( sample_tool_definitions: list[ToolDefinition], ) -> None: """Multi-turn: user -> assistant(tool_call) -> tool_result -> assistant.""" driver = _make_driver() # Turn 1: user asks, model calls tool messages_t1 = [ ChatMessage(role=MessageRole.USER, content="What's the weather in Tokyo?"), ] tc = build_tool_call_dict( call_id="call_w1", name="get_weather", arguments='{"location": "Tokyo"}', ) mock_resp_t1 = build_model_response( content=None, tool_calls=[tc], finish_reason="tool_calls", ) with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mock_resp_t1): result_t1 = await driver.complete( messages_t1, "sonnet", tools=sample_tool_definitions ) assert result_t1.finish_reason == FinishReason.TOOL_USE assert len(result_t1.tool_calls) == 1 assistant_message = ChatMessage( role=MessageRole.ASSISTANT, content=result_t1.content, tool_calls=result_t1.tool_calls, ) # Turn 2: include tool result, model responds with text messages_t2 = [ *messages_t1, assistant_message, ChatMessage( role=MessageRole.TOOL, tool_result=ToolResult( tool_call_id="call_w1", content="Sunny, 25°C", ), ), ] mock_resp_t2 = build_model_response( content="It's sunny and 25°C in Tokyo!", finish_reason="stop", ) with patch( _PATCH_TARGET, new_callable=AsyncMock, return_value=mock_resp_t2 ) as mock_call_t2: result_t2 = await driver.complete( messages_t2, "sonnet", tools=sample_tool_definitions ) # Assert on final response assert result_t2.content == "It's sunny and 25°C in Tokyo!" assert result_t2.finish_reason == FinishReason.STOP assert not result_t2.tool_calls # Assert on messages forwarded to litellm kwargs = mock_call_t2.call_args.kwargs forwarded_messages = kwargs["messages"] assert len(forwarded_messages) == 3 assert forwarded_messages[0]["role"] == "user" assert forwarded_messages[1]["role"] == "assistant" assert forwarded_messages[1]["tool_calls"][0]["id"] == "call_w1" assert forwarded_messages[2]["role"] == "tool" assert forwarded_messages[2]["tool_call_id"] == "call_w1" assert forwarded_messages[2]["content"] == "Sunny, 25°C"

Copilot

Pull request overview

Adds an integration test suite for the provider adapter layer (LiteLLM-backed drivers + ProviderRegistry), using real litellm.ModelResponse objects while mocking litellm.acompletion to exercise the full mapping pipeline and error translation.

Changes:

Adds end-to-end “pipeline” integration tests for Anthropic, OpenRouter, and Ollama adapters (config → registry → complete/stream).
Adds integration coverage for tool-calling (non-stream + streaming accumulation + multi-turn tool conversations).
Adds integration coverage for LiteLLM exception → ProviderError mapping, including retryability + retry-after parsing.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`tests/integration/providers/test_anthropic_pipeline.py`	Anthropic pipeline coverage: config/aliasing, kwargs forwarding, cost computation, basic streaming behavior.
`tests/integration/providers/test_openrouter_pipeline.py`	OpenRouter behaviors: base_url/api_base forwarding, provider-prefixed model IDs, alias resolution, cost.
`tests/integration/providers/test_ollama_pipeline.py`	Ollama behaviors: no `api_key`, localhost `api_base`, zero-cost pricing, response mapping.
`tests/integration/providers/test_tool_calling_pipeline.py`	Tool definition forwarding + tool call extraction and streaming accumulation; multi-turn tool conversation.
`tests/integration/providers/test_error_scenarios.py`	Cross-provider error mapping coverage from LiteLLM exceptions to internal `ProviderError` hierarchy.
`tests/integration/providers/conftest.py`	Shared integration fixtures and builders for real `ModelResponse` + streaming chunk helpers and provider configs.
`tests/integration/providers/__init__.py`	Package marker for integration provider tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/providers/conftest.py`:
- Around line 120-122: In build_model_response, avoid mutating the message dict
after construction; instead construct message in a single expression that
includes "role" and "content" plus an optional "tool_calls" key only when
tool_calls is not None (reference symbols: build_model_response, message,
tool_calls). Replace the two-step creation/mutation with a single immutable
construction (e.g., combining the base keys with a conditional small dict or
using a dict-union/merge expression) so no post-creation assignment to message
occurs.

In `@tests/integration/providers/test_error_scenarios.py`:
- Line 44: The pytest file currently sets only pytestmark =
pytest.mark.integration; add an explicit 30-second timeout marker for
consistency by changing pytestmark to include the timeout (e.g., set pytestmark
to a list or chained markers such as pytestmark = [pytest.mark.integration,
pytest.mark.timeout(30)]) so the file applies both the integration marker and
the 30s timeout.

In `@tests/integration/providers/test_ollama_pipeline.py`:
- Line 20: The module-level pytest marker only sets pytest.mark.integration and
is missing the required 30-second timeout; update the module-level pytestmark to
include the timeout marker so each test in this file has a 30s bound (e.g.,
change pytestmark to include pytest.mark.timeout(30) alongside
pytest.mark.integration) — look for the pytestmark symbol in
tests/integration/providers/test_ollama_pipeline.py and make it a list
containing both markers.

In `@tests/integration/providers/test_openrouter_pipeline.py`:
- Line 20: The module-level pytest marker declaration only sets pytestmark =
pytest.mark.integration but must also include an explicit 30-second timeout
marker; update the module-level pytestmark to include both markers (e.g., make
pytestmark a list containing pytest.mark.integration and
pytest.mark.timeout(30)) so the test module is marked as integration and has an
explicit 30s timeout.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aeeff8c and 22ec82f.

📒 Files selected for processing (7)

tests/integration/providers/__init__.py
tests/integration/providers/conftest.py
tests/integration/providers/test_anthropic_pipeline.py
tests/integration/providers/test_error_scenarios.py
tests/integration/providers/test_ollama_pipeline.py
tests/integration/providers/test_openrouter_pipeline.py
tests/integration/providers/test_tool_calling_pipeline.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Agent

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do not use from __future__ import annotations — Python 3.14 has PEP 649
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14
Add type hints to all public functions, enforced by mypy strict mode
Use Google style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects instead of mutating existing ones — enforce immutability
Use Pydantic v2 with BaseModel, model_validator, and ConfigDict
Keep line length to 88 characters, enforced by ruff
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files

Files:

tests/integration/providers/test_anthropic_pipeline.py
tests/integration/providers/test_openrouter_pipeline.py
tests/integration/providers/test_tool_calling_pipeline.py
tests/integration/providers/__init__.py
tests/integration/providers/conftest.py
tests/integration/providers/test_ollama_pipeline.py
tests/integration/providers/test_error_scenarios.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Use asyncio_mode = 'auto' in pytest — no manual @pytest.mark.asyncio needed
Set 30-second timeout per test

Files:

tests/integration/providers/test_anthropic_pipeline.py
tests/integration/providers/test_openrouter_pipeline.py
tests/integration/providers/test_tool_calling_pipeline.py
tests/integration/providers/__init__.py
tests/integration/providers/conftest.py
tests/integration/providers/test_ollama_pipeline.py
tests/integration/providers/test_error_scenarios.py

🧠 Learnings (11)

📚 Learning: 2026-01-24T09:54:45.426Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

tests/integration/providers/test_anthropic_pipeline.py
tests/integration/providers/test_tool_calling_pipeline.py
tests/integration/providers/test_ollama_pipeline.py

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/integration/test_*.py : Place integration tests for component interactions in `tests/integration/` directory

Applied to files:

tests/integration/providers/__init__.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

tests/integration/providers/conftest.py

📚 Learning: 2026-02-26T17:43:50.902Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Mock Ollama API responses to support both dict (`models.get("models")`) and object (`response.models`) patterns in test mocks.

Applied to files:

tests/integration/providers/conftest.py

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Mock Ollama API calls in tests to avoid requiring a running Ollama instance

Applied to files:

tests/integration/providers/test_ollama_pipeline.py

📚 Learning: 2026-01-24T16:33:29.354Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py} : Ollama Integration - all AI agents use Ollama for local LLM serving with default endpoint `http://localhost:11434`

Applied to files:

tests/integration/providers/test_ollama_pipeline.py

📚 Learning: 2026-01-24T09:54:45.426Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : In agent tests, mock Ollama API calls using `unittest.mock` and patch `agents.base.ollama.Client`

Applied to files:

tests/integration/providers/test_ollama_pipeline.py

📚 Learning: 2026-01-24T09:54:56.100Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Always mock Ollama API calls in tests - tests should not require a running Ollama instance; use `unittest.mock` for mocking (`patch`, `MagicMock`); mock the Ollama client with: `patch("agents.base.ollama.Client")`

Applied to files:

tests/integration/providers/test_ollama_pipeline.py

📚 Learning: 2026-01-24T16:33:29.354Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/model_service.py} : Respect existing model configuration patterns in Ollama integration

Applied to files:

tests/integration/providers/test_ollama_pipeline.py

📚 Learning: 2026-01-24T16:33:29.354Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)

Applied to files:

tests/integration/providers/test_ollama_pipeline.py

📚 Learning: 2026-01-24T09:54:45.426Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use BaseAgent methods for Ollama integration: `self.generate(prompt)` for LLM calls with built-in retry logic, `self.settings` for model configuration, and `self.client` for Ollama client instance

Applied to files:

tests/integration/providers/test_ollama_pipeline.py

🧬 Code graph analysis (5)

tests/integration/providers/test_anthropic_pipeline.py (4)

src/ai_company/providers/enums.py (2)

FinishReason (15-22)

StreamEventType (25-32)

src/ai_company/providers/models.py (2)

ChatMessage (114-186)

CompletionConfig (189-230)

src/ai_company/providers/registry.py (3)

ProviderRegistry (21-139)

from_config (96-139)

get (56-78)

tests/integration/providers/conftest.py (8)

async_iter_chunks (253-258)

build_content_chunk (158-176)

build_finish_chunk (232-250)

build_model_response (109-138)

build_usage_chunk (179-197)

make_anthropic_config (33-56)

user_messages (265-267)

multi_turn_messages (271-278)

tests/integration/providers/test_openrouter_pipeline.py (3)

src/ai_company/providers/enums.py (1)

FinishReason (15-22)

src/ai_company/providers/registry.py (3)

ProviderRegistry (21-139)

from_config (96-139)

get (56-78)

tests/integration/providers/conftest.py (3)

build_model_response (109-138)

make_openrouter_config (59-83)

user_messages (265-267)

tests/integration/providers/test_tool_calling_pipeline.py (4)

src/ai_company/providers/enums.py (3)

FinishReason (15-22)

MessageRole (6-12)

StreamEventType (25-32)

src/ai_company/providers/models.py (4)

ChatMessage (114-186)

ToolCall (73-95)

ToolDefinition (45-70)

ToolResult (98-111)

src/ai_company/providers/registry.py (3)

ProviderRegistry (21-139)

from_config (96-139)

get (56-78)

src/ai_company/providers/base.py (1)

BaseCompletionProvider (26-273)

tests/integration/providers/conftest.py (3)

src/ai_company/config/schema.py (2)

ProviderConfig (52-95)

ProviderModelConfig (17-49)

src/ai_company/providers/enums.py (1)

MessageRole (6-12)

src/ai_company/providers/models.py (2)

ChatMessage (114-186)

ToolDefinition (45-70)

tests/integration/providers/test_error_scenarios.py (3)

src/ai_company/providers/errors.py (5)

AuthenticationError (72-75)

RateLimitError (78-104)

ProviderTimeoutError (125-128)

ProviderConnectionError (131-134)

ProviderInternalError (137-140)

tests/integration/providers/conftest.py (2)

build_content_chunk (158-176)

make_anthropic_config (33-56)

tests/integration/providers/test_tool_calling_pipeline.py (1)

_make_driver (40-44)

🔇 Additional comments (9)

tests/integration/providers/__init__.py (1)

1-1: Module scaffolding looks good.

Clean and appropriate package-level docstring for the integration test namespace.

tests/integration/providers/conftest.py (1)

33-307: Strong shared test infrastructure.

The config factories plus ModelResponse/stream builders give good integration-level realism while keeping tests deterministic.

tests/integration/providers/test_ollama_pipeline.py (1)

25-97: Ollama pipeline checks are well targeted.

Good coverage of key local-provider behaviors: omitted API key, base URL forwarding, zero-cost usage mapping, and finish/model/request-id mapping.

tests/integration/providers/test_openrouter_pipeline.py (1)

25-124: OpenRouter integration scenarios are comprehensive.

Great coverage of base_url propagation, model transformation, API key forwarding, full mapping, and multi-model alias cost behavior.

tests/integration/providers/test_anthropic_pipeline.py (2)

36-303: Excellent end-to-end coverage for Anthropic adapter behavior.

The suite exercises aliasing, config forwarding, cost mapping, finish reasons, streaming events, and multi-turn message serialization in a realistic way.

28-28: No action needed. This module already complies with the timeout requirement through the global pytest configuration in pyproject.toml which sets timeout = 30 for all tests. Adding an explicit pytest.mark.timeout(30) marker is redundant.

Likely an incorrect or invalid review comment.

tests/integration/providers/test_error_scenarios.py (1)

131-301: Error-mapping integration coverage is strong.

Good depth across retryability flags, retry_after extraction, and streaming/non-streaming exception translation paths.

tests/integration/providers/test_tool_calling_pipeline.py (2)

50-318: Tool-calling integration coverage is excellent.

The suite validates extraction, forwarding, streaming accumulation, mixed event flows, and realistic multi-turn tool-result conversations.

35-35: No action required — 30-second timeout is already enforced globally.

The pytest configuration in pyproject.toml sets timeout = 30 at the global level in [tool.pytest.ini_options], which automatically applies a 30-second timeout to all tests. This satisfies the coding guideline requirement. Adding an explicit pytest.mark.timeout(30) to the pytestmark would be redundant and is not necessary. The file is in compliance.

Likely an incorrect or invalid review comment.

tests/integration/providers/conftest.py

tests/integration/providers/test_error_scenarios.py

tests/integration/providers/test_ollama_pipeline.py

tests/integration/providers/test_openrouter_pipeline.py

…mini - Add pytest.mark.timeout(30) to all 5 integration test files - Build message dict immutably in conftest.py build_model_response - Strengthen error context assertions (provider, model) across all error tests - Add retry_after assertion to streaming rate limit test - Add streaming AuthenticationError test (stream setup failure) - Add streaming ConnectionError test (mid-stream failure) - Add ModelNotFoundError test for unknown model alias - Add ProviderError passthrough test (re-raise without double-wrapping) - Strengthen unknown exception message assertion with full context - Add cost_usd assertion to streaming usage test - Add stop_sequences and top_p to CompletionConfig forwarding test - Add malformed JSON tool call args test (silent degradation to {}) - Assert forwarded messages in multi-turn tool conversation test

🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 1, 2026 16:46

Copilot started reviewing on behalf of Aureliolo March 1, 2026 16:46 View session

gemini-code-assist bot reviewed Mar 1, 2026

View reviewed changes

Copilot AI reviewed Mar 1, 2026

View reviewed changes

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

Aureliolo merged commit 40a61f4 into main Mar 1, 2026
9 of 10 checks passed

Aureliolo deleted the test/provider-integration-tests branch March 1, 2026 17:15

coderabbitai bot mentioned this pull request Mar 6, 2026

feat: add FastestStrategy routing + vendor-agnostic cleanup #140

Merged

5 tasks

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release ai-company 0.1.1 #282

Merged

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release 0.1.0 #283

Merged

This was referenced Mar 15, 2026

chore(main): release 0.2.4 #431

Merged

chore(main): release 0.2.0 #442

Closed

chore(main): release 0.2.5 #447

Merged

chore(main): release 0.2.0 #460

Closed

chore(main): release 0.2.0 #471

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add provider adapter integration tests#90

test: add provider adapter integration tests#90
Aureliolo merged 2 commits intomainfrom
test/provider-integration-tests

Aureliolo commented Mar 1, 2026

Uh oh!

coderabbitai bot commented Mar 1, 2026 •

edited

Loading

Walkthrough

Changes

Possibly related PRs

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Aureliolo commented Mar 1, 2026

Summary

Test files

Verification

Test plan

Uh oh!

coderabbitai bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Possibly related PRs

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Mar 1, 2026 •

edited

Loading