Skip to content

test: add provider adapter integration tests#90

Merged
Aureliolo merged 2 commits intomainfrom
test/provider-integration-tests
Mar 1, 2026
Merged

test: add provider adapter integration tests#90
Aureliolo merged 2 commits intomainfrom
test/provider-integration-tests

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Test files

File Tests Coverage
test_anthropic_pipeline.py 13 Config→registry→complete/stream, alias resolution, cost computation, streaming
test_openrouter_pipeline.py 5 Custom base_url forwarding, model prefixing, multi-model alias resolution
test_ollama_pipeline.py 4 No api_key, localhost base_url, zero-cost models
test_error_scenarios.py 9 Rate limit (429 + retry-after), auth (401), timeout, connection, internal, unknown
test_tool_calling_pipeline.py 8 Single/multiple tool calls, streaming accumulation, mixed text+tools, multi-turn
conftest.py Config factories, real ModelResponse builders, stream helpers

Verification

  • ruff check — all passed
  • ruff format — all formatted
  • mypy — 0 errors (7 files)
  • pytest — 1331 total tests pass, 94.49% coverage (80% required)

Closes #5

Test plan

  • CI passes (lint + type-check + test + coverage)
  • 39 integration tests pass under pytest -m integration
  • No regressions in existing 1292 unit tests
  • Coverage remains above 80% threshold

39 integration tests exercising the full provider pipeline
(config → registry → driver.complete/stream) with real litellm
ModelResponse objects mocked at the acompletion level.

- test_anthropic_pipeline: 13 tests (alias resolution, cost, streaming)
- test_openrouter_pipeline: 5 tests (base_url, model prefix, multi-model)
- test_ollama_pipeline: 4 tests (no api_key, localhost, zero-cost)
- test_error_scenarios: 9 tests (rate limit, auth, timeout, connection)
- test_tool_calling_pipeline: 8 tests (single/multi tool calls, streaming)
Copilot AI review requested due to automatic review settings March 1, 2026 16:46
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 1, 2026

Note

Currently processing new changes in this PR. This may take a few minutes, please wait...

📥 Commits

Reviewing files that changed from the base of the PR and between 22ec82f and 0531eed.

📒 Files selected for processing (6)
  • tests/integration/providers/conftest.py
  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/integration/providers/test_error_scenarios.py
  • tests/integration/providers/test_ollama_pipeline.py
  • tests/integration/providers/test_openrouter_pipeline.py
  • tests/integration/providers/test_tool_calling_pipeline.py

✏️ Tip: You can disable in-progress messages and the fortune message in your review settings.

📝 Walkthrough

Walkthrough

The PR introduces a comprehensive integration test suite for the provider adapter layer, covering end-to-end pipelines for multiple providers (Anthropic, OpenRouter, Ollama), error mapping scenarios, streaming behavior, and tool-calling functionality using mocked litellm responses.

Changes

Cohort / File(s) Summary
Test Infrastructure
tests/integration/providers/__init__.py, tests/integration/providers/conftest.py
Module docstring and comprehensive pytest fixtures including provider config factories (anthropic, openrouter, ollama), ModelResponse builders for streaming/non-streaming scenarios, async iterator helpers, and standard message/tool definition fixtures.
Provider Pipeline Tests
tests/integration/providers/test_anthropic_pipeline.py, tests/integration/providers/test_ollama_pipeline.py, tests/integration/providers/test_openrouter_pipeline.py
End-to-end integration tests for provider drivers validating configuration resolution, API key/base URL forwarding, cost calculation, finish reason mapping, usage token propagation, and model alias resolution through patched litellm calls.
Error Mapping Tests
tests/integration/providers/test_error_scenarios.py
Cross-provider error scenario tests validating LiteLLM exception mapping to provider errors (RateLimitError, AuthenticationError, Timeout, ConnectionError, InternalServerError) with metadata like is_retryable and retry_after, covering both non-streaming and streaming paths.
Tool Calling Tests
tests/integration/providers/test_tool_calling_pipeline.py
Integration tests for end-to-end tool-calling behavior including single/multiple tool call extraction, argument parsing, streaming tool-call accumulation with TOOL_CALL_DELTA events, mixed content/tool-call streaming, and multi-turn conversations with tool results.

Possibly related PRs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 96.72% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test: add provider adapter integration tests' accurately summarizes the main change—adding integration tests for the provider adapter layer, as confirmed by the detailed changeset.
Description check ✅ Passed The description clearly relates to the changeset, providing a structured overview of 39 integration tests across 5 test files and conftest.py, verification results, and reference to issue #5.
Linked Issues check ✅ Passed The PR successfully fulfills the primary acceptance criterion from issue #5 by implementing integration tests with mock/recorded API responses, covering all major error scenarios and provider pipelines.
Out of Scope Changes check ✅ Passed All changes are focused on adding integration tests for the provider adapter layer. No unrelated code modifications, refactoring, or feature changes are present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch test/provider-integration-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the testing suite by introducing 39 new integration tests for the provider adapter layer. These tests ensure the robustness and correctness of the LLM integration across various providers and scenarios, including streaming, error handling, and tool calling. By mocking at the litellm.acompletion level with real ModelResponse objects, the tests accurately reflect the system's behavior, providing high confidence in the adapter's functionality.

Highlights

  • Comprehensive Integration Test Suite: Added 39 new integration tests for the provider adapter layer, fulfilling a key acceptance criterion from issue Implement provider adapter layer (direct API or via LiteLLM) #5.
  • Realistic Mocking Strategy: Implemented tests to mock at the litellm.acompletion level using real litellm.ModelResponse objects, ensuring realistic attribute access paths through core processing functions.
  • Provider-Specific Pipeline Coverage: Covered Anthropic, OpenRouter, and Ollama provider pipelines, including tests for configuration, model alias resolution, cost computation, and streaming behavior.
  • Robust Error Handling Verification: Introduced comprehensive error scenario tests for rate limits (429), authentication (401), timeouts, connection issues, internal server errors (500), and unknown exceptions, verifying correct mapping to custom ProviderError subclasses and retryable flags.
  • Advanced Tool Calling Integration: Developed integration tests for tool calling functionality, verifying tool definition forwarding, single/multiple tool call extraction, streaming accumulation of tool calls, and multi-turn tool conversations.
Changelog
  • tests/integration/providers/init.py
    • Initialized the providers directory as a Python package for integration tests.
  • tests/integration/providers/conftest.py
    • Introduced shared pytest fixtures and utility functions for building mock litellm.ModelResponse objects and provider configurations.
  • tests/integration/providers/test_anthropic_pipeline.py
    • Added integration tests for the Anthropic provider, covering configuration, model alias resolution, cost computation, and streaming.
  • tests/integration/providers/test_error_scenarios.py
    • Implemented integration tests to verify the mapping of various litellm exceptions to custom ProviderError subclasses, including retryable flags.
  • tests/integration/providers/test_ollama_pipeline.py
    • Added integration tests for the Ollama provider, focusing on no API key scenarios, localhost base URL forwarding, and zero-cost models.
  • tests/integration/providers/test_openrouter_pipeline.py
    • Included integration tests for the OpenRouter provider, validating custom base URL, model prefixing, and multi-model alias resolution.
  • tests/integration/providers/test_tool_calling_pipeline.py
    • Developed integration tests for tool calling functionality, covering single/multiple calls, streaming accumulation, and multi-turn conversations.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive suite of 39 integration tests for the provider adapter layer. The tests are well-structured, covering various providers (Anthropic, OpenRouter, Ollama), different features like streaming and tool calling, and a wide range of error scenarios. The approach of mocking at the litellm.acompletion level and using real litellm.ModelResponse objects is excellent for verifying the response mapping logic. Overall, this is a high-quality addition that significantly improves confidence in the provider layer. I have one suggestion to enhance the multi-turn tool conversation test to make it more robust and reflect a more standard interaction pattern.

Comment on lines +257 to +318
async def test_multi_turn_tool_conversation(
sample_tool_definitions: list[ToolDefinition],
) -> None:
"""Multi-turn: user -> assistant(tool_call) -> tool_result -> assistant."""
driver = _make_driver()

# Turn 1: user asks, model calls tool
messages_t1 = [
ChatMessage(role=MessageRole.USER, content="What's the weather?"),
]
tc = build_tool_call_dict(
call_id="call_w1",
name="get_weather",
arguments='{"location": "Tokyo"}',
)
mock_resp_t1 = build_model_response(
content=None,
tool_calls=[tc],
finish_reason="tool_calls",
)
with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mock_resp_t1):
result_t1 = await driver.complete(
messages_t1, "sonnet", tools=sample_tool_definitions
)

assert result_t1.finish_reason == FinishReason.TOOL_USE
assert len(result_t1.tool_calls) == 1

# Turn 2: include tool result, model responds with text
messages_t2 = [
ChatMessage(role=MessageRole.USER, content="What's the weather?"),
ChatMessage(
role=MessageRole.ASSISTANT,
tool_calls=(
ToolCall(
id="call_w1",
name="get_weather",
arguments={"location": "Tokyo"},
),
),
),
ChatMessage(
role=MessageRole.TOOL,
tool_result=ToolResult(
tool_call_id="call_w1",
content="Sunny, 25°C",
),
),
ChatMessage(role=MessageRole.USER, content="Tell me the result"),
]
mock_resp_t2 = build_model_response(
content="It's sunny and 25°C in Tokyo!",
finish_reason="stop",
)
with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mock_resp_t2):
result_t2 = await driver.complete(
messages_t2, "sonnet", tools=sample_tool_definitions
)

assert result_t2.content == "It's sunny and 25°C in Tokyo!"
assert result_t2.finish_reason == FinishReason.STOP
assert len(result_t2.tool_calls) == 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test is great for verifying a multi-turn tool conversation. However, it could be improved in two ways to be more robust and realistic:

  1. Simplify the conversation flow: The final user message, ChatMessage(role=MessageRole.USER, content="Tell me the result"), is not typical in a tool-use conversation. The assistant should be able to generate a response based on the tool result without an extra prompt. Removing this message would make the test case reflect a more standard interaction pattern.

  2. Assert on forwarded messages: The test currently only asserts on the final CompletionResponse. It would be more thorough to also assert that the conversation history (messages_t2) is correctly formatted and passed to the underlying litellm.acompletion call. This would ensure the message mapping logic for TOOL and ASSISTANT (with tool calls) roles is working as expected.

Here's a suggested implementation that incorporates these points:

async def test_multi_turn_tool_conversation(
    sample_tool_definitions: list[ToolDefinition],
) -> None:
    """Multi-turn: user -> assistant(tool_call) -> tool_result -> assistant."""
    driver = _make_driver()

    # Turn 1: user asks, model calls tool
    messages_t1 = [
        ChatMessage(role=MessageRole.USER, content="What's the weather in Tokyo?"),
    ]
    tc = build_tool_call_dict(
        call_id="call_w1",
        name="get_weather",
        arguments='{"location": "Tokyo"}',
    )
    mock_resp_t1 = build_model_response(
        content=None,
        tool_calls=[tc],
        finish_reason="tool_calls",
    )
    with patch(_PATCH_TARGET, new_callable=AsyncMock, return_value=mock_resp_t1):
        result_t1 = await driver.complete(
            messages_t1, "sonnet", tools=sample_tool_definitions
        )

    assert result_t1.finish_reason == FinishReason.TOOL_USE
    assert len(result_t1.tool_calls) == 1
    assistant_message = ChatMessage(
        role=MessageRole.ASSISTANT,
        content=result_t1.content,
        tool_calls=result_t1.tool_calls,
    )

    # Turn 2: include tool result, model responds with text
    messages_t2 = [
        *messages_t1,
        assistant_message,
        ChatMessage(
            role=MessageRole.TOOL,
            tool_result=ToolResult(
                tool_call_id="call_w1",
                content="Sunny, 25°C",
            ),
        ),
    ]
    mock_resp_t2 = build_model_response(
        content="It's sunny and 25°C in Tokyo!",
        finish_reason="stop",
    )
    with patch(
        _PATCH_TARGET, new_callable=AsyncMock, return_value=mock_resp_t2
    ) as mock_call_t2:
        result_t2 = await driver.complete(
            messages_t2, "sonnet", tools=sample_tool_definitions
        )

    # Assert on final response
    assert result_t2.content == "It's sunny and 25°C in Tokyo!"
    assert result_t2.finish_reason == FinishReason.STOP
    assert not result_t2.tool_calls

    # Assert on messages forwarded to litellm
    kwargs = mock_call_t2.call_args.kwargs
    forwarded_messages = kwargs["messages"]
    assert len(forwarded_messages) == 3
    assert forwarded_messages[0]["role"] == "user"
    assert forwarded_messages[1]["role"] == "assistant"
    assert forwarded_messages[1]["tool_calls"][0]["id"] == "call_w1"
    assert forwarded_messages[2]["role"] == "tool"
    assert forwarded_messages[2]["tool_call_id"] == "call_w1"
    assert forwarded_messages[2]["content"] == "Sunny, 25°C"

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an integration test suite for the provider adapter layer (LiteLLM-backed drivers + ProviderRegistry), using real litellm.ModelResponse objects while mocking litellm.acompletion to exercise the full mapping pipeline and error translation.

Changes:

  • Adds end-to-end “pipeline” integration tests for Anthropic, OpenRouter, and Ollama adapters (config → registry → complete/stream).
  • Adds integration coverage for tool-calling (non-stream + streaming accumulation + multi-turn tool conversations).
  • Adds integration coverage for LiteLLM exception → ProviderError mapping, including retryability + retry-after parsing.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/integration/providers/test_anthropic_pipeline.py Anthropic pipeline coverage: config/aliasing, kwargs forwarding, cost computation, basic streaming behavior.
tests/integration/providers/test_openrouter_pipeline.py OpenRouter behaviors: base_url/api_base forwarding, provider-prefixed model IDs, alias resolution, cost.
tests/integration/providers/test_ollama_pipeline.py Ollama behaviors: no api_key, localhost api_base, zero-cost pricing, response mapping.
tests/integration/providers/test_tool_calling_pipeline.py Tool definition forwarding + tool call extraction and streaming accumulation; multi-turn tool conversation.
tests/integration/providers/test_error_scenarios.py Cross-provider error mapping coverage from LiteLLM exceptions to internal ProviderError hierarchy.
tests/integration/providers/conftest.py Shared integration fixtures and builders for real ModelResponse + streaming chunk helpers and provider configs.
tests/integration/providers/__init__.py Package marker for integration provider tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/providers/conftest.py`:
- Around line 120-122: In build_model_response, avoid mutating the message dict
after construction; instead construct message in a single expression that
includes "role" and "content" plus an optional "tool_calls" key only when
tool_calls is not None (reference symbols: build_model_response, message,
tool_calls). Replace the two-step creation/mutation with a single immutable
construction (e.g., combining the base keys with a conditional small dict or
using a dict-union/merge expression) so no post-creation assignment to message
occurs.

In `@tests/integration/providers/test_error_scenarios.py`:
- Line 44: The pytest file currently sets only pytestmark =
pytest.mark.integration; add an explicit 30-second timeout marker for
consistency by changing pytestmark to include the timeout (e.g., set pytestmark
to a list or chained markers such as pytestmark = [pytest.mark.integration,
pytest.mark.timeout(30)]) so the file applies both the integration marker and
the 30s timeout.

In `@tests/integration/providers/test_ollama_pipeline.py`:
- Line 20: The module-level pytest marker only sets pytest.mark.integration and
is missing the required 30-second timeout; update the module-level pytestmark to
include the timeout marker so each test in this file has a 30s bound (e.g.,
change pytestmark to include pytest.mark.timeout(30) alongside
pytest.mark.integration) — look for the pytestmark symbol in
tests/integration/providers/test_ollama_pipeline.py and make it a list
containing both markers.

In `@tests/integration/providers/test_openrouter_pipeline.py`:
- Line 20: The module-level pytest marker declaration only sets pytestmark =
pytest.mark.integration but must also include an explicit 30-second timeout
marker; update the module-level pytestmark to include both markers (e.g., make
pytestmark a list containing pytest.mark.integration and
pytest.mark.timeout(30)) so the test module is marked as integration and has an
explicit 30s timeout.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aeeff8c and 22ec82f.

📒 Files selected for processing (7)
  • tests/integration/providers/__init__.py
  • tests/integration/providers/conftest.py
  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/integration/providers/test_error_scenarios.py
  • tests/integration/providers/test_ollama_pipeline.py
  • tests/integration/providers/test_openrouter_pipeline.py
  • tests/integration/providers/test_tool_calling_pipeline.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do not use from __future__ import annotations — Python 3.14 has PEP 649
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14
Add type hints to all public functions, enforced by mypy strict mode
Use Google style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects instead of mutating existing ones — enforce immutability
Use Pydantic v2 with BaseModel, model_validator, and ConfigDict
Keep line length to 88 characters, enforced by ruff
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files

Files:

  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/integration/providers/test_openrouter_pipeline.py
  • tests/integration/providers/test_tool_calling_pipeline.py
  • tests/integration/providers/__init__.py
  • tests/integration/providers/conftest.py
  • tests/integration/providers/test_ollama_pipeline.py
  • tests/integration/providers/test_error_scenarios.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Use asyncio_mode = 'auto' in pytest — no manual @pytest.mark.asyncio needed
Set 30-second timeout per test

Files:

  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/integration/providers/test_openrouter_pipeline.py
  • tests/integration/providers/test_tool_calling_pipeline.py
  • tests/integration/providers/__init__.py
  • tests/integration/providers/conftest.py
  • tests/integration/providers/test_ollama_pipeline.py
  • tests/integration/providers/test_error_scenarios.py
🧠 Learnings (11)
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/integration/providers/test_tool_calling_pipeline.py
  • tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/integration/test_*.py : Place integration tests for component interactions in `tests/integration/` directory

Applied to files:

  • tests/integration/providers/__init__.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/integration/providers/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Mock Ollama API responses to support both dict (`models.get("models")`) and object (`response.models`) patterns in test mocks.

Applied to files:

  • tests/integration/providers/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Mock Ollama API calls in tests to avoid requiring a running Ollama instance

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py} : Ollama Integration - all AI agents use Ollama for local LLM serving with default endpoint `http://localhost:11434`

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : In agent tests, mock Ollama API calls using `unittest.mock` and patch `agents.base.ollama.Client`

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Always mock Ollama API calls in tests - tests should not require a running Ollama instance; use `unittest.mock` for mocking (`patch`, `MagicMock`); mock the Ollama client with: `patch("agents.base.ollama.Client")`

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/model_service.py} : Respect existing model configuration patterns in Ollama integration

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use BaseAgent methods for Ollama integration: `self.generate(prompt)` for LLM calls with built-in retry logic, `self.settings` for model configuration, and `self.client` for Ollama client instance

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
🧬 Code graph analysis (5)
tests/integration/providers/test_anthropic_pipeline.py (4)
src/ai_company/providers/enums.py (2)
  • FinishReason (15-22)
  • StreamEventType (25-32)
src/ai_company/providers/models.py (2)
  • ChatMessage (114-186)
  • CompletionConfig (189-230)
src/ai_company/providers/registry.py (3)
  • ProviderRegistry (21-139)
  • from_config (96-139)
  • get (56-78)
tests/integration/providers/conftest.py (8)
  • async_iter_chunks (253-258)
  • build_content_chunk (158-176)
  • build_finish_chunk (232-250)
  • build_model_response (109-138)
  • build_usage_chunk (179-197)
  • make_anthropic_config (33-56)
  • user_messages (265-267)
  • multi_turn_messages (271-278)
tests/integration/providers/test_openrouter_pipeline.py (3)
src/ai_company/providers/enums.py (1)
  • FinishReason (15-22)
src/ai_company/providers/registry.py (3)
  • ProviderRegistry (21-139)
  • from_config (96-139)
  • get (56-78)
tests/integration/providers/conftest.py (3)
  • build_model_response (109-138)
  • make_openrouter_config (59-83)
  • user_messages (265-267)
tests/integration/providers/test_tool_calling_pipeline.py (4)
src/ai_company/providers/enums.py (3)
  • FinishReason (15-22)
  • MessageRole (6-12)
  • StreamEventType (25-32)
src/ai_company/providers/models.py (4)
  • ChatMessage (114-186)
  • ToolCall (73-95)
  • ToolDefinition (45-70)
  • ToolResult (98-111)
src/ai_company/providers/registry.py (3)
  • ProviderRegistry (21-139)
  • from_config (96-139)
  • get (56-78)
src/ai_company/providers/base.py (1)
  • BaseCompletionProvider (26-273)
tests/integration/providers/conftest.py (3)
src/ai_company/config/schema.py (2)
  • ProviderConfig (52-95)
  • ProviderModelConfig (17-49)
src/ai_company/providers/enums.py (1)
  • MessageRole (6-12)
src/ai_company/providers/models.py (2)
  • ChatMessage (114-186)
  • ToolDefinition (45-70)
tests/integration/providers/test_error_scenarios.py (3)
src/ai_company/providers/errors.py (5)
  • AuthenticationError (72-75)
  • RateLimitError (78-104)
  • ProviderTimeoutError (125-128)
  • ProviderConnectionError (131-134)
  • ProviderInternalError (137-140)
tests/integration/providers/conftest.py (2)
  • build_content_chunk (158-176)
  • make_anthropic_config (33-56)
tests/integration/providers/test_tool_calling_pipeline.py (1)
  • _make_driver (40-44)
🔇 Additional comments (9)
tests/integration/providers/__init__.py (1)

1-1: Module scaffolding looks good.

Clean and appropriate package-level docstring for the integration test namespace.

tests/integration/providers/conftest.py (1)

33-307: Strong shared test infrastructure.

The config factories plus ModelResponse/stream builders give good integration-level realism while keeping tests deterministic.

tests/integration/providers/test_ollama_pipeline.py (1)

25-97: Ollama pipeline checks are well targeted.

Good coverage of key local-provider behaviors: omitted API key, base URL forwarding, zero-cost usage mapping, and finish/model/request-id mapping.

tests/integration/providers/test_openrouter_pipeline.py (1)

25-124: OpenRouter integration scenarios are comprehensive.

Great coverage of base_url propagation, model transformation, API key forwarding, full mapping, and multi-model alias cost behavior.

tests/integration/providers/test_anthropic_pipeline.py (2)

36-303: Excellent end-to-end coverage for Anthropic adapter behavior.

The suite exercises aliasing, config forwarding, cost mapping, finish reasons, streaming events, and multi-turn message serialization in a realistic way.


28-28: No action needed. This module already complies with the timeout requirement through the global pytest configuration in pyproject.toml which sets timeout = 30 for all tests. Adding an explicit pytest.mark.timeout(30) marker is redundant.

Likely an incorrect or invalid review comment.

tests/integration/providers/test_error_scenarios.py (1)

131-301: Error-mapping integration coverage is strong.

Good depth across retryability flags, retry_after extraction, and streaming/non-streaming exception translation paths.

tests/integration/providers/test_tool_calling_pipeline.py (2)

50-318: Tool-calling integration coverage is excellent.

The suite validates extraction, forwarding, streaming accumulation, mixed event flows, and realistic multi-turn tool-result conversations.


35-35: No action required — 30-second timeout is already enforced globally.

The pytest configuration in pyproject.toml sets timeout = 30 at the global level in [tool.pytest.ini_options], which automatically applies a 30-second timeout to all tests. This satisfies the coding guideline requirement. Adding an explicit pytest.mark.timeout(30) to the pytestmark would be redundant and is not necessary. The file is in compliance.

Likely an incorrect or invalid review comment.

…mini

- Add pytest.mark.timeout(30) to all 5 integration test files
- Build message dict immutably in conftest.py build_model_response
- Strengthen error context assertions (provider, model) across all error tests
- Add retry_after assertion to streaming rate limit test
- Add streaming AuthenticationError test (stream setup failure)
- Add streaming ConnectionError test (mid-stream failure)
- Add ModelNotFoundError test for unknown model alias
- Add ProviderError passthrough test (re-raise without double-wrapping)
- Strengthen unknown exception message assertion with full context
- Add cost_usd assertion to streaming usage test
- Add stop_sequences and top_p to CompletionConfig forwarding test
- Add malformed JSON tool call args test (silent degradation to {})
- Assert forwarded messages in multi-turn tool conversation test
@Aureliolo Aureliolo merged commit 40a61f4 into main Mar 1, 2026
9 of 10 checks passed
@Aureliolo Aureliolo deleted the test/provider-integration-tests branch March 1, 2026 17:15
Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement provider adapter layer (direct API or via LiteLLM)

2 participants