feat: design unified provider interface by Aureliolo · Pull Request #86 · Aureliolo/synthorg

Aureliolo · 2026-03-01T08:52:39Z

Closes #3

Summary

Implements the typed contract layer between the agent engine and LLM providers
CompletionProvider Protocol for loose coupling + BaseCompletionProvider ABC for shared validation and cost computation
Full model hierarchy: ChatMessage, ToolDefinition, ToolCall, ToolResult, CompletionConfig, CompletionResponse, StreamChunk, TokenUsage
Error hierarchy with is_retryable flags (8 typed subclasses of ProviderError)
ModelCapabilities for routing decisions (context window, tool support, vision, streaming, costs)

Design Decisions

Protocol + ABC two-tier: engine type-hints against Protocol (loose coupling), concrete adapters subclass ABC (shared logic)
Raw JSON Schema for tools: parameters_schema: dict[str, Any] — no custom ToolParameter model, every provider consumes JSON Schema natively
TokenUsage separate from CostRecord: lightweight provider-layer record; budget layer adds agent/task context
Streaming via AsyncIterator[StreamChunk]: typed chunks with StreamEventType discriminator
Error retryability: each error type declares is_retryable; retry logic (issue Implement retry logic, rate limiting, and provider error handling #9) will use this flag

Files

Source (7): providers/enums.py, errors.py, models.py, capabilities.py, protocol.py, base.py, __init__.py
Tests (5+2): test_enums.py, test_errors.py, test_models.py, test_capabilities.py, test_protocol.py + conftest.py, __init__.py

Test plan

uv run ruff check — passes
uv run ruff format --check — passes
uv run mypy src/ai_company/providers/ — passes (strict)
118 unit tests pass (pytest tests/unit/providers/ -v -m unit)
100% coverage on providers module
Full suite: 1156 tests pass, 94.64% overall coverage (80% required)
CI pipeline (lint + type-check + test)

🤖 Generated with Claude Code

Implement the typed contract layer between the agent engine and LLM providers (issue #3). Defines CompletionProvider Protocol for loose coupling and BaseCompletionProvider ABC for shared validation/cost computation. Includes full model hierarchy (ChatMessage, ToolDefinition, ToolCall, CompletionResponse, StreamChunk, TokenUsage), error hierarchy with is_retryable flags, and ModelCapabilities for routing decisions. 118 unit tests, 100% coverage on providers module.

coderabbitai · 2026-03-01T08:52:52Z

Warning

Rate limit exceeded

@Aureliolo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 45 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between f9ae175 and 520e14e.

📒 Files selected for processing (1)

src/ai_company/providers/models.py

📝 Walkthrough

Walkthrough

Adds a provider layer: runtime-checkable CompletionProvider Protocol, BaseCompletionProvider ABC, immutable provider-domain Pydantic models, provider error hierarchy, package exports, and comprehensive unit tests and fixtures. No behavioral changes to existing runtime code outside this new provider surface.

Changes

Cohort / File(s)	Summary
Documentation `CLAUDE.md`	Updates guidance: replaces the `from __future__ import annotations` bullet with a PEP 758 except-syntax note and adds style/engineering guidance for models, line/function length, errors, and validation.
Package Exports `src/ai_company/providers/__init__.py`	New package-level exports consolidating provider public API (base, protocol, models, enums, errors) and an explicit `__all__`.
Provider Base & Protocol `src/ai_company/providers/base.py`, `src/ai_company/providers/protocol.py`	Adds `BaseCompletionProvider` ABC with validation, compute_cost, and abstract hooks; adds runtime-checkable `CompletionProvider` Protocol. Review hook contracts, validation, and cost rounding/edge cases.
Domain Models `src/ai_company/providers/models.py`, `src/ai_company/providers/capabilities.py`, `src/ai_company/providers/enums.py`	Introduces frozen Pydantic models (TokenUsage, ChatMessage, CompletionConfig, CompletionResponse, StreamChunk, ToolDefinition/Call/Result) and `ModelCapabilities` with cross-field validators; adds MessageRole, FinishReason, StreamEventType enums. Inspect cross-field validators and immutability constraints.
Errors `src/ai_company/providers/errors.py`	Adds `ProviderError` base and subclasses (AuthenticationError, RateLimitError, ModelNotFoundError, InvalidRequestError, ContentFilterError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError) with is_retryable flags, immutable context via MappingProxyType, and sensitive-key redaction.
Tests & Fixtures `tests/unit/providers/...` `tests/unit/providers/conftest.py`, `tests/unit/providers/test_*.py`	Adds factories, a FakeProvider, and extensive tests for enums, errors, models, capabilities, protocol, and BaseCompletionProvider. Expect many new unit tests covering validation, immutability, protocol conformance, and cost logic.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Application/Client
    participant Base as BaseCompletionProvider
    participant Concrete as Concrete Provider
    participant Models as Pydantic Models
    participant Stream as StreamChunk / CompletionResponse

    Client->>Base: complete(messages, model, tools?, config?)
    Base->>Models: _validate_messages(messages)
    Models-->>Base: valid / raise InvalidRequestError
    Base->>Concrete: _do_complete(messages, model, tools, config)
    Concrete->>Concrete: call external provider API
    Concrete-->>Base: CompletionResponse (raw)
    Base->>Models: construct/validate CompletionResponse
    Models-->>Stream: validated CompletionResponse
    Stream-->>Client: CompletionResponse

    Client->>Base: stream(messages, model, tools?, config?)
    Base->>Models: _validate_messages(messages)
    Base->>Concrete: _do_stream(...)
    Concrete-->>Base: AsyncIterator[StreamChunk]
    Base-->>Client: AsyncIterator[StreamChunk]
    loop per chunk
        Client->>Stream: iterate
        Stream-->>Client: StreamChunk (validated)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

docs: add CLAUDE.md, contributing guide, and dev documentation #65: Modifies the same CLAUDE.md documentation file — likely related documentation edits.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 10.66% which is insufficient. The required threshold is 100.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly and specifically describes the main change: design of a unified provider interface with typed contracts.
Description check	✅ Passed	Description comprehensively explains the PR's purpose, design decisions, files changed, and test results with clear relevance to the changeset.
Linked Issues check	✅ Passed	PR fully implements all acceptance criteria from issue `#3`: Protocol + ABC interface, completion signature, streaming support, model capabilities, cost information, and comprehensive unit tests.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to implementing the unified provider interface defined in issue `#3`; CLAUDE.md update is minor documentation-only change.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/provider-interface

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-01T08:53:01Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes a foundational, type-safe interface for interacting with various Large Language Model (LLM) providers. It introduces a clear separation of concerns through protocols and abstract base classes, defines a comprehensive set of data models for chat completions and tool usage, and implements a structured error handling mechanism. This work is crucial for building a flexible and extensible agent engine capable of integrating with diverse LLM services while maintaining consistency and reliability.

Highlights

Unified Provider Interface: Implemented a typed contract layer between the agent engine and LLM providers, including CompletionProvider Protocol for loose coupling and BaseCompletionProvider ABC for shared validation and cost computation.
Comprehensive Model Hierarchy: Defined a full suite of Pydantic models for LLM interactions, including ChatMessage, ToolDefinition, ToolCall, ToolResult, CompletionConfig, CompletionResponse, StreamChunk, and TokenUsage.
Robust Error Handling: Introduced a detailed error hierarchy with 8 typed subclasses of ProviderError, each including an is_retryable flag to facilitate intelligent retry logic.
Model Capabilities for Routing: Added ModelCapabilities to describe LLM features such as context window size, tool support, vision capabilities, streaming, and cost, enabling informed routing decisions.
Streaming Support: Designed streaming functionality using AsyncIterator[StreamChunk] with a StreamEventType discriminator for typed chunks.

Changelog

CLAUDE.md
- Updated code conventions to include PEP 758 except syntax for Python 3.14.
src/ai_company/providers/init.py
- Added module-level docstring outlining the public API for the unified provider interface.
- Exported all new provider-related classes, enums, and protocols via __all__.
src/ai_company/providers/base.py
- Added BaseCompletionProvider abstract base class to provide shared logic for validation and cost computation.
- Implemented complete, stream, and get_model_capabilities public methods that delegate to abstract _do_* hooks.
- Included a static method compute_cost for calculating token usage costs.
- Added a private static method _validate_messages to ensure message lists are not empty.
src/ai_company/providers/capabilities.py
- Added ModelCapabilities Pydantic model to define static metadata and cost information for LLM models.
src/ai_company/providers/enums.py
- Added MessageRole StrEnum for chat message participants (system, user, assistant, tool).
- Added FinishReason StrEnum for reasons a model stops generating tokens.
- Added StreamEventType StrEnum to discriminate types of streaming response chunks.
src/ai_company/providers/errors.py
- Added ProviderError as the base exception for all provider-layer errors, including an is_retryable flag.
- Defined specific error subclasses: AuthenticationError, RateLimitError, ModelNotFoundError, InvalidRequestError, ContentFilterError, ProviderTimeoutError, ProviderConnectionError, and ProviderInternalError.
src/ai_company/providers/models.py
- Added TokenUsage Pydantic model for tracking token counts and estimated costs.
- Added ToolDefinition Pydantic model for describing tools the model can invoke, using raw JSON Schema.
- Added ToolCall Pydantic model for representing a tool invocation requested by the model.
- Added ToolResult Pydantic model for the outcome of executing a tool call.
- Added ChatMessage Pydantic model for a single message in a conversation, with role-based validation.
- Added CompletionConfig Pydantic model for optional completion parameters.
- Added CompletionResponse Pydantic model for non-streaming completion results.
- Added StreamChunk Pydantic model for individual chunks in a streaming response, with event-type validation.
src/ai_company/providers/protocol.py
- Added CompletionProvider Protocol to define the structural interface for LLM provider adapters.
tests/unit/providers/conftest.py
- Added Pydantic factories for TokenUsage, ToolDefinition, ToolCall, ToolResult, ChatMessage, CompletionConfig, CompletionResponse, StreamChunk, and ModelCapabilities.
- Added FakeProvider class to structurally satisfy CompletionProvider for testing purposes.
- Provided sample fixtures for various provider models and a fake provider instance.
tests/unit/providers/test_capabilities.py
- Added unit tests for ModelCapabilities validation, including checks for empty/invalid fields and immutability.
tests/unit/providers/test_enums.py
- Added unit tests to verify the existence, values, and string subclass nature of MessageRole, FinishReason, and StreamEventType enums.
tests/unit/providers/test_errors.py
- Added unit tests for the ProviderError hierarchy, covering message storage, context handling, and is_retryable flags for each error type.
tests/unit/providers/test_models.py
- Added extensive unit tests for all new Pydantic models (TokenUsage, ToolDefinition, ToolCall, ToolResult, ChatMessage, CompletionConfig, CompletionResponse, StreamChunk), validating field constraints, defaults, and immutability.
tests/unit/providers/test_protocol.py
- Added unit tests to confirm that FakeProvider correctly implements the CompletionProvider protocol and that BaseCompletionProvider's validation and delegation work as expected.

Activity

The author has performed ruff check and ruff format --check, both passing.
The author has run mypy src/ai_company/providers/ with strict mode, which passed.
118 unit tests specifically for the providers module passed.
The overall test suite (1156 tests) passed with 94.64% coverage, exceeding the 80% requirement.
The CI pipeline (lint + type-check + test) is pending.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a well-designed and comprehensive unified provider interface for LLM completions. The use of a Protocol for loose coupling and an ABC for shared logic is a solid architectural choice. The data models are thorough and include robust validation, and the error hierarchy with is_retryable flags is a great feature for building resilient systems. The accompanying tests are extensive and cover many edge cases.

I have a couple of minor suggestions for improvement: one regarding a potentially confusing code convention in the documentation, and another to enhance the robustness of a model validator.

gemini-code-assist · 2026-03-01T08:54:32Z

CLAUDE.md

 ## Code Conventions

 - **No `from __future__ import annotations`** — Python 3.14 has PEP 649
+- **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14


The documented syntax except A, B: for multiple exceptions is from Python 2 and is invalid in Python 3. The correct syntax in Python 3 is to use a tuple: except (A, B):.

Additionally, I couldn't find a reference to PEP 758. This might be a typo.

To prevent confusion for developers, it would be good to update this convention.

Suggested change

- **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14

- **Multiple exceptions**: use `except (A, B):` to catch multiple exception types.

gemini-code-assist · 2026-03-01T08:54:32Z

src/ai_company/providers/models.py

+        match self.event_type:
+            case StreamEventType.CONTENT_DELTA:
+                if self.content is None:
+                    msg = "content_delta event must include content"
+                    raise ValueError(msg)
+            case StreamEventType.TOOL_CALL_DELTA:
+                if self.tool_call_delta is None:
+                    msg = "tool_call_delta event must include tool_call_delta"
+                    raise ValueError(msg)
+            case StreamEventType.USAGE:
+                if self.usage is None:
+                    msg = "usage event must include usage"
+                    raise ValueError(msg)
+            case StreamEventType.ERROR:
+                if self.error_message is None:
+                    msg = "error event must include error_message"
+                    raise ValueError(msg)


The match statement in _validate_event_fields is not exhaustive. It's missing a case for StreamEventType.DONE. While DONE events currently require no extra fields and thus pass validation by falling through, making this explicit would improve readability and robustness.

Adding a case for DONE and a default case _: to handle any future unhandled event types would make the validator more maintainable.

Suggested change

match self.event_type:

case StreamEventType.CONTENT_DELTA:

if self.content is None:

msg = "content_delta event must include content"

raise ValueError(msg)

case StreamEventType.TOOL_CALL_DELTA:

if self.tool_call_delta is None:

msg = "tool_call_delta event must include tool_call_delta"

raise ValueError(msg)

case StreamEventType.USAGE:

if self.usage is None:

msg = "usage event must include usage"

raise ValueError(msg)

case StreamEventType.ERROR:

if self.error_message is None:

msg = "error event must include error_message"

raise ValueError(msg)

match self.event_type:

case StreamEventType.CONTENT_DELTA:

if self.content is None:

msg = "content_delta event must include content"

raise ValueError(msg)

case StreamEventType.TOOL_CALL_DELTA:

if self.tool_call_delta is None:

msg = "tool_call_delta event must include tool_call_delta"

raise ValueError(msg)

case StreamEventType.USAGE:

if self.usage is None:

msg = "usage event must include usage"

raise ValueError(msg)

case StreamEventType.ERROR:

if self.error_message is None:

msg = "error event must include error_message"

raise ValueError(msg)

case StreamEventType.DONE:

pass

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/providers/conftest.py`:
- Around line 215-239: Update the fixtures to avoid using real provider/model
identifiers: in the sample_completion_response fixture replace
model="claude-sonnet-4-6" with a fake model name (e.g., "test-model-1") and in
sample_model_capabilities set model_id to a fake model id (e.g., "test-model-1")
and provider to a fake provider string (e.g., "test-provider"); ensure you only
change the model and provider fields in the sample_completion_response and
sample_model_capabilities fixtures so tests use non-production identifiers.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbf1496 and 76f3b4c.

📒 Files selected for processing (15)

CLAUDE.md
src/ai_company/providers/__init__.py
src/ai_company/providers/base.py
src/ai_company/providers/capabilities.py
src/ai_company/providers/enums.py
src/ai_company/providers/errors.py
src/ai_company/providers/models.py
src/ai_company/providers/protocol.py
tests/unit/providers/__init__.py
tests/unit/providers/conftest.py
tests/unit/providers/test_capabilities.py
tests/unit/providers/test_enums.py
tests/unit/providers/test_errors.py
tests/unit/providers/test_models.py
tests/unit/providers/test_protocol.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Agent

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use PEP 649 native lazy annotations instead of from __future__ import annotations (Python 3.14+)
All public functions and classes must include type hints; enforce via mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Create new objects instead of mutating existing ones; enforce immutability patterns
Use Pydantic v2 models (BaseModel, model_validator, ConfigDict) for data validation
Validate at system boundaries (user input, external APIs, config files); never silently swallow errors
Enforce 88-character line length (via ruff configuration)

Files:

tests/unit/providers/test_capabilities.py
tests/unit/providers/test_errors.py
tests/unit/providers/test_models.py
src/ai_company/providers/base.py
src/ai_company/providers/capabilities.py
tests/unit/providers/test_enums.py
src/ai_company/providers/enums.py
src/ai_company/providers/errors.py
tests/unit/providers/test_protocol.py
src/ai_company/providers/models.py
tests/unit/providers/conftest.py
src/ai_company/providers/protocol.py
src/ai_company/providers/__init__.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain minimum 80% code coverage, enforced in CI via pytest-cov
Set test timeout to 30 seconds per test; use asyncio_mode = 'auto' for async tests without manual @pytest.mark.asyncio

Files:

tests/unit/providers/test_capabilities.py
tests/unit/providers/test_errors.py
tests/unit/providers/test_models.py
tests/unit/providers/test_enums.py
tests/unit/providers/test_protocol.py
tests/unit/providers/conftest.py

{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Run ruff check and ruff format as pre-commit hooks on all Python files; enforce no commits to main branch

Files:

tests/unit/providers/test_capabilities.py
tests/unit/providers/test_errors.py
tests/unit/providers/test_models.py
src/ai_company/providers/base.py
src/ai_company/providers/capabilities.py
tests/unit/providers/test_enums.py
src/ai_company/providers/enums.py
src/ai_company/providers/errors.py
tests/unit/providers/test_protocol.py
src/ai_company/providers/models.py
tests/unit/providers/conftest.py
src/ai_company/providers/protocol.py
src/ai_company/providers/__init__.py

🧠 Learnings (13)

📚 Learning: 2026-02-26T17:43:50.902Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

tests/unit/providers/test_capabilities.py
tests/unit/providers/test_models.py
tests/unit/providers/conftest.py

📚 Learning: 2026-02-26T17:43:50.902Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

CLAUDE.md

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to README.md : Update README.md for significant feature changes

Applied to files:

CLAUDE.md

📚 Learning: 2026-02-28T13:41:23.377Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)

Applied to files:

CLAUDE.md

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Line length: 100 characters (enforced by Ruff)

Applied to files:

CLAUDE.md

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : After making code changes, run `ruff format .` to format code with Ruff

Applied to files:

CLAUDE.md

📚 Learning: 2026-02-28T13:41:23.377Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : Enforce 88-character line length (via ruff configuration)

Applied to files:

CLAUDE.md

📚 Learning: 2026-02-28T13:41:23.377Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : All public functions and classes must include type hints; enforce via mypy strict mode

Applied to files:

CLAUDE.md

📚 Learning: 2026-02-28T13:41:23.377Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : Use PEP 649 native lazy annotations instead of `from __future__ import annotations` (Python 3.14+)

Applied to files:

CLAUDE.md

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Python version: 3.14+

Applied to files:

CLAUDE.md

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : After making code changes, run `ruff check .` to lint code (use `ruff check --fix .` to auto-fix)

Applied to files:

CLAUDE.md

📚 Learning: 2026-01-26T08:59:32.818Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : NEVER add `from __future__ import annotations` - Python 3.14+ has native support for deferred annotation evaluation and does not require this import

Applied to files:

CLAUDE.md

📚 Learning: 2026-02-26T17:43:50.902Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to **/*.py : Maintain line length at 100 characters per Ruff formatting rules.

Applied to files:

CLAUDE.md

🧬 Code graph analysis (8)

tests/unit/providers/test_capabilities.py (2)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (8-55)

tests/unit/providers/conftest.py (2)

ModelCapabilitiesFactory (93-105)

sample_model_capabilities (226-239)

tests/unit/providers/test_errors.py (1)

src/ai_company/providers/errors.py (8)

AuthenticationError (39-42)

ContentFilterError (73-76)

InvalidRequestError (67-70)

ModelNotFoundError (61-64)

ProviderConnectionError (85-88)

ProviderError (11-36)

ProviderInternalError (91-94)

ProviderTimeoutError (79-82)

src/ai_company/providers/base.py (3)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (8-55)

src/ai_company/providers/errors.py (1)

InvalidRequestError (67-70)

src/ai_company/providers/models.py (5)

CompletionConfig (159-199)

CompletionResponse (202-227)

StreamChunk (230-281)

TokenUsage (12-42)

ToolDefinition (45-64)

tests/unit/providers/test_protocol.py (5)

src/ai_company/providers/base.py (8)

BaseCompletionProvider (23-185)

complete (37-65)

stream (67-95)

get_model_capabilities (97-106)

_do_complete (111-120)

_do_stream (123-132)

_do_get_model_capabilities (135-140)

compute_cost (145-171)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (8-55)

src/ai_company/providers/enums.py (3)

FinishReason (15-22)

MessageRole (6-12)

StreamEventType (25-32)

src/ai_company/providers/models.py (6)

ChatMessage (102-156)

CompletionConfig (159-199)

CompletionResponse (202-227)

StreamChunk (230-281)

TokenUsage (12-42)

ToolDefinition (45-64)

src/ai_company/providers/protocol.py (4)

CompletionProvider (21-81)

complete (30-49)

stream (51-70)

get_model_capabilities (72-81)

src/ai_company/providers/models.py (1)

src/ai_company/providers/enums.py (3)

FinishReason (15-22)

MessageRole (6-12)

StreamEventType (25-32)

tests/unit/providers/conftest.py (3)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (8-55)

src/ai_company/providers/enums.py (3)

FinishReason (15-22)

MessageRole (6-12)

StreamEventType (25-32)

src/ai_company/providers/models.py (8)

ChatMessage (102-156)

CompletionConfig (159-199)

CompletionResponse (202-227)

StreamChunk (230-281)

TokenUsage (12-42)

ToolCall (67-83)

ToolDefinition (45-64)

ToolResult (86-99)

src/ai_company/providers/protocol.py (4)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (8-55)

src/ai_company/providers/models.py (5)

ChatMessage (102-156)

CompletionConfig (159-199)

CompletionResponse (202-227)

StreamChunk (230-281)

ToolDefinition (45-64)

src/ai_company/providers/base.py (3)

complete (37-65)

stream (67-95)

get_model_capabilities (97-106)

tests/unit/providers/conftest.py (3)

complete (119-128)

stream (130-149)

get_model_capabilities (151-152)

src/ai_company/providers/__init__.py (6)

src/ai_company/providers/base.py (1)

BaseCompletionProvider (23-185)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (8-55)

src/ai_company/providers/enums.py (3)

FinishReason (15-22)

MessageRole (6-12)

StreamEventType (25-32)

src/ai_company/providers/errors.py (9)

AuthenticationError (39-42)

ContentFilterError (73-76)

InvalidRequestError (67-70)

ModelNotFoundError (61-64)

ProviderConnectionError (85-88)

ProviderError (11-36)

ProviderInternalError (91-94)

ProviderTimeoutError (79-82)

RateLimitError (45-58)

src/ai_company/providers/models.py (8)

ChatMessage (102-156)

CompletionConfig (159-199)

CompletionResponse (202-227)

StreamChunk (230-281)

TokenUsage (12-42)

ToolCall (67-83)

ToolDefinition (45-64)

ToolResult (86-99)

src/ai_company/providers/protocol.py (1)

CompletionProvider (21-81)

🔇 Additional comments (16)

src/ai_company/providers/enums.py (1)

1-32: LGTM!

Clean enum definitions with appropriate docstrings and string values that align with LLM provider conventions. Using StrEnum is ideal for JSON serialization compatibility.

CLAUDE.md (1)

51-51: LGTM!

Documentation correctly reflects PEP 758 syntax for Python 3.14+. This aligns with the project's Python version requirements and ruff enforcement.

src/ai_company/providers/capabilities.py (1)

1-55: LGTM!

Well-designed capability model with:

Appropriate validation constraints (gt=0 for token limits, ge=0.0 for costs)

Sensible defaults for capability flags

ConfigDict(frozen=True) for immutability

Comprehensive docstring following Google style

tests/unit/providers/test_enums.py (1)

32-81: LGTM!

The remaining test classes (TestFinishReason, TestStreamEventType) are well-structured with proper markers and comprehensive coverage of enum behavior.

tests/unit/providers/test_errors.py (1)

1-144: LGTM!

Comprehensive error hierarchy tests covering:

Base ProviderError behavior (message, context, string representation, retryability)

Subclass relationships via issubclass checks

Parametrized is_retryable validation for all 8 error types

RateLimitError-specific retry_after attribute

String formatting consistency across all error types

tests/unit/providers/test_models.py (1)

1-424: LGTM!

Excellent test coverage for all provider domain models:

Validation rules (positive/negative cases, boundary conditions)

Immutability via frozen=True enforcement

Role-based constraints for ChatMessage

Event-type-specific validation for StreamChunk

Factory integration tests

JSON round-trip serialization

Tests use appropriate fake model names (e.g., "test") per coding guidelines.

tests/unit/providers/test_protocol.py (1)

1-194: LGTM!

Comprehensive protocol and ABC tests covering:

Structural typing with isinstance checks for CompletionProvider

Async method behavior (complete, stream, get_model_capabilities)

Delegation pattern from public API to _do_* hooks

Input validation (empty messages rejection)

compute_cost helper with appropriate floating-point tolerance (1e-9)

Protocol conformance verification for concrete implementations

src/ai_company/providers/base.py (1)

1-185: LGTM!

Well-designed abstract base class with:

Clear separation between public API and provider-specific hooks

Input validation via _validate_messages raising InvalidRequestError

Static compute_cost helper with floating-point precision handling (round(cost, 10))

Comprehensive Google-style docstrings with Args, Returns, and Raises sections

All methods under 50 lines per coding guidelines

The design cleanly implements the template method pattern, allowing concrete providers to focus on provider-specific logic while the base class handles common validation and cost computation.
tests/unit/providers/test_capabilities.py (2)
1-119: LGTM!

The test structure is well-organized with comprehensive coverage of:

Required field validation (empty model_id, empty provider)

Numeric constraint validation (zero/negative tokens and costs)

Boolean defaults verification

Immutability enforcement

Factory integration

JSON round-trip serialization

The @pytest.mark.unit decorator and timeout configuration comply with coding guidelines.

17-21: ⚠️ Potential issue | 🟡 Minor

Test uses real model ID instead of fake model name.

The test asserts against "claude-sonnet-4-6" which is a real Anthropic model identifier. The assertion depends on the sample_model_capabilities fixture which also uses the real model ID.

Based on learnings: "Tests must use fake model names (e.g., test-model:8b, fake-writer:latest)—never use real model IDs from RECOMMENDED_MODELS."

Consider updating the fixture in conftest.py and this assertion to use a fake model name like "test-model:200k".
⛔ Skipped due to learnings
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.
src/ai_company/providers/protocol.py (1)

1-81: LGTM!

The CompletionProvider Protocol is well-designed:

@runtime_checkable enables isinstance checks for structural typing

Method signatures align with BaseCompletionProvider in base.py

Google-style docstrings are complete with Args and Returns sections

Type hints are comprehensive and use modern union syntax (|)

The noqa comments appropriately suppress type-checking import warnings for types needed in runtime-checkable Protocol signatures.

src/ai_company/providers/__init__.py (1)

1-74: LGTM!

The module's public API is well-organized:

Comprehensive module docstring categorizing exports (Protocols, Models, Capabilities, Enums, Errors)

All 23 exports in __all__ are alphabetically sorted for discoverability

Imports and __all__ entries are consistent

src/ai_company/providers/models.py (1)

1-281: LGTM!

Comprehensive domain models with solid validation:

TokenUsage: Enforces total_tokens == input_tokens + output_tokens invariant

ChatMessage: Role-specific constraints are thorough (tool messages require tool_result, assistant can have tool_calls but not tool_result, etc.)

StreamChunk: Event-type discriminator pattern with match statement validates required fields per event type

All models are immutable (frozen=True) per coding guidelines

Pydantic v2 patterns used correctly (ConfigDict, model_validator, Field)
tests/unit/providers/conftest.py (2)
111-152: LGTM!

FakeProvider is a well-designed test double:

Structurally satisfies CompletionProvider protocol

Tracks complete_calls and stream_calls for test assertions

stream method correctly returns an AsyncIterator via inner generator

Properly handles optional capabilities parameter with factory fallback

74-81: ⚠️ Potential issue | 🟡 Minor

Factories use real model identifiers instead of fake names.

The CompletionResponseFactory and ModelCapabilitiesFactory use "claude-sonnet-4-6" and "anthropic" which are real provider identifiers.

Based on learnings: "Tests must use fake model names (e.g., test-model:8b, fake-writer:latest)—never use real model IDs from RECOMMENDED_MODELS."
🧪 Proposed fix to use fake identifiers
 class CompletionResponseFactory(ModelFactory):
     __model__ = CompletionResponse
     content = "Hello! How can I help you?"
     tool_calls = ()
     finish_reason = FinishReason.STOP
     usage = TokenUsageFactory
-    model = "claude-sonnet-4-6"
+    model = "fake-model:8b"
     provider_request_id = None


 class ModelCapabilitiesFactory(ModelFactory):
     __model__ = ModelCapabilities
-    model_id = "claude-sonnet-4-6"
-    provider = "anthropic"
+    model_id = "fake-model:8b"
+    provider = "fake-provider"
     max_context_tokens = 200_000
     max_output_tokens = 8_192
Also applies to: 93-105
⛔ Skipped due to learnings
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)
src/ai_company/providers/errors.py (1)

1-94: LGTM!

Well-structured error hierarchy with clear retry semantics:

Retryable errors (transient): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError

Non-retryable errors (permanent): AuthenticationError, ModelNotFoundError, InvalidRequestError, ContentFilterError

The RateLimitError.retry_after attribute supports respecting Retry-After headers from providers. The ProviderError.__str__ method provides helpful context formatting for debugging.

tests/unit/providers/conftest.py

Copilot

Pull request overview

Implements the foundational “provider layer” abstraction in ai_company.providers, defining the typed interface and shared base logic that future LLM provider adapters (Anthropic/OpenAI/OpenRouter/Ollama/etc.) will conform to.

Changes:

Added CompletionProvider Protocol plus BaseCompletionProvider ABC with shared request validation and token-cost computation helpers.
Introduced provider-layer domain models (ChatMessage, tool types, streaming chunk types, config/response/usage) and capability descriptors for routing.
Added a typed provider error hierarchy with is_retryable flags, plus comprehensive unit tests for the new module.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/ai_company/providers/protocol.py	Defines the structural provider interface (`CompletionProvider`) used for loose coupling.
src/ai_company/providers/base.py	Adds the shared ABC with validation and cost computation helper(s).
src/ai_company/providers/models.py	Introduces provider-layer request/response/tool/streaming models with Pydantic validation.
src/ai_company/providers/capabilities.py	Adds `ModelCapabilities` for routing and cost metadata.
src/ai_company/providers/errors.py	Adds provider error hierarchy with retryability flags.
src/ai_company/providers/enums.py	Adds provider-layer enums (roles, finish reasons, stream event types).
src/ai_company/providers/init.py	Exposes the providers public API surface via imports and `__all__`.
tests/unit/providers/conftest.py	Adds factories/fixtures and a fake provider for protocol/base tests.
tests/unit/providers/test_protocol.py	Tests structural typing of the Protocol and BaseCompletionProvider behavior.
tests/unit/providers/test_models.py	Tests validation/immutability/serialization behaviors for provider models.
tests/unit/providers/test_capabilities.py	Tests `ModelCapabilities` validation and JSON roundtrip.
tests/unit/providers/test_errors.py	Tests error hierarchy behavior, formatting, and retryability flags.
tests/unit/providers/test_enums.py	Tests provider-layer enums for members and string values.
CLAUDE.md	Updates contributor conventions (contains a syntax guidance change).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-01T08:56:26Z

CLAUDE.md

 ## Code Conventions

 - **No `from __future__ import annotations`** — Python 3.14 has PEP 649
+- **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14


The CLAUDE.md guidance on exception syntax appears incorrect: except A, B: is not valid Python 3 syntax for catching multiple exceptions (it should use a tuple, e.g. except (A, B):). As written this will mislead contributors and may cause invalid code to be introduced.

Suggested change

- **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14

- **Exception syntax**: for multiple exceptions use `except (A, B):` (tuple) — Python 3 only supports the tuple form

Copilot · 2026-03-01T08:56:27Z

src/ai_company/providers/models.py

+        """Ensure the populated fields match the event_type."""
+        match self.event_type:
+            case StreamEventType.CONTENT_DELTA:
+                if self.content is None:
+                    msg = "content_delta event must include content"
+                    raise ValueError(msg)
+            case StreamEventType.TOOL_CALL_DELTA:
+                if self.tool_call_delta is None:
+                    msg = "tool_call_delta event must include tool_call_delta"
+                    raise ValueError(msg)
+            case StreamEventType.USAGE:
+                if self.usage is None:
+                    msg = "usage event must include usage"
+                    raise ValueError(msg)
+            case StreamEventType.ERROR:
+                if self.error_message is None:
+                    msg = "error event must include error_message"
+                    raise ValueError(msg)


StreamChunk._validate_event_fields enforces required fields for some event_types, but it doesn't reject extraneous fields (e.g., a DONE chunk can still carry content/usage/etc.). This allows invalid/ambiguous stream states even though the docstring says the discriminator determines which fields are populated. Consider validating that non-relevant fields are None for each event type (especially DONE and ERROR).

Suggested change

"""Ensure the populated fields match the event_type."""

match self.event_type:

case StreamEventType.CONTENT_DELTA:

if self.content is None:

msg = "content_delta event must include content"

raise ValueError(msg)

case StreamEventType.TOOL_CALL_DELTA:

if self.tool_call_delta is None:

msg = "tool_call_delta event must include tool_call_delta"

raise ValueError(msg)

case StreamEventType.USAGE:

if self.usage is None:

msg = "usage event must include usage"

raise ValueError(msg)

case StreamEventType.ERROR:

if self.error_message is None:

msg = "error event must include error_message"

raise ValueError(msg)

"""Ensure the populated fields match the event_type.

The ``event_type`` acts as a discriminator: for each event type, only

the relevant field(s) may be non-None.

"""

# Determine which fields are required and which may be non-None for

# this event type. All other optional fields must be None.

match self.event_type:

case StreamEventType.CONTENT_DELTA:

allowed_non_null = {"content"}

required_non_null = {"content"}

case StreamEventType.TOOL_CALL_DELTA:

allowed_non_null = {"tool_call_delta"}

required_non_null = {"tool_call_delta"}

case StreamEventType.USAGE:

allowed_non_null = {"usage"}

required_non_null = {"usage"}

case StreamEventType.ERROR:

allowed_non_null = {"error_message"}

required_non_null = {"error_message"}

case StreamEventType.DONE:

# Terminal event: no additional payload fields should be set.

allowed_non_null = set()

required_non_null = set()

case _:

# For unknown or future event types, avoid over-validating to

# preserve forward compatibility.

return self

field_values: dict[str, Any] = {

"content": self.content,

"tool_call_delta": self.tool_call_delta,

"usage": self.usage,

"error_message": self.error_message,

}

# Ensure all required fields are present.

missing_required = sorted(

name for name in required_non_null if field_values[name] is None

)

if missing_required:

required_list = ", ".join(missing_required)

msg = (

f"{self.event_type.value} event missing required field(s): "

f"{required_list}"

)

raise ValueError(msg)

# Ensure no extraneous fields are populated.

extraneous = sorted(

name

for name, value in field_values.items()

if name not in allowed_non_null and value is not None

)

if extraneous:

extraneous_list = ", ".join(extraneous)

msg = (

f"{self.event_type.value} event must not include field(s): "

f"{extraneous_list}"

)

raise ValueError(msg)

Copilot · 2026-03-01T08:56:27Z

src/ai_company/providers/base.py

+        cost = (input_tokens / 1000) * cost_per_1k_input + (
+            output_tokens / 1000
+        ) * cost_per_1k_output
+        return TokenUsage(
+            input_tokens=input_tokens,
+            output_tokens=output_tokens,
+            total_tokens=input_tokens + output_tokens,
+            cost_usd=round(cost, 10),
+        )


compute_cost() hard-codes the rounding precision (round(cost, 10)), which makes the intent and consistency across the codebase harder to manage. There is already a shared BUDGET_ROUNDING_PRECISION = 10 constant used elsewhere to avoid float artifacts; consider reusing a shared constant (or introducing a provider-specific one) instead of a magic number here.

…, and Copilot Source fixes: - Make ProviderError.context immutable via MappingProxyType - Sanitize sensitive keys in ProviderError.__str__ output - Add input validation to compute_cost (reject negative values) - Add model parameter validation in BaseCompletionProvider - Add CompletionResponse validator requiring content or tool_calls - Make StreamChunk match exhaustive with negative constraints - Add ModelCapabilities cross-field validators - Extract cost rounding precision to named constant - Fix misleading docstrings (cost computation, delegation, stubs) - Standardize all docstrings to Google style Test additions (23 new tests, 118 → 141): - ABC instantiation and partial implementation rejection - Model validation (blank model, error context metadata) - compute_cost negative inputs and rounding precision - ChatMessage role-field symmetry (system/tool_calls, user/tool_result) - CompletionResponse empty response handling - StreamChunk extraneous field rejection - __all__ exports importability - ModelCapabilities cross-field constraint violations Other: - Replace real model identifiers with fake names in test fixtures

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/providers/errors.py`:
- Around line 11-13: The redaction set _REDACTED_KEYS should be matched
case-insensitively: normalize keys to lowercase (make _REDACTED_KEYS contain
lowercase strings) and update the __str__ logic to check k.lower() (or
equivalent) when deciding to redact values; ensure any comparisons that
currently use "k in _REDACTED_KEYS" are replaced with "k.lower() in
_REDACTED_KEYS" so keys like "API_KEY" or "Authorization" get redacted
correctly.

In `@tests/unit/providers/test_protocol.py`:
- Around line 271-282: Update the test to use the provider's rounding constant
instead of the hardcoded 10: import or reference
BaseCompletionProvider._COST_ROUNDING_PRECISION in
test_compute_cost_rounding_precision and replace the literal `10` with that
constant so the expected value is computed with the same precision as
BaseCompletionProvider.compute_cost; ensure the test still calls
BaseCompletionProvider.compute_cost(333, 777, cost_per_1k_input=0.003,
cost_per_1k_output=0.015) and compares usage.cost_usd to the rounded value using
_COST_ROUNDING_PRECISION.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76f3b4c and 5af5a7f.

📒 Files selected for processing (10)

src/ai_company/providers/__init__.py
src/ai_company/providers/base.py
src/ai_company/providers/capabilities.py
src/ai_company/providers/errors.py
src/ai_company/providers/models.py
src/ai_company/providers/protocol.py
tests/unit/providers/conftest.py
tests/unit/providers/test_capabilities.py
tests/unit/providers/test_models.py
tests/unit/providers/test_protocol.py

📜 Review details

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use PEP 649 native lazy annotations instead of from __future__ import annotations (Python 3.14+)
All public functions and classes must include type hints; enforce via mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Create new objects instead of mutating existing ones; enforce immutability patterns
Use Pydantic v2 models (BaseModel, model_validator, ConfigDict) for data validation
Validate at system boundaries (user input, external APIs, config files); never silently swallow errors
Enforce 88-character line length (via ruff configuration)

Files:

tests/unit/providers/test_capabilities.py
src/ai_company/providers/errors.py
tests/unit/providers/test_models.py
tests/unit/providers/test_protocol.py
src/ai_company/providers/capabilities.py
src/ai_company/providers/models.py
src/ai_company/providers/__init__.py
tests/unit/providers/conftest.py
src/ai_company/providers/base.py
src/ai_company/providers/protocol.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain minimum 80% code coverage, enforced in CI via pytest-cov
Set test timeout to 30 seconds per test; use asyncio_mode = 'auto' for async tests without manual @pytest.mark.asyncio

Files:

tests/unit/providers/test_capabilities.py
tests/unit/providers/test_models.py
tests/unit/providers/test_protocol.py
tests/unit/providers/conftest.py

{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Run ruff check and ruff format as pre-commit hooks on all Python files; enforce no commits to main branch

Files:

tests/unit/providers/test_capabilities.py
src/ai_company/providers/errors.py
tests/unit/providers/test_models.py
tests/unit/providers/test_protocol.py
src/ai_company/providers/capabilities.py
src/ai_company/providers/models.py
src/ai_company/providers/__init__.py
tests/unit/providers/conftest.py
src/ai_company/providers/base.py
src/ai_company/providers/protocol.py

🧠 Learnings (3)

📚 Learning: 2026-02-26T17:43:50.902Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

tests/unit/providers/test_capabilities.py
tests/unit/providers/test_models.py
tests/unit/providers/conftest.py

📚 Learning: 2026-01-31T13:51:16.868Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.

Applied to files:

tests/unit/providers/conftest.py

📚 Learning: 2026-01-24T16:33:29.354Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)

Applied to files:

tests/unit/providers/conftest.py

🧬 Code graph analysis (7)

tests/unit/providers/test_capabilities.py (2)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (10-83)

tests/unit/providers/conftest.py (2)

ModelCapabilitiesFactory (93-105)

sample_model_capabilities (226-239)

tests/unit/providers/test_models.py (2)

src/ai_company/providers/enums.py (3)

FinishReason (15-22)

MessageRole (6-12)

StreamEventType (25-32)

src/ai_company/providers/models.py (8)

ChatMessage (102-170)

CompletionConfig (173-213)

CompletionResponse (216-265)

StreamChunk (268-346)

TokenUsage (12-42)

ToolCall (67-83)

ToolDefinition (45-64)

ToolResult (86-99)

tests/unit/providers/test_protocol.py (6)

src/ai_company/providers/base.py (7)

complete (42-72)

stream (74-104)

get_model_capabilities (106-119)

_do_complete (124-133)

_do_stream (136-145)

_do_get_model_capabilities (148-153)

compute_cost (158-211)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (10-83)

src/ai_company/providers/enums.py (3)

FinishReason (15-22)

MessageRole (6-12)

StreamEventType (25-32)

src/ai_company/providers/errors.py (1)

InvalidRequestError (97-100)

src/ai_company/providers/models.py (6)

ChatMessage (102-170)

CompletionConfig (173-213)

CompletionResponse (216-265)

StreamChunk (268-346)

TokenUsage (12-42)

ToolDefinition (45-64)

src/ai_company/providers/protocol.py (4)

CompletionProvider (22-81)

complete (30-49)

stream (51-70)

get_model_capabilities (72-81)

src/ai_company/providers/models.py (1)

src/ai_company/providers/enums.py (3)

FinishReason (15-22)

MessageRole (6-12)

StreamEventType (25-32)

tests/unit/providers/conftest.py (2)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (10-83)

src/ai_company/providers/models.py (8)

ChatMessage (102-170)

CompletionConfig (173-213)

CompletionResponse (216-265)

StreamChunk (268-346)

TokenUsage (12-42)

ToolCall (67-83)

ToolDefinition (45-64)

ToolResult (86-99)

src/ai_company/providers/base.py (3)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (10-83)

src/ai_company/providers/errors.py (1)

InvalidRequestError (97-100)

src/ai_company/providers/models.py (6)

ChatMessage (102-170)

CompletionConfig (173-213)

CompletionResponse (216-265)

StreamChunk (268-346)

TokenUsage (12-42)

ToolDefinition (45-64)

src/ai_company/providers/protocol.py (3)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (10-83)

src/ai_company/providers/models.py (5)

ChatMessage (102-170)

CompletionConfig (173-213)

CompletionResponse (216-265)

StreamChunk (268-346)

ToolDefinition (45-64)

src/ai_company/providers/base.py (3)

complete (42-72)

stream (74-104)

get_model_capabilities (106-119)

🔇 Additional comments (19)

src/ai_company/providers/errors.py (1)

1-124: LGTM!

The error hierarchy is well-designed with:

Immutable context via MappingProxyType

Clear is_retryable semantics per error type

Sensitive key redaction in __str__

Proper inheritance and Google-style docstrings

tests/unit/providers/test_models.py (1)

1-512: LGTM!

Comprehensive test coverage for all provider domain models with:

Proper @pytest.mark.unit markers on all test classes

Module-level timeout of 30 seconds

Validation tests (boundaries, constraints, cross-field rules)

Immutability tests (frozen model config)

Factory usage tests

JSON round-trip verification

Uses fake model names ("test", "test-model") per repository guidelines

tests/unit/providers/test_capabilities.py (1)

1-157: LGTM!

Well-structured test suite for ModelCapabilities with:

Proper @pytest.mark.unit marker and 30-second timeout

Coverage of all field validations (empty/negative/zero constraints)

Cross-field constraint tests (max_output_tokens vs max_context_tokens, streaming tool call dependencies)

Immutability and JSON round-trip verification

Uses fake model names ("test-model", "basic-model") per repository guidelines

tests/unit/providers/test_protocol.py (1)

1-282: LGTM!

Comprehensive test coverage for the provider protocol and base ABC:

Protocol structural typing with isinstance checks

Async method delegation tests for complete, stream, get_model_capabilities

Validation tests for empty messages and blank model IDs

compute_cost edge cases (zero, large, negative inputs, rounding)

ABC instantiation prevention and partial implementation rejection

Uses fake model names per repository guidelines

src/ai_company/providers/protocol.py (1)

1-81: LGTM!

Clean protocol definition with:

@runtime_checkable decorator for structural typing with isinstance checks

Well-documented async methods with Google-style docstrings

Proper type hints using domain models from the provider layer

Correct use of AsyncIterator[StreamChunk] for streaming

src/ai_company/providers/models.py (2)

80-83: Note: Mutable default values in frozen models.

The arguments field uses default_factory=dict, which creates a new mutable dict for each instance. While the model is frozen (preventing reassignment of arguments), the dict itself is still mutable. If callers attempt to modify arguments after construction, they could inadvertently mutate state.

This is likely acceptable for the current use case since the model is meant to be immutable and callers should treat it as such, but be aware that Pydantic's frozen=True only prevents attribute reassignment, not deep immutability.

1-346: LGTM!

Excellent domain model design with:

Strict immutability via ConfigDict(frozen=True) on all models

Comprehensive cross-field validation (e.g., TokenUsage total check, ChatMessage role constraints, StreamChunk event-type validation)

Clear Google-style docstrings with attribute documentation

Proper use of Pydantic v2 features (model_validator, Field constraints)

NotBlankStr for enforcing non-empty identifiers

src/ai_company/providers/capabilities.py (1)

1-83: LGTM!

Well-designed capability descriptor with:

Comprehensive field set for routing decisions (context limits, feature flags, costs)

Sensible defaults (supports_streaming=True, supports_tools=False)

Cross-field validation ensuring consistency (max_output_tokens <= max_context_tokens, streaming tool calls require both tools and streaming)

Immutability via frozen=True

Google-style docstrings with clear attribute documentation

src/ai_company/providers/__init__.py (1)

1-57: LGTM!

Clean public API consolidation with:

Complete export of all public types (protocols, base classes, models, enums, errors)

Alphabetically sorted __all__ for discoverability

Clear module docstring describing the package purpose

tests/unit/providers/conftest.py (4)

1-20: LGTM!

Imports are correctly organized with the appropriate # noqa: TC003 comment for AsyncIterator since it's used in runtime type hints.

24-105: Well-structured factories following polyfactory patterns.

The factories are correctly configured:

TokenUsageFactory has consistent values (100 + 50 = 150 total_tokens)

CompletionResponseFactory.usage = TokenUsageFactory correctly references the sub-factory class (polyfactory will call .build() automatically)

StreamChunkFactory defaults align with CONTENT_DELTA event type requirements

ModelCapabilitiesFactory uses fake identifiers ("test-model", "test-provider") as required

The # type: ignore[var-annotated] and # noqa: RUF012 comments are appropriate for mutable class-level attributes in the ModelFactory pattern.

111-152: FakeProvider correctly implements the protocol structure.

The implementation properly:

Records calls for test verification via complete_calls and stream_calls

Returns factory-built responses from complete

Yields properly-shaped StreamChunk objects from stream (CONTENT_DELTA followed by DONE)

Returns injectable capabilities from get_model_capabilities

The async generator pattern in stream (wrapping an inner _gen() function) correctly produces an AsyncIterator[StreamChunk].

215-239: Fixtures now use fake model identifiers.

The sample_completion_response and sample_model_capabilities fixtures correctly use "test-model" and "test-provider" instead of real provider identifiers, addressing the previous review feedback.

src/ai_company/providers/base.py (6)

1-23: LGTM!

Module docstring clearly explains the purpose and the module-level _COST_ROUNDING_PRECISION constant is well-documented.

26-72: Well-designed public API with proper validation.

The complete method correctly:

Validates inputs before delegating to the hook

Documents the validation behavior in Raises section

Returns the hook result unmodified (no unnecessary wrapping)

74-104: stream method follows the same validation pattern.

The return await self._do_stream(...) pattern is correct—it expects subclass _do_stream implementations to be async methods returning an AsyncIterator (e.g., via an inner generator function), which aligns with the FakeProvider pattern in tests.

123-153: Abstract hooks properly defined.

The three abstract methods have clear signatures matching the public API, with brief but sufficient docstrings for implementers.

157-211: Robust cost computation with input validation.

The compute_cost method:

Validates all four parameters for non-negative values

Includes context in error messages for debugging

Uses the module-level precision constant for rounding

Correctly computes total_tokens as input_tokens + output_tokens

213-239: Validation helpers are correct.

_validate_messages properly rejects empty lists, and _validate_model handles both empty strings and whitespace-only strings with if not model or not model.strip().

src/ai_company/providers/errors.py

tests/unit/providers/test_protocol.py

…al reviewers - Case-insensitive secret redaction in ProviderError.__str__ with tests - Context dict shallow-copy before MappingProxyType wrap (immutability) - RateLimitError.retry_after validation (rejects negative/NaN/infinity) - _validate_model guards against None input with isinstance check - compute_cost rejects non-finite (inf/NaN) cost rates - Hook docstrings specify ProviderError wrapping contract - _do_stream docstring clarifies return-not-yield pattern - ChatMessage validator refactored to exhaustive match/case - Removed forward reference to nonexistent LiteLLMProvider - Fixed tool_call_delta docstring from "Partial" to accurate description - Temperature docstring notes provider-specific range caveat - Removed fragile issue #9 reference from errors.py docstring - Added ProviderError docstring note about redaction behavior - FakeProvider now records tools/config parameters - Tests: redaction, context immutability, retry_after validation, tools/config forwarding, None model, inf/NaN cost rates, MAX_TOKENS/TOOL_USE empty response, assistant with content+tool_calls

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-01T09:44:09Z

src/ai_company/providers/base.py

+            cost_per_1k_input: Cost per 1 000 input tokens in USD (>= 0).
+            cost_per_1k_output: Cost per 1 000 output tokens in USD (>= 0).
+
+        Returns:
+            Populated ``TokenUsage`` with computed cost.
+
+        Raises:
+            InvalidRequestError: If any parameter is negative.


compute_cost() rejects non-finite rates (NaN/inf) in addition to negatives, but the docstring says it only raises InvalidRequestError when a parameter is negative. Please update the docstring to reflect the actual validation behavior (non-negative and finite).

Suggested change

cost_per_1k_input: Cost per 1 000 input tokens in USD (>= 0).

cost_per_1k_output: Cost per 1 000 output tokens in USD (>= 0).

Returns:

Populated ``TokenUsage`` with computed cost.

Raises:

InvalidRequestError: If any parameter is negative.

cost_per_1k_input: Cost per 1 000 input tokens in USD (finite and >= 0).

cost_per_1k_output: Cost per 1 000 output tokens in USD (finite and >= 0).

Returns:

Populated ``TokenUsage`` with computed cost.

Raises:

InvalidRequestError: If any parameter is negative or non-finite

(for example, NaN or infinity).

Copilot · 2026-03-01T09:44:09Z

src/ai_company/providers/base.py

+_COST_ROUNDING_PRECISION: int = 10
+"""Decimal places for cost rounding to avoid floating-point dust."""


This introduces a new rounding-precision constant with the same value/purpose as ai_company.constants.BUDGET_ROUNDING_PRECISION. To avoid duplicated “10” precision definitions drifting over time, consider reusing a shared constant (or promoting a generic rounding constant in ai_company.constants) instead of defining _COST_ROUNDING_PRECISION locally.

Copilot · 2026-03-01T09:44:09Z

tests/unit/providers/test_protocol.py

+import pytest
+
+from ai_company.providers.base import (
+    _COST_ROUNDING_PRECISION,


The test imports _COST_ROUNDING_PRECISION, which is a private (underscore-prefixed) implementation detail. Either make the constant part of the public API (no leading underscore) or avoid importing it in the test by asserting on the rounded result directly (e.g., use the expected precision inline).

Suggested change

_COST_ROUNDING_PRECISION,

- Reuse shared BUDGET_ROUNDING_PRECISION from ai_company.constants instead of defining local _COST_ROUNDING_PRECISION duplicate - Update compute_cost docstring to reflect finite+non-negative validation - Import public constant in tests instead of private symbol

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/providers/models.py`:
- Around line 61-64: Add a short note to the model docstrings explaining that
dict fields like parameters_schema (in the model that defines Field(...)) and
the arguments field in ToolCall are shallowly immutable under the frozen
Pydantic model—reassignment is prevented but their contents can still be
mutated—so callers should treat these dicts as read-only (or copy them before
modifying) to avoid surprising side effects; update the docstring for the class
that declares parameters_schema and the ToolCall docstring to include this
guidance.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5af5a7f and f9ae175.

📒 Files selected for processing (8)

src/ai_company/providers/base.py
src/ai_company/providers/errors.py
src/ai_company/providers/models.py
src/ai_company/providers/protocol.py
tests/unit/providers/conftest.py
tests/unit/providers/test_errors.py
tests/unit/providers/test_models.py
tests/unit/providers/test_protocol.py

📜 Review details

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use PEP 649 native lazy annotations instead of from __future__ import annotations (Python 3.14+)
All public functions and classes must include type hints; enforce via mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Create new objects instead of mutating existing ones; enforce immutability patterns
Use Pydantic v2 models (BaseModel, model_validator, ConfigDict) for data validation
Validate at system boundaries (user input, external APIs, config files); never silently swallow errors
Enforce 88-character line length (via ruff configuration)

Files:

tests/unit/providers/test_models.py
tests/unit/providers/test_protocol.py
src/ai_company/providers/models.py
src/ai_company/providers/base.py
tests/unit/providers/conftest.py
tests/unit/providers/test_errors.py
src/ai_company/providers/errors.py
src/ai_company/providers/protocol.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain minimum 80% code coverage, enforced in CI via pytest-cov
Set test timeout to 30 seconds per test; use asyncio_mode = 'auto' for async tests without manual @pytest.mark.asyncio

Files:

tests/unit/providers/test_models.py
tests/unit/providers/test_protocol.py
tests/unit/providers/conftest.py
tests/unit/providers/test_errors.py

{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Run ruff check and ruff format as pre-commit hooks on all Python files; enforce no commits to main branch

Files:

tests/unit/providers/test_models.py
tests/unit/providers/test_protocol.py
src/ai_company/providers/models.py
src/ai_company/providers/base.py
tests/unit/providers/conftest.py
tests/unit/providers/test_errors.py
src/ai_company/providers/errors.py
src/ai_company/providers/protocol.py

🧠 Learnings (3)

📚 Learning: 2026-02-26T17:43:50.902Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

tests/unit/providers/test_models.py
tests/unit/providers/conftest.py

📚 Learning: 2026-01-31T13:51:16.868Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.

Applied to files:

tests/unit/providers/conftest.py

📚 Learning: 2026-01-24T16:33:29.354Z

Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)

Applied to files:

tests/unit/providers/conftest.py

🧬 Code graph analysis (3)

src/ai_company/providers/models.py (1)

src/ai_company/providers/enums.py (3)

FinishReason (15-22)

MessageRole (6-12)

StreamEventType (25-32)

tests/unit/providers/test_errors.py (1)

src/ai_company/providers/errors.py (7)

ContentFilterError (119-122)

InvalidRequestError (113-116)

ModelNotFoundError (107-110)

ProviderError (22-69)

ProviderInternalError (137-140)

ProviderTimeoutError (125-128)

RateLimitError (78-104)

src/ai_company/providers/protocol.py (2)

src/ai_company/providers/capabilities.py (1)

ModelCapabilities (10-83)

src/ai_company/providers/models.py (5)

ChatMessage (102-174)

CompletionConfig (177-218)

CompletionResponse (221-270)

StreamChunk (273-351)

ToolDefinition (45-64)

🔇 Additional comments (8)

src/ai_company/providers/protocol.py (1)

1-80: LGTM!

The protocol definition is well-structured with proper runtime-checkable decorator, clear method signatures, and comprehensive Google-style docstrings. The interface cleanly separates the engine-facing contract from concrete implementations.

src/ai_company/providers/errors.py (1)

1-140: LGTM!

The error hierarchy is well-designed with:

Proper immutability via MappingProxyType with defensive copy

Case-insensitive sensitive key redaction

Appropriate is_retryable flags for retry logic

Thorough validation of retry_after in RateLimitError

src/ai_company/providers/models.py (1)

1-351: Well-structured domain models with comprehensive validation.

The models correctly implement:

Frozen configs for immutability

Cross-field validation via model_validator(mode="after")

Exhaustive role/event-type matching with fallback error handling

Clear separation between token usage, tools, messages, and streaming concerns

src/ai_company/providers/base.py (1)

1-273: LGTM!

The ABC design is clean with:

Clear separation between public API (with validation) and abstract hooks

Comprehensive input validation raising InvalidRequestError with context

Proper finite/non-negative checks in compute_cost

Use of shared BUDGET_ROUNDING_PRECISION constant for consistency

The streaming design correctly documents that _do_stream implementations must return an AsyncIterator rather than yielding directly, which matches the await pattern in the public stream method.

tests/unit/providers/test_errors.py (1)

1-212: LGTM!

Comprehensive test coverage for the error hierarchy including:

Base class behavior and inheritance

is_retryable flags for all error types

Redaction of sensitive keys (case-insensitive)

Context immutability via MappingProxyType

RateLimitError validation for negative/NaN/inf retry_after

tests/unit/providers/test_models.py (1)

1-545: LGTM!

Thorough test coverage for all domain models with:

Validation rule enforcement (cross-field constraints, bounds)

Immutability verification via frozen model mutations

Factory-based instance creation

JSON round-trip serialization

__all__ export verification

All tests use fake model names like "test" as required.

tests/unit/providers/test_protocol.py (1)

1-369: LGTM!

Excellent test coverage for protocol and base class including:

Structural typing verification via isinstance checks

Input validation (empty messages, blank/None model)

Tools and config forwarding through hooks

compute_cost validation (negative tokens, negative/inf/NaN rates)

Rounding precision using BUDGET_ROUNDING_PRECISION constant

ABC instantiation prevention and partial implementation rejection

tests/unit/providers/conftest.py (1)

1-258: LGTM!

Well-organized test infrastructure with:

Comprehensive factories for all domain models

FakeProvider correctly satisfying the CompletionProvider protocol

Call recording for verification in protocol tests

All fixtures using fake model names ("test-model", "test-provider")

Based on learnings: Tests correctly use fake model names as required.

src/ai_company/providers/models.py

…ocstrings

Copilot

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-01T10:11:40Z

src/ai_company/providers/capabilities.py

+    cost_per_1k_input: float = Field(
+        ge=0.0,
+        description="Cost per 1k input tokens in USD",
+    )
+    cost_per_1k_output: float = Field(
+        ge=0.0,
+        description="Cost per 1k output tokens in USD",
+    )


cost_per_1k_input / cost_per_1k_output currently only enforce >= 0.0, which allows float('inf') values. Since downstream cost computation explicitly requires finite rates (see BaseCompletionProvider.compute_cost), these fields should also validate as finite to fail fast on invalid model metadata.

Copilot · 2026-03-01T10:11:41Z

src/ai_company/providers/models.py

+            and self.content is None
+            and not self.tool_calls
+        ):
+            msg = f"{self.role} messages must have content or tool_calls"
+            raise ValueError(msg)


For system/user messages, tool_calls are forbidden, but the validation error says they "must have content or tool_calls". That message is misleading because tool_calls can never satisfy the requirement for those roles; consider emitting a role-specific message (e.g., "system messages must have content").

Copilot · 2026-03-01T10:11:41Z

src/ai_company/providers/base.py

+        r"""Provider-specific streaming completion.
+
+        Implementations must *return* an ``AsyncIterator`` (not ``yield``
+        directly), since the caller ``await``\s this coroutine to obtain


The docstring contains the text await\s, which renders literally as "await\s" and looks like an escaped typo. Replace it with normal wording (e.g., "awaits") to avoid confusing readers and doc tooling.

Suggested change

directly), since the caller ``await``\s this coroutine to obtain

directly), since the caller awaits this coroutine to obtain

Add the "Employment Agency" — a swappable driver system with LiteLLM as the default backend. Implements the concrete provider layer behind the contracts designed in #86. - LiteLLMDriver wrapping litellm.acompletion for streaming and non-streaming completions, model capability queries, and full exception mapping (10 LiteLLM exception types → ProviderError hierarchy) - ProviderRegistry (immutable) mapping provider names to driver instances, built from config via from_config() with factory_overrides for testing - Pure mapping functions (messages, tools, finish reasons, tool calls) - 3 new driver error classes (DriverNotRegistered, DriverAlreadyRegistered, DriverFactoryNotFound) - driver field on ProviderConfig (defaults to "litellm") - Upgraded litellm 1.67.2 → 1.82.0 (fixes Python 3.14 compat, removes need for PYTHONUTF8 env var and deprecation warning filters) - Removed unused F403 ruff ignore for __init__.py - 235 new unit tests, all 1273 tests pass at 94% coverage

## Summary - Implements the **Employment Agency** — a swappable driver system with LiteLLM as the default backend behind the contracts from #86 - `LiteLLMDriver(BaseCompletionProvider)` wrapping `litellm.acompletion` for streaming/non-streaming completions, model capabilities, and full exception mapping (10 LiteLLM exception types → `ProviderError` hierarchy) - `ProviderRegistry` (immutable, `MappingProxyType`) mapping provider names → driver instances, built from config via `from_config()` with `factory_overrides` for testing/native SDK swaps - Pure mapping functions for messages, tools, finish reasons, and tool call extraction - Upgraded litellm `1.67.2` → `1.82.0`, removing all Python 3.14 compatibility workarounds (warning filters, `PYTHONUTF8` env var, stale `type: ignore`) - Removed unused `F403` ruff ignore for `__init__.py` ## Architecture ``` Engine ──> ProviderRegistry ──> LiteLLMDriver (anthropic) ("Employment Agency") LiteLLMDriver (openrouter) LiteLLMDriver (ollama) # Future: NativeAnthropicDriver, etc. ``` ## New files | File | Lines | Purpose | |------|-------|---------| | `providers/drivers/litellm_driver.py` | ~530 | LiteLLM-backed completion driver | | `providers/drivers/mappers.py` | ~170 | Pure message/tool/reason mapping functions | | `providers/registry.py` | ~185 | Immutable provider name → driver registry | | `providers/drivers/__init__.py` | ~20 | Sub-package exports | ## Modified files | File | Change | |------|--------| | `config/schema.py` | Added `driver` field to `ProviderConfig` (default `"litellm"`) | | `providers/errors.py` | Added 3 driver error classes | | `providers/__init__.py` | Added new exports | | `pyproject.toml` | litellm `1.82.0`, removed warning filters and `F403` ignore | ## Test plan - [x] 235 new unit tests covering driver, mappers, registry, and exception mapping - [x] All 1273 tests pass (`uv run pytest tests/ -n auto`) - [x] 94% coverage (`--cov-fail-under=80`) - [x] Lint clean (`uv run ruff check src/ tests/`) - [x] Format clean (`uv run ruff format --check src/ tests/`) - [x] Type-check clean (`uv run mypy src/` — 0 errors in 64 files) - [x] All pre-commit hooks pass Closes #4

## Summary - Adds **39 integration tests** for the provider adapter layer, completing the final unchecked acceptance criterion from #5: _"Integration tests with mock/recorded API responses"_ - All source code was already implemented in PRs #86 and #88 — this PR covers only the integration test suite - Mocks at `litellm.acompletion` level using **real `litellm.ModelResponse`** objects (not MagicMock), exercising actual attribute access paths through `_map_response`, `_process_chunk`, and `extract_tool_calls` ### Test files | File | Tests | Coverage | |------|-------|---------| | `test_anthropic_pipeline.py` | 13 | Config→registry→complete/stream, alias resolution, cost computation, streaming | | `test_openrouter_pipeline.py` | 5 | Custom base_url forwarding, model prefixing, multi-model alias resolution | | `test_ollama_pipeline.py` | 4 | No api_key, localhost base_url, zero-cost models | | `test_error_scenarios.py` | 9 | Rate limit (429 + retry-after), auth (401), timeout, connection, internal, unknown | | `test_tool_calling_pipeline.py` | 8 | Single/multiple tool calls, streaming accumulation, mixed text+tools, multi-turn | | `conftest.py` | — | Config factories, real ModelResponse builders, stream helpers | ### Verification - `ruff check` — all passed - `ruff format` — all formatted - `mypy` — 0 errors (7 files) - `pytest` — 1331 total tests pass, **94.49% coverage** (80% required) Closes #5 ## Test plan - [ ] CI passes (lint + type-check + test + coverage) - [ ] 39 integration tests pass under `pytest -m integration` - [ ] No regressions in existing 1292 unit tests - [ ] Coverage remains above 80% threshold

🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 1, 2026 08:52

Copilot started reviewing on behalf of Aureliolo March 1, 2026 08:53 View session

gemini-code-assist bot reviewed Mar 1, 2026

View reviewed changes

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

tests/unit/providers/conftest.py Show resolved Hide resolved

Copilot AI reviewed Mar 1, 2026

View reviewed changes

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

src/ai_company/providers/errors.py Show resolved Hide resolved

tests/unit/providers/test_protocol.py Show resolved Hide resolved

Copilot AI review requested due to automatic review settings March 1, 2026 09:40

Copilot started reviewing on behalf of Aureliolo March 1, 2026 09:41 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

coderabbitai bot reviewed Mar 1, 2026

View reviewed changes

src/ai_company/providers/models.py Show resolved Hide resolved

docs: add shallow-immutability notes to ToolDefinition and ToolCall d…

520e14e

…ocstrings

Copilot AI review requested due to automatic review settings March 1, 2026 10:07

Copilot started reviewing on behalf of Aureliolo March 1, 2026 10:08 View session

Aureliolo merged commit 3e23d64 into main Mar 1, 2026
11 of 12 checks passed

Aureliolo deleted the feat/provider-interface branch March 1, 2026 10:09

Copilot AI reviewed Mar 1, 2026

View reviewed changes

Aureliolo mentioned this pull request Mar 1, 2026

feat: implement LiteLLM driver and provider registry #88

Merged

7 tasks

coderabbitai bot mentioned this pull request Mar 1, 2026

fix: enforce strict mypy on test files #89

Merged

4 tasks

Aureliolo mentioned this pull request Mar 1, 2026

test: add provider adapter integration tests #90

Merged

4 tasks

coderabbitai bot mentioned this pull request Mar 1, 2026

Wire all modules into observability system and harden logging infrastructure #91

Closed

14 tasks

coderabbitai bot mentioned this pull request Mar 1, 2026

feat: implement model routing engine #98

Closed

5 tasks

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release ai-company 0.1.1 #282

Merged

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release 0.1.0 #283

Merged

This was referenced Mar 15, 2026

chore(main): release 0.2.4 #431

Merged

chore(main): release 0.2.0 #442

Closed

chore(main): release 0.2.5 #447

Merged

chore(main): release 0.2.0 #460

Closed

chore(main): release 0.2.0 #471

Closed

	- PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14
	- Multiple exceptions: use `except (A, B):` to catch multiple exception types.

	- PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14
	- Exception syntax: for multiple exceptions use `except (A, B):` (tuple) — Python 3 only supports the tuple form

		_COST_ROUNDING_PRECISION: int = 10
		"""Decimal places for cost rounding to avoid floating-point dust."""

	directly), since the caller ``await``\s this coroutine to obtain
	directly), since the caller awaits this coroutine to obtain

Conversation

Aureliolo commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design Decisions

Files

Test plan

Uh oh!

coderabbitai bot commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 1, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

Aureliolo commented Mar 1, 2026 •

edited

Loading

coderabbitai bot commented Mar 1, 2026 •

edited

Loading