Skip to content

feat: design unified provider interface#86

Merged
Aureliolo merged 5 commits intomainfrom
feat/provider-interface
Mar 1, 2026
Merged

feat: design unified provider interface#86
Aureliolo merged 5 commits intomainfrom
feat/provider-interface

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

@Aureliolo Aureliolo commented Mar 1, 2026

Closes #3

Summary

  • Implements the typed contract layer between the agent engine and LLM providers
  • CompletionProvider Protocol for loose coupling + BaseCompletionProvider ABC for shared validation and cost computation
  • Full model hierarchy: ChatMessage, ToolDefinition, ToolCall, ToolResult, CompletionConfig, CompletionResponse, StreamChunk, TokenUsage
  • Error hierarchy with is_retryable flags (8 typed subclasses of ProviderError)
  • ModelCapabilities for routing decisions (context window, tool support, vision, streaming, costs)

Design Decisions

  • Protocol + ABC two-tier: engine type-hints against Protocol (loose coupling), concrete adapters subclass ABC (shared logic)
  • Raw JSON Schema for tools: parameters_schema: dict[str, Any] — no custom ToolParameter model, every provider consumes JSON Schema natively
  • TokenUsage separate from CostRecord: lightweight provider-layer record; budget layer adds agent/task context
  • Streaming via AsyncIterator[StreamChunk]: typed chunks with StreamEventType discriminator
  • Error retryability: each error type declares is_retryable; retry logic (issue Implement retry logic, rate limiting, and provider error handling #9) will use this flag

Files

Source (7): providers/enums.py, errors.py, models.py, capabilities.py, protocol.py, base.py, __init__.py
Tests (5+2): test_enums.py, test_errors.py, test_models.py, test_capabilities.py, test_protocol.py + conftest.py, __init__.py

Test plan

  • uv run ruff check — passes
  • uv run ruff format --check — passes
  • uv run mypy src/ai_company/providers/ — passes (strict)
  • 118 unit tests pass (pytest tests/unit/providers/ -v -m unit)
  • 100% coverage on providers module
  • Full suite: 1156 tests pass, 94.64% overall coverage (80% required)
  • CI pipeline (lint + type-check + test)

🤖 Generated with Claude Code

Implement the typed contract layer between the agent engine and LLM
providers (issue #3). Defines CompletionProvider Protocol for loose
coupling and BaseCompletionProvider ABC for shared validation/cost
computation. Includes full model hierarchy (ChatMessage, ToolDefinition,
ToolCall, CompletionResponse, StreamChunk, TokenUsage), error hierarchy
with is_retryable flags, and ModelCapabilities for routing decisions.

118 unit tests, 100% coverage on providers module.
Copilot AI review requested due to automatic review settings March 1, 2026 08:52
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 1, 2026

Warning

Rate limit exceeded

@Aureliolo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 45 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between f9ae175 and 520e14e.

📒 Files selected for processing (1)
  • src/ai_company/providers/models.py
📝 Walkthrough

Walkthrough

Adds a provider layer: runtime-checkable CompletionProvider Protocol, BaseCompletionProvider ABC, immutable provider-domain Pydantic models, provider error hierarchy, package exports, and comprehensive unit tests and fixtures. No behavioral changes to existing runtime code outside this new provider surface.

Changes

Cohort / File(s) Summary
Documentation
CLAUDE.md
Updates guidance: replaces the from __future__ import annotations bullet with a PEP 758 except-syntax note and adds style/engineering guidance for models, line/function length, errors, and validation.
Package Exports
src/ai_company/providers/__init__.py
New package-level exports consolidating provider public API (base, protocol, models, enums, errors) and an explicit __all__.
Provider Base & Protocol
src/ai_company/providers/base.py, src/ai_company/providers/protocol.py
Adds BaseCompletionProvider ABC with validation, compute_cost, and abstract hooks; adds runtime-checkable CompletionProvider Protocol. Review hook contracts, validation, and cost rounding/edge cases.
Domain Models
src/ai_company/providers/models.py, src/ai_company/providers/capabilities.py, src/ai_company/providers/enums.py
Introduces frozen Pydantic models (TokenUsage, ChatMessage, CompletionConfig, CompletionResponse, StreamChunk, ToolDefinition/Call/Result) and ModelCapabilities with cross-field validators; adds MessageRole, FinishReason, StreamEventType enums. Inspect cross-field validators and immutability constraints.
Errors
src/ai_company/providers/errors.py
Adds ProviderError base and subclasses (AuthenticationError, RateLimitError, ModelNotFoundError, InvalidRequestError, ContentFilterError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError) with is_retryable flags, immutable context via MappingProxyType, and sensitive-key redaction.
Tests & Fixtures
tests/unit/providers/...
tests/unit/providers/conftest.py, tests/unit/providers/test_*.py
Adds factories, a FakeProvider, and extensive tests for enums, errors, models, capabilities, protocol, and BaseCompletionProvider. Expect many new unit tests covering validation, immutability, protocol conformance, and cost logic.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Application/Client
    participant Base as BaseCompletionProvider
    participant Concrete as Concrete Provider
    participant Models as Pydantic Models
    participant Stream as StreamChunk / CompletionResponse

    Client->>Base: complete(messages, model, tools?, config?)
    Base->>Models: _validate_messages(messages)
    Models-->>Base: valid / raise InvalidRequestError
    Base->>Concrete: _do_complete(messages, model, tools, config)
    Concrete->>Concrete: call external provider API
    Concrete-->>Base: CompletionResponse (raw)
    Base->>Models: construct/validate CompletionResponse
    Models-->>Stream: validated CompletionResponse
    Stream-->>Client: CompletionResponse

    Client->>Base: stream(messages, model, tools?, config?)
    Base->>Models: _validate_messages(messages)
    Base->>Concrete: _do_stream(...)
    Concrete-->>Base: AsyncIterator[StreamChunk]
    Base-->>Client: AsyncIterator[StreamChunk]
    loop per chunk
        Client->>Stream: iterate
        Stream-->>Client: StreamChunk (validated)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.66% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly and specifically describes the main change: design of a unified provider interface with typed contracts.
Description check ✅ Passed Description comprehensively explains the PR's purpose, design decisions, files changed, and test results with clear relevance to the changeset.
Linked Issues check ✅ Passed PR fully implements all acceptance criteria from issue #3: Protocol + ABC interface, completion signature, streaming support, model capabilities, cost information, and comprehensive unit tests.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing the unified provider interface defined in issue #3; CLAUDE.md update is minor documentation-only change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/provider-interface

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes a foundational, type-safe interface for interacting with various Large Language Model (LLM) providers. It introduces a clear separation of concerns through protocols and abstract base classes, defines a comprehensive set of data models for chat completions and tool usage, and implements a structured error handling mechanism. This work is crucial for building a flexible and extensible agent engine capable of integrating with diverse LLM services while maintaining consistency and reliability.

Highlights

  • Unified Provider Interface: Implemented a typed contract layer between the agent engine and LLM providers, including CompletionProvider Protocol for loose coupling and BaseCompletionProvider ABC for shared validation and cost computation.
  • Comprehensive Model Hierarchy: Defined a full suite of Pydantic models for LLM interactions, including ChatMessage, ToolDefinition, ToolCall, ToolResult, CompletionConfig, CompletionResponse, StreamChunk, and TokenUsage.
  • Robust Error Handling: Introduced a detailed error hierarchy with 8 typed subclasses of ProviderError, each including an is_retryable flag to facilitate intelligent retry logic.
  • Model Capabilities for Routing: Added ModelCapabilities to describe LLM features such as context window size, tool support, vision capabilities, streaming, and cost, enabling informed routing decisions.
  • Streaming Support: Designed streaming functionality using AsyncIterator[StreamChunk] with a StreamEventType discriminator for typed chunks.
Changelog
  • CLAUDE.md
    • Updated code conventions to include PEP 758 except syntax for Python 3.14.
  • src/ai_company/providers/init.py
    • Added module-level docstring outlining the public API for the unified provider interface.
    • Exported all new provider-related classes, enums, and protocols via __all__.
  • src/ai_company/providers/base.py
    • Added BaseCompletionProvider abstract base class to provide shared logic for validation and cost computation.
    • Implemented complete, stream, and get_model_capabilities public methods that delegate to abstract _do_* hooks.
    • Included a static method compute_cost for calculating token usage costs.
    • Added a private static method _validate_messages to ensure message lists are not empty.
  • src/ai_company/providers/capabilities.py
    • Added ModelCapabilities Pydantic model to define static metadata and cost information for LLM models.
  • src/ai_company/providers/enums.py
    • Added MessageRole StrEnum for chat message participants (system, user, assistant, tool).
    • Added FinishReason StrEnum for reasons a model stops generating tokens.
    • Added StreamEventType StrEnum to discriminate types of streaming response chunks.
  • src/ai_company/providers/errors.py
    • Added ProviderError as the base exception for all provider-layer errors, including an is_retryable flag.
    • Defined specific error subclasses: AuthenticationError, RateLimitError, ModelNotFoundError, InvalidRequestError, ContentFilterError, ProviderTimeoutError, ProviderConnectionError, and ProviderInternalError.
  • src/ai_company/providers/models.py
    • Added TokenUsage Pydantic model for tracking token counts and estimated costs.
    • Added ToolDefinition Pydantic model for describing tools the model can invoke, using raw JSON Schema.
    • Added ToolCall Pydantic model for representing a tool invocation requested by the model.
    • Added ToolResult Pydantic model for the outcome of executing a tool call.
    • Added ChatMessage Pydantic model for a single message in a conversation, with role-based validation.
    • Added CompletionConfig Pydantic model for optional completion parameters.
    • Added CompletionResponse Pydantic model for non-streaming completion results.
    • Added StreamChunk Pydantic model for individual chunks in a streaming response, with event-type validation.
  • src/ai_company/providers/protocol.py
    • Added CompletionProvider Protocol to define the structural interface for LLM provider adapters.
  • tests/unit/providers/conftest.py
    • Added Pydantic factories for TokenUsage, ToolDefinition, ToolCall, ToolResult, ChatMessage, CompletionConfig, CompletionResponse, StreamChunk, and ModelCapabilities.
    • Added FakeProvider class to structurally satisfy CompletionProvider for testing purposes.
    • Provided sample fixtures for various provider models and a fake provider instance.
  • tests/unit/providers/test_capabilities.py
    • Added unit tests for ModelCapabilities validation, including checks for empty/invalid fields and immutability.
  • tests/unit/providers/test_enums.py
    • Added unit tests to verify the existence, values, and string subclass nature of MessageRole, FinishReason, and StreamEventType enums.
  • tests/unit/providers/test_errors.py
    • Added unit tests for the ProviderError hierarchy, covering message storage, context handling, and is_retryable flags for each error type.
  • tests/unit/providers/test_models.py
    • Added extensive unit tests for all new Pydantic models (TokenUsage, ToolDefinition, ToolCall, ToolResult, ChatMessage, CompletionConfig, CompletionResponse, StreamChunk), validating field constraints, defaults, and immutability.
  • tests/unit/providers/test_protocol.py
    • Added unit tests to confirm that FakeProvider correctly implements the CompletionProvider protocol and that BaseCompletionProvider's validation and delegation work as expected.
Activity
  • The author has performed ruff check and ruff format --check, both passing.
  • The author has run mypy src/ai_company/providers/ with strict mode, which passed.
  • 118 unit tests specifically for the providers module passed.
  • The overall test suite (1156 tests) passed with 94.64% coverage, exceeding the 80% requirement.
  • The CI pipeline (lint + type-check + test) is pending.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-designed and comprehensive unified provider interface for LLM completions. The use of a Protocol for loose coupling and an ABC for shared logic is a solid architectural choice. The data models are thorough and include robust validation, and the error hierarchy with is_retryable flags is a great feature for building resilient systems. The accompanying tests are extensive and cover many edge cases.

I have a couple of minor suggestions for improvement: one regarding a potentially confusing code convention in the documentation, and another to enhance the robustness of a model validator.

## Code Conventions

- **No `from __future__ import annotations`** — Python 3.14 has PEP 649
- **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documented syntax except A, B: for multiple exceptions is from Python 2 and is invalid in Python 3. The correct syntax in Python 3 is to use a tuple: except (A, B):.

Additionally, I couldn't find a reference to PEP 758. This might be a typo.

To prevent confusion for developers, it would be good to update this convention.

Suggested change
- **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14
- **Multiple exceptions**: use `except (A, B):` to catch multiple exception types.

Comment on lines +264 to +280
match self.event_type:
case StreamEventType.CONTENT_DELTA:
if self.content is None:
msg = "content_delta event must include content"
raise ValueError(msg)
case StreamEventType.TOOL_CALL_DELTA:
if self.tool_call_delta is None:
msg = "tool_call_delta event must include tool_call_delta"
raise ValueError(msg)
case StreamEventType.USAGE:
if self.usage is None:
msg = "usage event must include usage"
raise ValueError(msg)
case StreamEventType.ERROR:
if self.error_message is None:
msg = "error event must include error_message"
raise ValueError(msg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The match statement in _validate_event_fields is not exhaustive. It's missing a case for StreamEventType.DONE. While DONE events currently require no extra fields and thus pass validation by falling through, making this explicit would improve readability and robustness.

Adding a case for DONE and a default case _: to handle any future unhandled event types would make the validator more maintainable.

Suggested change
match self.event_type:
case StreamEventType.CONTENT_DELTA:
if self.content is None:
msg = "content_delta event must include content"
raise ValueError(msg)
case StreamEventType.TOOL_CALL_DELTA:
if self.tool_call_delta is None:
msg = "tool_call_delta event must include tool_call_delta"
raise ValueError(msg)
case StreamEventType.USAGE:
if self.usage is None:
msg = "usage event must include usage"
raise ValueError(msg)
case StreamEventType.ERROR:
if self.error_message is None:
msg = "error event must include error_message"
raise ValueError(msg)
match self.event_type:
case StreamEventType.CONTENT_DELTA:
if self.content is None:
msg = "content_delta event must include content"
raise ValueError(msg)
case StreamEventType.TOOL_CALL_DELTA:
if self.tool_call_delta is None:
msg = "tool_call_delta event must include tool_call_delta"
raise ValueError(msg)
case StreamEventType.USAGE:
if self.usage is None:
msg = "usage event must include usage"
raise ValueError(msg)
case StreamEventType.ERROR:
if self.error_message is None:
msg = "error event must include error_message"
raise ValueError(msg)
case StreamEventType.DONE:
pass

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/providers/conftest.py`:
- Around line 215-239: Update the fixtures to avoid using real provider/model
identifiers: in the sample_completion_response fixture replace
model="claude-sonnet-4-6" with a fake model name (e.g., "test-model-1") and in
sample_model_capabilities set model_id to a fake model id (e.g., "test-model-1")
and provider to a fake provider string (e.g., "test-provider"); ensure you only
change the model and provider fields in the sample_completion_response and
sample_model_capabilities fixtures so tests use non-production identifiers.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cbf1496 and 76f3b4c.

📒 Files selected for processing (15)
  • CLAUDE.md
  • src/ai_company/providers/__init__.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/capabilities.py
  • src/ai_company/providers/enums.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/models.py
  • src/ai_company/providers/protocol.py
  • tests/unit/providers/__init__.py
  • tests/unit/providers/conftest.py
  • tests/unit/providers/test_capabilities.py
  • tests/unit/providers/test_enums.py
  • tests/unit/providers/test_errors.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_protocol.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use PEP 649 native lazy annotations instead of from __future__ import annotations (Python 3.14+)
All public functions and classes must include type hints; enforce via mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Create new objects instead of mutating existing ones; enforce immutability patterns
Use Pydantic v2 models (BaseModel, model_validator, ConfigDict) for data validation
Validate at system boundaries (user input, external APIs, config files); never silently swallow errors
Enforce 88-character line length (via ruff configuration)

Files:

  • tests/unit/providers/test_capabilities.py
  • tests/unit/providers/test_errors.py
  • tests/unit/providers/test_models.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/capabilities.py
  • tests/unit/providers/test_enums.py
  • src/ai_company/providers/enums.py
  • src/ai_company/providers/errors.py
  • tests/unit/providers/test_protocol.py
  • src/ai_company/providers/models.py
  • tests/unit/providers/conftest.py
  • src/ai_company/providers/protocol.py
  • src/ai_company/providers/__init__.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain minimum 80% code coverage, enforced in CI via pytest-cov
Set test timeout to 30 seconds per test; use asyncio_mode = 'auto' for async tests without manual @pytest.mark.asyncio

Files:

  • tests/unit/providers/test_capabilities.py
  • tests/unit/providers/test_errors.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_enums.py
  • tests/unit/providers/test_protocol.py
  • tests/unit/providers/conftest.py
{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Run ruff check and ruff format as pre-commit hooks on all Python files; enforce no commits to main branch

Files:

  • tests/unit/providers/test_capabilities.py
  • tests/unit/providers/test_errors.py
  • tests/unit/providers/test_models.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/capabilities.py
  • tests/unit/providers/test_enums.py
  • src/ai_company/providers/enums.py
  • src/ai_company/providers/errors.py
  • tests/unit/providers/test_protocol.py
  • src/ai_company/providers/models.py
  • tests/unit/providers/conftest.py
  • src/ai_company/providers/protocol.py
  • src/ai_company/providers/__init__.py
🧠 Learnings (13)
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

  • tests/unit/providers/test_capabilities.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to README.md : Update README.md for significant feature changes

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-28T13:41:23.377Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Line length: 100 characters (enforced by Ruff)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : After making code changes, run `ruff format .` to format code with Ruff

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-28T13:41:23.377Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : Enforce 88-character line length (via ruff configuration)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-28T13:41:23.377Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : All public functions and classes must include type hints; enforce via mypy strict mode

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-28T13:41:23.377Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : Use PEP 649 native lazy annotations instead of `from __future__ import annotations` (Python 3.14+)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Python version: 3.14+

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : After making code changes, run `ruff check .` to lint code (use `ruff check --fix .` to auto-fix)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : NEVER add `from __future__ import annotations` - Python 3.14+ has native support for deferred annotation evaluation and does not require this import

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to **/*.py : Maintain line length at 100 characters per Ruff formatting rules.

Applied to files:

  • CLAUDE.md
🧬 Code graph analysis (8)
tests/unit/providers/test_capabilities.py (2)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (8-55)
tests/unit/providers/conftest.py (2)
  • ModelCapabilitiesFactory (93-105)
  • sample_model_capabilities (226-239)
tests/unit/providers/test_errors.py (1)
src/ai_company/providers/errors.py (8)
  • AuthenticationError (39-42)
  • ContentFilterError (73-76)
  • InvalidRequestError (67-70)
  • ModelNotFoundError (61-64)
  • ProviderConnectionError (85-88)
  • ProviderError (11-36)
  • ProviderInternalError (91-94)
  • ProviderTimeoutError (79-82)
src/ai_company/providers/base.py (3)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (8-55)
src/ai_company/providers/errors.py (1)
  • InvalidRequestError (67-70)
src/ai_company/providers/models.py (5)
  • CompletionConfig (159-199)
  • CompletionResponse (202-227)
  • StreamChunk (230-281)
  • TokenUsage (12-42)
  • ToolDefinition (45-64)
tests/unit/providers/test_protocol.py (5)
src/ai_company/providers/base.py (8)
  • BaseCompletionProvider (23-185)
  • complete (37-65)
  • stream (67-95)
  • get_model_capabilities (97-106)
  • _do_complete (111-120)
  • _do_stream (123-132)
  • _do_get_model_capabilities (135-140)
  • compute_cost (145-171)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (8-55)
src/ai_company/providers/enums.py (3)
  • FinishReason (15-22)
  • MessageRole (6-12)
  • StreamEventType (25-32)
src/ai_company/providers/models.py (6)
  • ChatMessage (102-156)
  • CompletionConfig (159-199)
  • CompletionResponse (202-227)
  • StreamChunk (230-281)
  • TokenUsage (12-42)
  • ToolDefinition (45-64)
src/ai_company/providers/protocol.py (4)
  • CompletionProvider (21-81)
  • complete (30-49)
  • stream (51-70)
  • get_model_capabilities (72-81)
src/ai_company/providers/models.py (1)
src/ai_company/providers/enums.py (3)
  • FinishReason (15-22)
  • MessageRole (6-12)
  • StreamEventType (25-32)
tests/unit/providers/conftest.py (3)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (8-55)
src/ai_company/providers/enums.py (3)
  • FinishReason (15-22)
  • MessageRole (6-12)
  • StreamEventType (25-32)
src/ai_company/providers/models.py (8)
  • ChatMessage (102-156)
  • CompletionConfig (159-199)
  • CompletionResponse (202-227)
  • StreamChunk (230-281)
  • TokenUsage (12-42)
  • ToolCall (67-83)
  • ToolDefinition (45-64)
  • ToolResult (86-99)
src/ai_company/providers/protocol.py (4)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (8-55)
src/ai_company/providers/models.py (5)
  • ChatMessage (102-156)
  • CompletionConfig (159-199)
  • CompletionResponse (202-227)
  • StreamChunk (230-281)
  • ToolDefinition (45-64)
src/ai_company/providers/base.py (3)
  • complete (37-65)
  • stream (67-95)
  • get_model_capabilities (97-106)
tests/unit/providers/conftest.py (3)
  • complete (119-128)
  • stream (130-149)
  • get_model_capabilities (151-152)
src/ai_company/providers/__init__.py (6)
src/ai_company/providers/base.py (1)
  • BaseCompletionProvider (23-185)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (8-55)
src/ai_company/providers/enums.py (3)
  • FinishReason (15-22)
  • MessageRole (6-12)
  • StreamEventType (25-32)
src/ai_company/providers/errors.py (9)
  • AuthenticationError (39-42)
  • ContentFilterError (73-76)
  • InvalidRequestError (67-70)
  • ModelNotFoundError (61-64)
  • ProviderConnectionError (85-88)
  • ProviderError (11-36)
  • ProviderInternalError (91-94)
  • ProviderTimeoutError (79-82)
  • RateLimitError (45-58)
src/ai_company/providers/models.py (8)
  • ChatMessage (102-156)
  • CompletionConfig (159-199)
  • CompletionResponse (202-227)
  • StreamChunk (230-281)
  • TokenUsage (12-42)
  • ToolCall (67-83)
  • ToolDefinition (45-64)
  • ToolResult (86-99)
src/ai_company/providers/protocol.py (1)
  • CompletionProvider (21-81)
🔇 Additional comments (16)
src/ai_company/providers/enums.py (1)

1-32: LGTM!

Clean enum definitions with appropriate docstrings and string values that align with LLM provider conventions. Using StrEnum is ideal for JSON serialization compatibility.

CLAUDE.md (1)

51-51: LGTM!

Documentation correctly reflects PEP 758 syntax for Python 3.14+. This aligns with the project's Python version requirements and ruff enforcement.

src/ai_company/providers/capabilities.py (1)

1-55: LGTM!

Well-designed capability model with:

  • Appropriate validation constraints (gt=0 for token limits, ge=0.0 for costs)
  • Sensible defaults for capability flags
  • ConfigDict(frozen=True) for immutability
  • Comprehensive docstring following Google style
tests/unit/providers/test_enums.py (1)

32-81: LGTM!

The remaining test classes (TestFinishReason, TestStreamEventType) are well-structured with proper markers and comprehensive coverage of enum behavior.

tests/unit/providers/test_errors.py (1)

1-144: LGTM!

Comprehensive error hierarchy tests covering:

  • Base ProviderError behavior (message, context, string representation, retryability)
  • Subclass relationships via issubclass checks
  • Parametrized is_retryable validation for all 8 error types
  • RateLimitError-specific retry_after attribute
  • String formatting consistency across all error types
tests/unit/providers/test_models.py (1)

1-424: LGTM!

Excellent test coverage for all provider domain models:

  • Validation rules (positive/negative cases, boundary conditions)
  • Immutability via frozen=True enforcement
  • Role-based constraints for ChatMessage
  • Event-type-specific validation for StreamChunk
  • Factory integration tests
  • JSON round-trip serialization

Tests use appropriate fake model names (e.g., "test") per coding guidelines.

tests/unit/providers/test_protocol.py (1)

1-194: LGTM!

Comprehensive protocol and ABC tests covering:

  • Structural typing with isinstance checks for CompletionProvider
  • Async method behavior (complete, stream, get_model_capabilities)
  • Delegation pattern from public API to _do_* hooks
  • Input validation (empty messages rejection)
  • compute_cost helper with appropriate floating-point tolerance (1e-9)
  • Protocol conformance verification for concrete implementations
src/ai_company/providers/base.py (1)

1-185: LGTM!

Well-designed abstract base class with:

  • Clear separation between public API and provider-specific hooks
  • Input validation via _validate_messages raising InvalidRequestError
  • Static compute_cost helper with floating-point precision handling (round(cost, 10))
  • Comprehensive Google-style docstrings with Args, Returns, and Raises sections
  • All methods under 50 lines per coding guidelines

The design cleanly implements the template method pattern, allowing concrete providers to focus on provider-specific logic while the base class handles common validation and cost computation.

tests/unit/providers/test_capabilities.py (2)

1-119: LGTM!

The test structure is well-organized with comprehensive coverage of:

  • Required field validation (empty model_id, empty provider)
  • Numeric constraint validation (zero/negative tokens and costs)
  • Boolean defaults verification
  • Immutability enforcement
  • Factory integration
  • JSON round-trip serialization

The @pytest.mark.unit decorator and timeout configuration comply with coding guidelines.


17-21: ⚠️ Potential issue | 🟡 Minor

Test uses real model ID instead of fake model name.

The test asserts against "claude-sonnet-4-6" which is a real Anthropic model identifier. The assertion depends on the sample_model_capabilities fixture which also uses the real model ID.

Based on learnings: "Tests must use fake model names (e.g., test-model:8b, fake-writer:latest)—never use real model IDs from RECOMMENDED_MODELS."

Consider updating the fixture in conftest.py and this assertion to use a fake model name like "test-model:200k".

⛔ Skipped due to learnings
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.
src/ai_company/providers/protocol.py (1)

1-81: LGTM!

The CompletionProvider Protocol is well-designed:

  • @runtime_checkable enables isinstance checks for structural typing
  • Method signatures align with BaseCompletionProvider in base.py
  • Google-style docstrings are complete with Args and Returns sections
  • Type hints are comprehensive and use modern union syntax (|)

The noqa comments appropriately suppress type-checking import warnings for types needed in runtime-checkable Protocol signatures.

src/ai_company/providers/__init__.py (1)

1-74: LGTM!

The module's public API is well-organized:

  • Comprehensive module docstring categorizing exports (Protocols, Models, Capabilities, Enums, Errors)
  • All 23 exports in __all__ are alphabetically sorted for discoverability
  • Imports and __all__ entries are consistent
src/ai_company/providers/models.py (1)

1-281: LGTM!

Comprehensive domain models with solid validation:

  • TokenUsage: Enforces total_tokens == input_tokens + output_tokens invariant
  • ChatMessage: Role-specific constraints are thorough (tool messages require tool_result, assistant can have tool_calls but not tool_result, etc.)
  • StreamChunk: Event-type discriminator pattern with match statement validates required fields per event type
  • All models are immutable (frozen=True) per coding guidelines
  • Pydantic v2 patterns used correctly (ConfigDict, model_validator, Field)
tests/unit/providers/conftest.py (2)

111-152: LGTM!

FakeProvider is a well-designed test double:

  • Structurally satisfies CompletionProvider protocol
  • Tracks complete_calls and stream_calls for test assertions
  • stream method correctly returns an AsyncIterator via inner generator
  • Properly handles optional capabilities parameter with factory fallback

74-81: ⚠️ Potential issue | 🟡 Minor

Factories use real model identifiers instead of fake names.

The CompletionResponseFactory and ModelCapabilitiesFactory use "claude-sonnet-4-6" and "anthropic" which are real provider identifiers.

Based on learnings: "Tests must use fake model names (e.g., test-model:8b, fake-writer:latest)—never use real model IDs from RECOMMENDED_MODELS."

🧪 Proposed fix to use fake identifiers
 class CompletionResponseFactory(ModelFactory):
     __model__ = CompletionResponse
     content = "Hello! How can I help you?"
     tool_calls = ()
     finish_reason = FinishReason.STOP
     usage = TokenUsageFactory
-    model = "claude-sonnet-4-6"
+    model = "fake-model:8b"
     provider_request_id = None


 class ModelCapabilitiesFactory(ModelFactory):
     __model__ = ModelCapabilities
-    model_id = "claude-sonnet-4-6"
-    provider = "anthropic"
+    model_id = "fake-model:8b"
+    provider = "fake-provider"
     max_context_tokens = 200_000
     max_output_tokens = 8_192

Also applies to: 93-105

⛔ Skipped due to learnings
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)
src/ai_company/providers/errors.py (1)

1-94: LGTM!

Well-structured error hierarchy with clear retry semantics:

  • Retryable errors (transient): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError
  • Non-retryable errors (permanent): AuthenticationError, ModelNotFoundError, InvalidRequestError, ContentFilterError

The RateLimitError.retry_after attribute supports respecting Retry-After headers from providers. The ProviderError.__str__ method provides helpful context formatting for debugging.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the foundational “provider layer” abstraction in ai_company.providers, defining the typed interface and shared base logic that future LLM provider adapters (Anthropic/OpenAI/OpenRouter/Ollama/etc.) will conform to.

Changes:

  • Added CompletionProvider Protocol plus BaseCompletionProvider ABC with shared request validation and token-cost computation helpers.
  • Introduced provider-layer domain models (ChatMessage, tool types, streaming chunk types, config/response/usage) and capability descriptors for routing.
  • Added a typed provider error hierarchy with is_retryable flags, plus comprehensive unit tests for the new module.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/ai_company/providers/protocol.py Defines the structural provider interface (CompletionProvider) used for loose coupling.
src/ai_company/providers/base.py Adds the shared ABC with validation and cost computation helper(s).
src/ai_company/providers/models.py Introduces provider-layer request/response/tool/streaming models with Pydantic validation.
src/ai_company/providers/capabilities.py Adds ModelCapabilities for routing and cost metadata.
src/ai_company/providers/errors.py Adds provider error hierarchy with retryability flags.
src/ai_company/providers/enums.py Adds provider-layer enums (roles, finish reasons, stream event types).
src/ai_company/providers/init.py Exposes the providers public API surface via imports and __all__.
tests/unit/providers/conftest.py Adds factories/fixtures and a fake provider for protocol/base tests.
tests/unit/providers/test_protocol.py Tests structural typing of the Protocol and BaseCompletionProvider behavior.
tests/unit/providers/test_models.py Tests validation/immutability/serialization behaviors for provider models.
tests/unit/providers/test_capabilities.py Tests ModelCapabilities validation and JSON roundtrip.
tests/unit/providers/test_errors.py Tests error hierarchy behavior, formatting, and retryability flags.
tests/unit/providers/test_enums.py Tests provider-layer enums for members and string values.
CLAUDE.md Updates contributor conventions (contains a syntax guidance change).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

## Code Conventions

- **No `from __future__ import annotations`** — Python 3.14 has PEP 649
- **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CLAUDE.md guidance on exception syntax appears incorrect: except A, B: is not valid Python 3 syntax for catching multiple exceptions (it should use a tuple, e.g. except (A, B):). As written this will mislead contributors and may cause invalid code to be introduced.

Suggested change
- **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14
- **Exception syntax**: for multiple exceptions use `except (A, B):` (tuple) — Python 3 only supports the tuple form

Copilot uses AI. Check for mistakes.
Comment on lines +263 to +280
"""Ensure the populated fields match the event_type."""
match self.event_type:
case StreamEventType.CONTENT_DELTA:
if self.content is None:
msg = "content_delta event must include content"
raise ValueError(msg)
case StreamEventType.TOOL_CALL_DELTA:
if self.tool_call_delta is None:
msg = "tool_call_delta event must include tool_call_delta"
raise ValueError(msg)
case StreamEventType.USAGE:
if self.usage is None:
msg = "usage event must include usage"
raise ValueError(msg)
case StreamEventType.ERROR:
if self.error_message is None:
msg = "error event must include error_message"
raise ValueError(msg)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamChunk._validate_event_fields enforces required fields for some event_types, but it doesn't reject extraneous fields (e.g., a DONE chunk can still carry content/usage/etc.). This allows invalid/ambiguous stream states even though the docstring says the discriminator determines which fields are populated. Consider validating that non-relevant fields are None for each event type (especially DONE and ERROR).

Suggested change
"""Ensure the populated fields match the event_type."""
match self.event_type:
case StreamEventType.CONTENT_DELTA:
if self.content is None:
msg = "content_delta event must include content"
raise ValueError(msg)
case StreamEventType.TOOL_CALL_DELTA:
if self.tool_call_delta is None:
msg = "tool_call_delta event must include tool_call_delta"
raise ValueError(msg)
case StreamEventType.USAGE:
if self.usage is None:
msg = "usage event must include usage"
raise ValueError(msg)
case StreamEventType.ERROR:
if self.error_message is None:
msg = "error event must include error_message"
raise ValueError(msg)
"""Ensure the populated fields match the event_type.
The ``event_type`` acts as a discriminator: for each event type, only
the relevant field(s) may be non-None.
"""
# Determine which fields are required and which may be non-None for
# this event type. All other optional fields must be None.
match self.event_type:
case StreamEventType.CONTENT_DELTA:
allowed_non_null = {"content"}
required_non_null = {"content"}
case StreamEventType.TOOL_CALL_DELTA:
allowed_non_null = {"tool_call_delta"}
required_non_null = {"tool_call_delta"}
case StreamEventType.USAGE:
allowed_non_null = {"usage"}
required_non_null = {"usage"}
case StreamEventType.ERROR:
allowed_non_null = {"error_message"}
required_non_null = {"error_message"}
case StreamEventType.DONE:
# Terminal event: no additional payload fields should be set.
allowed_non_null = set()
required_non_null = set()
case _:
# For unknown or future event types, avoid over-validating to
# preserve forward compatibility.
return self
field_values: dict[str, Any] = {
"content": self.content,
"tool_call_delta": self.tool_call_delta,
"usage": self.usage,
"error_message": self.error_message,
}
# Ensure all required fields are present.
missing_required = sorted(
name for name in required_non_null if field_values[name] is None
)
if missing_required:
required_list = ", ".join(missing_required)
msg = (
f"{self.event_type.value} event missing required field(s): "
f"{required_list}"
)
raise ValueError(msg)
# Ensure no extraneous fields are populated.
extraneous = sorted(
name
for name, value in field_values.items()
if name not in allowed_non_null and value is not None
)
if extraneous:
extraneous_list = ", ".join(extraneous)
msg = (
f"{self.event_type.value} event must not include field(s): "
f"{extraneous_list}"
)
raise ValueError(msg)

Copilot uses AI. Check for mistakes.
Comment on lines +163 to +171
cost = (input_tokens / 1000) * cost_per_1k_input + (
output_tokens / 1000
) * cost_per_1k_output
return TokenUsage(
input_tokens=input_tokens,
output_tokens=output_tokens,
total_tokens=input_tokens + output_tokens,
cost_usd=round(cost, 10),
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_cost() hard-codes the rounding precision (round(cost, 10)), which makes the intent and consistency across the codebase harder to manage. There is already a shared BUDGET_ROUNDING_PRECISION = 10 constant used elsewhere to avoid float artifacts; consider reusing a shared constant (or introducing a provider-specific one) instead of a magic number here.

Copilot uses AI. Check for mistakes.
…, and Copilot

Source fixes:
- Make ProviderError.context immutable via MappingProxyType
- Sanitize sensitive keys in ProviderError.__str__ output
- Add input validation to compute_cost (reject negative values)
- Add model parameter validation in BaseCompletionProvider
- Add CompletionResponse validator requiring content or tool_calls
- Make StreamChunk match exhaustive with negative constraints
- Add ModelCapabilities cross-field validators
- Extract cost rounding precision to named constant
- Fix misleading docstrings (cost computation, delegation, stubs)
- Standardize all docstrings to Google style

Test additions (23 new tests, 118 → 141):
- ABC instantiation and partial implementation rejection
- Model validation (blank model, error context metadata)
- compute_cost negative inputs and rounding precision
- ChatMessage role-field symmetry (system/tool_calls, user/tool_result)
- CompletionResponse empty response handling
- StreamChunk extraneous field rejection
- __all__ exports importability
- ModelCapabilities cross-field constraint violations

Other:
- Replace real model identifiers with fake names in test fixtures
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/providers/errors.py`:
- Around line 11-13: The redaction set _REDACTED_KEYS should be matched
case-insensitively: normalize keys to lowercase (make _REDACTED_KEYS contain
lowercase strings) and update the __str__ logic to check k.lower() (or
equivalent) when deciding to redact values; ensure any comparisons that
currently use "k in _REDACTED_KEYS" are replaced with "k.lower() in
_REDACTED_KEYS" so keys like "API_KEY" or "Authorization" get redacted
correctly.

In `@tests/unit/providers/test_protocol.py`:
- Around line 271-282: Update the test to use the provider's rounding constant
instead of the hardcoded 10: import or reference
BaseCompletionProvider._COST_ROUNDING_PRECISION in
test_compute_cost_rounding_precision and replace the literal `10` with that
constant so the expected value is computed with the same precision as
BaseCompletionProvider.compute_cost; ensure the test still calls
BaseCompletionProvider.compute_cost(333, 777, cost_per_1k_input=0.003,
cost_per_1k_output=0.015) and compares usage.cost_usd to the rounded value using
_COST_ROUNDING_PRECISION.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76f3b4c and 5af5a7f.

📒 Files selected for processing (10)
  • src/ai_company/providers/__init__.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/capabilities.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/models.py
  • src/ai_company/providers/protocol.py
  • tests/unit/providers/conftest.py
  • tests/unit/providers/test_capabilities.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_protocol.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use PEP 649 native lazy annotations instead of from __future__ import annotations (Python 3.14+)
All public functions and classes must include type hints; enforce via mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Create new objects instead of mutating existing ones; enforce immutability patterns
Use Pydantic v2 models (BaseModel, model_validator, ConfigDict) for data validation
Validate at system boundaries (user input, external APIs, config files); never silently swallow errors
Enforce 88-character line length (via ruff configuration)

Files:

  • tests/unit/providers/test_capabilities.py
  • src/ai_company/providers/errors.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_protocol.py
  • src/ai_company/providers/capabilities.py
  • src/ai_company/providers/models.py
  • src/ai_company/providers/__init__.py
  • tests/unit/providers/conftest.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/protocol.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain minimum 80% code coverage, enforced in CI via pytest-cov
Set test timeout to 30 seconds per test; use asyncio_mode = 'auto' for async tests without manual @pytest.mark.asyncio

Files:

  • tests/unit/providers/test_capabilities.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_protocol.py
  • tests/unit/providers/conftest.py
{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Run ruff check and ruff format as pre-commit hooks on all Python files; enforce no commits to main branch

Files:

  • tests/unit/providers/test_capabilities.py
  • src/ai_company/providers/errors.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_protocol.py
  • src/ai_company/providers/capabilities.py
  • src/ai_company/providers/models.py
  • src/ai_company/providers/__init__.py
  • tests/unit/providers/conftest.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/protocol.py
🧠 Learnings (3)
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

  • tests/unit/providers/test_capabilities.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/conftest.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.

Applied to files:

  • tests/unit/providers/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)

Applied to files:

  • tests/unit/providers/conftest.py
🧬 Code graph analysis (7)
tests/unit/providers/test_capabilities.py (2)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (10-83)
tests/unit/providers/conftest.py (2)
  • ModelCapabilitiesFactory (93-105)
  • sample_model_capabilities (226-239)
tests/unit/providers/test_models.py (2)
src/ai_company/providers/enums.py (3)
  • FinishReason (15-22)
  • MessageRole (6-12)
  • StreamEventType (25-32)
src/ai_company/providers/models.py (8)
  • ChatMessage (102-170)
  • CompletionConfig (173-213)
  • CompletionResponse (216-265)
  • StreamChunk (268-346)
  • TokenUsage (12-42)
  • ToolCall (67-83)
  • ToolDefinition (45-64)
  • ToolResult (86-99)
tests/unit/providers/test_protocol.py (6)
src/ai_company/providers/base.py (7)
  • complete (42-72)
  • stream (74-104)
  • get_model_capabilities (106-119)
  • _do_complete (124-133)
  • _do_stream (136-145)
  • _do_get_model_capabilities (148-153)
  • compute_cost (158-211)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (10-83)
src/ai_company/providers/enums.py (3)
  • FinishReason (15-22)
  • MessageRole (6-12)
  • StreamEventType (25-32)
src/ai_company/providers/errors.py (1)
  • InvalidRequestError (97-100)
src/ai_company/providers/models.py (6)
  • ChatMessage (102-170)
  • CompletionConfig (173-213)
  • CompletionResponse (216-265)
  • StreamChunk (268-346)
  • TokenUsage (12-42)
  • ToolDefinition (45-64)
src/ai_company/providers/protocol.py (4)
  • CompletionProvider (22-81)
  • complete (30-49)
  • stream (51-70)
  • get_model_capabilities (72-81)
src/ai_company/providers/models.py (1)
src/ai_company/providers/enums.py (3)
  • FinishReason (15-22)
  • MessageRole (6-12)
  • StreamEventType (25-32)
tests/unit/providers/conftest.py (2)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (10-83)
src/ai_company/providers/models.py (8)
  • ChatMessage (102-170)
  • CompletionConfig (173-213)
  • CompletionResponse (216-265)
  • StreamChunk (268-346)
  • TokenUsage (12-42)
  • ToolCall (67-83)
  • ToolDefinition (45-64)
  • ToolResult (86-99)
src/ai_company/providers/base.py (3)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (10-83)
src/ai_company/providers/errors.py (1)
  • InvalidRequestError (97-100)
src/ai_company/providers/models.py (6)
  • ChatMessage (102-170)
  • CompletionConfig (173-213)
  • CompletionResponse (216-265)
  • StreamChunk (268-346)
  • TokenUsage (12-42)
  • ToolDefinition (45-64)
src/ai_company/providers/protocol.py (3)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (10-83)
src/ai_company/providers/models.py (5)
  • ChatMessage (102-170)
  • CompletionConfig (173-213)
  • CompletionResponse (216-265)
  • StreamChunk (268-346)
  • ToolDefinition (45-64)
src/ai_company/providers/base.py (3)
  • complete (42-72)
  • stream (74-104)
  • get_model_capabilities (106-119)
🔇 Additional comments (19)
src/ai_company/providers/errors.py (1)

1-124: LGTM!

The error hierarchy is well-designed with:

  • Immutable context via MappingProxyType
  • Clear is_retryable semantics per error type
  • Sensitive key redaction in __str__
  • Proper inheritance and Google-style docstrings
tests/unit/providers/test_models.py (1)

1-512: LGTM!

Comprehensive test coverage for all provider domain models with:

  • Proper @pytest.mark.unit markers on all test classes
  • Module-level timeout of 30 seconds
  • Validation tests (boundaries, constraints, cross-field rules)
  • Immutability tests (frozen model config)
  • Factory usage tests
  • JSON round-trip verification
  • Uses fake model names ("test", "test-model") per repository guidelines
tests/unit/providers/test_capabilities.py (1)

1-157: LGTM!

Well-structured test suite for ModelCapabilities with:

  • Proper @pytest.mark.unit marker and 30-second timeout
  • Coverage of all field validations (empty/negative/zero constraints)
  • Cross-field constraint tests (max_output_tokens vs max_context_tokens, streaming tool call dependencies)
  • Immutability and JSON round-trip verification
  • Uses fake model names ("test-model", "basic-model") per repository guidelines
tests/unit/providers/test_protocol.py (1)

1-282: LGTM!

Comprehensive test coverage for the provider protocol and base ABC:

  • Protocol structural typing with isinstance checks
  • Async method delegation tests for complete, stream, get_model_capabilities
  • Validation tests for empty messages and blank model IDs
  • compute_cost edge cases (zero, large, negative inputs, rounding)
  • ABC instantiation prevention and partial implementation rejection
  • Uses fake model names per repository guidelines
src/ai_company/providers/protocol.py (1)

1-81: LGTM!

Clean protocol definition with:

  • @runtime_checkable decorator for structural typing with isinstance checks
  • Well-documented async methods with Google-style docstrings
  • Proper type hints using domain models from the provider layer
  • Correct use of AsyncIterator[StreamChunk] for streaming
src/ai_company/providers/models.py (2)

80-83: Note: Mutable default values in frozen models.

The arguments field uses default_factory=dict, which creates a new mutable dict for each instance. While the model is frozen (preventing reassignment of arguments), the dict itself is still mutable. If callers attempt to modify arguments after construction, they could inadvertently mutate state.

This is likely acceptable for the current use case since the model is meant to be immutable and callers should treat it as such, but be aware that Pydantic's frozen=True only prevents attribute reassignment, not deep immutability.


1-346: LGTM!

Excellent domain model design with:

  • Strict immutability via ConfigDict(frozen=True) on all models
  • Comprehensive cross-field validation (e.g., TokenUsage total check, ChatMessage role constraints, StreamChunk event-type validation)
  • Clear Google-style docstrings with attribute documentation
  • Proper use of Pydantic v2 features (model_validator, Field constraints)
  • NotBlankStr for enforcing non-empty identifiers
src/ai_company/providers/capabilities.py (1)

1-83: LGTM!

Well-designed capability descriptor with:

  • Comprehensive field set for routing decisions (context limits, feature flags, costs)
  • Sensible defaults (supports_streaming=True, supports_tools=False)
  • Cross-field validation ensuring consistency (max_output_tokens <= max_context_tokens, streaming tool calls require both tools and streaming)
  • Immutability via frozen=True
  • Google-style docstrings with clear attribute documentation
src/ai_company/providers/__init__.py (1)

1-57: LGTM!

Clean public API consolidation with:

  • Complete export of all public types (protocols, base classes, models, enums, errors)
  • Alphabetically sorted __all__ for discoverability
  • Clear module docstring describing the package purpose
tests/unit/providers/conftest.py (4)

1-20: LGTM!

Imports are correctly organized with the appropriate # noqa: TC003 comment for AsyncIterator since it's used in runtime type hints.


24-105: Well-structured factories following polyfactory patterns.

The factories are correctly configured:

  • TokenUsageFactory has consistent values (100 + 50 = 150 total_tokens)
  • CompletionResponseFactory.usage = TokenUsageFactory correctly references the sub-factory class (polyfactory will call .build() automatically)
  • StreamChunkFactory defaults align with CONTENT_DELTA event type requirements
  • ModelCapabilitiesFactory uses fake identifiers ("test-model", "test-provider") as required

The # type: ignore[var-annotated] and # noqa: RUF012 comments are appropriate for mutable class-level attributes in the ModelFactory pattern.


111-152: FakeProvider correctly implements the protocol structure.

The implementation properly:

  • Records calls for test verification via complete_calls and stream_calls
  • Returns factory-built responses from complete
  • Yields properly-shaped StreamChunk objects from stream (CONTENT_DELTA followed by DONE)
  • Returns injectable capabilities from get_model_capabilities

The async generator pattern in stream (wrapping an inner _gen() function) correctly produces an AsyncIterator[StreamChunk].


215-239: Fixtures now use fake model identifiers.

The sample_completion_response and sample_model_capabilities fixtures correctly use "test-model" and "test-provider" instead of real provider identifiers, addressing the previous review feedback.

src/ai_company/providers/base.py (6)

1-23: LGTM!

Module docstring clearly explains the purpose and the module-level _COST_ROUNDING_PRECISION constant is well-documented.


26-72: Well-designed public API with proper validation.

The complete method correctly:

  • Validates inputs before delegating to the hook
  • Documents the validation behavior in Raises section
  • Returns the hook result unmodified (no unnecessary wrapping)

74-104: stream method follows the same validation pattern.

The return await self._do_stream(...) pattern is correct—it expects subclass _do_stream implementations to be async methods returning an AsyncIterator (e.g., via an inner generator function), which aligns with the FakeProvider pattern in tests.


123-153: Abstract hooks properly defined.

The three abstract methods have clear signatures matching the public API, with brief but sufficient docstrings for implementers.


157-211: Robust cost computation with input validation.

The compute_cost method:

  • Validates all four parameters for non-negative values
  • Includes context in error messages for debugging
  • Uses the module-level precision constant for rounding
  • Correctly computes total_tokens as input_tokens + output_tokens

213-239: Validation helpers are correct.

_validate_messages properly rejects empty lists, and _validate_model handles both empty strings and whitespace-only strings with if not model or not model.strip().

…al reviewers

- Case-insensitive secret redaction in ProviderError.__str__ with tests
- Context dict shallow-copy before MappingProxyType wrap (immutability)
- RateLimitError.retry_after validation (rejects negative/NaN/infinity)
- _validate_model guards against None input with isinstance check
- compute_cost rejects non-finite (inf/NaN) cost rates
- Hook docstrings specify ProviderError wrapping contract
- _do_stream docstring clarifies return-not-yield pattern
- ChatMessage validator refactored to exhaustive match/case
- Removed forward reference to nonexistent LiteLLMProvider
- Fixed tool_call_delta docstring from "Partial" to accurate description
- Temperature docstring notes provider-specific range caveat
- Removed fragile issue #9 reference from errors.py docstring
- Added ProviderError docstring note about redaction behavior
- FakeProvider now records tools/config parameters
- Tests: redaction, context immutability, retry_after validation,
  tools/config forwarding, None model, inf/NaN cost rates,
  MAX_TOKENS/TOOL_USE empty response, assistant with content+tool_calls
Copilot AI review requested due to automatic review settings March 1, 2026 09:40
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +195 to +202
cost_per_1k_input: Cost per 1 000 input tokens in USD (>= 0).
cost_per_1k_output: Cost per 1 000 output tokens in USD (>= 0).

Returns:
Populated ``TokenUsage`` with computed cost.

Raises:
InvalidRequestError: If any parameter is negative.
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_cost() rejects non-finite rates (NaN/inf) in addition to negatives, but the docstring says it only raises InvalidRequestError when a parameter is negative. Please update the docstring to reflect the actual validation behavior (non-negative and finite).

Suggested change
cost_per_1k_input: Cost per 1 000 input tokens in USD (>= 0).
cost_per_1k_output: Cost per 1 000 output tokens in USD (>= 0).
Returns:
Populated ``TokenUsage`` with computed cost.
Raises:
InvalidRequestError: If any parameter is negative.
cost_per_1k_input: Cost per 1 000 input tokens in USD (finite and >= 0).
cost_per_1k_output: Cost per 1 000 output tokens in USD (finite and >= 0).
Returns:
Populated ``TokenUsage`` with computed cost.
Raises:
InvalidRequestError: If any parameter is negative or non-finite
(for example, NaN or infinity).

Copilot uses AI. Check for mistakes.
Comment on lines +23 to +24
_COST_ROUNDING_PRECISION: int = 10
"""Decimal places for cost rounding to avoid floating-point dust."""
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces a new rounding-precision constant with the same value/purpose as ai_company.constants.BUDGET_ROUNDING_PRECISION. To avoid duplicated “10” precision definitions drifting over time, consider reusing a shared constant (or promoting a generic rounding constant in ai_company.constants) instead of defining _COST_ROUNDING_PRECISION locally.

Copilot uses AI. Check for mistakes.
import pytest

from ai_company.providers.base import (
_COST_ROUNDING_PRECISION,
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test imports _COST_ROUNDING_PRECISION, which is a private (underscore-prefixed) implementation detail. Either make the constant part of the public API (no leading underscore) or avoid importing it in the test by asserting on the rounded result directly (e.g., use the expected precision inline).

Suggested change
_COST_ROUNDING_PRECISION,

Copilot uses AI. Check for mistakes.
- Reuse shared BUDGET_ROUNDING_PRECISION from ai_company.constants
  instead of defining local _COST_ROUNDING_PRECISION duplicate
- Update compute_cost docstring to reflect finite+non-negative validation
- Import public constant in tests instead of private symbol
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/providers/models.py`:
- Around line 61-64: Add a short note to the model docstrings explaining that
dict fields like parameters_schema (in the model that defines Field(...)) and
the arguments field in ToolCall are shallowly immutable under the frozen
Pydantic model—reassignment is prevented but their contents can still be
mutated—so callers should treat these dicts as read-only (or copy them before
modifying) to avoid surprising side effects; update the docstring for the class
that declares parameters_schema and the ToolCall docstring to include this
guidance.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5af5a7f and f9ae175.

📒 Files selected for processing (8)
  • src/ai_company/providers/base.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/models.py
  • src/ai_company/providers/protocol.py
  • tests/unit/providers/conftest.py
  • tests/unit/providers/test_errors.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_protocol.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use PEP 649 native lazy annotations instead of from __future__ import annotations (Python 3.14+)
All public functions and classes must include type hints; enforce via mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Create new objects instead of mutating existing ones; enforce immutability patterns
Use Pydantic v2 models (BaseModel, model_validator, ConfigDict) for data validation
Validate at system boundaries (user input, external APIs, config files); never silently swallow errors
Enforce 88-character line length (via ruff configuration)

Files:

  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_protocol.py
  • src/ai_company/providers/models.py
  • src/ai_company/providers/base.py
  • tests/unit/providers/conftest.py
  • tests/unit/providers/test_errors.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/protocol.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Mark tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow
Maintain minimum 80% code coverage, enforced in CI via pytest-cov
Set test timeout to 30 seconds per test; use asyncio_mode = 'auto' for async tests without manual @pytest.mark.asyncio

Files:

  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_protocol.py
  • tests/unit/providers/conftest.py
  • tests/unit/providers/test_errors.py
{src/**/*.py,tests/**/*.py}

📄 CodeRabbit inference engine (CLAUDE.md)

Run ruff check and ruff format as pre-commit hooks on all Python files; enforce no commits to main branch

Files:

  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_protocol.py
  • src/ai_company/providers/models.py
  • src/ai_company/providers/base.py
  • tests/unit/providers/conftest.py
  • tests/unit/providers/test_errors.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/protocol.py
🧠 Learnings (3)
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

  • tests/unit/providers/test_models.py
  • tests/unit/providers/conftest.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.

Applied to files:

  • tests/unit/providers/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)

Applied to files:

  • tests/unit/providers/conftest.py
🧬 Code graph analysis (3)
src/ai_company/providers/models.py (1)
src/ai_company/providers/enums.py (3)
  • FinishReason (15-22)
  • MessageRole (6-12)
  • StreamEventType (25-32)
tests/unit/providers/test_errors.py (1)
src/ai_company/providers/errors.py (7)
  • ContentFilterError (119-122)
  • InvalidRequestError (113-116)
  • ModelNotFoundError (107-110)
  • ProviderError (22-69)
  • ProviderInternalError (137-140)
  • ProviderTimeoutError (125-128)
  • RateLimitError (78-104)
src/ai_company/providers/protocol.py (2)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (10-83)
src/ai_company/providers/models.py (5)
  • ChatMessage (102-174)
  • CompletionConfig (177-218)
  • CompletionResponse (221-270)
  • StreamChunk (273-351)
  • ToolDefinition (45-64)
🔇 Additional comments (8)
src/ai_company/providers/protocol.py (1)

1-80: LGTM!

The protocol definition is well-structured with proper runtime-checkable decorator, clear method signatures, and comprehensive Google-style docstrings. The interface cleanly separates the engine-facing contract from concrete implementations.

src/ai_company/providers/errors.py (1)

1-140: LGTM!

The error hierarchy is well-designed with:

  • Proper immutability via MappingProxyType with defensive copy
  • Case-insensitive sensitive key redaction
  • Appropriate is_retryable flags for retry logic
  • Thorough validation of retry_after in RateLimitError
src/ai_company/providers/models.py (1)

1-351: Well-structured domain models with comprehensive validation.

The models correctly implement:

  • Frozen configs for immutability
  • Cross-field validation via model_validator(mode="after")
  • Exhaustive role/event-type matching with fallback error handling
  • Clear separation between token usage, tools, messages, and streaming concerns
src/ai_company/providers/base.py (1)

1-273: LGTM!

The ABC design is clean with:

  • Clear separation between public API (with validation) and abstract hooks
  • Comprehensive input validation raising InvalidRequestError with context
  • Proper finite/non-negative checks in compute_cost
  • Use of shared BUDGET_ROUNDING_PRECISION constant for consistency

The streaming design correctly documents that _do_stream implementations must return an AsyncIterator rather than yielding directly, which matches the await pattern in the public stream method.

tests/unit/providers/test_errors.py (1)

1-212: LGTM!

Comprehensive test coverage for the error hierarchy including:

  • Base class behavior and inheritance
  • is_retryable flags for all error types
  • Redaction of sensitive keys (case-insensitive)
  • Context immutability via MappingProxyType
  • RateLimitError validation for negative/NaN/inf retry_after
tests/unit/providers/test_models.py (1)

1-545: LGTM!

Thorough test coverage for all domain models with:

  • Validation rule enforcement (cross-field constraints, bounds)
  • Immutability verification via frozen model mutations
  • Factory-based instance creation
  • JSON round-trip serialization
  • __all__ export verification

All tests use fake model names like "test" as required.

tests/unit/providers/test_protocol.py (1)

1-369: LGTM!

Excellent test coverage for protocol and base class including:

  • Structural typing verification via isinstance checks
  • Input validation (empty messages, blank/None model)
  • Tools and config forwarding through hooks
  • compute_cost validation (negative tokens, negative/inf/NaN rates)
  • Rounding precision using BUDGET_ROUNDING_PRECISION constant
  • ABC instantiation prevention and partial implementation rejection
tests/unit/providers/conftest.py (1)

1-258: LGTM!

Well-organized test infrastructure with:

  • Comprehensive factories for all domain models
  • FakeProvider correctly satisfying the CompletionProvider protocol
  • Call recording for verification in protocol tests
  • All fixtures using fake model names ("test-model", "test-provider")

Based on learnings: Tests correctly use fake model names as required.

Copilot AI review requested due to automatic review settings March 1, 2026 10:07
@Aureliolo Aureliolo merged commit 3e23d64 into main Mar 1, 2026
11 of 12 checks passed
@Aureliolo Aureliolo deleted the feat/provider-interface branch March 1, 2026 10:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +50 to +57
cost_per_1k_input: float = Field(
ge=0.0,
description="Cost per 1k input tokens in USD",
)
cost_per_1k_output: float = Field(
ge=0.0,
description="Cost per 1k output tokens in USD",
)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cost_per_1k_input / cost_per_1k_output currently only enforce >= 0.0, which allows float('inf') values. Since downstream cost computation explicitly requires finite rates (see BaseCompletionProvider.compute_cost), these fields should also validate as finite to fail fast on invalid model metadata.

Copilot uses AI. Check for mistakes.
Comment on lines +180 to +184
and self.content is None
and not self.tool_calls
):
msg = f"{self.role} messages must have content or tool_calls"
raise ValueError(msg)
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For system/user messages, tool_calls are forbidden, but the validation error says they "must have content or tool_calls". That message is misleading because tool_calls can never satisfy the requirement for those roles; consider emitting a role-specific message (e.g., "system messages must have content").

Copilot uses AI. Check for mistakes.
r"""Provider-specific streaming completion.

Implementations must *return* an ``AsyncIterator`` (not ``yield``
directly), since the caller ``await``\s this coroutine to obtain
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring contains the text await\s, which renders literally as "await\s" and looks like an escaped typo. Replace it with normal wording (e.g., "awaits") to avoid confusing readers and doc tooling.

Suggested change
directly), since the caller ``await``\s this coroutine to obtain
directly), since the caller awaits this coroutine to obtain

Copilot uses AI. Check for mistakes.
Aureliolo added a commit that referenced this pull request Mar 1, 2026
Add the "Employment Agency" — a swappable driver system with LiteLLM as
the default backend. Implements the concrete provider layer behind the
contracts designed in #86.

- LiteLLMDriver wrapping litellm.acompletion for streaming and
  non-streaming completions, model capability queries, and full
  exception mapping (10 LiteLLM exception types → ProviderError hierarchy)
- ProviderRegistry (immutable) mapping provider names to driver instances,
  built from config via from_config() with factory_overrides for testing
- Pure mapping functions (messages, tools, finish reasons, tool calls)
- 3 new driver error classes (DriverNotRegistered, DriverAlreadyRegistered,
  DriverFactoryNotFound)
- driver field on ProviderConfig (defaults to "litellm")
- Upgraded litellm 1.67.2 → 1.82.0 (fixes Python 3.14 compat, removes
  need for PYTHONUTF8 env var and deprecation warning filters)
- Removed unused F403 ruff ignore for __init__.py
- 235 new unit tests, all 1273 tests pass at 94% coverage
Aureliolo added a commit that referenced this pull request Mar 1, 2026
## Summary

- Implements the **Employment Agency** — a swappable driver system with
LiteLLM as the default backend behind the contracts from #86
- `LiteLLMDriver(BaseCompletionProvider)` wrapping `litellm.acompletion`
for streaming/non-streaming completions, model capabilities, and full
exception mapping (10 LiteLLM exception types → `ProviderError`
hierarchy)
- `ProviderRegistry` (immutable, `MappingProxyType`) mapping provider
names → driver instances, built from config via `from_config()` with
`factory_overrides` for testing/native SDK swaps
- Pure mapping functions for messages, tools, finish reasons, and tool
call extraction
- Upgraded litellm `1.67.2` → `1.82.0`, removing all Python 3.14
compatibility workarounds (warning filters, `PYTHONUTF8` env var, stale
`type: ignore`)
- Removed unused `F403` ruff ignore for `__init__.py`

## Architecture

```
Engine  ──>  ProviderRegistry  ──>  LiteLLMDriver (anthropic)
             ("Employment Agency")  LiteLLMDriver (openrouter)
                                    LiteLLMDriver (ollama)
                                    # Future: NativeAnthropicDriver, etc.
```

## New files

| File | Lines | Purpose |
|------|-------|---------|
| `providers/drivers/litellm_driver.py` | ~530 | LiteLLM-backed
completion driver |
| `providers/drivers/mappers.py` | ~170 | Pure message/tool/reason
mapping functions |
| `providers/registry.py` | ~185 | Immutable provider name → driver
registry |
| `providers/drivers/__init__.py` | ~20 | Sub-package exports |

## Modified files

| File | Change |
|------|--------|
| `config/schema.py` | Added `driver` field to `ProviderConfig` (default
`"litellm"`) |
| `providers/errors.py` | Added 3 driver error classes |
| `providers/__init__.py` | Added new exports |
| `pyproject.toml` | litellm `1.82.0`, removed warning filters and
`F403` ignore |

## Test plan

- [x] 235 new unit tests covering driver, mappers, registry, and
exception mapping
- [x] All 1273 tests pass (`uv run pytest tests/ -n auto`)
- [x] 94% coverage (`--cov-fail-under=80`)
- [x] Lint clean (`uv run ruff check src/ tests/`)
- [x] Format clean (`uv run ruff format --check src/ tests/`)
- [x] Type-check clean (`uv run mypy src/` — 0 errors in 64 files)
- [x] All pre-commit hooks pass

Closes #4
@coderabbitai coderabbitai bot mentioned this pull request Mar 1, 2026
4 tasks
Aureliolo added a commit that referenced this pull request Mar 1, 2026
## Summary

- Adds **39 integration tests** for the provider adapter layer,
completing the final unchecked acceptance criterion from #5:
_"Integration tests with mock/recorded API responses"_
- All source code was already implemented in PRs #86 and #88 — this PR
covers only the integration test suite
- Mocks at `litellm.acompletion` level using **real
`litellm.ModelResponse`** objects (not MagicMock), exercising actual
attribute access paths through `_map_response`, `_process_chunk`, and
`extract_tool_calls`

### Test files

| File | Tests | Coverage |
|------|-------|---------|
| `test_anthropic_pipeline.py` | 13 | Config→registry→complete/stream,
alias resolution, cost computation, streaming |
| `test_openrouter_pipeline.py` | 5 | Custom base_url forwarding, model
prefixing, multi-model alias resolution |
| `test_ollama_pipeline.py` | 4 | No api_key, localhost base_url,
zero-cost models |
| `test_error_scenarios.py` | 9 | Rate limit (429 + retry-after), auth
(401), timeout, connection, internal, unknown |
| `test_tool_calling_pipeline.py` | 8 | Single/multiple tool calls,
streaming accumulation, mixed text+tools, multi-turn |
| `conftest.py` | — | Config factories, real ModelResponse builders,
stream helpers |

### Verification

- `ruff check` — all passed
- `ruff format` — all formatted
- `mypy` — 0 errors (7 files)
- `pytest` — 1331 total tests pass, **94.49% coverage** (80% required)

Closes #5

## Test plan

- [ ] CI passes (lint + type-check + test + coverage)
- [ ] 39 integration tests pass under `pytest -m integration`
- [ ] No regressions in existing 1292 unit tests
- [ ] Coverage remains above 80% threshold
@coderabbitai coderabbitai bot mentioned this pull request Mar 1, 2026
5 tasks
Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Design unified provider interface (base classes, protocols)

2 participants