Conversation
Implement the typed contract layer between the agent engine and LLM providers (issue #3). Defines CompletionProvider Protocol for loose coupling and BaseCompletionProvider ABC for shared validation/cost computation. Includes full model hierarchy (ChatMessage, ToolDefinition, ToolCall, CompletionResponse, StreamChunk, TokenUsage), error hierarchy with is_retryable flags, and ModelCapabilities for routing decisions. 118 unit tests, 100% coverage on providers module.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a provider layer: runtime-checkable CompletionProvider Protocol, BaseCompletionProvider ABC, immutable provider-domain Pydantic models, provider error hierarchy, package exports, and comprehensive unit tests and fixtures. No behavioral changes to existing runtime code outside this new provider surface. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Application/Client
participant Base as BaseCompletionProvider
participant Concrete as Concrete Provider
participant Models as Pydantic Models
participant Stream as StreamChunk / CompletionResponse
Client->>Base: complete(messages, model, tools?, config?)
Base->>Models: _validate_messages(messages)
Models-->>Base: valid / raise InvalidRequestError
Base->>Concrete: _do_complete(messages, model, tools, config)
Concrete->>Concrete: call external provider API
Concrete-->>Base: CompletionResponse (raw)
Base->>Models: construct/validate CompletionResponse
Models-->>Stream: validated CompletionResponse
Stream-->>Client: CompletionResponse
Client->>Base: stream(messages, model, tools?, config?)
Base->>Models: _validate_messages(messages)
Base->>Concrete: _do_stream(...)
Concrete-->>Base: AsyncIterator[StreamChunk]
Base-->>Client: AsyncIterator[StreamChunk]
loop per chunk
Client->>Stream: iterate
Stream-->>Client: StreamChunk (validated)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request establishes a foundational, type-safe interface for interacting with various Large Language Model (LLM) providers. It introduces a clear separation of concerns through protocols and abstract base classes, defines a comprehensive set of data models for chat completions and tool usage, and implements a structured error handling mechanism. This work is crucial for building a flexible and extensible agent engine capable of integrating with diverse LLM services while maintaining consistency and reliability. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a well-designed and comprehensive unified provider interface for LLM completions. The use of a Protocol for loose coupling and an ABC for shared logic is a solid architectural choice. The data models are thorough and include robust validation, and the error hierarchy with is_retryable flags is a great feature for building resilient systems. The accompanying tests are extensive and cover many edge cases.
I have a couple of minor suggestions for improvement: one regarding a potentially confusing code convention in the documentation, and another to enhance the robustness of a model validator.
| ## Code Conventions | ||
|
|
||
| - **No `from __future__ import annotations`** — Python 3.14 has PEP 649 | ||
| - **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14 |
There was a problem hiding this comment.
The documented syntax except A, B: for multiple exceptions is from Python 2 and is invalid in Python 3. The correct syntax in Python 3 is to use a tuple: except (A, B):.
Additionally, I couldn't find a reference to PEP 758. This might be a typo.
To prevent confusion for developers, it would be good to update this convention.
| - **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14 | |
| - **Multiple exceptions**: use `except (A, B):` to catch multiple exception types. |
src/ai_company/providers/models.py
Outdated
| match self.event_type: | ||
| case StreamEventType.CONTENT_DELTA: | ||
| if self.content is None: | ||
| msg = "content_delta event must include content" | ||
| raise ValueError(msg) | ||
| case StreamEventType.TOOL_CALL_DELTA: | ||
| if self.tool_call_delta is None: | ||
| msg = "tool_call_delta event must include tool_call_delta" | ||
| raise ValueError(msg) | ||
| case StreamEventType.USAGE: | ||
| if self.usage is None: | ||
| msg = "usage event must include usage" | ||
| raise ValueError(msg) | ||
| case StreamEventType.ERROR: | ||
| if self.error_message is None: | ||
| msg = "error event must include error_message" | ||
| raise ValueError(msg) |
There was a problem hiding this comment.
The match statement in _validate_event_fields is not exhaustive. It's missing a case for StreamEventType.DONE. While DONE events currently require no extra fields and thus pass validation by falling through, making this explicit would improve readability and robustness.
Adding a case for DONE and a default case _: to handle any future unhandled event types would make the validator more maintainable.
| match self.event_type: | |
| case StreamEventType.CONTENT_DELTA: | |
| if self.content is None: | |
| msg = "content_delta event must include content" | |
| raise ValueError(msg) | |
| case StreamEventType.TOOL_CALL_DELTA: | |
| if self.tool_call_delta is None: | |
| msg = "tool_call_delta event must include tool_call_delta" | |
| raise ValueError(msg) | |
| case StreamEventType.USAGE: | |
| if self.usage is None: | |
| msg = "usage event must include usage" | |
| raise ValueError(msg) | |
| case StreamEventType.ERROR: | |
| if self.error_message is None: | |
| msg = "error event must include error_message" | |
| raise ValueError(msg) | |
| match self.event_type: | |
| case StreamEventType.CONTENT_DELTA: | |
| if self.content is None: | |
| msg = "content_delta event must include content" | |
| raise ValueError(msg) | |
| case StreamEventType.TOOL_CALL_DELTA: | |
| if self.tool_call_delta is None: | |
| msg = "tool_call_delta event must include tool_call_delta" | |
| raise ValueError(msg) | |
| case StreamEventType.USAGE: | |
| if self.usage is None: | |
| msg = "usage event must include usage" | |
| raise ValueError(msg) | |
| case StreamEventType.ERROR: | |
| if self.error_message is None: | |
| msg = "error event must include error_message" | |
| raise ValueError(msg) | |
| case StreamEventType.DONE: | |
| pass |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/providers/conftest.py`:
- Around line 215-239: Update the fixtures to avoid using real provider/model
identifiers: in the sample_completion_response fixture replace
model="claude-sonnet-4-6" with a fake model name (e.g., "test-model-1") and in
sample_model_capabilities set model_id to a fake model id (e.g., "test-model-1")
and provider to a fake provider string (e.g., "test-provider"); ensure you only
change the model and provider fields in the sample_completion_response and
sample_model_capabilities fixtures so tests use non-production identifiers.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (15)
CLAUDE.mdsrc/ai_company/providers/__init__.pysrc/ai_company/providers/base.pysrc/ai_company/providers/capabilities.pysrc/ai_company/providers/enums.pysrc/ai_company/providers/errors.pysrc/ai_company/providers/models.pysrc/ai_company/providers/protocol.pytests/unit/providers/__init__.pytests/unit/providers/conftest.pytests/unit/providers/test_capabilities.pytests/unit/providers/test_enums.pytests/unit/providers/test_errors.pytests/unit/providers/test_models.pytests/unit/providers/test_protocol.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use PEP 649 native lazy annotations instead offrom __future__ import annotations(Python 3.14+)
All public functions and classes must include type hints; enforce via mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Create new objects instead of mutating existing ones; enforce immutability patterns
Use Pydantic v2 models (BaseModel, model_validator, ConfigDict) for data validation
Validate at system boundaries (user input, external APIs, config files); never silently swallow errors
Enforce 88-character line length (via ruff configuration)
Files:
tests/unit/providers/test_capabilities.pytests/unit/providers/test_errors.pytests/unit/providers/test_models.pysrc/ai_company/providers/base.pysrc/ai_company/providers/capabilities.pytests/unit/providers/test_enums.pysrc/ai_company/providers/enums.pysrc/ai_company/providers/errors.pytests/unit/providers/test_protocol.pysrc/ai_company/providers/models.pytests/unit/providers/conftest.pysrc/ai_company/providers/protocol.pysrc/ai_company/providers/__init__.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow
Maintain minimum 80% code coverage, enforced in CI via pytest-cov
Set test timeout to 30 seconds per test; use asyncio_mode = 'auto' for async tests without manual@pytest.mark.asyncio
Files:
tests/unit/providers/test_capabilities.pytests/unit/providers/test_errors.pytests/unit/providers/test_models.pytests/unit/providers/test_enums.pytests/unit/providers/test_protocol.pytests/unit/providers/conftest.py
{src/**/*.py,tests/**/*.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Run ruff check and ruff format as pre-commit hooks on all Python files; enforce no commits to main branch
Files:
tests/unit/providers/test_capabilities.pytests/unit/providers/test_errors.pytests/unit/providers/test_models.pysrc/ai_company/providers/base.pysrc/ai_company/providers/capabilities.pytests/unit/providers/test_enums.pysrc/ai_company/providers/enums.pysrc/ai_company/providers/errors.pytests/unit/providers/test_protocol.pysrc/ai_company/providers/models.pytests/unit/providers/conftest.pysrc/ai_company/providers/protocol.pysrc/ai_company/providers/__init__.py
🧠 Learnings (13)
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Applied to files:
tests/unit/providers/test_capabilities.pytests/unit/providers/test_models.pytests/unit/providers/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to README.md : Update README.md for significant feature changes
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-28T13:41:23.377Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Line length: 100 characters (enforced by Ruff)
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : After making code changes, run `ruff format .` to format code with Ruff
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-28T13:41:23.377Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : Enforce 88-character line length (via ruff configuration)
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-28T13:41:23.377Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : All public functions and classes must include type hints; enforce via mypy strict mode
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-28T13:41:23.377Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-28T13:41:23.377Z
Learning: Applies to **/*.py : Use PEP 649 native lazy annotations instead of `from __future__ import annotations` (Python 3.14+)
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : Python version: 3.14+
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : After making code changes, run `ruff check .` to lint code (use `ruff check --fix .` to auto-fix)
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to **/*.py : NEVER add `from __future__ import annotations` - Python 3.14+ has native support for deferred annotation evaluation and does not require this import
Applied to files:
CLAUDE.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to **/*.py : Maintain line length at 100 characters per Ruff formatting rules.
Applied to files:
CLAUDE.md
🧬 Code graph analysis (8)
tests/unit/providers/test_capabilities.py (2)
src/ai_company/providers/capabilities.py (1)
ModelCapabilities(8-55)tests/unit/providers/conftest.py (2)
ModelCapabilitiesFactory(93-105)sample_model_capabilities(226-239)
tests/unit/providers/test_errors.py (1)
src/ai_company/providers/errors.py (8)
AuthenticationError(39-42)ContentFilterError(73-76)InvalidRequestError(67-70)ModelNotFoundError(61-64)ProviderConnectionError(85-88)ProviderError(11-36)ProviderInternalError(91-94)ProviderTimeoutError(79-82)
src/ai_company/providers/base.py (3)
src/ai_company/providers/capabilities.py (1)
ModelCapabilities(8-55)src/ai_company/providers/errors.py (1)
InvalidRequestError(67-70)src/ai_company/providers/models.py (5)
CompletionConfig(159-199)CompletionResponse(202-227)StreamChunk(230-281)TokenUsage(12-42)ToolDefinition(45-64)
tests/unit/providers/test_protocol.py (5)
src/ai_company/providers/base.py (8)
BaseCompletionProvider(23-185)complete(37-65)stream(67-95)get_model_capabilities(97-106)_do_complete(111-120)_do_stream(123-132)_do_get_model_capabilities(135-140)compute_cost(145-171)src/ai_company/providers/capabilities.py (1)
ModelCapabilities(8-55)src/ai_company/providers/enums.py (3)
FinishReason(15-22)MessageRole(6-12)StreamEventType(25-32)src/ai_company/providers/models.py (6)
ChatMessage(102-156)CompletionConfig(159-199)CompletionResponse(202-227)StreamChunk(230-281)TokenUsage(12-42)ToolDefinition(45-64)src/ai_company/providers/protocol.py (4)
CompletionProvider(21-81)complete(30-49)stream(51-70)get_model_capabilities(72-81)
src/ai_company/providers/models.py (1)
src/ai_company/providers/enums.py (3)
FinishReason(15-22)MessageRole(6-12)StreamEventType(25-32)
tests/unit/providers/conftest.py (3)
src/ai_company/providers/capabilities.py (1)
ModelCapabilities(8-55)src/ai_company/providers/enums.py (3)
FinishReason(15-22)MessageRole(6-12)StreamEventType(25-32)src/ai_company/providers/models.py (8)
ChatMessage(102-156)CompletionConfig(159-199)CompletionResponse(202-227)StreamChunk(230-281)TokenUsage(12-42)ToolCall(67-83)ToolDefinition(45-64)ToolResult(86-99)
src/ai_company/providers/protocol.py (4)
src/ai_company/providers/capabilities.py (1)
ModelCapabilities(8-55)src/ai_company/providers/models.py (5)
ChatMessage(102-156)CompletionConfig(159-199)CompletionResponse(202-227)StreamChunk(230-281)ToolDefinition(45-64)src/ai_company/providers/base.py (3)
complete(37-65)stream(67-95)get_model_capabilities(97-106)tests/unit/providers/conftest.py (3)
complete(119-128)stream(130-149)get_model_capabilities(151-152)
src/ai_company/providers/__init__.py (6)
src/ai_company/providers/base.py (1)
BaseCompletionProvider(23-185)src/ai_company/providers/capabilities.py (1)
ModelCapabilities(8-55)src/ai_company/providers/enums.py (3)
FinishReason(15-22)MessageRole(6-12)StreamEventType(25-32)src/ai_company/providers/errors.py (9)
AuthenticationError(39-42)ContentFilterError(73-76)InvalidRequestError(67-70)ModelNotFoundError(61-64)ProviderConnectionError(85-88)ProviderError(11-36)ProviderInternalError(91-94)ProviderTimeoutError(79-82)RateLimitError(45-58)src/ai_company/providers/models.py (8)
ChatMessage(102-156)CompletionConfig(159-199)CompletionResponse(202-227)StreamChunk(230-281)TokenUsage(12-42)ToolCall(67-83)ToolDefinition(45-64)ToolResult(86-99)src/ai_company/providers/protocol.py (1)
CompletionProvider(21-81)
🔇 Additional comments (16)
src/ai_company/providers/enums.py (1)
1-32: LGTM!Clean enum definitions with appropriate docstrings and string values that align with LLM provider conventions. Using
StrEnumis ideal for JSON serialization compatibility.CLAUDE.md (1)
51-51: LGTM!Documentation correctly reflects PEP 758 syntax for Python 3.14+. This aligns with the project's Python version requirements and ruff enforcement.
src/ai_company/providers/capabilities.py (1)
1-55: LGTM!Well-designed capability model with:
- Appropriate validation constraints (
gt=0for token limits,ge=0.0for costs)- Sensible defaults for capability flags
ConfigDict(frozen=True)for immutability- Comprehensive docstring following Google style
tests/unit/providers/test_enums.py (1)
32-81: LGTM!The remaining test classes (
TestFinishReason,TestStreamEventType) are well-structured with proper markers and comprehensive coverage of enum behavior.tests/unit/providers/test_errors.py (1)
1-144: LGTM!Comprehensive error hierarchy tests covering:
- Base
ProviderErrorbehavior (message, context, string representation, retryability)- Subclass relationships via
issubclasschecks- Parametrized
is_retryablevalidation for all 8 error typesRateLimitError-specificretry_afterattribute- String formatting consistency across all error types
tests/unit/providers/test_models.py (1)
1-424: LGTM!Excellent test coverage for all provider domain models:
- Validation rules (positive/negative cases, boundary conditions)
- Immutability via
frozen=Trueenforcement- Role-based constraints for
ChatMessage- Event-type-specific validation for
StreamChunk- Factory integration tests
- JSON round-trip serialization
Tests use appropriate fake model names (e.g.,
"test") per coding guidelines.tests/unit/providers/test_protocol.py (1)
1-194: LGTM!Comprehensive protocol and ABC tests covering:
- Structural typing with
isinstancechecks forCompletionProvider- Async method behavior (complete, stream, get_model_capabilities)
- Delegation pattern from public API to
_do_*hooks- Input validation (empty messages rejection)
compute_costhelper with appropriate floating-point tolerance (1e-9)- Protocol conformance verification for concrete implementations
src/ai_company/providers/base.py (1)
1-185: LGTM!Well-designed abstract base class with:
- Clear separation between public API and provider-specific hooks
- Input validation via
_validate_messagesraisingInvalidRequestError- Static
compute_costhelper with floating-point precision handling (round(cost, 10))- Comprehensive Google-style docstrings with Args, Returns, and Raises sections
- All methods under 50 lines per coding guidelines
The design cleanly implements the template method pattern, allowing concrete providers to focus on provider-specific logic while the base class handles common validation and cost computation.
tests/unit/providers/test_capabilities.py (2)
1-119: LGTM!The test structure is well-organized with comprehensive coverage of:
- Required field validation (empty model_id, empty provider)
- Numeric constraint validation (zero/negative tokens and costs)
- Boolean defaults verification
- Immutability enforcement
- Factory integration
- JSON round-trip serialization
The
@pytest.mark.unitdecorator and timeout configuration comply with coding guidelines.
17-21:⚠️ Potential issue | 🟡 MinorTest uses real model ID instead of fake model name.
The test asserts against
"claude-sonnet-4-6"which is a real Anthropic model identifier. The assertion depends on thesample_model_capabilitiesfixture which also uses the real model ID.Based on learnings: "Tests must use fake model names (e.g.,
test-model:8b,fake-writer:latest)—never use real model IDs fromRECOMMENDED_MODELS."Consider updating the fixture in
conftest.pyand this assertion to use a fake model name like"test-model:200k".⛔ Skipped due to learnings
Learnt from: CR Repo: Aureliolo/story-factory PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-02-26T17:43:50.902Z Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.Learnt from: CR Repo: Aureliolo/story-factory PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-01-31T13:51:16.868Z Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.src/ai_company/providers/protocol.py (1)
1-81: LGTM!The
CompletionProviderProtocol is well-designed:
@runtime_checkableenablesisinstancechecks for structural typing- Method signatures align with
BaseCompletionProviderinbase.py- Google-style docstrings are complete with Args and Returns sections
- Type hints are comprehensive and use modern union syntax (
|)The
noqacomments appropriately suppress type-checking import warnings for types needed in runtime-checkable Protocol signatures.src/ai_company/providers/__init__.py (1)
1-74: LGTM!The module's public API is well-organized:
- Comprehensive module docstring categorizing exports (Protocols, Models, Capabilities, Enums, Errors)
- All 23 exports in
__all__are alphabetically sorted for discoverability- Imports and
__all__entries are consistentsrc/ai_company/providers/models.py (1)
1-281: LGTM!Comprehensive domain models with solid validation:
- TokenUsage: Enforces
total_tokens == input_tokens + output_tokensinvariant- ChatMessage: Role-specific constraints are thorough (tool messages require
tool_result, assistant can havetool_callsbut nottool_result, etc.)- StreamChunk: Event-type discriminator pattern with
matchstatement validates required fields per event type- All models are immutable (
frozen=True) per coding guidelines- Pydantic v2 patterns used correctly (
ConfigDict,model_validator,Field)tests/unit/providers/conftest.py (2)
111-152: LGTM!
FakeProvideris a well-designed test double:
- Structurally satisfies
CompletionProviderprotocol- Tracks
complete_callsandstream_callsfor test assertionsstreammethod correctly returns anAsyncIteratorvia inner generator- Properly handles optional
capabilitiesparameter with factory fallback
74-81:⚠️ Potential issue | 🟡 MinorFactories use real model identifiers instead of fake names.
The
CompletionResponseFactoryandModelCapabilitiesFactoryuse"claude-sonnet-4-6"and"anthropic"which are real provider identifiers.Based on learnings: "Tests must use fake model names (e.g.,
test-model:8b,fake-writer:latest)—never use real model IDs fromRECOMMENDED_MODELS."🧪 Proposed fix to use fake identifiers
class CompletionResponseFactory(ModelFactory): __model__ = CompletionResponse content = "Hello! How can I help you?" tool_calls = () finish_reason = FinishReason.STOP usage = TokenUsageFactory - model = "claude-sonnet-4-6" + model = "fake-model:8b" provider_request_id = None class ModelCapabilitiesFactory(ModelFactory): __model__ = ModelCapabilities - model_id = "claude-sonnet-4-6" - provider = "anthropic" + model_id = "fake-model:8b" + provider = "fake-provider" max_context_tokens = 200_000 max_output_tokens = 8_192Also applies to: 93-105
⛔ Skipped due to learnings
Learnt from: CR Repo: Aureliolo/story-factory PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-02-26T17:43:50.902Z Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.Learnt from: CR Repo: Aureliolo/story-factory PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-01-31T13:51:16.868Z Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.Learnt from: CR Repo: Aureliolo/story-factory PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-01-24T16:33:29.354Z Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)src/ai_company/providers/errors.py (1)
1-94: LGTM!Well-structured error hierarchy with clear retry semantics:
- Retryable errors (transient):
RateLimitError,ProviderTimeoutError,ProviderConnectionError,ProviderInternalError- Non-retryable errors (permanent):
AuthenticationError,ModelNotFoundError,InvalidRequestError,ContentFilterErrorThe
RateLimitError.retry_afterattribute supports respectingRetry-Afterheaders from providers. TheProviderError.__str__method provides helpful context formatting for debugging.
There was a problem hiding this comment.
Pull request overview
Implements the foundational “provider layer” abstraction in ai_company.providers, defining the typed interface and shared base logic that future LLM provider adapters (Anthropic/OpenAI/OpenRouter/Ollama/etc.) will conform to.
Changes:
- Added
CompletionProviderProtocolplusBaseCompletionProviderABC with shared request validation and token-cost computation helpers. - Introduced provider-layer domain models (
ChatMessage, tool types, streaming chunk types, config/response/usage) and capability descriptors for routing. - Added a typed provider error hierarchy with
is_retryableflags, plus comprehensive unit tests for the new module.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ai_company/providers/protocol.py | Defines the structural provider interface (CompletionProvider) used for loose coupling. |
| src/ai_company/providers/base.py | Adds the shared ABC with validation and cost computation helper(s). |
| src/ai_company/providers/models.py | Introduces provider-layer request/response/tool/streaming models with Pydantic validation. |
| src/ai_company/providers/capabilities.py | Adds ModelCapabilities for routing and cost metadata. |
| src/ai_company/providers/errors.py | Adds provider error hierarchy with retryability flags. |
| src/ai_company/providers/enums.py | Adds provider-layer enums (roles, finish reasons, stream event types). |
| src/ai_company/providers/init.py | Exposes the providers public API surface via imports and __all__. |
| tests/unit/providers/conftest.py | Adds factories/fixtures and a fake provider for protocol/base tests. |
| tests/unit/providers/test_protocol.py | Tests structural typing of the Protocol and BaseCompletionProvider behavior. |
| tests/unit/providers/test_models.py | Tests validation/immutability/serialization behaviors for provider models. |
| tests/unit/providers/test_capabilities.py | Tests ModelCapabilities validation and JSON roundtrip. |
| tests/unit/providers/test_errors.py | Tests error hierarchy behavior, formatting, and retryability flags. |
| tests/unit/providers/test_enums.py | Tests provider-layer enums for members and string values. |
| CLAUDE.md | Updates contributor conventions (contains a syntax guidance change). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ## Code Conventions | ||
|
|
||
| - **No `from __future__ import annotations`** — Python 3.14 has PEP 649 | ||
| - **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14 |
There was a problem hiding this comment.
The CLAUDE.md guidance on exception syntax appears incorrect: except A, B: is not valid Python 3 syntax for catching multiple exceptions (it should use a tuple, e.g. except (A, B):). As written this will mislead contributors and may cause invalid code to be introduced.
| - **PEP 758 except syntax**: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14 | |
| - **Exception syntax**: for multiple exceptions use `except (A, B):` (tuple) — Python 3 only supports the tuple form |
src/ai_company/providers/models.py
Outdated
| """Ensure the populated fields match the event_type.""" | ||
| match self.event_type: | ||
| case StreamEventType.CONTENT_DELTA: | ||
| if self.content is None: | ||
| msg = "content_delta event must include content" | ||
| raise ValueError(msg) | ||
| case StreamEventType.TOOL_CALL_DELTA: | ||
| if self.tool_call_delta is None: | ||
| msg = "tool_call_delta event must include tool_call_delta" | ||
| raise ValueError(msg) | ||
| case StreamEventType.USAGE: | ||
| if self.usage is None: | ||
| msg = "usage event must include usage" | ||
| raise ValueError(msg) | ||
| case StreamEventType.ERROR: | ||
| if self.error_message is None: | ||
| msg = "error event must include error_message" | ||
| raise ValueError(msg) |
There was a problem hiding this comment.
StreamChunk._validate_event_fields enforces required fields for some event_types, but it doesn't reject extraneous fields (e.g., a DONE chunk can still carry content/usage/etc.). This allows invalid/ambiguous stream states even though the docstring says the discriminator determines which fields are populated. Consider validating that non-relevant fields are None for each event type (especially DONE and ERROR).
| """Ensure the populated fields match the event_type.""" | |
| match self.event_type: | |
| case StreamEventType.CONTENT_DELTA: | |
| if self.content is None: | |
| msg = "content_delta event must include content" | |
| raise ValueError(msg) | |
| case StreamEventType.TOOL_CALL_DELTA: | |
| if self.tool_call_delta is None: | |
| msg = "tool_call_delta event must include tool_call_delta" | |
| raise ValueError(msg) | |
| case StreamEventType.USAGE: | |
| if self.usage is None: | |
| msg = "usage event must include usage" | |
| raise ValueError(msg) | |
| case StreamEventType.ERROR: | |
| if self.error_message is None: | |
| msg = "error event must include error_message" | |
| raise ValueError(msg) | |
| """Ensure the populated fields match the event_type. | |
| The ``event_type`` acts as a discriminator: for each event type, only | |
| the relevant field(s) may be non-None. | |
| """ | |
| # Determine which fields are required and which may be non-None for | |
| # this event type. All other optional fields must be None. | |
| match self.event_type: | |
| case StreamEventType.CONTENT_DELTA: | |
| allowed_non_null = {"content"} | |
| required_non_null = {"content"} | |
| case StreamEventType.TOOL_CALL_DELTA: | |
| allowed_non_null = {"tool_call_delta"} | |
| required_non_null = {"tool_call_delta"} | |
| case StreamEventType.USAGE: | |
| allowed_non_null = {"usage"} | |
| required_non_null = {"usage"} | |
| case StreamEventType.ERROR: | |
| allowed_non_null = {"error_message"} | |
| required_non_null = {"error_message"} | |
| case StreamEventType.DONE: | |
| # Terminal event: no additional payload fields should be set. | |
| allowed_non_null = set() | |
| required_non_null = set() | |
| case _: | |
| # For unknown or future event types, avoid over-validating to | |
| # preserve forward compatibility. | |
| return self | |
| field_values: dict[str, Any] = { | |
| "content": self.content, | |
| "tool_call_delta": self.tool_call_delta, | |
| "usage": self.usage, | |
| "error_message": self.error_message, | |
| } | |
| # Ensure all required fields are present. | |
| missing_required = sorted( | |
| name for name in required_non_null if field_values[name] is None | |
| ) | |
| if missing_required: | |
| required_list = ", ".join(missing_required) | |
| msg = ( | |
| f"{self.event_type.value} event missing required field(s): " | |
| f"{required_list}" | |
| ) | |
| raise ValueError(msg) | |
| # Ensure no extraneous fields are populated. | |
| extraneous = sorted( | |
| name | |
| for name, value in field_values.items() | |
| if name not in allowed_non_null and value is not None | |
| ) | |
| if extraneous: | |
| extraneous_list = ", ".join(extraneous) | |
| msg = ( | |
| f"{self.event_type.value} event must not include field(s): " | |
| f"{extraneous_list}" | |
| ) | |
| raise ValueError(msg) |
| cost = (input_tokens / 1000) * cost_per_1k_input + ( | ||
| output_tokens / 1000 | ||
| ) * cost_per_1k_output | ||
| return TokenUsage( | ||
| input_tokens=input_tokens, | ||
| output_tokens=output_tokens, | ||
| total_tokens=input_tokens + output_tokens, | ||
| cost_usd=round(cost, 10), | ||
| ) |
There was a problem hiding this comment.
compute_cost() hard-codes the rounding precision (round(cost, 10)), which makes the intent and consistency across the codebase harder to manage. There is already a shared BUDGET_ROUNDING_PRECISION = 10 constant used elsewhere to avoid float artifacts; consider reusing a shared constant (or introducing a provider-specific one) instead of a magic number here.
…, and Copilot Source fixes: - Make ProviderError.context immutable via MappingProxyType - Sanitize sensitive keys in ProviderError.__str__ output - Add input validation to compute_cost (reject negative values) - Add model parameter validation in BaseCompletionProvider - Add CompletionResponse validator requiring content or tool_calls - Make StreamChunk match exhaustive with negative constraints - Add ModelCapabilities cross-field validators - Extract cost rounding precision to named constant - Fix misleading docstrings (cost computation, delegation, stubs) - Standardize all docstrings to Google style Test additions (23 new tests, 118 → 141): - ABC instantiation and partial implementation rejection - Model validation (blank model, error context metadata) - compute_cost negative inputs and rounding precision - ChatMessage role-field symmetry (system/tool_calls, user/tool_result) - CompletionResponse empty response handling - StreamChunk extraneous field rejection - __all__ exports importability - ModelCapabilities cross-field constraint violations Other: - Replace real model identifiers with fake names in test fixtures
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/providers/errors.py`:
- Around line 11-13: The redaction set _REDACTED_KEYS should be matched
case-insensitively: normalize keys to lowercase (make _REDACTED_KEYS contain
lowercase strings) and update the __str__ logic to check k.lower() (or
equivalent) when deciding to redact values; ensure any comparisons that
currently use "k in _REDACTED_KEYS" are replaced with "k.lower() in
_REDACTED_KEYS" so keys like "API_KEY" or "Authorization" get redacted
correctly.
In `@tests/unit/providers/test_protocol.py`:
- Around line 271-282: Update the test to use the provider's rounding constant
instead of the hardcoded 10: import or reference
BaseCompletionProvider._COST_ROUNDING_PRECISION in
test_compute_cost_rounding_precision and replace the literal `10` with that
constant so the expected value is computed with the same precision as
BaseCompletionProvider.compute_cost; ensure the test still calls
BaseCompletionProvider.compute_cost(333, 777, cost_per_1k_input=0.003,
cost_per_1k_output=0.015) and compares usage.cost_usd to the rounded value using
_COST_ROUNDING_PRECISION.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (10)
src/ai_company/providers/__init__.pysrc/ai_company/providers/base.pysrc/ai_company/providers/capabilities.pysrc/ai_company/providers/errors.pysrc/ai_company/providers/models.pysrc/ai_company/providers/protocol.pytests/unit/providers/conftest.pytests/unit/providers/test_capabilities.pytests/unit/providers/test_models.pytests/unit/providers/test_protocol.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use PEP 649 native lazy annotations instead offrom __future__ import annotations(Python 3.14+)
All public functions and classes must include type hints; enforce via mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Create new objects instead of mutating existing ones; enforce immutability patterns
Use Pydantic v2 models (BaseModel, model_validator, ConfigDict) for data validation
Validate at system boundaries (user input, external APIs, config files); never silently swallow errors
Enforce 88-character line length (via ruff configuration)
Files:
tests/unit/providers/test_capabilities.pysrc/ai_company/providers/errors.pytests/unit/providers/test_models.pytests/unit/providers/test_protocol.pysrc/ai_company/providers/capabilities.pysrc/ai_company/providers/models.pysrc/ai_company/providers/__init__.pytests/unit/providers/conftest.pysrc/ai_company/providers/base.pysrc/ai_company/providers/protocol.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow
Maintain minimum 80% code coverage, enforced in CI via pytest-cov
Set test timeout to 30 seconds per test; use asyncio_mode = 'auto' for async tests without manual@pytest.mark.asyncio
Files:
tests/unit/providers/test_capabilities.pytests/unit/providers/test_models.pytests/unit/providers/test_protocol.pytests/unit/providers/conftest.py
{src/**/*.py,tests/**/*.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Run ruff check and ruff format as pre-commit hooks on all Python files; enforce no commits to main branch
Files:
tests/unit/providers/test_capabilities.pysrc/ai_company/providers/errors.pytests/unit/providers/test_models.pytests/unit/providers/test_protocol.pysrc/ai_company/providers/capabilities.pysrc/ai_company/providers/models.pysrc/ai_company/providers/__init__.pytests/unit/providers/conftest.pysrc/ai_company/providers/base.pysrc/ai_company/providers/protocol.py
🧠 Learnings (3)
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Applied to files:
tests/unit/providers/test_capabilities.pytests/unit/providers/test_models.pytests/unit/providers/conftest.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.
Applied to files:
tests/unit/providers/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)
Applied to files:
tests/unit/providers/conftest.py
🧬 Code graph analysis (7)
tests/unit/providers/test_capabilities.py (2)
src/ai_company/providers/capabilities.py (1)
ModelCapabilities(10-83)tests/unit/providers/conftest.py (2)
ModelCapabilitiesFactory(93-105)sample_model_capabilities(226-239)
tests/unit/providers/test_models.py (2)
src/ai_company/providers/enums.py (3)
FinishReason(15-22)MessageRole(6-12)StreamEventType(25-32)src/ai_company/providers/models.py (8)
ChatMessage(102-170)CompletionConfig(173-213)CompletionResponse(216-265)StreamChunk(268-346)TokenUsage(12-42)ToolCall(67-83)ToolDefinition(45-64)ToolResult(86-99)
tests/unit/providers/test_protocol.py (6)
src/ai_company/providers/base.py (7)
complete(42-72)stream(74-104)get_model_capabilities(106-119)_do_complete(124-133)_do_stream(136-145)_do_get_model_capabilities(148-153)compute_cost(158-211)src/ai_company/providers/capabilities.py (1)
ModelCapabilities(10-83)src/ai_company/providers/enums.py (3)
FinishReason(15-22)MessageRole(6-12)StreamEventType(25-32)src/ai_company/providers/errors.py (1)
InvalidRequestError(97-100)src/ai_company/providers/models.py (6)
ChatMessage(102-170)CompletionConfig(173-213)CompletionResponse(216-265)StreamChunk(268-346)TokenUsage(12-42)ToolDefinition(45-64)src/ai_company/providers/protocol.py (4)
CompletionProvider(22-81)complete(30-49)stream(51-70)get_model_capabilities(72-81)
src/ai_company/providers/models.py (1)
src/ai_company/providers/enums.py (3)
FinishReason(15-22)MessageRole(6-12)StreamEventType(25-32)
tests/unit/providers/conftest.py (2)
src/ai_company/providers/capabilities.py (1)
ModelCapabilities(10-83)src/ai_company/providers/models.py (8)
ChatMessage(102-170)CompletionConfig(173-213)CompletionResponse(216-265)StreamChunk(268-346)TokenUsage(12-42)ToolCall(67-83)ToolDefinition(45-64)ToolResult(86-99)
src/ai_company/providers/base.py (3)
src/ai_company/providers/capabilities.py (1)
ModelCapabilities(10-83)src/ai_company/providers/errors.py (1)
InvalidRequestError(97-100)src/ai_company/providers/models.py (6)
ChatMessage(102-170)CompletionConfig(173-213)CompletionResponse(216-265)StreamChunk(268-346)TokenUsage(12-42)ToolDefinition(45-64)
src/ai_company/providers/protocol.py (3)
src/ai_company/providers/capabilities.py (1)
ModelCapabilities(10-83)src/ai_company/providers/models.py (5)
ChatMessage(102-170)CompletionConfig(173-213)CompletionResponse(216-265)StreamChunk(268-346)ToolDefinition(45-64)src/ai_company/providers/base.py (3)
complete(42-72)stream(74-104)get_model_capabilities(106-119)
🔇 Additional comments (19)
src/ai_company/providers/errors.py (1)
1-124: LGTM!The error hierarchy is well-designed with:
- Immutable context via
MappingProxyType- Clear
is_retryablesemantics per error type- Sensitive key redaction in
__str__- Proper inheritance and Google-style docstrings
tests/unit/providers/test_models.py (1)
1-512: LGTM!Comprehensive test coverage for all provider domain models with:
- Proper
@pytest.mark.unitmarkers on all test classes- Module-level timeout of 30 seconds
- Validation tests (boundaries, constraints, cross-field rules)
- Immutability tests (
frozenmodel config)- Factory usage tests
- JSON round-trip verification
- Uses fake model names (
"test","test-model") per repository guidelinestests/unit/providers/test_capabilities.py (1)
1-157: LGTM!Well-structured test suite for
ModelCapabilitieswith:
- Proper
@pytest.mark.unitmarker and 30-second timeout- Coverage of all field validations (empty/negative/zero constraints)
- Cross-field constraint tests (
max_output_tokensvsmax_context_tokens, streaming tool call dependencies)- Immutability and JSON round-trip verification
- Uses fake model names (
"test-model","basic-model") per repository guidelinestests/unit/providers/test_protocol.py (1)
1-282: LGTM!Comprehensive test coverage for the provider protocol and base ABC:
- Protocol structural typing with
isinstancechecks- Async method delegation tests for
complete,stream,get_model_capabilities- Validation tests for empty messages and blank model IDs
compute_costedge cases (zero, large, negative inputs, rounding)- ABC instantiation prevention and partial implementation rejection
- Uses fake model names per repository guidelines
src/ai_company/providers/protocol.py (1)
1-81: LGTM!Clean protocol definition with:
@runtime_checkabledecorator for structural typing withisinstancechecks- Well-documented async methods with Google-style docstrings
- Proper type hints using domain models from the provider layer
- Correct use of
AsyncIterator[StreamChunk]for streamingsrc/ai_company/providers/models.py (2)
80-83: Note: Mutable default values in frozen models.The
argumentsfield usesdefault_factory=dict, which creates a new mutable dict for each instance. While the model is frozen (preventing reassignment ofarguments), the dict itself is still mutable. If callers attempt to modifyargumentsafter construction, they could inadvertently mutate state.This is likely acceptable for the current use case since the model is meant to be immutable and callers should treat it as such, but be aware that Pydantic's
frozen=Trueonly prevents attribute reassignment, not deep immutability.
1-346: LGTM!Excellent domain model design with:
- Strict immutability via
ConfigDict(frozen=True)on all models- Comprehensive cross-field validation (e.g.,
TokenUsagetotal check,ChatMessagerole constraints,StreamChunkevent-type validation)- Clear Google-style docstrings with attribute documentation
- Proper use of Pydantic v2 features (
model_validator,Fieldconstraints)NotBlankStrfor enforcing non-empty identifierssrc/ai_company/providers/capabilities.py (1)
1-83: LGTM!Well-designed capability descriptor with:
- Comprehensive field set for routing decisions (context limits, feature flags, costs)
- Sensible defaults (
supports_streaming=True,supports_tools=False)- Cross-field validation ensuring consistency (
max_output_tokens <= max_context_tokens, streaming tool calls require both tools and streaming)- Immutability via
frozen=True- Google-style docstrings with clear attribute documentation
src/ai_company/providers/__init__.py (1)
1-57: LGTM!Clean public API consolidation with:
- Complete export of all public types (protocols, base classes, models, enums, errors)
- Alphabetically sorted
__all__for discoverability- Clear module docstring describing the package purpose
tests/unit/providers/conftest.py (4)
1-20: LGTM!Imports are correctly organized with the appropriate
# noqa: TC003comment forAsyncIteratorsince it's used in runtime type hints.
24-105: Well-structured factories following polyfactory patterns.The factories are correctly configured:
TokenUsageFactoryhas consistent values (100 + 50 = 150 total_tokens)CompletionResponseFactory.usage = TokenUsageFactorycorrectly references the sub-factory class (polyfactory will call.build()automatically)StreamChunkFactorydefaults align withCONTENT_DELTAevent type requirementsModelCapabilitiesFactoryuses fake identifiers ("test-model","test-provider") as requiredThe
# type: ignore[var-annotated]and# noqa: RUF012comments are appropriate for mutable class-level attributes in the ModelFactory pattern.
111-152: FakeProvider correctly implements the protocol structure.The implementation properly:
- Records calls for test verification via
complete_callsandstream_calls- Returns factory-built responses from
complete- Yields properly-shaped
StreamChunkobjects fromstream(CONTENT_DELTA followed by DONE)- Returns injectable capabilities from
get_model_capabilitiesThe async generator pattern in
stream(wrapping an inner_gen()function) correctly produces anAsyncIterator[StreamChunk].
215-239: Fixtures now use fake model identifiers.The
sample_completion_responseandsample_model_capabilitiesfixtures correctly use"test-model"and"test-provider"instead of real provider identifiers, addressing the previous review feedback.src/ai_company/providers/base.py (6)
1-23: LGTM!Module docstring clearly explains the purpose and the module-level
_COST_ROUNDING_PRECISIONconstant is well-documented.
26-72: Well-designed public API with proper validation.The
completemethod correctly:
- Validates inputs before delegating to the hook
- Documents the validation behavior in Raises section
- Returns the hook result unmodified (no unnecessary wrapping)
74-104:streammethod follows the same validation pattern.The
return await self._do_stream(...)pattern is correct—it expects subclass_do_streamimplementations to be async methods returning anAsyncIterator(e.g., via an inner generator function), which aligns with theFakeProviderpattern in tests.
123-153: Abstract hooks properly defined.The three abstract methods have clear signatures matching the public API, with brief but sufficient docstrings for implementers.
157-211: Robust cost computation with input validation.The
compute_costmethod:
- Validates all four parameters for non-negative values
- Includes context in error messages for debugging
- Uses the module-level precision constant for rounding
- Correctly computes total_tokens as
input_tokens + output_tokens
213-239: Validation helpers are correct.
_validate_messagesproperly rejects empty lists, and_validate_modelhandles both empty strings and whitespace-only strings withif not model or not model.strip().
…al reviewers - Case-insensitive secret redaction in ProviderError.__str__ with tests - Context dict shallow-copy before MappingProxyType wrap (immutability) - RateLimitError.retry_after validation (rejects negative/NaN/infinity) - _validate_model guards against None input with isinstance check - compute_cost rejects non-finite (inf/NaN) cost rates - Hook docstrings specify ProviderError wrapping contract - _do_stream docstring clarifies return-not-yield pattern - ChatMessage validator refactored to exhaustive match/case - Removed forward reference to nonexistent LiteLLMProvider - Fixed tool_call_delta docstring from "Partial" to accurate description - Temperature docstring notes provider-specific range caveat - Removed fragile issue #9 reference from errors.py docstring - Added ProviderError docstring note about redaction behavior - FakeProvider now records tools/config parameters - Tests: redaction, context immutability, retry_after validation, tools/config forwarding, None model, inf/NaN cost rates, MAX_TOKENS/TOOL_USE empty response, assistant with content+tool_calls
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ai_company/providers/base.py
Outdated
| cost_per_1k_input: Cost per 1 000 input tokens in USD (>= 0). | ||
| cost_per_1k_output: Cost per 1 000 output tokens in USD (>= 0). | ||
|
|
||
| Returns: | ||
| Populated ``TokenUsage`` with computed cost. | ||
|
|
||
| Raises: | ||
| InvalidRequestError: If any parameter is negative. |
There was a problem hiding this comment.
compute_cost() rejects non-finite rates (NaN/inf) in addition to negatives, but the docstring says it only raises InvalidRequestError when a parameter is negative. Please update the docstring to reflect the actual validation behavior (non-negative and finite).
| cost_per_1k_input: Cost per 1 000 input tokens in USD (>= 0). | |
| cost_per_1k_output: Cost per 1 000 output tokens in USD (>= 0). | |
| Returns: | |
| Populated ``TokenUsage`` with computed cost. | |
| Raises: | |
| InvalidRequestError: If any parameter is negative. | |
| cost_per_1k_input: Cost per 1 000 input tokens in USD (finite and >= 0). | |
| cost_per_1k_output: Cost per 1 000 output tokens in USD (finite and >= 0). | |
| Returns: | |
| Populated ``TokenUsage`` with computed cost. | |
| Raises: | |
| InvalidRequestError: If any parameter is negative or non-finite | |
| (for example, NaN or infinity). |
src/ai_company/providers/base.py
Outdated
| _COST_ROUNDING_PRECISION: int = 10 | ||
| """Decimal places for cost rounding to avoid floating-point dust.""" |
There was a problem hiding this comment.
This introduces a new rounding-precision constant with the same value/purpose as ai_company.constants.BUDGET_ROUNDING_PRECISION. To avoid duplicated “10” precision definitions drifting over time, consider reusing a shared constant (or promoting a generic rounding constant in ai_company.constants) instead of defining _COST_ROUNDING_PRECISION locally.
| import pytest | ||
|
|
||
| from ai_company.providers.base import ( | ||
| _COST_ROUNDING_PRECISION, |
There was a problem hiding this comment.
The test imports _COST_ROUNDING_PRECISION, which is a private (underscore-prefixed) implementation detail. Either make the constant part of the public API (no leading underscore) or avoid importing it in the test by asserting on the rounded result directly (e.g., use the expected precision inline).
| _COST_ROUNDING_PRECISION, |
- Reuse shared BUDGET_ROUNDING_PRECISION from ai_company.constants instead of defining local _COST_ROUNDING_PRECISION duplicate - Update compute_cost docstring to reflect finite+non-negative validation - Import public constant in tests instead of private symbol
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/providers/models.py`:
- Around line 61-64: Add a short note to the model docstrings explaining that
dict fields like parameters_schema (in the model that defines Field(...)) and
the arguments field in ToolCall are shallowly immutable under the frozen
Pydantic model—reassignment is prevented but their contents can still be
mutated—so callers should treat these dicts as read-only (or copy them before
modifying) to avoid surprising side effects; update the docstring for the class
that declares parameters_schema and the ToolCall docstring to include this
guidance.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (8)
src/ai_company/providers/base.pysrc/ai_company/providers/errors.pysrc/ai_company/providers/models.pysrc/ai_company/providers/protocol.pytests/unit/providers/conftest.pytests/unit/providers/test_errors.pytests/unit/providers/test_models.pytests/unit/providers/test_protocol.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use PEP 649 native lazy annotations instead offrom __future__ import annotations(Python 3.14+)
All public functions and classes must include type hints; enforce via mypy strict mode
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Create new objects instead of mutating existing ones; enforce immutability patterns
Use Pydantic v2 models (BaseModel, model_validator, ConfigDict) for data validation
Validate at system boundaries (user input, external APIs, config files); never silently swallow errors
Enforce 88-character line length (via ruff configuration)
Files:
tests/unit/providers/test_models.pytests/unit/providers/test_protocol.pysrc/ai_company/providers/models.pysrc/ai_company/providers/base.pytests/unit/providers/conftest.pytests/unit/providers/test_errors.pysrc/ai_company/providers/errors.pysrc/ai_company/providers/protocol.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Mark tests with@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, or@pytest.mark.slow
Maintain minimum 80% code coverage, enforced in CI via pytest-cov
Set test timeout to 30 seconds per test; use asyncio_mode = 'auto' for async tests without manual@pytest.mark.asyncio
Files:
tests/unit/providers/test_models.pytests/unit/providers/test_protocol.pytests/unit/providers/conftest.pytests/unit/providers/test_errors.py
{src/**/*.py,tests/**/*.py}
📄 CodeRabbit inference engine (CLAUDE.md)
Run ruff check and ruff format as pre-commit hooks on all Python files; enforce no commits to main branch
Files:
tests/unit/providers/test_models.pytests/unit/providers/test_protocol.pysrc/ai_company/providers/models.pysrc/ai_company/providers/base.pytests/unit/providers/conftest.pytests/unit/providers/test_errors.pysrc/ai_company/providers/errors.pysrc/ai_company/providers/protocol.py
🧠 Learnings (3)
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Applied to files:
tests/unit/providers/test_models.pytests/unit/providers/conftest.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.
Applied to files:
tests/unit/providers/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)
Applied to files:
tests/unit/providers/conftest.py
🧬 Code graph analysis (3)
src/ai_company/providers/models.py (1)
src/ai_company/providers/enums.py (3)
FinishReason(15-22)MessageRole(6-12)StreamEventType(25-32)
tests/unit/providers/test_errors.py (1)
src/ai_company/providers/errors.py (7)
ContentFilterError(119-122)InvalidRequestError(113-116)ModelNotFoundError(107-110)ProviderError(22-69)ProviderInternalError(137-140)ProviderTimeoutError(125-128)RateLimitError(78-104)
src/ai_company/providers/protocol.py (2)
src/ai_company/providers/capabilities.py (1)
ModelCapabilities(10-83)src/ai_company/providers/models.py (5)
ChatMessage(102-174)CompletionConfig(177-218)CompletionResponse(221-270)StreamChunk(273-351)ToolDefinition(45-64)
🔇 Additional comments (8)
src/ai_company/providers/protocol.py (1)
1-80: LGTM!The protocol definition is well-structured with proper runtime-checkable decorator, clear method signatures, and comprehensive Google-style docstrings. The interface cleanly separates the engine-facing contract from concrete implementations.
src/ai_company/providers/errors.py (1)
1-140: LGTM!The error hierarchy is well-designed with:
- Proper immutability via
MappingProxyTypewith defensive copy- Case-insensitive sensitive key redaction
- Appropriate
is_retryableflags for retry logic- Thorough validation of
retry_afterinRateLimitErrorsrc/ai_company/providers/models.py (1)
1-351: Well-structured domain models with comprehensive validation.The models correctly implement:
- Frozen configs for immutability
- Cross-field validation via
model_validator(mode="after")- Exhaustive role/event-type matching with fallback error handling
- Clear separation between token usage, tools, messages, and streaming concerns
src/ai_company/providers/base.py (1)
1-273: LGTM!The ABC design is clean with:
- Clear separation between public API (with validation) and abstract hooks
- Comprehensive input validation raising
InvalidRequestErrorwith context- Proper finite/non-negative checks in
compute_cost- Use of shared
BUDGET_ROUNDING_PRECISIONconstant for consistencyThe streaming design correctly documents that
_do_streamimplementations must return anAsyncIteratorrather than yielding directly, which matches theawaitpattern in the publicstreammethod.tests/unit/providers/test_errors.py (1)
1-212: LGTM!Comprehensive test coverage for the error hierarchy including:
- Base class behavior and inheritance
is_retryableflags for all error types- Redaction of sensitive keys (case-insensitive)
- Context immutability via
MappingProxyTypeRateLimitErrorvalidation for negative/NaN/infretry_aftertests/unit/providers/test_models.py (1)
1-545: LGTM!Thorough test coverage for all domain models with:
- Validation rule enforcement (cross-field constraints, bounds)
- Immutability verification via frozen model mutations
- Factory-based instance creation
- JSON round-trip serialization
__all__export verificationAll tests use fake model names like
"test"as required.tests/unit/providers/test_protocol.py (1)
1-369: LGTM!Excellent test coverage for protocol and base class including:
- Structural typing verification via
isinstancechecks- Input validation (empty messages, blank/None model)
- Tools and config forwarding through hooks
compute_costvalidation (negative tokens, negative/inf/NaN rates)- Rounding precision using
BUDGET_ROUNDING_PRECISIONconstant- ABC instantiation prevention and partial implementation rejection
tests/unit/providers/conftest.py (1)
1-258: LGTM!Well-organized test infrastructure with:
- Comprehensive factories for all domain models
FakeProvidercorrectly satisfying theCompletionProviderprotocol- Call recording for verification in protocol tests
- All fixtures using fake model names (
"test-model","test-provider")Based on learnings: Tests correctly use fake model names as required.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| cost_per_1k_input: float = Field( | ||
| ge=0.0, | ||
| description="Cost per 1k input tokens in USD", | ||
| ) | ||
| cost_per_1k_output: float = Field( | ||
| ge=0.0, | ||
| description="Cost per 1k output tokens in USD", | ||
| ) |
There was a problem hiding this comment.
cost_per_1k_input / cost_per_1k_output currently only enforce >= 0.0, which allows float('inf') values. Since downstream cost computation explicitly requires finite rates (see BaseCompletionProvider.compute_cost), these fields should also validate as finite to fail fast on invalid model metadata.
| and self.content is None | ||
| and not self.tool_calls | ||
| ): | ||
| msg = f"{self.role} messages must have content or tool_calls" | ||
| raise ValueError(msg) |
There was a problem hiding this comment.
For system/user messages, tool_calls are forbidden, but the validation error says they "must have content or tool_calls". That message is misleading because tool_calls can never satisfy the requirement for those roles; consider emitting a role-specific message (e.g., "system messages must have content").
| r"""Provider-specific streaming completion. | ||
|
|
||
| Implementations must *return* an ``AsyncIterator`` (not ``yield`` | ||
| directly), since the caller ``await``\s this coroutine to obtain |
There was a problem hiding this comment.
The docstring contains the text await\s, which renders literally as "await\s" and looks like an escaped typo. Replace it with normal wording (e.g., "awaits") to avoid confusing readers and doc tooling.
| directly), since the caller ``await``\s this coroutine to obtain | |
| directly), since the caller awaits this coroutine to obtain |
Add the "Employment Agency" — a swappable driver system with LiteLLM as the default backend. Implements the concrete provider layer behind the contracts designed in #86. - LiteLLMDriver wrapping litellm.acompletion for streaming and non-streaming completions, model capability queries, and full exception mapping (10 LiteLLM exception types → ProviderError hierarchy) - ProviderRegistry (immutable) mapping provider names to driver instances, built from config via from_config() with factory_overrides for testing - Pure mapping functions (messages, tools, finish reasons, tool calls) - 3 new driver error classes (DriverNotRegistered, DriverAlreadyRegistered, DriverFactoryNotFound) - driver field on ProviderConfig (defaults to "litellm") - Upgraded litellm 1.67.2 → 1.82.0 (fixes Python 3.14 compat, removes need for PYTHONUTF8 env var and deprecation warning filters) - Removed unused F403 ruff ignore for __init__.py - 235 new unit tests, all 1273 tests pass at 94% coverage
## Summary - Implements the **Employment Agency** — a swappable driver system with LiteLLM as the default backend behind the contracts from #86 - `LiteLLMDriver(BaseCompletionProvider)` wrapping `litellm.acompletion` for streaming/non-streaming completions, model capabilities, and full exception mapping (10 LiteLLM exception types → `ProviderError` hierarchy) - `ProviderRegistry` (immutable, `MappingProxyType`) mapping provider names → driver instances, built from config via `from_config()` with `factory_overrides` for testing/native SDK swaps - Pure mapping functions for messages, tools, finish reasons, and tool call extraction - Upgraded litellm `1.67.2` → `1.82.0`, removing all Python 3.14 compatibility workarounds (warning filters, `PYTHONUTF8` env var, stale `type: ignore`) - Removed unused `F403` ruff ignore for `__init__.py` ## Architecture ``` Engine ──> ProviderRegistry ──> LiteLLMDriver (anthropic) ("Employment Agency") LiteLLMDriver (openrouter) LiteLLMDriver (ollama) # Future: NativeAnthropicDriver, etc. ``` ## New files | File | Lines | Purpose | |------|-------|---------| | `providers/drivers/litellm_driver.py` | ~530 | LiteLLM-backed completion driver | | `providers/drivers/mappers.py` | ~170 | Pure message/tool/reason mapping functions | | `providers/registry.py` | ~185 | Immutable provider name → driver registry | | `providers/drivers/__init__.py` | ~20 | Sub-package exports | ## Modified files | File | Change | |------|--------| | `config/schema.py` | Added `driver` field to `ProviderConfig` (default `"litellm"`) | | `providers/errors.py` | Added 3 driver error classes | | `providers/__init__.py` | Added new exports | | `pyproject.toml` | litellm `1.82.0`, removed warning filters and `F403` ignore | ## Test plan - [x] 235 new unit tests covering driver, mappers, registry, and exception mapping - [x] All 1273 tests pass (`uv run pytest tests/ -n auto`) - [x] 94% coverage (`--cov-fail-under=80`) - [x] Lint clean (`uv run ruff check src/ tests/`) - [x] Format clean (`uv run ruff format --check src/ tests/`) - [x] Type-check clean (`uv run mypy src/` — 0 errors in 64 files) - [x] All pre-commit hooks pass Closes #4
## Summary - Adds **39 integration tests** for the provider adapter layer, completing the final unchecked acceptance criterion from #5: _"Integration tests with mock/recorded API responses"_ - All source code was already implemented in PRs #86 and #88 — this PR covers only the integration test suite - Mocks at `litellm.acompletion` level using **real `litellm.ModelResponse`** objects (not MagicMock), exercising actual attribute access paths through `_map_response`, `_process_chunk`, and `extract_tool_calls` ### Test files | File | Tests | Coverage | |------|-------|---------| | `test_anthropic_pipeline.py` | 13 | Config→registry→complete/stream, alias resolution, cost computation, streaming | | `test_openrouter_pipeline.py` | 5 | Custom base_url forwarding, model prefixing, multi-model alias resolution | | `test_ollama_pipeline.py` | 4 | No api_key, localhost base_url, zero-cost models | | `test_error_scenarios.py` | 9 | Rate limit (429 + retry-after), auth (401), timeout, connection, internal, unknown | | `test_tool_calling_pipeline.py` | 8 | Single/multiple tool calls, streaming accumulation, mixed text+tools, multi-turn | | `conftest.py` | — | Config factories, real ModelResponse builders, stream helpers | ### Verification - `ruff check` — all passed - `ruff format` — all formatted - `mypy` — 0 errors (7 files) - `pytest` — 1331 total tests pass, **94.49% coverage** (80% required) Closes #5 ## Test plan - [ ] CI passes (lint + type-check + test + coverage) - [ ] 39 integration tests pass under `pytest -m integration` - [ ] No regressions in existing 1292 unit tests - [ ] Coverage remains above 80% threshold
🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Closes #3
Summary
CompletionProviderProtocol for loose coupling +BaseCompletionProviderABC for shared validation and cost computationChatMessage,ToolDefinition,ToolCall,ToolResult,CompletionConfig,CompletionResponse,StreamChunk,TokenUsageis_retryableflags (8 typed subclasses ofProviderError)ModelCapabilitiesfor routing decisions (context window, tool support, vision, streaming, costs)Design Decisions
parameters_schema: dict[str, Any]— no custom ToolParameter model, every provider consumes JSON Schema nativelyStreamEventTypediscriminatoris_retryable; retry logic (issue Implement retry logic, rate limiting, and provider error handling #9) will use this flagFiles
Source (7):
providers/enums.py,errors.py,models.py,capabilities.py,protocol.py,base.py,__init__.pyTests (5+2):
test_enums.py,test_errors.py,test_models.py,test_capabilities.py,test_protocol.py+conftest.py,__init__.pyTest plan
uv run ruff check— passesuv run ruff format --check— passesuv run mypy src/ai_company/providers/— passes (strict)pytest tests/unit/providers/ -v -m unit)🤖 Generated with Claude Code