Skip to content

feat: implement LiteLLM driver and provider registry#88

Merged
Aureliolo merged 3 commits intomainfrom
feat/litellm-driver
Mar 1, 2026
Merged

feat: implement LiteLLM driver and provider registry#88
Aureliolo merged 3 commits intomainfrom
feat/litellm-driver

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Implements the Employment Agency — a swappable driver system with LiteLLM as the default backend behind the contracts from feat: design unified provider interface #86
  • LiteLLMDriver(BaseCompletionProvider) wrapping litellm.acompletion for streaming/non-streaming completions, model capabilities, and full exception mapping (10 LiteLLM exception types → ProviderError hierarchy)
  • ProviderRegistry (immutable, MappingProxyType) mapping provider names → driver instances, built from config via from_config() with factory_overrides for testing/native SDK swaps
  • Pure mapping functions for messages, tools, finish reasons, and tool call extraction
  • Upgraded litellm 1.67.21.82.0, removing all Python 3.14 compatibility workarounds (warning filters, PYTHONUTF8 env var, stale type: ignore)
  • Removed unused F403 ruff ignore for __init__.py

Architecture

Engine  ──>  ProviderRegistry  ──>  LiteLLMDriver (anthropic)
             ("Employment Agency")  LiteLLMDriver (openrouter)
                                    LiteLLMDriver (ollama)
                                    # Future: NativeAnthropicDriver, etc.

New files

File Lines Purpose
providers/drivers/litellm_driver.py ~530 LiteLLM-backed completion driver
providers/drivers/mappers.py ~170 Pure message/tool/reason mapping functions
providers/registry.py ~185 Immutable provider name → driver registry
providers/drivers/__init__.py ~20 Sub-package exports

Modified files

File Change
config/schema.py Added driver field to ProviderConfig (default "litellm")
providers/errors.py Added 3 driver error classes
providers/__init__.py Added new exports
pyproject.toml litellm 1.82.0, removed warning filters and F403 ignore

Test plan

  • 235 new unit tests covering driver, mappers, registry, and exception mapping
  • All 1273 tests pass (uv run pytest tests/ -n auto)
  • 94% coverage (--cov-fail-under=80)
  • Lint clean (uv run ruff check src/ tests/)
  • Format clean (uv run ruff format --check src/ tests/)
  • Type-check clean (uv run mypy src/ — 0 errors in 64 files)
  • All pre-commit hooks pass

Closes #4

Add the "Employment Agency" — a swappable driver system with LiteLLM as
the default backend. Implements the concrete provider layer behind the
contracts designed in #86.

- LiteLLMDriver wrapping litellm.acompletion for streaming and
  non-streaming completions, model capability queries, and full
  exception mapping (10 LiteLLM exception types → ProviderError hierarchy)
- ProviderRegistry (immutable) mapping provider names to driver instances,
  built from config via from_config() with factory_overrides for testing
- Pure mapping functions (messages, tools, finish reasons, tool calls)
- 3 new driver error classes (DriverNotRegistered, DriverAlreadyRegistered,
  DriverFactoryNotFound)
- driver field on ProviderConfig (defaults to "litellm")
- Upgraded litellm 1.67.2 → 1.82.0 (fixes Python 3.14 compat, removes
  need for PYTHONUTF8 env var and deprecation warning filters)
- Removed unused F403 ruff ignore for __init__.py
- 235 new unit tests, all 1273 tests pass at 94% coverage
Copilot AI review requested due to automatic review settings March 1, 2026 11:15
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 1, 2026

Warning

Rate limit exceeded

@Aureliolo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 53 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 63f0774 and c8c7300.

📒 Files selected for processing (5)
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/providers/registry.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/unit/providers/test_registry.py
📝 Walkthrough

Walkthrough

Adds a LiteLLM-backed provider layer: new LiteLLMDriver, mapping utilities, driver registry and errors, ProviderConfig.driver field, package export updates, pyproject/mypy adjustments, and comprehensive unit tests for drivers, mappers, and the registry.

Changes

Cohort / File(s) Summary
Project config
pyproject.toml
Added litellm dependency, adjusted mypy overrides (litellm., tests.) and wrapped filterwarnings as a list.
Schema
src/ai_company/config/schema.py
Added driver: NotBlankStr field to ProviderConfig (default "litellm").
Provider package exports
src/ai_company/providers/__init__.py
Exported LiteLLMDriver, ProviderRegistry, and driver error classes (DriverAlreadyRegisteredError, DriverFactoryNotFoundError, DriverNotRegisteredError).
Drivers package init
src/ai_company/providers/drivers/__init__.py
New drivers package initializer exporting LiteLLMDriver.
LiteLLM driver
src/ai_company/providers/drivers/litellm_driver.py
New LiteLLMDriver: model resolution, request construction, streaming support (with tool-call accumulation), error mapping, response mapping, cost computation, and capability discovery.
Driver mappers
src/ai_company/providers/drivers/mappers.py
New mapping utilities: messages_to_dicts, tools_to_dicts, extract_tool_calls, map_finish_reason and helpers for converting domain models to OpenAI-style dicts.
Provider errors
src/ai_company/providers/errors.py
Added DriverNotRegisteredError, DriverAlreadyRegisteredError, DriverFactoryNotFoundError (subclassing ProviderError).
Provider registry
src/ai_company/providers/registry.py
New ProviderRegistry with from_config factory construction, driver instantiation/validation, immutable mapping, and lookup APIs.
Driver tests fixtures
tests/unit/providers/drivers/conftest.py
New test fixtures and helpers for provider configs, mock LiteLLM responses, stream chunks, and tool-call deltas.
LiteLLMDriver tests
tests/unit/providers/drivers/test_litellm_driver.py
Comprehensive unit tests for non-streaming/streaming flows, exception mapping, model capabilities, tool handling, and cost/provenance assertions.
Mapper tests
tests/unit/providers/drivers/test_mappers.py
Unit tests for message/tool conversion, finish-reason mapping, and tool-call extraction.
Registry tests
tests/unit/providers/test_registry.py
Tests for registry retrieval, listing, membership, factory overrides, error cases, and immutability.

Sequence Diagram

sequenceDiagram
    actor Client
    participant ProviderRegistry
    participant LiteLLMDriver
    participant Mappers
    participant litellm

    Client->>ProviderRegistry: get(provider_name)
    ProviderRegistry-->>Client: LiteLLMDriver instance

    Client->>LiteLLMDriver: complete(messages, tools, config)
    LiteLLMDriver->>LiteLLMDriver: resolve model & apply config
    LiteLLMDriver->>Mappers: messages_to_dicts(messages)
    Mappers-->>LiteLLMDriver: message dicts
    LiteLLMDriver->>Mappers: tools_to_dicts(tools)
    Mappers-->>LiteLLMDriver: tool dicts
    LiteLLMDriver->>litellm: acompletion(model, messages, tools, ...)
    litellm-->>LiteLLMDriver: ModelResponse / stream
    LiteLLMDriver->>Mappers: extract_tool_calls(raw_tool_calls)
    Mappers-->>LiteLLMDriver: ToolCall tuples
    LiteLLMDriver-->>Client: CompletionResponse / stream chunks
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.97% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: implement LiteLLM driver and provider registry' directly summarizes the main changes: adding a LiteLLM-backed driver and a provider registry system.
Description check ✅ Passed The PR description comprehensively relates to the changeset, detailing the implementation of the Employment Agency pattern with LiteLLM driver, registry, mapping functions, dependencies, and test coverage.
Linked Issues check ✅ Passed The PR implements core requirements from #4: LiteLLM integration prototype with Anthropic/OpenRouter/Ollama support, cost tracking, exception mapping, tool calling, streaming, and error propagation demonstrated through 235+ tests.
Out of Scope Changes check ✅ Passed All changes are in scope: LiteLLMDriver and ProviderRegistry (core components), mappers (required for message/tool conversion), error classes (supporting exception mapping requirement), config schema update (driver field), and litellm upgrade with cleanup.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/litellm-driver

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational architectural shift by implementing a pluggable driver system for LLM providers. It centralizes the management and interaction with various LLM backends through a ProviderRegistry and provides a robust LiteLLMDriver as the initial default. This change significantly improves the system's flexibility, maintainability, and error handling capabilities, paving the way for easier integration of diverse AI models and services.

Highlights

  • Modular Driver System: Implemented a swappable driver system, referred to as the 'Employment Agency', with LiteLLM serving as the default backend for LLM providers. This architecture allows for easy integration of different LLM backends.
  • LiteLLM Driver Implementation: Introduced the LiteLLMDriver which wraps litellm.acompletion to handle both streaming and non-streaming completions, manage model capabilities, and provide comprehensive exception mapping for 10 LiteLLM exception types to a custom ProviderError hierarchy.
  • Provider Registry: Created a ProviderRegistry (an immutable MappingProxyType) that maps provider names to their respective driver instances. This registry is built from configuration via from_config(), supporting factory_overrides for testing or native SDK integrations.
  • Mapping Functions: Added pure mapping functions to convert between internal domain models and OpenAI-compatible dict formats for messages, tools, finish reasons, and tool call extraction, ensuring reusability for future native SDK drivers.
  • Dependency and Tooling Updates: Upgraded the litellm dependency from 1.67.2 to 1.82.0, which allowed for the removal of Python 3.14 compatibility workarounds (warning filters, PYTHONUTF8 env var, stale type: ignore). Also removed an unused F403 ruff ignore for __init__.py files.
  • Configuration Schema Enhancement: Modified the ProviderConfig schema to include a new driver field, defaulting to "litellm", enabling configuration-driven selection of provider backends.
  • New Error Classes: Added three new driver-specific error classes: DriverNotRegisteredError, DriverAlreadyRegisteredError, and DriverFactoryNotFoundError to enhance error handling within the provider layer.
Changelog
  • pyproject.toml
    • Added litellm==1.82.0 to project dependencies.
    • Removed F403 from the ruff ignore list for __init__.py files.
    • Configured mypy to ignore missing imports for litellm.* modules.
    • Updated filterwarnings to use a multi-line list format.
  • src/ai_company/config/schema.py
    • Added a new driver field to the ProviderConfig class, defaulting to "litellm".
  • src/ai_company/providers/init.py
    • Updated the module docstring to reflect new exports including drivers and the registry.
    • Imported LiteLLMDriver and ProviderRegistry.
    • Imported new driver-related error classes: DriverAlreadyRegisteredError, DriverFactoryNotFoundError, and DriverNotRegisteredError.
    • Added newly imported classes and modules to the __all__ export list.
  • src/ai_company/providers/drivers/init.py
    • Added a new file to define the drivers sub-package and export LiteLLMDriver.
  • src/ai_company/providers/drivers/litellm_driver.py
    • Added a new file implementing the LiteLLMDriver class, which integrates with LiteLLM for LLM completions and streaming.
  • src/ai_company/providers/drivers/mappers.py
    • Added a new file containing utility functions for mapping between internal data models and LiteLLM's OpenAI-compatible API formats.
  • src/ai_company/providers/errors.py
    • Added DriverNotRegisteredError for unregistered drivers.
    • Added DriverAlreadyRegisteredError for duplicate driver registrations.
    • Added DriverFactoryNotFoundError for missing driver factories.
  • src/ai_company/providers/registry.py
    • Added a new file implementing the ProviderRegistry class for managing and retrieving LLM provider drivers.
  • tests/unit/providers/drivers/conftest.py
    • Added a new file providing shared fixtures and mock factories for unit tests of provider drivers.
  • tests/unit/providers/drivers/test_litellm_driver.py
    • Added a new file containing unit tests for the LiteLLMDriver functionality.
  • tests/unit/providers/drivers/test_mappers.py
    • Added a new file containing unit tests for the mapping functions in mappers.py.
  • tests/unit/providers/test_registry.py
    • Added a new file containing unit tests for the ProviderRegistry.
Activity
  • 235 new unit tests were added, specifically covering the driver, mappers, registry, and exception mapping logic.
  • All 1273 existing tests passed successfully.
  • Code coverage was maintained at 94%, exceeding the 80% minimum threshold.
  • Linting checks passed without issues.
  • Formatting checks passed without issues.
  • Type-checking completed with 0 errors across 64 files.
  • All pre-commit hooks passed.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a swappable provider “driver” layer (defaulting to LiteLLM) plus an immutable ProviderRegistry that builds provider instances from config, aligning with the unified provider contracts introduced in #86.

Changes:

  • Added LiteLLMDriver implementing BaseCompletionProvider with streaming/non-streaming completions, model capability lookup, and provider error mapping.
  • Added ProviderRegistry to construct and expose provider-name → driver mappings from ProviderConfig.driver.
  • Added pure mapping utilities (messages_to_dicts, tools_to_dicts, finish-reason/tool-call extraction) and comprehensive unit tests; bumped litellm to 1.82.0.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyproject.toml Adds litellm==1.82.0, mypy override for litellm.*, and adjusts ruff per-file ignore for __init__.py.
uv.lock Locks litellm and its transitive dependencies.
src/ai_company/config/schema.py Adds ProviderConfig.driver field (default "litellm").
src/ai_company/providers/errors.py Introduces registry/driver-related error types.
src/ai_company/providers/registry.py Implements immutable ProviderRegistry + factory-based construction from config.
src/ai_company/providers/drivers/mappers.py Adds message/tool/finish-reason/tool-call mapping helpers.
src/ai_company/providers/drivers/litellm_driver.py Implements LiteLLM-backed driver with streaming support and exception mapping.
src/ai_company/providers/drivers/__init__.py Exposes driver(s) from the subpackage.
src/ai_company/providers/__init__.py Exports registry + driver + new error types from the top-level providers package.
tests/unit/providers/test_registry.py Unit tests for ProviderRegistry.
tests/unit/providers/drivers/test_mappers.py Unit tests for mapping helpers.
tests/unit/providers/drivers/test_litellm_driver.py Unit tests for LiteLLMDriver (mocked LiteLLM calls).
tests/unit/providers/drivers/conftest.py Shared mock factories/fixtures for driver tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if isinstance(raw, str):
try:
parsed = json.loads(raw)
except json.JSONDecodeError, ValueError:
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid exception syntax: except json.JSONDecodeError, ValueError: is Python 2 syntax and raises a SyntaxError on import under Python 3. Use except (json.JSONDecodeError, ValueError): instead.

Suggested change
except json.JSONDecodeError, ValueError:
except (json.JSONDecodeError, ValueError):

Copilot uses AI. Check for mistakes.
return None
try:
return float(raw)
except ValueError, TypeError:
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid exception syntax: except ValueError, TypeError: is Python 2 syntax and will raise a SyntaxError under Python 3. Use except (ValueError, TypeError): instead.

Suggested change
except ValueError, TypeError:
except (ValueError, TypeError):

Copilot uses AI. Check for mistakes.
return None
try:
parsed = json.loads(self.arguments) if self.arguments else {}
except json.JSONDecodeError, ValueError:
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid exception syntax: except json.JSONDecodeError, ValueError: is Python 2 syntax and will raise a SyntaxError under Python 3. Use except (json.JSONDecodeError, ValueError): (or just json.JSONDecodeError) instead.

Suggested change
except json.JSONDecodeError, ValueError:
except (json.JSONDecodeError, ValueError):

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-designed and extensible driver system for LLM providers, with LiteLLM as the default implementation. The architecture is clean, with good separation of concerns between the driver, mappers, and registry. The code is extensively tested and the exception handling is robust. I've found a few critical syntax issues related to exception handling that appear to be from Python 2, which will cause errors in the target Python 3.14 environment. Once these are addressed, this will be an excellent addition to the codebase.

return None
try:
return float(raw)
except ValueError, TypeError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is from Python 2 and will raise a SyntaxError in Python 3. To catch multiple exception types, they must be enclosed in a tuple.

        except (ValueError, TypeError):

return None
try:
parsed = json.loads(self.arguments) if self.arguments else {}
except json.JSONDecodeError, ValueError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is from Python 2 and will raise a SyntaxError in Python 3. To catch multiple exception types, they must be enclosed in a tuple.

Suggested change
except json.JSONDecodeError, ValueError:
except (json.JSONDecodeError, ValueError):

if isinstance(raw, str):
try:
parsed = json.loads(raw)
except json.JSONDecodeError, ValueError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is from Python 2 and will raise a SyntaxError in Python 3. To catch multiple exception types, they must be enclosed in a tuple.

Suggested change
except json.JSONDecodeError, ValueError:
except (json.JSONDecodeError, ValueError):

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/providers/drivers/litellm_driver.py`:
- Around line 205-208: The loop that builds _model_lookup (the for m in models
block creating lookup[m.id] and lookup[m.alias]) can silently overwrite entries
when an alias equals another model's id or alias; update the builder to detect
collisions: when about to assign lookup[key] check if key already exists and if
so raise or log a clear validation error referencing the conflicting model
ids/aliases (include both existing and new m.id/m.alias) and skip or abort
loading as appropriate per policy; ensure checks cover both m.id and m.alias and
run before any assignment so _model_lookup cannot be silently remapped.
- Around line 360-363: The streaming branch currently drops usage events when
usage_obj.prompt_tokens is zero; change the condition in the streaming path
(where usage_obj is retrieved from chunk and _make_usage_chunk is called) to
emit usage whenever usage_obj is not None (i.e., check usage_obj is not None)
rather than requiring prompt_tokens to be truthy, so
result.append(self._make_usage_chunk(usage_obj, model_config)) runs for valid
usage objects even if prompt_tokens == 0.
- Around line 276-277: The code converts usage_obj prompt/completion token
attributes with int(getattr(...)) which throws TypeError if the attribute exists
but is None; change the conversions to coerce None to 0 (e.g., input_tok =
int(getattr(usage_obj, "prompt_tokens", 0) or 0) and output_tok =
int(getattr(usage_obj, "completion_tokens", 0) or 0)) and make the same change
for the other occurrence referenced (the conversions at the second location
around lines with input_tok/output_tok in the later block) so None values are
safely treated as 0 before int().
- Around line 421-425: The code currently does a case-sensitive lookup raw =
headers.get("retry-after") which breaks HTTP semantics; update the lookup in
litellm_driver.py (around the getattr(exc, "headers", None) handling) to perform
a case-insensitive search (for example, by normalizing keys or iterating
headers.items() and matching k.lower() == "retry-after") and assign the found
value to raw; keep the existing isinstance(headers, dict) guard and ensure the
new lookup still returns None when no Retry-After header is present.
- Around line 265-266: In _map_response, avoid direct indexing of
response.choices[0]; instead retrieve choices via getattr(response, "choices",
[]) and check for emptiness—if empty, raise ProviderInternalError with a clear
message; otherwise use the first choice (e.g., choice = choices[0]) and continue
mapping as before. Ensure the change is made inside the _map_response method and
mirror the defensive pattern used in _process_chunk.
- Around line 186-189: The supports_streaming flag is hard-coded True; change it
to read the model info like the other capabilities (e.g., supports_streaming =
bool(info.get("supports_streaming", False))) and set
supports_streaming_tool_calls to the logical AND of streaming and
function-calling (e.g., bool(info.get("supports_function_calling", False)) and
supports_streaming) so non-streaming models aren't routed to streaming
endpoints; update the assignment locations where supports_streaming and
supports_streaming_tool_calls are set (the dict building that currently contains
supports_streaming=True and supports_streaming_tool_calls=...) to use these
extracted values.

In `@src/ai_company/providers/drivers/mappers.py`:
- Around line 73-79: The mapper currently returns payloads containing mutable
dicts by reference (e.g., using tool.parameters_schema in the function payload
and other dicts around lines 153-161); fix by returning defensive copies of any
mutable objects before including them in the returned dicts (use shallow or deep
copy as appropriate for nested structures) so callers cannot mutate the original
tool or schema objects — update the mapper return that constructs the "function"
payload (references: tool.name, tool.description, tool.parameters_schema) and
the other dict-producing mapper(s) around lines 153-161 to clone their dict/list
values before returning.
- Around line 83-103: The _FINISH_REASON_MAP in map_finish_reason is missing
Anthropic-specific keys so Anthropic finish reasons like "end_turn",
"stop_sequence", and "tool_use" currently fall back to FinishReason.ERROR;
update _FINISH_REASON_MAP to include "end_turn" -> FinishReason.STOP,
"stop_sequence" -> FinishReason.STOP, and "tool_use" -> FinishReason.TOOL_USE
(optionally normalize incoming reason with .lower() inside map_finish_reason
before lookup) so these provider-native values map correctly instead of
defaulting to ERROR.

In `@src/ai_company/providers/registry.py`:
- Around line 84-86: The __contains__ method currently does "name in
self._drivers" which raises TypeError for unhashable inputs; update the
Registry.__contains__ implementation to handle unhashable objects by performing
the membership test inside a try/except TypeError block (or using a safe lookup)
and return False when a TypeError occurs so unhashable probes (e.g., lists) do
not propagate exceptions; reference the __contains__ method and the
self._drivers attribute when making the change.
- Around line 174-175: The call to factory(name, config) can raise raw
exceptions which bypass the registry's driver error type; wrap the invocation of
factory inside a try/except in the registry code where driver = factory(name,
config) is executed, catch any Exception, and re-raise the registry's driver
error type (including contextual information: provider name and config) while
preserving the original exception as the cause; then proceed to the isinstance
check for BaseCompletionProvider as before.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e23d64 and 49710c6.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • pyproject.toml
  • src/ai_company/config/schema.py
  • src/ai_company/providers/__init__.py
  • src/ai_company/providers/drivers/__init__.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/registry.py
  • tests/unit/providers/drivers/__init__.py
  • tests/unit/providers/drivers/conftest.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/unit/providers/drivers/test_mappers.py
  • tests/unit/providers/test_registry.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do not use from __future__ import annotations — Python 3.14 has PEP 649
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14
Add type hints to all public functions, enforced by mypy strict mode
Use Google style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects instead of mutating existing ones — enforce immutability
Use Pydantic v2 with BaseModel, model_validator, and ConfigDict
Keep line length to 88 characters, enforced by ruff
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files

Files:

  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/unit/providers/drivers/test_mappers.py
  • src/ai_company/providers/registry.py
  • src/ai_company/providers/__init__.py
  • src/ai_company/config/schema.py
  • src/ai_company/providers/drivers/__init__.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • tests/unit/providers/test_registry.py
  • src/ai_company/providers/drivers/mappers.py
  • tests/unit/providers/drivers/conftest.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Use asyncio_mode = 'auto' in pytest — no manual @pytest.mark.asyncio needed
Set 30-second timeout per test

Files:

  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/unit/providers/drivers/test_mappers.py
  • tests/unit/providers/test_registry.py
  • tests/unit/providers/drivers/conftest.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Maintain 80% minimum code coverage, enforced in CI

Files:

  • src/ai_company/providers/registry.py
  • src/ai_company/providers/__init__.py
  • src/ai_company/config/schema.py
  • src/ai_company/providers/drivers/__init__.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/drivers/mappers.py
pyproject.toml

📄 CodeRabbit inference engine (CLAUDE.md)

Pin all dependency versions using == in pyproject.toml

Files:

  • pyproject.toml
🧠 Learnings (8)
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/unit/providers/drivers/test_litellm_driver.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Mock Ollama API responses to support both dict (`models.get("models")`) and object (`response.models`) patterns in test mocks.

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Mock Ollama API calls in tests to avoid requiring a running Ollama instance

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Ollama API response mocks must support both dict pattern (`models.get('models')`) and object pattern (`response.models`) to match actual API behavior.

Applied to files:

  • tests/unit/providers/drivers/conftest.py
🧬 Code graph analysis (5)
tests/unit/providers/drivers/test_mappers.py (3)
src/ai_company/providers/drivers/mappers.py (4)
  • extract_tool_calls (106-134)
  • map_finish_reason (93-103)
  • messages_to_dicts (15-25)
  • tools_to_dicts (58-68)
src/ai_company/providers/enums.py (2)
  • FinishReason (15-22)
  • MessageRole (6-12)
src/ai_company/providers/models.py (4)
  • ChatMessage (114-186)
  • ToolCall (73-95)
  • ToolDefinition (45-70)
  • ToolResult (98-111)
src/ai_company/providers/registry.py (3)
src/ai_company/providers/base.py (1)
  • BaseCompletionProvider (26-273)
src/ai_company/providers/errors.py (2)
  • DriverFactoryNotFoundError (155-158)
  • DriverNotRegisteredError (143-146)
src/ai_company/config/schema.py (1)
  • ProviderConfig (52-95)
src/ai_company/providers/drivers/__init__.py (1)
src/ai_company/providers/drivers/litellm_driver.py (1)
  • LiteLLMDriver (86-447)
src/ai_company/providers/drivers/litellm_driver.py (6)
src/ai_company/providers/errors.py (4)
  • AuthenticationError (72-75)
  • RateLimitError (78-104)
  • ProviderError (22-69)
  • InvalidRequestError (113-116)
src/ai_company/providers/base.py (2)
  • BaseCompletionProvider (26-273)
  • compute_cost (182-238)
src/ai_company/providers/capabilities.py (1)
  • ModelCapabilities (10-83)
src/ai_company/providers/enums.py (1)
  • StreamEventType (25-32)
src/ai_company/providers/models.py (5)
  • CompletionResponse (233-282)
  • StreamChunk (285-363)
  • ToolCall (73-95)
  • ChatMessage (114-186)
  • ToolDefinition (45-70)
src/ai_company/config/schema.py (2)
  • ProviderConfig (52-95)
  • ProviderModelConfig (17-49)
tests/unit/providers/drivers/conftest.py (1)
src/ai_company/config/schema.py (2)
  • ProviderConfig (52-95)
  • ProviderModelConfig (17-49)
🔇 Additional comments (14)
pyproject.toml (4)

125-125: LGTM!

Removing the F403 (star imports) ignore tightens lint rules appropriately. Retaining F401 for __init__.py is correct since these files commonly re-export symbols.


149-152: LGTM!

The MyPy override to ignore missing imports for litellm.* is appropriate since the library lacks complete type stubs.


195-197: LGTM!

The list format for filterwarnings is valid and more extensible. Treating warnings as errors during tests ensures deprecation issues are caught early.


17-17: No changes needed—all dependencies are correctly pinned.

The litellm==1.82.0 dependency is properly pinned with == and version 1.82.0 exists on PyPI. All other dependencies in pyproject.toml (lines 15-21 and 34-52) are also pinned with == per coding guidelines.

src/ai_company/config/schema.py (1)

64-67: Good default driver wiring.

This adds a typed, immutable config selector with a safe default and keeps existing configs working.

src/ai_company/providers/drivers/__init__.py (1)

7-9: Clean public export surface.

Explicitly exporting LiteLLMDriver here keeps driver imports stable and discoverable.

src/ai_company/providers/errors.py (1)

143-158: Error taxonomy extension looks solid.

The new driver-registry errors are specific and keep retry semantics explicit.

tests/unit/providers/test_registry.py (1)

130-202: Nice coverage for construction and immutability paths.

The from_config and source-dict mutation cases are especially valuable for guarding registry behavior.

tests/unit/providers/drivers/test_mappers.py (1)

22-278: Mapper tests are thorough and well-structured.

Good balance of nominal and edge-case coverage, especially for tool-call parsing variants.

src/ai_company/providers/__init__.py (1)

9-59: Public API exports are coherent with the new driver architecture.

LiteLLMDriver, ProviderRegistry, and driver-related errors are surfaced cleanly.

tests/unit/providers/drivers/test_litellm_driver.py (2)

338-440: Exception-path coverage is excellent.

The mapped-exception and stream-iteration failure tests are strong and directly exercise resilience behavior.


80-505: No action needed — 30-second timeout is already configured globally.

The pyproject.toml file already sets timeout = 30 in [tool.pytest.ini_options] (line 187), and pytest-timeout is installed as a test dependency. This global configuration automatically applies to all tests in the repository, including the async tests in this file, preventing hung streams from stalling CI.

src/ai_company/providers/drivers/mappers.py (1)

158-159: PEP 758 syntax is correctly applied for Python 3.14+.

Line 158 uses the PEP 758 except A, B: syntax, which is valid since the project requires Python 3.14+ (pinned in pyproject.toml), ruff targets py314, and CI runs on Python 3.14.

tests/unit/providers/drivers/conftest.py (1)

16-176: Fixture utilities look solid.

The helpers are deterministic and keep driver tests isolated from real provider/network behavior.

…pilot

Source fixes:
- Add collision detection in _build_model_lookup for alias/ID conflicts
- Defensive check for empty choices in _map_response
- Wrap response mapping in try/except to keep ProviderError hierarchy
- Read supports_streaming from model info instead of hard-coding True
- Case-insensitive retry-after header lookup per HTTP semantics
- None-safe int conversion for usage token counts (or 0 pattern)
- Fix streaming usage drop when prompt_tokens is zero
- Replace bare except Exception with targeted catches + logging
- Add warning logging for silent JSON parse failures in tool calls
- Add warning logging for dropped/incomplete tool calls
- Add warning logging for unknown finish reasons and skipped items
- Add Anthropic-specific finish reason keys (end_turn, stop_sequence, tool_use)
- Deep copy parameters_schema in tool mapper for immutability
- Handle unhashable inputs in ProviderRegistry.__contains__
- Wrap factory call in _build_driver to catch construction errors
- Document DriverAlreadyRegisteredError as reserved for future use
- Remove unused mock_acompletion fixture
- Multiple docstring improvements across driver, mappers, registry

New tests (18 added, 253 total):
- Stream exception before iteration
- Response mapping error wrapped as ProviderError
- Incomplete tool call accumulator dropped
- Multiple concurrent streaming tool calls
- Usage-only chunk with empty choices
- Usage emitted when prompt_tokens is zero
- Case-insensitive retry-after header
- No headers / non-numeric retry-after edge cases
- supports_streaming from model info
- supports_streaming_tool_calls requires both
- Non-callable factory, non-provider return, factory exception
- Unhashable __contains__ returns False
- Anthropic finish reasons (end_turn, stop_sequence, tool_use)
@Aureliolo Aureliolo requested a review from Copilot March 1, 2026 11:34
@Aureliolo
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
src/ai_company/providers/drivers/mappers.py (2)

194-196: 🧹 Nitpick | 🔵 Trivial

Return a defensive copy for parsed dict arguments.

Same concern as above—the parsed dict is returned directly without copying.

♻️ Proposed fix
         if isinstance(parsed, dict):
-            return parsed
+            return dict(parsed)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/providers/drivers/mappers.py` around lines 194 - 196, The code
returns the input dict object directly from the branch checking "if
isinstance(parsed, dict): return parsed", which can lead to callers mutating
internal state; change that to return a defensive shallow copy (e.g., return
parsed.copy() or dict(parsed)) instead, updating the return in the function in
mappers.py that uses the "parsed" variable so callers receive a copy rather than
the original dict.

183-184: 🧹 Nitpick | 🔵 Trivial

Return a defensive copy for dict arguments to enforce immutability.

When raw is already a dict, returning it directly allows callers to mutate the original. This is inconsistent with the deep copy applied to parameters_schema at line 82.

♻️ Proposed fix
     if isinstance(raw, dict):
-        return raw
+        return dict(raw)

As per coding guidelines "Create new objects instead of mutating existing ones — enforce immutability".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/providers/drivers/mappers.py` around lines 183 - 184, Replace
the direct return of the dict `raw` with a defensive deep copy to enforce
immutability (consistent with the `parameters_schema` deep copy at line 82);
specifically, where `if isinstance(raw, dict): return raw` appears in
mappers.py, return a deep copy of `raw` instead (use copy.deepcopy(raw)) and add
the necessary `import copy` if not already present so callers cannot mutate the
original dict.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/ai_company/providers/drivers/mappers.py`:
- Around line 194-196: The code returns the input dict object directly from the
branch checking "if isinstance(parsed, dict): return parsed", which can lead to
callers mutating internal state; change that to return a defensive shallow copy
(e.g., return parsed.copy() or dict(parsed)) instead, updating the return in the
function in mappers.py that uses the "parsed" variable so callers receive a copy
rather than the original dict.
- Around line 183-184: Replace the direct return of the dict `raw` with a
defensive deep copy to enforce immutability (consistent with the
`parameters_schema` deep copy at line 82); specifically, where `if
isinstance(raw, dict): return raw` appears in mappers.py, return a deep copy of
`raw` instead (use copy.deepcopy(raw)) and add the necessary `import copy` if
not already present so callers cannot mutate the original dict.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 49710c6 and 63f0774.

📒 Files selected for processing (8)
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/registry.py
  • tests/unit/providers/drivers/conftest.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/unit/providers/drivers/test_mappers.py
  • tests/unit/providers/test_registry.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do not use from __future__ import annotations — Python 3.14 has PEP 649
Use PEP 758 except syntax: except A, B: (no parentheses) — ruff enforces this on Python 3.14
Add type hints to all public functions, enforced by mypy strict mode
Use Google style docstrings on all public classes and functions, enforced by ruff D rules
Create new objects instead of mutating existing ones — enforce immutability
Use Pydantic v2 with BaseModel, model_validator, and ConfigDict
Keep line length to 88 characters, enforced by ruff
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files

Files:

  • src/ai_company/providers/drivers/mappers.py
  • tests/unit/providers/test_registry.py
  • tests/unit/providers/drivers/test_mappers.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • src/ai_company/providers/registry.py
  • tests/unit/providers/drivers/conftest.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Maintain 80% minimum code coverage, enforced in CI

Files:

  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/providers/errors.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/registry.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Use asyncio_mode = 'auto' in pytest — no manual @pytest.mark.asyncio needed
Set 30-second timeout per test

Files:

  • tests/unit/providers/test_registry.py
  • tests/unit/providers/drivers/test_mappers.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/unit/providers/drivers/conftest.py
🧠 Learnings (12)
📚 Learning: 2026-03-01T10:09:25.209Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T10:09:25.209Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow exceptions

Applied to files:

  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-03-01T10:09:25.209Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-01T10:09:25.209Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) — ruff enforces this on Python 3.14

Applied to files:

  • src/ai_company/providers/drivers/mappers.py
  • src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

  • src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.

Applied to files:

  • src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/unit/providers/drivers/test_litellm_driver.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Mock Ollama API responses to support both dict (`models.get("models")`) and object (`response.models`) patterns in test mocks.

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Mock Ollama API calls in tests to avoid requiring a running Ollama instance

Applied to files:

  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)

Applied to files:

  • tests/unit/providers/drivers/conftest.py
🧬 Code graph analysis (5)
tests/unit/providers/test_registry.py (3)
src/ai_company/config/schema.py (2)
  • ProviderConfig (52-95)
  • ProviderModelConfig (17-49)
src/ai_company/providers/errors.py (2)
  • DriverFactoryNotFoundError (159-162)
  • DriverNotRegisteredError (143-146)
src/ai_company/providers/registry.py (4)
  • ProviderRegistry (21-139)
  • get (56-78)
  • list_providers (80-82)
  • from_config (96-139)
tests/unit/providers/drivers/test_mappers.py (2)
src/ai_company/providers/enums.py (2)
  • FinishReason (15-22)
  • MessageRole (6-12)
src/ai_company/providers/models.py (4)
  • ChatMessage (114-186)
  • ToolCall (73-95)
  • ToolDefinition (45-70)
  • ToolResult (98-111)
tests/unit/providers/drivers/test_litellm_driver.py (4)
src/ai_company/config/schema.py (2)
  • ProviderConfig (52-95)
  • ProviderModelConfig (17-49)
src/ai_company/providers/drivers/litellm_driver.py (1)
  • LiteLLMDriver (89-514)
src/ai_company/providers/enums.py (3)
  • FinishReason (15-22)
  • MessageRole (6-12)
  • StreamEventType (25-32)
src/ai_company/providers/errors.py (8)
  • AuthenticationError (72-75)
  • ContentFilterError (119-122)
  • InvalidRequestError (113-116)
  • ModelNotFoundError (107-110)
  • ProviderError (22-69)
  • ProviderInternalError (137-140)
  • ProviderTimeoutError (125-128)
  • RateLimitError (78-104)
src/ai_company/providers/registry.py (3)
src/ai_company/providers/base.py (1)
  • BaseCompletionProvider (26-273)
src/ai_company/providers/errors.py (2)
  • DriverFactoryNotFoundError (159-162)
  • DriverNotRegisteredError (143-146)
src/ai_company/config/schema.py (1)
  • ProviderConfig (52-95)
tests/unit/providers/drivers/conftest.py (1)
src/ai_company/config/schema.py (2)
  • ProviderConfig (52-95)
  • ProviderModelConfig (17-49)
🔇 Additional comments (13)
src/ai_company/providers/errors.py (1)

141-162: LGTM!

The new driver error classes follow the established pattern, have appropriate docstrings, and correctly set is_retryable = False for configuration-time errors.

src/ai_company/providers/drivers/mappers.py (1)

1-124: LGTM!

The mapping module is well-structured with:

  • Proper type hints and Google-style docstrings
  • Comprehensive finish reason mapping including Anthropic-specific values
  • Defensive deep copy for parameters_schema
  • Appropriate logging for unknown finish reasons
tests/unit/providers/drivers/conftest.py (1)

1-170: LGTM!

The conftest module provides well-designed test utilities:

  • Reusable mock factories with sensible defaults
  • Support for both dict and attribute-access patterns (per learnings about Ollama API mocks)
  • Configurable fixtures for various test scenarios
  • Proper async generator for streaming tests
tests/unit/providers/test_registry.py (1)

1-249: LGTM!

Comprehensive test coverage for the ProviderRegistry including:

  • Core operations (get, list, contains, len)
  • Factory configuration with overrides
  • Error cases (unknown driver, non-callable factory, non-provider return, factory exceptions)
  • Immutability verification
  • Default LiteLLM driver resolution

All tests properly marked with @pytest.mark.unit.

tests/unit/providers/drivers/test_mappers.py (1)

1-281: LGTM!

Thorough test coverage for mapper functions including:

  • Message conversion for all role types
  • Tool definition conversion
  • Finish reason mapping with Anthropic-specific values
  • Tool call extraction with edge cases (None, empty, invalid JSON, missing fields)

All tests properly marked with @pytest.mark.unit.

src/ai_company/providers/registry.py (1)

1-196: LGTM!

Well-designed registry implementation:

  • Immutable driver mapping using MappingProxyType
  • Defensive copy on construction
  • Robust __contains__ handling unhashable inputs
  • Comprehensive factory validation (callable check, type check, exception wrapping)
  • Clear error messages listing available providers/drivers
tests/unit/providers/drivers/test_litellm_driver.py (1)

1-759: LGTM!

Excellent test coverage for LiteLLMDriver including:

  • Non-streaming completion with various configurations
  • Streaming with tool call deltas and usage tracking
  • Exception mapping for all LiteLLM exception types
  • Model capabilities with fallbacks
  • Edge cases (empty choices, zero prompt_tokens, case-insensitive headers)

All tests properly marked with @pytest.mark.unit and use mocked LiteLLM calls.

src/ai_company/providers/drivers/litellm_driver.py (6)

1-106: LGTM!

The driver module is well-structured with:

  • Clean separation of concerns (hooks, mapping, streaming, exception handling)
  • Comprehensive LiteLLM exception mapping table
  • Proper TYPE_CHECKING pattern for type-only imports

219-261: LGTM!

Model resolution properly validates:

  • Duplicate model IDs
  • Alias collisions with existing keys
  • Clear error messages with conflicting identifiers

293-337: LGTM!

Response mapping implements defensive patterns:

  • Guard against empty choices with descriptive error
  • Null-safe token extraction with or 0 coercion
  • Proper use of getattr for attribute access

370-431: LGTM!

Streaming implementation handles edge cases:

  • Usage-only chunks with empty choices
  • Usage emission regardless of prompt_tokens value
  • Null-safe token conversion
  • Proper delta accumulation for tool calls

464-485: LGTM!

Retry-after extraction properly implements:

  • Case-insensitive header lookup per HTTP semantics
  • Graceful handling of non-numeric values
  • PEP 758 compliant exception syntax

573-622: LGTM!

The _ToolCallAccumulator class properly:

  • Accumulates streaming deltas incrementally
  • Handles incomplete tool calls with appropriate logging
  • Gracefully handles JSON parse failures
  • Uses PEP 758 compliant exception syntax

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 14 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +186 to +193
try:
parsed = json.loads(raw)
except json.JSONDecodeError, ValueError:
_logger.warning(
"Failed to parse tool call arguments: %r",
raw[:200],
)
return {}
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid exception syntax: except json.JSONDecodeError, ValueError: is not valid in Python 3. Use except (json.JSONDecodeError, ValueError): (or just json.JSONDecodeError) so this file parses and the fallback {} path works.

Copilot uses AI. Check for mistakes.
Comment on lines +478 to +485
try:
return float(raw)
except ValueError, TypeError:
_logger.debug(
"Could not parse retry-after header as seconds: %r",
raw,
)
return None
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid exception syntax: except ValueError, TypeError: is not valid in Python 3. Wrap the exception types in parentheses (e.g., except (ValueError, TypeError):) so retry-after parsing doesn’t cause a SyntaxError.

Copilot uses AI. Check for mistakes.
Comment on lines +498 to +506
try:
raw = _litellm.get_model_info(model=litellm_model)
info: dict[str, Any] = dict(raw) if raw else {}
except KeyError, ValueError:
_logger.info(
"No LiteLLM metadata for model %r, using config defaults",
litellm_model,
)
return {}
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid exception syntax: except KeyError, ValueError: is not valid in Python 3. Use except (KeyError, ValueError): so model-info fallback works and the module can be imported.

Copilot uses AI. Check for mistakes.
Comment on lines +611 to +620
try:
parsed = json.loads(self.arguments) if self.arguments else {}
except json.JSONDecodeError, ValueError:
_logger.warning(
"Failed to parse tool call arguments for tool %r (id=%r): %r",
self.name,
self.id,
self.arguments[:200] if self.arguments else "",
)
parsed = {}
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid exception syntax: except json.JSONDecodeError, ValueError: is not valid in Python 3. Use except (json.JSONDecodeError, ValueError): (or just json.JSONDecodeError) so streamed tool-call argument parsing doesn’t break module import.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-designed and extensible driver system for LLM providers, with LiteLLM as the default backend. The new ProviderRegistry provides a clean, immutable way to manage provider drivers, and the pure mapping functions in mappers.py create a strong separation of concerns. The exception handling is particularly robust, mapping a wide range of provider-specific errors to a unified hierarchy, and the code is accompanied by a comprehensive suite of unit tests. However, potential security issues have been identified, specifically the leakage of sensitive information in error logs and a resource exhaustion vulnerability in the streaming tool call accumulation logic. Additionally, there are critical syntax errors in the exception handling blocks, using Python 2-style syntax (e.g., except A, B:) which is invalid in Python 3 and will cause a SyntaxError. Addressing these concerns will improve the production readiness and security posture of the provider layer.

return None
try:
return float(raw)
except ValueError, TypeError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is for Python 2. For Python 3, multiple exceptions must be caught as a tuple. This will raise a SyntaxError at runtime.

        except (ValueError, TypeError):

try:
raw = _litellm.get_model_info(model=litellm_model)
info: dict[str, Any] = dict(raw) if raw else {}
except KeyError, ValueError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is for Python 2. For Python 3, multiple exceptions must be caught as a tuple. This will raise a SyntaxError at runtime.

        except (KeyError, ValueError):

return None
try:
parsed = json.loads(self.arguments) if self.arguments else {}
except json.JSONDecodeError, ValueError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is for Python 2. For Python 3, multiple exceptions must be caught as a tuple. This will raise a SyntaxError at runtime.

Suggested change
except json.JSONDecodeError, ValueError:
except (json.JSONDecodeError, ValueError):

if isinstance(raw, str):
try:
parsed = json.loads(raw)
except json.JSONDecodeError, ValueError:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This except syntax is for Python 2. For Python 3, multiple exceptions must be caught as a tuple. This will raise a SyntaxError at runtime.

Suggested change
except json.JSONDecodeError, ValueError:
except (json.JSONDecodeError, ValueError):

retry_after=self._extract_retry_after(exc),
context=ctx,
)
return our_type(str(exc), context=ctx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Raw exception strings from LiteLLM are included directly in the error message. These strings can contain sensitive information like API keys (e.g., in authentication errors). Since these messages are often logged, this can lead to secret leakage. It is safer to use a generic message and put the exception details in the context dictionary, which handles redaction.

return exc

return errors.ProviderInternalError(
f"Unexpected error from {self._provider_name}: {exc}",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Including the raw exception exc in the error message can leak sensitive information if the exception string contains secrets. Consider using a generic message and moving the exception details to the context dictionary.

self.name = str(name)
args = getattr(func, "arguments", None)
if args:
self.arguments += str(args)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Tool call arguments are accumulated here without any length limit. A malicious LLM provider or a prompt injection attack could send an infinite stream of deltas, leading to memory exhaustion and a Denial of Service (DoS). Please implement a maximum length check for self.arguments.

except Exception as exc:
msg = (
f"Failed to instantiate driver {driver_type!r} for provider {name!r}: {exc}"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The raw exception exc is included in the error message, which may leak sensitive configuration data if the exception string contains secrets. Use a generic error message instead.

…copies

- Move raw exception strings from error messages to context dicts to prevent
  potential API key leakage in logs (Gemini security-medium)
- Add 1 MiB length limit to _ToolCallAccumulator to prevent DoS via infinite
  streaming deltas (Gemini security-medium)
- Return defensive copies from _parse_arguments for immutability consistency
  (CodeRabbit nitpick)
- Update registry factory error to use generic message with detail in context
@Aureliolo Aureliolo merged commit ae3f18b into main Mar 1, 2026
7 of 8 checks passed
@Aureliolo Aureliolo deleted the feat/litellm-driver branch March 1, 2026 12:07
@coderabbitai coderabbitai bot mentioned this pull request Mar 1, 2026
4 tasks
Aureliolo added a commit that referenced this pull request Mar 1, 2026
## Summary

- Adds **39 integration tests** for the provider adapter layer,
completing the final unchecked acceptance criterion from #5:
_"Integration tests with mock/recorded API responses"_
- All source code was already implemented in PRs #86 and #88 — this PR
covers only the integration test suite
- Mocks at `litellm.acompletion` level using **real
`litellm.ModelResponse`** objects (not MagicMock), exercising actual
attribute access paths through `_map_response`, `_process_chunk`, and
`extract_tool_calls`

### Test files

| File | Tests | Coverage |
|------|-------|---------|
| `test_anthropic_pipeline.py` | 13 | Config→registry→complete/stream,
alias resolution, cost computation, streaming |
| `test_openrouter_pipeline.py` | 5 | Custom base_url forwarding, model
prefixing, multi-model alias resolution |
| `test_ollama_pipeline.py` | 4 | No api_key, localhost base_url,
zero-cost models |
| `test_error_scenarios.py` | 9 | Rate limit (429 + retry-after), auth
(401), timeout, connection, internal, unknown |
| `test_tool_calling_pipeline.py` | 8 | Single/multiple tool calls,
streaming accumulation, mixed text+tools, multi-turn |
| `conftest.py` | — | Config factories, real ModelResponse builders,
stream helpers |

### Verification

- `ruff check` — all passed
- `ruff format` — all formatted
- `mypy` — 0 errors (7 files)
- `pytest` — 1331 total tests pass, **94.49% coverage** (80% required)

Closes #5

## Test plan

- [ ] CI passes (lint + type-check + test + coverage)
- [ ] 39 integration tests pass under `pytest -m integration`
- [ ] No regressions in existing 1292 unit tests
- [ ] Coverage remains above 80% threshold
Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
This was referenced Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evaluate LiteLLM: integration prototype, limitations, alternatives

2 participants