Skip to content

feat: add retry handler, rate limiter, and provider resilience#100

Merged
Aureliolo merged 4 commits intomainfrom
feat/provider-resilience
Mar 5, 2026
Merged

feat: add retry handler, rate limiter, and provider resilience#100
Aureliolo merged 4 commits intomainfrom
feat/provider-resilience

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Add RetryHandler with exponential backoff, jitter, and retry-after header respect for transient provider errors
  • Add RateLimiter with sliding-window RPM tracking and asyncio semaphore concurrency control
  • Add RetryExhaustedError as a signal for M3 fallback chain orchestration
  • Wire retry + rate limiting into BaseCompletionProvider — all drivers inherit resilience automatically
  • Add RetryConfig and RateLimiterConfig fields to ProviderConfig
  • Add 5 observability event constants for retry/rate-limit tracking
  • Add resilience-audit agent to PR review skill for ongoing enforcement

Architecture

LiteLLMDriver.__init__()
  ├── RetryHandler(config.retry)        # if max_retries > 0
  └── RateLimiter(config.rate_limiter)  # if enabled
        ↓ passed to
BaseCompletionProvider.__init__(retry_handler=..., rate_limiter=...)
        ↓ used in
complete() / stream()
  └── _resilient_execute(attempt_fn)
        └── RetryHandler.execute() → _rate_limited_call() → _do_complete()
  • Retry: wraps the full _rate_limited_call → _do_complete/_do_stream cycle
  • Rate limiter: acquire/release around each attempt; pauses on retry-after
  • Streaming: only connection setup is retried, not mid-stream errors
  • Non-retryable errors (is_retryable=False) raise immediately without retry

New Files

File Purpose
src/ai_company/providers/resilience/__init__.py Package re-exports
src/ai_company/providers/resilience/config.py Re-exports from config/schema.py
src/ai_company/providers/resilience/errors.py RetryExhaustedError
src/ai_company/providers/resilience/retry.py RetryHandler (backoff + jitter)
src/ai_company/providers/resilience/rate_limiter.py RateLimiter (RPM + concurrency)
tests/unit/providers/resilience/test_*.py 50 unit tests
tests/integration/providers/test_retry_integration.py 7 integration tests

Test plan

  • 16 config validation tests (defaults, bounds, immutability)
  • 6 RetryExhaustedError tests (attributes, non-retryable flag)
  • 17 RetryHandler tests (success, exhaust, non-retryable, backoff, jitter, retry-after, logging)
  • 11 RateLimiter tests (disabled, concurrent, RPM, pause, logging)
  • 7 integration tests (retry with mocked LiteLLM: succeed-after-transient, exhaust, non-retryable, rate-limit, stream)
  • All 1540 existing tests still pass
  • 94.70% coverage (threshold: 80%)
  • ruff, mypy, pre-commit all clean

Closes #9

Implement automatic retry with exponential backoff + jitter and
client-side rate limiting (RPM + concurrency) for the provider layer.

- Add RetryHandler with configurable backoff, retry-after respect
- Add RateLimiter with sliding-window RPM and semaphore concurrency
- Add RetryExhaustedError for fallback chain signaling
- Wire resilience into BaseCompletionProvider (retry + rate limiting)
- Add RetryConfig and RateLimiterConfig to ProviderConfig
- Add 5 observability event constants for retry/rate-limit tracking
- Add resilience-audit agent to PR review skill
- 57 new tests (unit + integration), 94.70% overall coverage
Copilot AI review requested due to automatic review settings March 2, 2026 21:00
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 2, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2d7c8332-64e6-44ca-b6e5-1a4cfce3e0be

📥 Commits

Reviewing files that changed from the base of the PR and between 4bf337b and d555773.

📒 Files selected for processing (1)
  • tests/unit/config/conftest.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Provider-level automatic retries with configurable backoff, jitter, and exhaustion signaling
    • Per-provider rate limiting with concurrency controls and pause-on-retry-after behavior
    • New observability events for retry attempts, skips, exhaustion, throttling, and pauses
    • New provider config options for retry and rate-limiting
  • Bug Fixes

    • Streaming tool-calls drop malformed JSON tool-call chunks instead of emitting empty arguments
  • Testing

    • Extensive new unit and integration tests and resilience fixtures
  • Documentation

    • Added provider resilience docs and updated testing requirements

Walkthrough

Adds provider-level resilience: configurable retry and rate-limiting models, runtime implementations (RetryHandler, RateLimiter), wiring into BaseCompletionProvider and LiteLLM driver, new observability events, extensive unit/integration tests, and docs/reviewer rules for resilience audits.

Changes

Cohort / File(s) Summary
Config schema
src/ai_company/config/schema.py
Adds RetryConfig and RateLimiterConfig and exposes ProviderConfig.retry and ProviderConfig.rate_limiter.
Resilience package
src/ai_company/providers/resilience/__init__.py, src/ai_company/providers/resilience/config.py, src/ai_company/providers/resilience/errors.py, src/ai_company/providers/resilience/rate_limiter.py, src/ai_company/providers/resilience/retry.py
New package: re-exports config types, adds RetryExhaustedError, RateLimiter implementation, and RetryHandler with exponential backoff, jitter, RateLimitError handling, and observability events.
Provider base & exports
src/ai_company/providers/__init__.py, src/ai_company/providers/base.py
Exports resilience types; BaseCompletionProvider accepts retry_handler and rate_limiter, adds _resilient_execute and _rate_limited_call helpers and integrates them into complete/stream flows.
Driver changes
src/ai_company/providers/drivers/litellm_driver.py
Wires retry_handler and rate_limiter into the driver, refactors completion config merging, strengthens model lookup validation/logging, adjusts streaming types and tool-call accumulation/error handling.
Observability
src/ai_company/observability/events.py
Adds provider resilience event constants: PROVIDER_RETRY_ATTEMPT, PROVIDER_RETRY_EXHAUSTED, PROVIDER_RETRY_SKIPPED, PROVIDER_RATE_LIMITER_THROTTLED, PROVIDER_RATE_LIMITER_PAUSED.
Docs & review rules
CLAUDE.md, .claude/skills/aurelio-review-pr/skill.md
Documents provider-level retry and rate-limiting behavior and adds a resilience-audit review agent with hard/soft rules for PR reviews.
Tests: unit & integration
tests/unit/providers/resilience/*, tests/integration/providers/test_retry_integration.py, tests/unit/providers/drivers/conftest.py, tests/** (many)
Adds unit tests for RetryHandler/RateLimiter/RetryExhaustedError, integration tests for retries/rate limits; updates many fixtures/tests to provide RetryConfig/RateLimiterConfig and replace model IDs with test-model-*.
Test data updates
tests/** (multiple files)
Replaces real model identifiers with test-model-* placeholders and adjusts assertions/mocks accordingly.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant BaseProv as BaseCompletionProvider
    participant RateLimiter
    participant RetryHandler
    participant Driver as ProviderDriver
    participant Observ as Observability

    Client->>BaseProv: complete()/stream() request
    BaseProv->>RateLimiter: acquire() (if enabled)
    alt RateLimiter pause or RateLimitError with retry_after
        RateLimiter-->>BaseProv: raises RateLimitError / enforces pause
        BaseProv->>RetryHandler: execute(func) (honors retry_after)
    else granted
        RateLimiter-->>BaseProv: granted
        BaseProv->>RetryHandler: execute(func)
    end
    RetryHandler->>Driver: call provider API (wrapped)
    alt transient retryable error
        Driver-->>RetryHandler: raises ProviderError (retryable)
        RetryHandler->>Observ: emit PROVIDER_RETRY_ATTEMPT
        RetryHandler-->>RetryHandler: wait backoff (compute_delay)
        RetryHandler->>Driver: retry call
    else success
        Driver-->>RetryHandler: response
    end
    RetryHandler-->>BaseProv: final result or RetryExhaustedError
    BaseProv->>RateLimiter: release() (if acquired)
    BaseProv->>Client: return/stream result
    Note right of Observ: emits attempt/exhausted/skipped/throttled/paused events
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.99% which is insufficient. The required threshold is 100.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding retry handler, rate limiter, and provider resilience features.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining the new resilience layer, architecture, and test coverage.
Linked Issues check ✅ Passed The PR addresses the primary objectives from issue #9: exponential backoff for transient errors, configurable retry count, rate limiting, and RetryExhaustedError signal for fallback orchestration [#9].
Out of Scope Changes check ✅ Passed All changes are within scope of adding retry/rate-limit resilience and updating test fixtures with new model IDs to support the feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/provider-resilience

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and reliability of the system's interactions with external completion providers. By introducing a centralized and configurable resilience layer, it ensures that transient network issues, provider timeouts, and rate limits are handled gracefully through automatic retries and intelligent throttling. This change reduces the likelihood of service disruptions and improves the overall robustness of the provider integration, making the system more resilient to external fluctuations.

Highlights

  • Provider Resilience: Introduced a robust resilience layer for completion providers, incorporating automatic retry mechanisms and client-side rate limiting to handle transient errors and manage request throughput effectively.
  • Retry Handler: Implemented a RetryHandler with exponential backoff, jitter, and support for retry-after headers, ensuring intelligent retries for transient provider errors.
  • Rate Limiter: Developed a RateLimiter featuring sliding-window RPM tracking and asyncio semaphore-based concurrency control to prevent overwhelming providers and respect rate limits.
  • Error Signaling: Added RetryExhaustedError to explicitly signal when all retry attempts have failed, enabling upstream systems (like M3 fallback chains) to orchestrate alternative actions.
  • Automatic Integration: Wired the new retry and rate-limiting logic directly into BaseCompletionProvider, ensuring all derived drivers automatically inherit these resilience features without needing explicit implementation.
  • Configuration and Observability: Introduced RetryConfig and RateLimiterConfig models within ProviderConfig for granular control, and added five new observability event constants for tracking resilience-related activities.
  • Resilience Audit Agent: Integrated a resilience-audit agent into the PR review skill to enforce best practices and identify violations related to provider resilience.
Changelog
  • .claude/skills/aurelio-review-pr/skill.md
    • Added a resilience-audit agent to the PR review skill, along with a set of hard and soft rules for auditing provider-layer changes.
  • CLAUDE.md
    • Added a new 'Resilience' section outlining guidelines for implementing and using provider resilience features.
  • src/ai_company/config/schema.py
    • Added RetryConfig and RateLimiterConfig Pydantic models to define retry and rate limiting parameters.
    • Integrated RetryConfig and RateLimiterConfig as fields within the ProviderConfig model.
  • src/ai_company/observability/events.py
    • Added five new constant strings for observability events related to provider retry and rate limiting.
  • src/ai_company/providers/init.py
    • Imported and re-exported the new RateLimiter, RateLimiterConfig, RetryConfig, RetryExhaustedError, and RetryHandler components.
  • src/ai_company/providers/base.py
    • Updated the BaseCompletionProvider's __init__ method to accept optional RetryHandler and RateLimiter instances.
    • Modified complete and stream methods to wrap calls to _do_complete and _do_stream with _resilient_execute and _rate_limited_call.
    • Added _resilient_execute method to apply retry logic if a RetryHandler is configured.
    • Added _rate_limited_call method to apply rate limiting logic if a RateLimiter is configured, handling RateLimitError.retry_after.
  • src/ai_company/providers/drivers/litellm_driver.py
    • Imported RateLimiter and RetryHandler.
    • Modified the LiteLLMDriver's __init__ to instantiate and pass RetryHandler and RateLimiter instances to the super().__init__() call based on ProviderConfig.
  • src/ai_company/providers/resilience/init.py
    • Added a new package resilience and its __init__.py to re-export resilience-related modules and classes.
  • src/ai_company/providers/resilience/config.py
    • Added a new file to re-export RateLimiterConfig and RetryConfig from ai_company.config.schema to avoid circular dependencies.
  • src/ai_company/providers/resilience/errors.py
    • Added a new file defining RetryExhaustedError, a custom ProviderError subclass for signaling retry exhaustion.
  • src/ai_company/providers/resilience/rate_limiter.py
    • Added a new file implementing the RateLimiter class, which manages client-side rate limiting using a sliding window for RPM and an asyncio.Semaphore for concurrency.
  • src/ai_company/providers/resilience/retry.py
    • Added a new file implementing the RetryHandler class, which provides retry logic with exponential backoff, jitter, and respect for retry-after headers.
  • tests/integration/providers/conftest.py
    • Updated make_anthropic_config, make_openrouter_config, and make_ollama_config fixtures to explicitly disable retries by setting retry=RetryConfig(max_retries=0).
  • tests/integration/providers/test_retry_integration.py
    • Added a new file containing integration tests for the retry and rate limiting functionality within the LiteLLM driver, covering success after transient failures, retry exhaustion, non-retryable errors, rate limit handling, and streaming retries.
  • tests/unit/providers/drivers/conftest.py
    • Updated the make_provider_config function to accept retry and rate_limiter configurations, defaulting max_retries to 0 for driver tests.
  • tests/unit/providers/resilience/conftest.py
    • Added a new file providing shared pytest fixtures for resilience unit tests, including various RetryConfig and RateLimiterConfig instances, and different types of ProviderError.
  • tests/unit/providers/resilience/test_config.py
    • Added a new file with unit tests for the RetryConfig and RateLimiterConfig models, verifying default values, custom settings, and validation rules.
  • tests/unit/providers/resilience/test_errors.py
    • Added a new file with unit tests for the RetryExhaustedError class, confirming its properties and behavior.
  • tests/unit/providers/resilience/test_rate_limiter.py
    • Added a new file with unit tests for the RateLimiter class, covering disabled state, concurrency limits, RPM tracking, pause functionality, and logging.
  • tests/unit/providers/resilience/test_retry.py
    • Added a new file with unit tests for the RetryHandler class, verifying success scenarios, retry exhaustion, non-retryable errors, disabled retries, backoff calculation, jitter, retry-after handling, and logging.
  • tests/unit/providers/test_protocol.py
    • Updated _ConcreteProvider and _RecordingProvider classes to explicitly call super().__init__() in their constructors.
Activity
  • The pull request introduces a comprehensive resilience framework for provider interactions.
  • A total of 50 new unit tests and 7 new integration tests were added to validate the new resilience features.
  • All 1540 existing tests continue to pass, ensuring no regressions were introduced.
  • Code coverage stands at 94.70%, exceeding the 80% threshold.
  • The codebase remains clean with ruff, mypy, and pre-commit checks passing.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a provider-layer resilience stack (retry + rate limiting) and wires it into BaseCompletionProvider so all drivers inherit consistent backoff/throttling behavior.

Changes:

  • Introduces RetryHandler, RateLimiter, and RetryExhaustedError, plus new observability event constants.
  • Extends ProviderConfig with retry and rate_limiter configuration models.
  • Adds comprehensive unit + integration test coverage and updates test helpers/docs for the new provider initialization pattern.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/unit/providers/test_protocol.py Updates test providers to call BaseCompletionProvider.__init__ for new resilience fields
tests/unit/providers/resilience/test_retry.py Adds unit tests for retry behavior, backoff/jitter, and logging events
tests/unit/providers/resilience/test_rate_limiter.py Adds unit tests for RPM/concurrency limiting, pause behavior, and logging events
tests/unit/providers/resilience/test_errors.py Adds tests for RetryExhaustedError behavior and propagation
tests/unit/providers/resilience/test_config.py Adds validation/default/frozen-model tests for resilience config models
tests/unit/providers/resilience/conftest.py Adds shared fixtures for resilience tests
tests/unit/providers/drivers/conftest.py Extends provider config factory to set explicit retry/rate-limiter defaults for driver tests
tests/integration/providers/test_retry_integration.py Adds integration tests validating end-to-end retry + rate limiting wiring via LiteLLM
tests/integration/providers/conftest.py Ensures integration provider configs disable retries by default to preserve existing expectations
src/ai_company/providers/resilience/retry.py Implements RetryHandler (exponential backoff, jitter, retry_after, logging)
src/ai_company/providers/resilience/rate_limiter.py Implements RateLimiter (sliding-window RPM, semaphore concurrency, pause support, logging)
src/ai_company/providers/resilience/errors.py Introduces RetryExhaustedError wrapper error for fallback-chain signaling
src/ai_company/providers/resilience/config.py Re-exports RetryConfig / RateLimiterConfig from config.schema
src/ai_company/providers/resilience/init.py Package exports for resilience components
src/ai_company/providers/drivers/litellm_driver.py Wires RetryHandler and RateLimiter into BaseCompletionProvider
src/ai_company/providers/base.py Adds resilience execution helpers and wraps complete()/stream() with retry/limiting
src/ai_company/providers/init.py Re-exports resilience public API from ai_company.providers
src/ai_company/observability/events.py Adds new resilience observability event constants
src/ai_company/config/schema.py Adds RetryConfig / RateLimiterConfig models and new fields on ProviderConfig
CLAUDE.md Documents resilience invariants and rules for provider-layer code
.claude/skills/aurelio-review-pr/skill.md Adds “resilience-audit” agent trigger and rule set for PR review automation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +159 to +166
return await self._rate_limited_call(
self._do_stream,
messages,
model,
tools=tools,
config=config,
)

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream() acquires/releases the rate limiter around the coroutine that returns an AsyncIterator, so the concurrency slot is released immediately after connection setup—before the stream is consumed. This makes max_concurrent ineffective for long-lived streams. Consider adding a stream-specific wrapper that holds the limiter slot until iteration completes (release in a finally inside the generator), and only releases early if connection setup fails.

Suggested change
return await self._rate_limited_call(
self._do_stream,
messages,
model,
tools=tools,
config=config,
)
# For streaming, we must hold any rate-limiter slot for the
# lifetime of the stream, not just for the connection setup
# coroutine that returns an AsyncIterator.
rate_limiter: RateLimiter | None = getattr(self, "_rate_limiter", None)
if rate_limiter is None:
# Fallback: no rate limiting configured.
return await self._do_stream(
messages,
model,
tools=tools,
config=config,
)
# Acquire the limiter before establishing the stream.
await rate_limiter.acquire()
try:
inner_stream = await self._do_stream(
messages,
model,
tools=tools,
config=config,
)
except Exception:
# Connection/setup failed; release the slot immediately
# so that retries or other callers can proceed.
rate_limiter.release()
raise
async def _rate_limited_stream() -> AsyncIterator[StreamChunk]:
try:
async for chunk in inner_stream:
yield chunk
finally:
# Ensure the limiter slot is released once streaming
# completes or if the consumer stops early.
rate_limiter.release()
return _rate_limited_stream()

Copilot uses AI. Check for mistakes.
Comment on lines +276 to +293
Args:
func: Async callable to invoke.
*args: Positional arguments for *func*.
**kwargs: Keyword arguments for *func*.

Returns:
The return value of *func*.
"""
if self._rate_limiter is not None:
await self._rate_limiter.acquire()
try:
return await func(*args, **kwargs)
except RateLimitError as exc:
if self._rate_limiter is not None and exc.retry_after is not None:
self._rate_limiter.pause(exc.retry_after)
raise
finally:
if self._rate_limiter is not None:
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream() acquires/releases the rate limiter around the coroutine that returns an AsyncIterator, so the concurrency slot is released immediately after connection setup—before the stream is consumed. This makes max_concurrent ineffective for long-lived streams. Consider adding a stream-specific wrapper that holds the limiter slot until iteration completes (release in a finally inside the generator), and only releases early if connection setup fails.

Suggested change
Args:
func: Async callable to invoke.
*args: Positional arguments for *func*.
**kwargs: Keyword arguments for *func*.
Returns:
The return value of *func*.
"""
if self._rate_limiter is not None:
await self._rate_limiter.acquire()
try:
return await func(*args, **kwargs)
except RateLimitError as exc:
if self._rate_limiter is not None and exc.retry_after is not None:
self._rate_limiter.pause(exc.retry_after)
raise
finally:
if self._rate_limiter is not None:
For streaming calls (where *func* returns an ``AsyncIterator``),
the rate limiter slot is held for the entire lifetime of the
stream and only released when iteration completes.
Args:
func: Async callable to invoke.
*args: Positional arguments for *func*.
**kwargs: Keyword arguments for *func*.
Returns:
The return value of *func*. If *func* returns an
``AsyncIterator``, a wrapped ``AsyncIterator`` is returned
that releases the rate limiter when it is exhausted or
iteration is otherwise terminated.
"""
acquired = False
is_streaming_result = False
if self._rate_limiter is not None:
await self._rate_limiter.acquire()
acquired = True
try:
result = await func(*args, **kwargs)
# If the call returns an AsyncIterator, wrap it so the rate
# limiter is held for the lifetime of the stream and released
# only after iteration completes.
if acquired and isinstance(result, AsyncIterator):
is_streaming_result = True
async def _wrapped_stream() -> AsyncIterator[Any]:
try:
async for item in result:
yield item
finally:
if self._rate_limiter is not None:
self._rate_limiter.release()
return _wrapped_stream() # type: ignore[return-value]
return result
except RateLimitError as exc:
if self._rate_limiter is not None and exc.retry_after is not None:
self._rate_limiter.pause(exc.retry_after)
raise
finally:
# For non-streaming results, release immediately after the call.
# For streaming results, the wrapped iterator will release when
# iteration completes.
if self._rate_limiter is not None and acquired and not is_streaming_result:

Copilot uses AI. Check for mistakes.
Comment on lines +276 to +293
Args:
func: Async callable to invoke.
*args: Positional arguments for *func*.
**kwargs: Keyword arguments for *func*.

Returns:
The return value of *func*.
"""
if self._rate_limiter is not None:
await self._rate_limiter.acquire()
try:
return await func(*args, **kwargs)
except RateLimitError as exc:
if self._rate_limiter is not None and exc.retry_after is not None:
self._rate_limiter.pause(exc.retry_after)
raise
finally:
if self._rate_limiter is not None:
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stream() acquires/releases the rate limiter around the coroutine that returns an AsyncIterator, so the concurrency slot is released immediately after connection setup—before the stream is consumed. This makes max_concurrent ineffective for long-lived streams. Consider adding a stream-specific wrapper that holds the limiter slot until iteration completes (release in a finally inside the generator), and only releases early if connection setup fails.

Suggested change
Args:
func: Async callable to invoke.
*args: Positional arguments for *func*.
**kwargs: Keyword arguments for *func*.
Returns:
The return value of *func*.
"""
if self._rate_limiter is not None:
await self._rate_limiter.acquire()
try:
return await func(*args, **kwargs)
except RateLimitError as exc:
if self._rate_limiter is not None and exc.retry_after is not None:
self._rate_limiter.pause(exc.retry_after)
raise
finally:
if self._rate_limiter is not None:
For streaming calls where *func* returns an ``AsyncIterator``,
the rate limiter slot is held for the duration of stream
consumption and released when iteration completes.
Args:
func: Async callable to invoke.
*args: Positional arguments for *func*.
**kwargs: Keyword arguments for *func*.
Returns:
The return value of *func* or a wrapped ``AsyncIterator``
for streaming responses.
"""
release_immediately = True
if self._rate_limiter is not None:
await self._rate_limiter.acquire()
try:
result = await func(*args, **kwargs)
# If this is a streaming response, defer release until
# the stream has been fully consumed.
if self._rate_limiter is not None and isinstance(result, AsyncIterator):
rate_limiter = self._rate_limiter
async def _wrapped_stream() -> AsyncIterator[Any]:
try:
async for item in result:
yield item
finally:
# Ensure the limiter slot is released once
# the consumer finishes iterating the stream.
rate_limiter.release()
release_immediately = False
return _wrapped_stream()
return result
except RateLimitError as exc:
if self._rate_limiter is not None and exc.retry_after is not None:
self._rate_limiter.pause(exc.retry_after)
raise
finally:
if self._rate_limiter is not None and release_immediately:

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +86
def release(self) -> None:
"""Release a concurrency slot."""
if self._semaphore is not None:
self._semaphore.release()
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asyncio.Semaphore.release() can be over-called and increase the counter beyond the intended max_concurrent, which can silently disable concurrency limiting if a caller releases without acquiring (including due to future refactors/edge paths). Prefer preventing over-release by tracking acquisition count (and making extra releases a no-op with a debug log), or using a bounded mechanism and explicitly handling the error without letting the internal counter grow.

Copilot uses AI. Check for mistakes.
self._config.max_requests_per_minute > 0 or self._config.max_concurrent > 0
)

async def acquire(self) -> None:
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pause_until is only checked once at the start of acquire(). Because acquire() awaits (pause sleep, RPM waits, semaphore waits), another task can extend the pause during those awaits and this task may proceed without re-checking the pause. Restructure acquire() to re-check _pause_until after any await (commonly by looping: check pause → sleep if needed → then attempt rpm/semaphore; after rpm/semaphore waits, re-check pause before returning).

Copilot uses AI. Check for mistakes.
Comment on lines +60 to +82
if not self.is_enabled and self._pause_until <= 0.0:
return

# Respect pause-until from retry_after
now = time.monotonic()
if self._pause_until > now:
wait = self._pause_until - now
logger.info(
PROVIDER_RATE_LIMITER_THROTTLED,
provider=self._provider_name,
wait_seconds=round(wait, 2),
reason="pause_active",
)
await asyncio.sleep(wait)

# RPM sliding window
if self._config.max_requests_per_minute > 0:
await self._wait_for_rpm_slot()

# Concurrency semaphore
if self._semaphore is not None:
await self._semaphore.acquire()

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pause_until is only checked once at the start of acquire(). Because acquire() awaits (pause sleep, RPM waits, semaphore waits), another task can extend the pause during those awaits and this task may proceed without re-checking the pause. Restructure acquire() to re-check _pause_until after any await (commonly by looping: check pause → sleep if needed → then attempt rpm/semaphore; after rpm/semaphore waits, re-check pause before returning).

Suggested change
if not self.is_enabled and self._pause_until <= 0.0:
return
# Respect pause-until from retry_after
now = time.monotonic()
if self._pause_until > now:
wait = self._pause_until - now
logger.info(
PROVIDER_RATE_LIMITER_THROTTLED,
provider=self._provider_name,
wait_seconds=round(wait, 2),
reason="pause_active",
)
await asyncio.sleep(wait)
# RPM sliding window
if self._config.max_requests_per_minute > 0:
await self._wait_for_rpm_slot()
# Concurrency semaphore
if self._semaphore is not None:
await self._semaphore.acquire()
# Fast path: if nothing is enabled and no pause is scheduled, return immediately.
if not self.is_enabled and self._pause_until <= 0.0:
return
while True:
# Respect pause-until from retry_after
now = time.monotonic()
if self._pause_until > now:
wait = self._pause_until - now
logger.info(
PROVIDER_RATE_LIMITER_THROTTLED,
provider=self._provider_name,
wait_seconds=round(wait, 2),
reason="pause_active",
)
await asyncio.sleep(wait)
# After sleeping, re-check pause and limits in the next loop iteration.
continue
# No active pause: if limiting is disabled, we can return.
if not self.is_enabled:
return
# RPM sliding window
if self._config.max_requests_per_minute > 0:
await self._wait_for_rpm_slot()
# Concurrency semaphore
if self._semaphore is not None:
await self._semaphore.acquire()
# After waiting for RPM and concurrency, re-check for a new pause
now = time.monotonic()
if self._pause_until > now:
# A pause became active while we were waiting; release the semaphore
# (if acquired) and honor the pause before proceeding.
if self._semaphore is not None:
self._semaphore.release()
# Loop back to handle the active pause (which will sleep as needed).
continue
# No pause is active; we have an RPM slot and a concurrency slot.
return

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +81
await asyncio.sleep(wait)

# RPM sliding window
if self._config.max_requests_per_minute > 0:
await self._wait_for_rpm_slot()

# Concurrency semaphore
if self._semaphore is not None:
await self._semaphore.acquire()
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pause_until is only checked once at the start of acquire(). Because acquire() awaits (pause sleep, RPM waits, semaphore waits), another task can extend the pause during those awaits and this task may proceed without re-checking the pause. Restructure acquire() to re-check _pause_until after any await (commonly by looping: check pause → sleep if needed → then attempt rpm/semaphore; after rpm/semaphore waits, re-check pause before returning).

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +82
# RPM sliding window
if self._config.max_requests_per_minute > 0:
await self._wait_for_rpm_slot()

# Concurrency semaphore
if self._semaphore is not None:
await self._semaphore.acquire()

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RPM tokens are reserved before the concurrency semaphore is acquired. Under high contention, tasks can consume RPM capacity while blocked on concurrency, reducing throughput and making throttling more aggressive than intended. Consider acquiring the semaphore first (if enabled) and only appending the RPM timestamp once the call is actually allowed to proceed (or otherwise ensure a reserved RPM slot is ‘returned’ if the task later blocks/cancels before making the request).

Suggested change
# RPM sliding window
if self._config.max_requests_per_minute > 0:
await self._wait_for_rpm_slot()
# Concurrency semaphore
if self._semaphore is not None:
await self._semaphore.acquire()
# Concurrency semaphore
if self._semaphore is not None:
await self._semaphore.acquire()
# RPM sliding window
if self._config.max_requests_per_minute > 0:
await self._wait_for_rpm_slot()

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +122
if len(self._request_timestamps) < rpm:
self._request_timestamps.append(now)
return
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RPM tokens are reserved before the concurrency semaphore is acquired. Under high contention, tasks can consume RPM capacity while blocked on concurrency, reducing throughput and making throttling more aggressive than intended. Consider acquiring the semaphore first (if enabled) and only appending the RPM timestamp once the call is actually allowed to proceed (or otherwise ensure a reserved RPM slot is ‘returned’ if the task later blocks/cancels before making the request).

Copilot uses AI. Check for mistakes.
Comment on lines +115 to +118
# Prune timestamps outside the window
self._request_timestamps = [
t for t in self._request_timestamps if t > cutoff
]
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RPM sliding-window implementation rebuilds the full timestamp list on every loop iteration, which can become allocation-heavy at higher RPM limits. A collections.deque with incremental popleft() pruning (while oldest <= cutoff) avoids repeated list allocations and keeps the operation closer to O(k) for expired entries.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive resilience layer for provider interactions, incorporating retry mechanisms with exponential backoff and jitter, along with client-side rate limiting. The design is well-integrated into the BaseCompletionProvider, ensuring all drivers automatically inherit these capabilities. The addition of RetryConfig and RateLimiterConfig provides flexible control over these features. New observability event constants and a dedicated RetryExhaustedError enhance monitoring and fallback orchestration. The accompanying unit and integration tests are thorough, demonstrating the robustness and correctness of the new functionality.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/ai_company/config/schema.py (1)

128-133: 🧹 Nitpick | 🔵 Trivial

Document newly added ProviderConfig fields in the class docstring.

retry and rate_limiter were added (Lines 154-161), but the Attributes list (Lines 128-133) still omits them.

📝 Proposed docstring update
 class ProviderConfig(BaseModel):
@@
     Attributes:
         driver: Driver backend name (e.g. ``"litellm"``).
         api_key: API key (typically injected by secret management).
         base_url: Base URL for the provider API.
         models: Available models for this provider.
+        retry: Retry policy for transient provider errors.
+        rate_limiter: Client-side rate limiting policy.

As per coding guidelines, "Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)".

Also applies to: 154-161

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/config/schema.py` around lines 128 - 133, The class
ProviderConfig docstring is missing the newly added fields retry and
rate_limiter; update the ProviderConfig class docstring Attributes section to
document both fields (e.g., describe retry as retry policy/configuration for
requests and rate_limiter as rate-limiting configuration or token bucket
settings), ensuring descriptions follow Google-style docstring format used
elsewhere so linters (ruff D) pass and readers can find these attributes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/config/schema.py`:
- Around line 37-59: The RetryConfig Pydantic model currently sets model_config
= ConfigDict(frozen=True) but does not disable infinities/NaNs, so float
constraints on base_delay, max_delay, and exponential_base can accept
float('inf'); update the model_config for RetryConfig to include
allow_inf_nan=False (in addition to frozen=True) so Pydantic rejects
infinities/NaNs and enforces the gt/le constraints on base_delay, max_delay,
exponential_base and any other float fields.

In `@src/ai_company/providers/base.py`:
- Around line 158-166: The current _attempt uses
self._rate_limited_call(self._do_stream, ...) which acquires the concurrency
token but releases it as soon as _do_stream returns an async iterator, letting
long-lived streams run outside the limiter; fix by changing the stream path so
the concurrency token is held until the stream is fully consumed: have
_rate_limited_call (or _attempt) wrap the iterator returned by _do_stream in an
async generator that yields items and only releases max_concurrent in a finally
block after iteration completes (i.e., acquire in _rate_limited_call/_attempt,
call self._do_stream to get the iterator, then iterate and yield each
StreamChunk, releasing the token in finally). Reference symbols: _attempt,
_rate_limited_call, _do_stream, and max_concurrent/semaphore release logic.

In `@src/ai_company/providers/resilience/rate_limiter.py`:
- Around line 88-99: The pause(seconds: float) method can accept negative or
non-finite values causing _pause_until to be set incorrectly; add validation at
the start of pause to ensure seconds is a finite, non-negative number (use
math.isfinite or equivalent) and raise a ValueError (or TypeError) for invalid
inputs before computing new_until and mutating self._pause_until so only valid
pauses update the limiter state; refer to function pause and attribute
_pause_until when making the change.
- Around line 63-74: The acquire() method currently checks self._pause_until
once and sleeps, but if pause() extends self._pause_until while sleeping the
method can resume too early; change acquire() to re-check the pause state in a
loop (e.g., while self._pause_until > time.monotonic()) so it recalculates wait
and logs PROVIDER_RATE_LIMITER_THROTTLED (using self._provider_name) each
iteration and only proceeds once the pause window truly expired; reference the
acquire(), pause(), and _pause_until symbols when making this change.

In `@src/ai_company/providers/resilience/retry.py`:
- Around line 67-73: The non-retryable branch in retry logic currently logs with
logger.debug then re-raises; change that to logger.warning so the immediate
raise is recorded at warning level; update the call that uses
PROVIDER_RETRY_SKIPPED and the same context args (error_type=type(exc).__name__,
reason="non_retryable") in the function/method where exc is checked (the
non-retryable branch in src/ai_company/providers/resilience/retry.py) so the log
reflects a WARNING before the raise.

In `@tests/unit/providers/resilience/test_rate_limiter.py`:
- Around line 72-77: The test allows calling RateLimiter.release() without a
prior acquire which can over-increment the underlying asyncio.Semaphore; modify
the RateLimiter class to track in-flight acquisitions (e.g., add a private
counter like self._in_flight initialized in __init__) and update acquire() to
increment this counter after a successful await self._semaphore.acquire(), and
update release() to only call self._semaphore.release() and decrement
self._in_flight when self._in_flight > 0 (ignore extra releases); ensure you
reference the RateLimiter class, its __init__, acquire, release methods, and the
existing self._semaphore in your changes and update the test to assert extra
release() calls are no-ops rather than increasing available slots.

---

Outside diff comments:
In `@src/ai_company/config/schema.py`:
- Around line 128-133: The class ProviderConfig docstring is missing the newly
added fields retry and rate_limiter; update the ProviderConfig class docstring
Attributes section to document both fields (e.g., describe retry as retry
policy/configuration for requests and rate_limiter as rate-limiting
configuration or token bucket settings), ensuring descriptions follow
Google-style docstring format used elsewhere so linters (ruff D) pass and
readers can find these attributes.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d3c250b and 4f9a64c.

📒 Files selected for processing (22)
  • .claude/skills/aurelio-review-pr/skill.md
  • CLAUDE.md
  • src/ai_company/config/schema.py
  • src/ai_company/observability/events.py
  • src/ai_company/providers/__init__.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/resilience/__init__.py
  • src/ai_company/providers/resilience/config.py
  • src/ai_company/providers/resilience/errors.py
  • src/ai_company/providers/resilience/rate_limiter.py
  • src/ai_company/providers/resilience/retry.py
  • tests/integration/providers/conftest.py
  • tests/integration/providers/test_retry_integration.py
  • tests/unit/providers/drivers/conftest.py
  • tests/unit/providers/resilience/__init__.py
  • tests/unit/providers/resilience/conftest.py
  • tests/unit/providers/resilience/test_config.py
  • tests/unit/providers/resilience/test_errors.py
  • tests/unit/providers/resilience/test_rate_limiter.py
  • tests/unit/providers/resilience/test_retry.py
  • tests/unit/providers/test_protocol.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax with except A, B: (no parentheses) — ruff enforces this on Python 3.14
Add type hints to all public functions, verify with mypy strict mode
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Follow immutability principle: create new objects instead of mutating existing ones
Use Pydantic v2 models with BaseModel, model_validator, and ConfigDict for data validation
Enforce 88 character line length limit (ruff enforces this)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly and never silently swallow exceptions
Validate at system boundaries: user input, external API responses, and configuration files

Files:

  • tests/unit/providers/test_protocol.py
  • src/ai_company/observability/events.py
  • src/ai_company/config/schema.py
  • src/ai_company/providers/resilience/rate_limiter.py
  • tests/unit/providers/resilience/test_retry.py
  • tests/unit/providers/resilience/conftest.py
  • tests/unit/providers/resilience/test_config.py
  • src/ai_company/providers/resilience/errors.py
  • src/ai_company/providers/resilience/retry.py
  • tests/integration/providers/conftest.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • tests/unit/providers/resilience/test_rate_limiter.py
  • src/ai_company/providers/base.py
  • tests/unit/providers/resilience/test_errors.py
  • src/ai_company/providers/__init__.py
  • src/ai_company/providers/resilience/__init__.py
  • tests/unit/providers/drivers/conftest.py
  • src/ai_company/providers/resilience/config.py
  • tests/integration/providers/test_retry_integration.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow to classify tests
Maintain minimum 80% code coverage (enforced in CI)
Use asyncio_mode = "auto" configuration — no manual @pytest.mark.asyncio needed
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider) — never use real vendor model IDs to avoid coupling tests to external providers

Files:

  • tests/unit/providers/test_protocol.py
  • tests/unit/providers/resilience/test_retry.py
  • tests/unit/providers/resilience/conftest.py
  • tests/unit/providers/resilience/test_config.py
  • tests/integration/providers/conftest.py
  • tests/unit/providers/resilience/test_rate_limiter.py
  • tests/unit/providers/resilience/test_errors.py
  • tests/unit/providers/drivers/conftest.py
  • tests/integration/providers/test_retry_integration.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST import and use: from ai_company.observability import get_logger then logger = get_logger(__name__)
Never use import logging or logging.getLogger() or print() in application code — use the structured logger from observability
Use logger as the variable name for the structured logger (not _logger, not log)
Use event name constants from ai_company.observability.events for all logging calls
Use structured logging format: logger.info(EVENT, key=value) — never use format strings like logger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions

Files:

  • src/ai_company/observability/events.py
  • src/ai_company/config/schema.py
  • src/ai_company/providers/resilience/rate_limiter.py
  • src/ai_company/providers/resilience/errors.py
  • src/ai_company/providers/resilience/retry.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/__init__.py
  • src/ai_company/providers/resilience/__init__.py
  • src/ai_company/providers/resilience/config.py
src/ai_company/**/{core,providers}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Pure data models, enums, and re-exports do NOT require logging

Files:

  • src/ai_company/providers/resilience/rate_limiter.py
  • src/ai_company/providers/resilience/errors.py
  • src/ai_company/providers/resilience/retry.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/__init__.py
  • src/ai_company/providers/resilience/__init__.py
  • src/ai_company/providers/resilience/config.py
🧠 Learnings (17)
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` for all logging calls

Applied to files:

  • src/ai_company/observability/events.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/base.py : Use `threading.Semaphore` for LLM rate limiting in BaseAgent to limit concurrent requests to max 2.

Applied to files:

  • src/ai_company/providers/resilience/rate_limiter.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Don't implement custom rate limiting - BaseAgent handles rate limiting with max 2 concurrent calls and respects configured timeout

Applied to files:

  • src/ai_company/providers/resilience/rate_limiter.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`

Applied to files:

  • tests/unit/providers/resilience/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures

Applied to files:

  • tests/unit/providers/resilience/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`

Applied to files:

  • tests/unit/providers/resilience/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings

Applied to files:

  • tests/unit/providers/resilience/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : 100% test coverage is MANDATORY for every commit on core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`), CI enforces this coverage requirement

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : 100% test coverage is MANDATORY for every commit. The CI enforces 100% coverage on core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`).

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : Core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`) must maintain 100% test coverage

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : RAG retrieval failures are non-fatal—agents proceed with empty context if retrieval fails rather than raising errors.

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state

Applied to files:

  • .claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py} : Use decorators `handle_ollama_errors` and `retry_with_fallback` for error handling

Applied to files:

  • src/ai_company/providers/resilience/retry.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.

Applied to files:

  • CLAUDE.md
🧬 Code graph analysis (14)
src/ai_company/providers/resilience/rate_limiter.py (2)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/config/schema.py (1)
  • RateLimiterConfig (66-87)
tests/unit/providers/resilience/conftest.py (2)
src/ai_company/providers/errors.py (4)
  • AuthenticationError (72-75)
  • ProviderConnectionError (131-134)
  • ProviderTimeoutError (125-128)
  • RateLimitError (78-104)
src/ai_company/config/schema.py (2)
  • RateLimiterConfig (66-87)
  • RetryConfig (26-63)
tests/unit/providers/resilience/test_config.py (1)
src/ai_company/config/schema.py (2)
  • RateLimiterConfig (66-87)
  • RetryConfig (26-63)
src/ai_company/providers/resilience/errors.py (1)
src/ai_company/providers/errors.py (1)
  • ProviderError (22-69)
src/ai_company/providers/resilience/retry.py (3)
src/ai_company/providers/errors.py (2)
  • ProviderError (22-69)
  • RateLimitError (78-104)
src/ai_company/providers/resilience/errors.py (1)
  • RetryExhaustedError (6-31)
src/ai_company/config/schema.py (1)
  • RetryConfig (26-63)
tests/integration/providers/conftest.py (1)
src/ai_company/config/schema.py (3)
  • ProviderConfig (125-176)
  • ProviderModelConfig (90-122)
  • RetryConfig (26-63)
src/ai_company/providers/drivers/litellm_driver.py (2)
src/ai_company/providers/resilience/rate_limiter.py (2)
  • RateLimiter (19-134)
  • is_enabled (48-52)
src/ai_company/providers/resilience/retry.py (1)
  • RetryHandler (27-120)
tests/unit/providers/resilience/test_rate_limiter.py (3)
src/ai_company/config/schema.py (1)
  • RateLimiterConfig (66-87)
src/ai_company/providers/resilience/rate_limiter.py (5)
  • RateLimiter (19-134)
  • is_enabled (48-52)
  • acquire (54-81)
  • release (83-86)
  • pause (88-104)
src/ai_company/providers/registry.py (1)
  • get (66-94)
src/ai_company/providers/base.py (3)
src/ai_company/providers/errors.py (2)
  • InvalidRequestError (113-116)
  • RateLimitError (78-104)
src/ai_company/providers/resilience/rate_limiter.py (4)
  • RateLimiter (19-134)
  • acquire (54-81)
  • pause (88-104)
  • release (83-86)
src/ai_company/providers/resilience/retry.py (2)
  • RetryHandler (27-120)
  • execute (41-96)
tests/unit/providers/resilience/test_errors.py (2)
src/ai_company/providers/errors.py (2)
  • ProviderError (22-69)
  • RateLimitError (78-104)
src/ai_company/providers/resilience/errors.py (1)
  • RetryExhaustedError (6-31)
src/ai_company/providers/__init__.py (4)
src/ai_company/providers/resilience/rate_limiter.py (1)
  • RateLimiter (19-134)
src/ai_company/config/schema.py (2)
  • RateLimiterConfig (66-87)
  • RetryConfig (26-63)
src/ai_company/providers/resilience/errors.py (1)
  • RetryExhaustedError (6-31)
src/ai_company/providers/resilience/retry.py (1)
  • RetryHandler (27-120)
src/ai_company/providers/resilience/__init__.py (4)
src/ai_company/config/schema.py (2)
  • RateLimiterConfig (66-87)
  • RetryConfig (26-63)
src/ai_company/providers/resilience/errors.py (1)
  • RetryExhaustedError (6-31)
src/ai_company/providers/resilience/rate_limiter.py (1)
  • RateLimiter (19-134)
src/ai_company/providers/resilience/retry.py (1)
  • RetryHandler (27-120)
tests/unit/providers/drivers/conftest.py (1)
src/ai_company/config/schema.py (4)
  • ProviderConfig (125-176)
  • ProviderModelConfig (90-122)
  • RateLimiterConfig (66-87)
  • RetryConfig (26-63)
src/ai_company/providers/resilience/config.py (1)
src/ai_company/config/schema.py (2)
  • RateLimiterConfig (66-87)
  • RetryConfig (26-63)
🪛 LanguageTool
.claude/skills/aurelio-review-pr/skill.md

[style] ~127-~127: Since ownership is already implied, this phrasing may be redundant.
Context: ... rules:** 1. Driver subclass implements its own retry/backoff logic instead of relying ...

(PRP_OWN)


[style] ~131-~131: This phrase is redundant. Consider using “outside”.
Context: .... asyncio.sleep used for retry delays outside of RetryHandler (MAJOR) **Soft rules (S...

(OUTSIDE_OF)

🔇 Additional comments (17)
src/ai_company/providers/resilience/config.py (1)

1-11: Clean re-export module; API surface is explicit and consistent.

This keeps canonical config definitions centralized while providing a stable resilience import path.

tests/unit/providers/test_protocol.py (1)

86-88: Good fix: explicit base initialization in test providers.

Calling super().__init__() here prevents base-state drift as BaseCompletionProvider evolves.

Also applies to: 129-131

src/ai_company/providers/drivers/litellm_driver.py (1)

128-138: Resilience components are wired correctly into the base provider.

The conditional retry handler + optional rate limiter injection matches the intended execution path.

src/ai_company/providers/__init__.py (1)

37-43: Public exports look consistent and complete for resilience API exposure.

__all__ was updated in sync with imported resilience symbols.

Also applies to: 102-107

tests/unit/providers/resilience/test_errors.py (1)

11-43: Solid behavioral coverage for RetryExhaustedError.

The tests validate the key contract surfaces (retryability, original error retention, and context/message propagation).

tests/unit/providers/resilience/test_config.py (1)

11-103: Good validation matrix for resilience config models.

Defaults, boundary checks, and immutability assertions are well covered.

tests/integration/providers/conftest.py (1)

20-20: Nice test hardening: explicit retry disablement in provider fixtures.

Setting max_retries=0 here keeps integration tests deterministic and faster.

Also applies to: 55-55, 83-83, 104-104

.claude/skills/aurelio-review-pr/skill.md (1)

96-96: Resilience audit integration is well-scoped and actionable.

The new trigger and hard/soft rule set are clear and aligned with provider-resilience review goals.

Also applies to: 124-136

CLAUDE.md (1)

73-82: Good documentation update for resilience behavior.

This section clearly captures retry ownership, retryability policy, and fallback semantics for RetryExhaustedError.

Based on learnings: "When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes."

src/ai_company/observability/events.py (1)

57-63: Event taxonomy extension looks solid.

These constants provide the right hooks for retry and rate-limiter telemetry without changing runtime behavior.

tests/integration/providers/test_retry_integration.py (1)

73-236: Integration coverage is comprehensive and targeted.

The suite exercises core retry/rate-limit paths, exhaustion behavior, and disabled-retry semantics with clear assertions.

src/ai_company/providers/resilience/errors.py (1)

6-31: RetryExhaustedError design is clean and useful.

Preserving original_error plus immutable copied context gives good debugging/fallback ergonomics.

tests/unit/providers/resilience/test_rate_limiter.py (1)

20-71: The rest of the RateLimiter test matrix is strong.

Concurrency blocking, pause extension semantics, and observability assertions are well covered.

Also applies to: 79-168

tests/unit/providers/drivers/conftest.py (1)

8-13: Fixture update improves test isolation.

Defaulting to RetryConfig(max_retries=0) in this helper makes driver-level error mapping tests more deterministic.

Also applies to: 21-34, 57-58

tests/unit/providers/resilience/test_retry.py (1)

40-246: Excellent RetryHandler behavioral coverage.

This suite validates execution flow, backoff policy, and observability events across the most important resilience edge cases.

src/ai_company/providers/resilience/__init__.py (1)

1-18: Clean and consistent resilience re-export surface.

Package-level exports and __all__ are aligned and make the resilience API easy to consume.

tests/unit/providers/resilience/conftest.py (1)

15-72: Fixture coverage is solid and test-friendly.

Good reuse surface for deterministic resilience tests (configs + retryable/non-retryable error variants).

Comment on lines +72 to +77
async def test_release_without_acquire_does_not_crash(self) -> None:
config = RateLimiterConfig(max_concurrent=2)
limiter = RateLimiter(config, provider_name="test-provider")
# Extra release (semaphore goes above initial count, but doesn't crash)
limiter.release()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify current limiter release semantics and the test expectation that permits over-release.
echo "=== Checking RateLimiter release() implementation ==="
rg -n -C3 'def release\(|_semaphore\.release\(' src/ai_company/providers/resilience/rate_limiter.py

echo -e "\n=== Checking test file for release_without_acquire test ==="
rg -n -C3 'test_release_without_acquire|semaphore goes above initial count|limiter.release\(\)' tests/unit/providers/resilience/test_rate_limiter.py

Repository: Aureliolo/ai-company

Length of output: 2140


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check RateLimiter class initialization and imports
echo "=== RateLimiter class initialization ==="
rg -n -B5 -A15 'class RateLimiter|def __init__' src/ai_company/providers/resilience/rate_limiter.py | head -50

echo -e "\n=== Checking imports in rate_limiter.py ==="
head -30 src/ai_company/providers/resilience/rate_limiter.py

echo -e "\n=== Full acquire/release methods ==="
rg -n -B2 -A10 'async def acquire|def release' src/ai_company/providers/resilience/rate_limiter.py

Repository: Aureliolo/ai-company

Length of output: 2879


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Verify asyncio.Semaphore behavior documentation to confirm over-release capability
echo "=== Checking if there are any guards against over-release in the codebase ==="
rg -n 'self\._semaphore|_acquire_count|_release_count|_acquired|bounded' src/ai_company/providers/resilience/rate_limiter.py

echo -e "\n=== Check full rate_limiter.py file to see if there's state tracking ==="
wc -l src/ai_company/providers/resilience/rate_limiter.py

echo -e "\n=== Check if release() appears in __init__ or elsewhere ==="
rg -n 'def __init__|def release|def acquire|self\._' src/ai_company/providers/resilience/rate_limiter.py | head -30

Repository: Aureliolo/ai-company

Length of output: 1987


Guard release() against over-release to maintain max_concurrent guarantees.

The test at lines 72–76 accepts release() without prior acquire(), which allows the internal semaphore counter to exceed the configured limit. With asyncio.Semaphore, this permits more concurrent acquisitions than max_concurrent specifies, breaking the concurrency guarantee.

Add a guard in release() to track in-flight requests or prevent releases beyond acquired slots. Update the test to verify that extra release() calls are safely ignored rather than normalized.

Suggested changes
-    async def test_release_without_acquire_does_not_crash(self) -> None:
+    async def test_release_without_acquire_is_noop(self) -> None:
         config = RateLimiterConfig(max_concurrent=2)
         limiter = RateLimiter(config, provider_name="test-provider")
-        # Extra release (semaphore goes above initial count, but doesn't crash)
+        # Extra release should not increase effective capacity
         limiter.release()

In src/ai_company/providers/resilience/rate_limiter.py, track acquired slots:

def __init__(self, ...):
    # ... existing code ...
    self._in_flight: int = 0

async def acquire(self) -> None:
    # ... existing code ...
    if self._semaphore is not None:
        await self._semaphore.acquire()
        self._in_flight += 1

def release(self) -> None:
    if self._semaphore is not None and self._in_flight > 0:
        self._semaphore.release()
        self._in_flight -= 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/providers/resilience/test_rate_limiter.py` around lines 72 - 77,
The test allows calling RateLimiter.release() without a prior acquire which can
over-increment the underlying asyncio.Semaphore; modify the RateLimiter class to
track in-flight acquisitions (e.g., add a private counter like self._in_flight
initialized in __init__) and update acquire() to increment this counter after a
successful await self._semaphore.acquire(), and update release() to only call
self._semaphore.release() and decrement self._in_flight when self._in_flight > 0
(ignore extra releases); ensure you reference the RateLimiter class, its
__init__, acquire, release methods, and the existing self._semaphore in your
changes and update the test to assert extra release() calls are no-ops rather
than increasing available slots.

…and Gemini

- C1: add asyncio.Lock around RPM sliding-window to prevent race condition
- C2: fix early-return in RateLimiter.acquire (use monotonic clock comparison)
- C3: _ToolCallAccumulator.build() returns None on JSON parse failure
- M1: add acquired flag in _rate_limited_call to prevent double-release
- M2: wrap stream in _hold_slot_for_stream to hold concurrency slot for stream lifetime
- M3: remove dead *args/**kwargs from RetryHandler.execute; tighten to Callable[[], Coroutine]
- M4: add _truncated flag to _ToolCallAccumulator; update() no-ops after truncation
- M5: _apply_completion_config returns new dict instead of mutating in-place
- M6: replace all vendor model IDs in test fixtures with fake test-model-XXX IDs
- M7: add TestBaseCompletionProviderResilience tests for rate limiter wiring
- M8: add catch-all except in RetryHandler to log and re-raise non-ProviderErrors
- M9: validate pause() argument is finite non-negative; raise ValueError otherwise
- M10: change pause check in acquire() from single check to while loop
- D1: replace assert with explicit RuntimeError guard in RetryHandler
- D2: add model_validator to RetryConfig ensuring base_delay <= max_delay
- D3: add test for zero-retries non-retryable error passes through unwrapped
- D4: add TestRateLimiterRPMThrottling with time-mocked RPM throttle tests
- D5: add TestStreamRetryIntegration for stream retry on connection error
- D6: fix resource leak in test_concurrent_limit with try/finally cancel
- D7: add test_internal_error_is_retried to verify ProviderInternalError retries
- D9/D11: _wrap_stream returns AsyncGenerator type annotation + logging before raise
- D10: add Args sections to abstract method docstrings in BaseCompletionProvider
- D12: change non-retryable branch from logger.debug to logger.warning
- D13: add allow_inf_nan=False to RetryConfig and RateLimiterConfig ConfigDict
- D14: update ProviderConfig docstring to include retry and rate_limiter attributes
- D15: increase timing tolerances in pause tests to reduce flakiness
- N1: add explicit class-level type annotations to _ToolCallAccumulator
- N2: update _compute_delay docstring to clarify zero-based retry counter
- N3/N5: add logger.warning before raises in _build_model_lookup
- N4: add logger.error before raise in _map_response
- N6: change TestRetryHandlerBackoff async def to def (_compute_delay is sync)
- Move RateLimiterConfig import out of TYPE_CHECKING to fix PEP 649 inspection
- Update test_streaming_malformed_json_tool_call to expect dropped tool call (C3)
@Aureliolo Aureliolo requested a review from Copilot March 3, 2026 20:04
@Aureliolo
Copy link
Copy Markdown
Owner Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive resilience layer with retry and rate-limiting capabilities for provider communications. The implementation is well-structured, with clear separation of concerns into RetryHandler and RateLimiter classes, which are seamlessly integrated into the BaseCompletionProvider. The addition of configuration models and extensive tests is commendable. My review includes a few suggestions to enhance type safety, robustness in tool call argument parsing, and performance in the rate limiter.

Note: Security Review did not run due to the size of the PR.

Comment on lines +317 to +326
async def _hold_slot_for_stream(
inner: AsyncIterator[Any],
) -> AsyncIterator[Any]:
try:
async for chunk in inner:
yield chunk
finally:
rate_limiter.release() # type: ignore[union-attr]

return _hold_slot_for_stream(result) # type: ignore[return-value]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve type safety and remove the type: ignore comments, you can make _hold_slot_for_stream generic. This requires adding _ChunkT = TypeVar("_ChunkT") at the module level near the definition of _T. The type: ignore on rate_limiter.release() can be replaced with an assert, and the one on the return statement will no longer be needed.

                async def _hold_slot_for_stream(
                    inner: AsyncIterator[_ChunkT],
                ) -> AsyncIterator[_ChunkT]:
                    # The logic to reach here ensures rate_limiter is not None.
                    assert rate_limiter is not None
                    try:
                        async for chunk in inner:
                            yield chunk
                    finally:
                        rate_limiter.release()

                return _hold_slot_for_stream(result)

Comment on lines 725 to 726
args: dict[str, Any] = parsed if isinstance(parsed, dict) else {}
return ToolCall(id=self.id, name=self.name, arguments=args)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If json.loads produces a valid JSON value that is not an object (e.g., a list or string), it's silently converted to an empty dictionary. This could hide issues with the data from the provider, as tool arguments are expected to be a JSON object. For better robustness, consider treating this as a failure case, similar to a JSONDecodeError.

        if not isinstance(parsed, dict):
            logger.warning(
                PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED,
                tool_name=self.name,
                tool_id=self.id,
                reason="arguments_not_a_json_object",
                args_type=type(parsed).__name__,
            )
            return None
        return ToolCall(id=self.id, name=self.name, arguments=parsed)

Comment on lines +130 to +133
# Prune timestamps outside the window
self._request_timestamps = [
t for t in self._request_timestamps if t > cutoff
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of pruning timestamps by creating a new list can be inefficient under high RPMs. For better performance, consider using a collections.deque for _request_timestamps. This allows for efficient O(1) removal of old timestamps.

This would require two changes:

  1. Import deque from collections.
  2. Change the type of _request_timestamps in __init__ to deque[float].
  3. Replace this list comprehension with a loop that uses popleft():
# In _wait_for_rpm_slot
cutoff = now - window
while self._request_timestamps and self._request_timestamps[0] <= cutoff:
    self._request_timestamps.popleft()

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/config/schema.py`:
- Around line 65-74: The _validate_delay_ordering model validator in RetryConfig
currently raises ValueError without logging; update the method to log the
failure at WARNING or ERROR level with structured context (include base_delay
and max_delay and any identifying fields on self) immediately before raising
ValueError — call the module/class logger (e.g., logger.error or logger.warning)
with a clear message like "RetryConfig validation failed: base_delay X >
max_delay Y" and include self or relevant attributes for debugging, then raise
the same ValueError as before.

In `@src/ai_company/providers/resilience/rate_limiter.py`:
- Around line 104-106: In the pause() validation branch inside the RateLimiter
(the if block that checks "if not math.isfinite(seconds) or seconds < 0"), add a
log call at WARNING or ERROR level before raising the ValueError; include the
invalid seconds value and relevant context (e.g., method name RateLimiter.pause
and any identifying instance info such as self.name or self._id if available) so
the message clearly shows what input failed, then raise the same ValueError as
before.

In `@src/ai_company/providers/resilience/retry.py`:
- Around line 118-120: The current code clamps provider-specified retry_after
down with min(exc.retry_after, self._config.max_delay) which can make us retry
earlier than the provider requested; change that to return max(exc.retry_after,
self._config.max_delay) so we never choose a value smaller than the provider
hint (keep the RateLimitError check and reference self._config.max_delay when
updating the return expression in the retry logic in retry.py).

In `@tests/integration/providers/test_retry_integration.py`:
- Line 32: Update the patch target string constant _PATCH_ACOMPLETION to the
driver-qualified name used by the code under test (e.g.
"ai_company.providers.drivers.litellm_driver._litellm.acompletion") so the test
patches the symbol as imported by the driver (replace the current value
"litellm.acompletion" with the fully qualified import path the driver uses).
- Around line 207-210: The empty async generator _empty_stream currently uses an
unreachable "return" then "yield" pattern which is valid but unclear; replace
the body with a clearer pattern such as "if False: yield  # pragma: no cover"
(keeping the async def _empty_stream() signature and type ignore) so Python
still recognizes it as an async generator while removing the misleading return
statement.

In `@tests/unit/providers/resilience/test_retry.py`:
- Around line 137-145: The test contains a redundant local import of
AuthenticationError inside test_zero_retries_non_retryable_raises_unwrapped;
remove that inner "from ai_company.providers.errors import AuthenticationError"
so the test uses the AuthenticationError already imported at the top of the
module, leaving the rest of the test (handler = RetryHandler(...), AsyncMock
side_effect, pytest.raises assertion, and func.assert_awaited_once()) unchanged.

In `@tests/unit/providers/test_protocol.py`:
- Around line 428-442: The test
test_rate_limit_error_with_retry_after_triggers_pause should also assert that
the rate limiter's semaphore is released on the exception path: after invoking
await provider.complete([...]) and catching RateLimitError, add assertions that
mock_limiter.release (and optionally mock_limiter.acquire) were called to ensure
no semaphore leak; update the assertions to call
mock_limiter.release.assert_called_once_with() (and/or check acquire was
awaited) on the _RateLimitProvider instance's mocked _rate_limiter.
- Around line 419-421: The test mutates provider._rate_limiter directly; instead
modify the constructors so _ConcreteProvider and _RateLimitProvider accept a
rate_limiter kw-only arg and forward it to BaseCompletionProvider's rate_limiter
parameter; update their __init__ signatures to accept rate_limiter:
Optional[...] and call super().__init__(..., rate_limiter=rate_limiter) (or pass
it along from _RateLimitProvider to its parent) and change test to construct
_ConcreteProvider(rate_limiter=mock_limiter) rather than assigning to the
private attribute.

ℹ️ Review info

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4f9a64c and 15d8adc.

📒 Files selected for processing (19)
  • src/ai_company/config/schema.py
  • src/ai_company/providers/base.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/resilience/rate_limiter.py
  • src/ai_company/providers/resilience/retry.py
  • tests/integration/providers/conftest.py
  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/integration/providers/test_error_scenarios.py
  • tests/integration/providers/test_ollama_pipeline.py
  • tests/integration/providers/test_openrouter_pipeline.py
  • tests/integration/providers/test_retry_integration.py
  • tests/integration/providers/test_tool_calling_pipeline.py
  • tests/unit/budget/conftest.py
  • tests/unit/budget/test_cost_record.py
  • tests/unit/providers/drivers/conftest.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/unit/providers/resilience/test_rate_limiter.py
  • tests/unit/providers/resilience/test_retry.py
  • tests/unit/providers/test_protocol.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax with except A, B: (no parentheses) — ruff enforces this on Python 3.14
Add type hints to all public functions, verify with mypy strict mode
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Follow immutability principle: create new objects instead of mutating existing ones
Use Pydantic v2 models with BaseModel, model_validator, and ConfigDict for data validation
Enforce 88 character line length limit (ruff enforces this)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly and never silently swallow exceptions
Validate at system boundaries: user input, external API responses, and configuration files

Files:

  • src/ai_company/providers/resilience/retry.py
  • tests/integration/providers/test_tool_calling_pipeline.py
  • tests/integration/providers/test_error_scenarios.py
  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/integration/providers/conftest.py
  • tests/integration/providers/test_ollama_pipeline.py
  • tests/unit/budget/test_cost_record.py
  • tests/integration/providers/test_retry_integration.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/unit/providers/resilience/test_retry.py
  • tests/integration/providers/test_openrouter_pipeline.py
  • tests/unit/budget/conftest.py
  • tests/unit/providers/resilience/test_rate_limiter.py
  • src/ai_company/config/schema.py
  • src/ai_company/providers/resilience/rate_limiter.py
  • tests/unit/providers/drivers/conftest.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/base.py
  • tests/unit/providers/test_protocol.py
src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST import and use: from ai_company.observability import get_logger then logger = get_logger(__name__)
Never use import logging or logging.getLogger() or print() in application code — use the structured logger from observability
Use logger as the variable name for the structured logger (not _logger, not log)
Use event name constants from ai_company.observability.events for all logging calls
Use structured logging format: logger.info(EVENT, key=value) — never use format strings like logger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions

Files:

  • src/ai_company/providers/resilience/retry.py
  • src/ai_company/config/schema.py
  • src/ai_company/providers/resilience/rate_limiter.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/base.py
src/ai_company/**/{core,providers}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Pure data models, enums, and re-exports do NOT require logging

Files:

  • src/ai_company/providers/resilience/retry.py
  • src/ai_company/providers/resilience/rate_limiter.py
  • src/ai_company/providers/drivers/litellm_driver.py
  • src/ai_company/providers/base.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow to classify tests
Maintain minimum 80% code coverage (enforced in CI)
Use asyncio_mode = "auto" configuration — no manual @pytest.mark.asyncio needed
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider) — never use real vendor model IDs to avoid coupling tests to external providers

Files:

  • tests/integration/providers/test_tool_calling_pipeline.py
  • tests/integration/providers/test_error_scenarios.py
  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/integration/providers/conftest.py
  • tests/integration/providers/test_ollama_pipeline.py
  • tests/unit/budget/test_cost_record.py
  • tests/integration/providers/test_retry_integration.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/unit/providers/resilience/test_retry.py
  • tests/integration/providers/test_openrouter_pipeline.py
  • tests/unit/budget/conftest.py
  • tests/unit/providers/resilience/test_rate_limiter.py
  • tests/unit/providers/drivers/conftest.py
  • tests/unit/providers/test_protocol.py
🧠 Learnings (16)
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to src/ai_company/**/*.py : Log all error paths at WARNING or ERROR level with context before raising exceptions

Applied to files:

  • src/ai_company/providers/resilience/retry.py
  • src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions

Applied to files:

  • src/ai_company/providers/resilience/retry.py
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to tests/**/*.py : Use vendor-agnostic fake model IDs/names in tests (e.g. `test-haiku-001`, `test-provider`) — never use real vendor model IDs to avoid coupling tests to external providers

Applied to files:

  • tests/integration/providers/test_error_scenarios.py
  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/integration/providers/conftest.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/integration/providers/test_openrouter_pipeline.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state

Applied to files:

  • tests/integration/providers/test_error_scenarios.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.

Applied to files:

  • tests/integration/providers/test_error_scenarios.py
  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/integration/providers/test_openrouter_pipeline.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.

Applied to files:

  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/unit/providers/drivers/test_litellm_driver.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)

Applied to files:

  • tests/integration/providers/test_anthropic_pipeline.py
  • tests/integration/providers/test_ollama_pipeline.py
  • tests/unit/providers/drivers/test_litellm_driver.py
  • tests/integration/providers/test_openrouter_pipeline.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/model_service.py} : Respect existing model configuration patterns in Ollama integration

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Mock Ollama API responses to support both dict (`models.get("models")`) and object (`response.models`) patterns in test mocks.

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
  • tests/integration/providers/test_openrouter_pipeline.py
  • tests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Ollama API response mocks must support both dict pattern (`models.get('models')`) and object pattern (`response.models`) to match actual API behavior.

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
  • tests/unit/providers/drivers/test_litellm_driver.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/services/model_service.py : Default Ollama endpoint is `http://localhost:11434`. Respect existing model configuration patterns.

Applied to files:

  • tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/base.py : Use `threading.Semaphore` for LLM rate limiting in BaseAgent to limit concurrent requests to max 2.

Applied to files:

  • tests/unit/providers/resilience/test_rate_limiter.py
  • src/ai_company/providers/resilience/rate_limiter.py
  • src/ai_company/providers/base.py
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` for all logging calls

Applied to files:

  • src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to src/ai_company/**/*.py : Every module with business logic MUST import and use: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to src/ai_company/**/*.py : Never use `import logging` or `logging.getLogger()` or `print()` in application code — use the structured logger from observability

Applied to files:

  • src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Don't implement custom rate limiting - BaseAgent handles rate limiting with max 2 concurrent calls and respects configured timeout

Applied to files:

  • src/ai_company/providers/base.py
🧬 Code graph analysis (9)
src/ai_company/providers/resilience/retry.py (4)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/providers/errors.py (2)
  • ProviderError (22-69)
  • RateLimitError (78-104)
src/ai_company/providers/resilience/errors.py (1)
  • RetryExhaustedError (6-31)
src/ai_company/config/schema.py (1)
  • RetryConfig (26-74)
tests/integration/providers/test_anthropic_pipeline.py (1)
tests/integration/providers/conftest.py (1)
  • user_messages (270-272)
tests/integration/providers/conftest.py (1)
src/ai_company/config/schema.py (3)
  • ProviderConfig (136-189)
  • ProviderModelConfig (101-133)
  • RetryConfig (26-74)
tests/integration/providers/test_retry_integration.py (6)
src/ai_company/config/schema.py (4)
  • ProviderConfig (136-189)
  • ProviderModelConfig (101-133)
  • RateLimiterConfig (77-98)
  • RetryConfig (26-74)
src/ai_company/providers/enums.py (2)
  • MessageRole (6-12)
  • StreamEventType (25-32)
src/ai_company/providers/errors.py (3)
  • AuthenticationError (72-75)
  • ProviderTimeoutError (125-128)
  • RateLimitError (78-104)
src/ai_company/providers/models.py (1)
  • ChatMessage (114-186)
src/ai_company/providers/resilience/errors.py (1)
  • RetryExhaustedError (6-31)
tests/integration/providers/conftest.py (1)
  • build_model_response (112-143)
tests/unit/providers/resilience/test_retry.py (4)
src/ai_company/providers/errors.py (4)
  • AuthenticationError (72-75)
  • ProviderConnectionError (131-134)
  • ProviderInternalError (137-140)
  • RateLimitError (78-104)
src/ai_company/config/schema.py (1)
  • RetryConfig (26-74)
src/ai_company/providers/resilience/errors.py (1)
  • RetryExhaustedError (6-31)
src/ai_company/providers/resilience/retry.py (3)
  • RetryHandler (28-127)
  • execute (42-102)
  • _compute_delay (104-127)
tests/unit/providers/drivers/conftest.py (1)
src/ai_company/config/schema.py (4)
  • ProviderConfig (136-189)
  • ProviderModelConfig (101-133)
  • RateLimiterConfig (77-98)
  • RetryConfig (26-74)
src/ai_company/providers/drivers/litellm_driver.py (4)
src/ai_company/providers/resilience/rate_limiter.py (2)
  • RateLimiter (18-151)
  • is_enabled (48-52)
src/ai_company/providers/resilience/retry.py (1)
  • RetryHandler (28-127)
src/ai_company/config/schema.py (1)
  • ProviderModelConfig (101-133)
src/ai_company/providers/models.py (2)
  • StreamChunk (285-363)
  • ToolCall (73-95)
src/ai_company/providers/base.py (4)
src/ai_company/providers/errors.py (2)
  • InvalidRequestError (113-116)
  • RateLimitError (78-104)
src/ai_company/providers/resilience/rate_limiter.py (4)
  • RateLimiter (18-151)
  • acquire (54-84)
  • release (86-89)
  • pause (91-114)
src/ai_company/providers/resilience/retry.py (2)
  • RetryHandler (28-127)
  • execute (42-102)
src/ai_company/providers/models.py (2)
  • CompletionResponse (233-282)
  • StreamChunk (285-363)
tests/unit/providers/test_protocol.py (3)
src/ai_company/providers/errors.py (1)
  • RateLimitError (78-104)
src/ai_company/providers/resilience/rate_limiter.py (5)
  • RateLimiter (18-151)
  • acquire (54-84)
  • is_enabled (48-52)
  • release (86-89)
  • pause (91-114)
tests/unit/providers/conftest.py (1)
  • ModelCapabilitiesFactory (93-105)
🔇 Additional comments (31)
tests/unit/budget/conftest.py (1)

120-120: Model fixture update is aligned and safe.

Using "test-model-001" keeps this fixture consistent with the suite’s test-model convention.

As per coding guidelines: "Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider) — never use real vendor model IDs to avoid coupling tests to external providers".

tests/unit/budget/test_cost_record.py (1)

24-24: Expectation update is correct.

The assertion now matches the updated fixture model value and keeps tests decoupled from real vendor IDs.

As per coding guidelines: "Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider) — never use real vendor model IDs to avoid coupling tests to external providers".

src/ai_company/config/schema.py (1)

167-174: Provider-level resilience defaults are wired cleanly.

Using default_factory for both retry and rate_limiter avoids shared mutable defaults and keeps provider config construction safe.

src/ai_company/providers/resilience/retry.py (1)

64-70: Non-retryable failure path is correctly surfaced.

Logging at warning level before re-raise improves production visibility of terminal provider errors.

Based on learnings: "Applies to src/ai_company/**/*.py : Log all error paths at WARNING or ERROR level with context before raising exceptions".

src/ai_company/providers/resilience/rate_limiter.py (1)

65-77: Pause-window re-check loop is a solid fix.

The looped pause check prevents early resume when pause windows are extended concurrently.

tests/integration/providers/conftest.py (1)

20-240: Integration fixtures are consistently updated for resilience and test-model naming.

RetryConfig(max_retries=0) wiring and the test-model-* defaults are coherent across provider builders and streaming/non-streaming helpers.

As per coding guidelines: "Use vendor-agnostic fake model IDs/names in tests (e.g. test-haiku-001, test-provider) — never use real vendor model IDs to avoid coupling tests to external providers".

tests/integration/providers/test_ollama_pipeline.py (1)

96-96: Updated model assertion is correct.

This aligns the pipeline expectation with the shared Ollama integration fixture data.

tests/integration/providers/test_tool_calling_pipeline.py (1)

261-280: Malformed streamed tool-call behavior is now tested clearly.

The updated expectation (len(tc_chunks) == 0) cleanly validates drop-on-malformed-JSON behavior.

tests/integration/providers/test_openrouter_pipeline.py (1)

58-58: LGTM!

The updates to use vendor-agnostic model identifiers (test-model-openrouter-001, test-model-openrouter-002) align with the coding guidelines for decoupling tests from real provider model IDs.

Also applies to: 112-122

src/ai_company/providers/base.py (3)

60-67: LGTM!

The constructor properly accepts optional retry_handler and rate_limiter dependencies, enabling resilience features to be injected by concrete drivers.


263-277: LGTM!

The _resilient_execute helper cleanly delegates to the retry handler when configured, falling back to direct execution otherwise.


303-335: Stream slot ownership is now correctly managed.

The implementation properly transfers concurrency slot ownership to a wrapper generator for streaming results, ensuring the slot is held for the full lifetime of the iterator. The _hold_slot_for_stream wrapper releases the slot in its finally block only after iteration completes or fails. This addresses the previous concern about premature slot release.

src/ai_company/providers/drivers/litellm_driver.py (5)

129-139: LGTM!

The resilience wiring correctly instantiates RetryHandler only when retries are configured (max_retries > 0) and RateLimiter only when enabled, then passes them to the base class constructor.


261-282: LGTM!

Enhanced validation with proper error logging for duplicate model IDs and alias collisions before raising ValueError. This improves debuggability.


603-621: LGTM!

Refactored to return a new merged dict instead of mutating the input, aligning with the immutability principle from the coding guidelines.


669-724: LGTM!

The _ToolCallAccumulator improvements properly track truncation state and return None on JSON parse failures, preventing malformed tool calls from propagating.


563-568: PEP 758 except syntax is valid and correctly implemented.

The except ValueError, TypeError: syntax is confirmed to be valid in Python 3.14 per PEP 758 (final). The code correctly uses this syntax without an as clause, which aligns with the project's coding guidelines. No changes needed.

tests/integration/providers/test_error_scenarios.py (1)

72-72: LGTM!

All LiteLLM exception constructors now use the vendor-agnostic anthropic/test-model-001 identifier, consistent with the coding guidelines.

Also applies to: 89-89, 99-99, 108-108, 122-122

tests/integration/providers/test_anthropic_pipeline.py (1)

50-50: LGTM!

Model assertions updated to use vendor-agnostic identifiers (test-model-001, test-model-002), consistent with the test helper defaults and coding guidelines.

Also applies to: 70-70, 85-88, 178-188

tests/unit/providers/resilience/test_retry.py (2)

1-39: LGTM!

Comprehensive test setup with proper markers and a _fast_config helper that creates quick RetryConfig instances with minimal delays suitable for testing.


41-259: LGTM!

Thorough test coverage for RetryHandler including success paths, exhaustion behavior, non-retryable error handling, backoff calculations (with and without jitter), retry-after semantics, and logging verification.

tests/unit/providers/resilience/test_rate_limiter.py (4)

1-36: LGTM!

Test setup properly covers disabled mode with appropriate assertions for is_enabled and no-op behavior.


74-78: Guard against over-release already flagged.

This test documents that release() without prior acquire() doesn't crash, but allows the semaphore counter to exceed the configured limit. This breaks the max_concurrent guarantee. A previous review already flagged this concern with a suggested fix to track in-flight requests.


38-72: LGTM!

Concurrency limit tests properly verify that the third acquire blocks until a slot is released, with appropriate cleanup handling.


81-266: LGTM!

Comprehensive test coverage for RPM controls, pause semantics (including extension and rejection of invalid values), throttling behavior with mocked time, and logging verification.

tests/integration/providers/test_retry_integration.py (2)

35-71: LGTM!

Well-structured test helpers with vendor-agnostic identifiers and configurable retry/rate-limiter settings.


73-188: LGTM!

Comprehensive integration tests covering retry success after transient failure, retry exhaustion, non-retryable error handling, rate-limit with retry-after, streaming path retries, and disabled-retry behavior.

Also applies to: 190-273

tests/unit/providers/drivers/conftest.py (2)

21-34: Resilience defaults are wired cleanly for driver tests.

The retry/rate_limiter fixture parameters and defaults keep tests deterministic while still allowing explicit override paths.

Also applies to: 57-59


38-46: Model fixture IDs are now consistently decoupled from vendor names.

Using test-model-* in both provider config and mock responses keeps resolution tests stable and provider-agnostic.

Also applies to: 79-79

tests/unit/providers/drivers/test_litellm_driver.py (1)

98-99: Model-resolution assertions are consistently updated across all relevant tests.

Alias resolution, direct ID resolution, and capabilities checks all align with the new test-model-* fixture contract.

Also applies to: 130-131, 140-145, 708-709, 726-727

tests/unit/providers/test_protocol.py (1)

376-407: The dedicated rate-limit test provider improves resilience test clarity.

_RateLimitProvider cleanly isolates retry-after behavior and keeps these tests focused on base-provider resilience flow.

Comment on lines +65 to +74
@model_validator(mode="after")
def _validate_delay_ordering(self) -> Self:
"""Ensure base_delay does not exceed max_delay."""
if self.base_delay > self.max_delay:
msg = (
f"base_delay ({self.base_delay}) must be"
f" <= max_delay ({self.max_delay})"
)
raise ValueError(msg)
return self
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log RetryConfig validation failures before raising.

_validate_delay_ordering raises ValueError directly without emitting a warning/error log. Add structured context before the raise.

🔧 Suggested patch
     `@model_validator`(mode="after")
     def _validate_delay_ordering(self) -> Self:
         """Ensure base_delay does not exceed max_delay."""
         if self.base_delay > self.max_delay:
             msg = (
                 f"base_delay ({self.base_delay}) must be"
                 f" <= max_delay ({self.max_delay})"
             )
+            logger.warning(
+                CONFIG_VALIDATION_FAILED,
+                model="RetryConfig",
+                error=msg,
+                base_delay=self.base_delay,
+                max_delay=self.max_delay,
+            )
             raise ValueError(msg)
         return self
As per coding guidelines: "Log all error paths at WARNING or ERROR level with context before raising exceptions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/config/schema.py` around lines 65 - 74, The
_validate_delay_ordering model validator in RetryConfig currently raises
ValueError without logging; update the method to log the failure at WARNING or
ERROR level with structured context (include base_delay and max_delay and any
identifying fields on self) immediately before raising ValueError — call the
module/class logger (e.g., logger.error or logger.warning) with a clear message
like "RetryConfig validation failed: base_delay X > max_delay Y" and include
self or relevant attributes for debugging, then raise the same ValueError as
before.

Comment on lines +104 to +106
if not math.isfinite(seconds) or seconds < 0:
msg = f"pause seconds must be a finite non-negative number, got {seconds!r}"
raise ValueError(msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log invalid pause() input before raising.

This validation path raises ValueError without warning/error logging. Add contextual logging first.

🔧 Suggested patch
         if not math.isfinite(seconds) or seconds < 0:
             msg = f"pause seconds must be a finite non-negative number, got {seconds!r}"
+            logger.warning(
+                PROVIDER_RATE_LIMITER_THROTTLED,
+                provider=self._provider_name,
+                reason="invalid_pause_seconds",
+                seconds=seconds,
+            )
             raise ValueError(msg)
As per coding guidelines: "Log all error paths at WARNING or ERROR level with context before raising exceptions".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/providers/resilience/rate_limiter.py` around lines 104 - 106,
In the pause() validation branch inside the RateLimiter (the if block that
checks "if not math.isfinite(seconds) or seconds < 0"), add a log call at
WARNING or ERROR level before raising the ValueError; include the invalid
seconds value and relevant context (e.g., method name RateLimiter.pause and any
identifying instance info such as self.name or self._id if available) so the
message clearly shows what input failed, then raise the same ValueError as
before.

Comment on lines +118 to +120
if isinstance(exc, RateLimitError) and exc.retry_after is not None:
return min(exc.retry_after, self._config.max_delay)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not clamp retry_after below the provider hint.

Using min(retry_after, max_delay) can retry earlier than requested by the provider, which can cause repeated 429s and slower recovery.

🔧 Suggested patch
         if isinstance(exc, RateLimitError) and exc.retry_after is not None:
-            return min(exc.retry_after, self._config.max_delay)
+            # Provider Retry-After is authoritative; do not under-wait.
+            return exc.retry_after
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if isinstance(exc, RateLimitError) and exc.retry_after is not None:
return min(exc.retry_after, self._config.max_delay)
if isinstance(exc, RateLimitError) and exc.retry_after is not None:
# Provider Retry-After is authoritative; do not under-wait.
return exc.retry_after
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/providers/resilience/retry.py` around lines 118 - 120, The
current code clamps provider-specified retry_after down with
min(exc.retry_after, self._config.max_delay) which can make us retry earlier
than the provider requested; change that to return max(exc.retry_after,
self._config.max_delay) so we never choose a value smaller than the provider
hint (keep the RateLimitError check and reference self._config.max_delay when
updating the return expression in the retry logic in retry.py).


pytestmark = pytest.mark.timeout(30)

_PATCH_ACOMPLETION = "litellm.acompletion"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Patch target may not intercept driver calls.

The patch target "litellm.acompletion" patches the original module, but the driver imports it as import litellm as _litellm. Other integration tests use "ai_company.providers.drivers.litellm_driver._litellm.acompletion" which patches the name as looked up by the driver.

🔧 Suggested fix
-_PATCH_ACOMPLETION = "litellm.acompletion"
+_PATCH_ACOMPLETION = "ai_company.providers.drivers.litellm_driver._litellm.acompletion"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_PATCH_ACOMPLETION = "litellm.acompletion"
_PATCH_ACOMPLETION = "ai_company.providers.drivers.litellm_driver._litellm.acompletion"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/providers/test_retry_integration.py` at line 32, Update the
patch target string constant _PATCH_ACOMPLETION to the driver-qualified name
used by the code under test (e.g.
"ai_company.providers.drivers.litellm_driver._litellm.acompletion") so the test
patches the symbol as imported by the driver (replace the current value
"litellm.acompletion" with the fully qualified import path the driver uses).

Comment on lines +207 to +210
async def _empty_stream() -> None: # type: ignore[misc]
# Async generator that yields nothing (makes it an async iterable)
return
yield # pragma: no cover
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Async generator pattern is correct but could be clearer.

The return followed by yield is a valid pattern to create an empty async generator. The yield is unreachable but necessary for Python to recognize this as an async generator. Consider using an alternative pattern for clarity.

♻️ Alternative pattern
-        async def _empty_stream() -> None:  # type: ignore[misc]
-            # Async generator that yields nothing (makes it an async iterable)
-            return
-            yield  # pragma: no cover
+        async def _empty_stream():
+            # Async generator that yields nothing
+            if False:
+                yield  # pragma: no cover
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def _empty_stream() -> None: # type: ignore[misc]
# Async generator that yields nothing (makes it an async iterable)
return
yield # pragma: no cover
async def _empty_stream():
# Async generator that yields nothing
if False:
yield # pragma: no cover
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/providers/test_retry_integration.py` around lines 207 -
210, The empty async generator _empty_stream currently uses an unreachable
"return" then "yield" pattern which is valid but unclear; replace the body with
a clearer pattern such as "if False: yield  # pragma: no cover" (keeping the
async def _empty_stream() signature and type ignore) so Python still recognizes
it as an async generator while removing the misleading return statement.

Comment on lines +137 to +145
async def test_zero_retries_non_retryable_raises_unwrapped(self) -> None:
handler = RetryHandler(_fast_config(max_retries=0))
from ai_company.providers.errors import AuthenticationError

error = AuthenticationError("bad key")
func = AsyncMock(side_effect=error)
with pytest.raises(AuthenticationError):
await handler.execute(func)
func.assert_awaited_once()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Remove redundant import.

AuthenticationError is already imported at line 14. The local import at line 139 is unnecessary.

♻️ Suggested fix
     async def test_zero_retries_non_retryable_raises_unwrapped(self) -> None:
         handler = RetryHandler(_fast_config(max_retries=0))
-        from ai_company.providers.errors import AuthenticationError
-
         error = AuthenticationError("bad key")
         func = AsyncMock(side_effect=error)
         with pytest.raises(AuthenticationError):
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_zero_retries_non_retryable_raises_unwrapped(self) -> None:
handler = RetryHandler(_fast_config(max_retries=0))
from ai_company.providers.errors import AuthenticationError
error = AuthenticationError("bad key")
func = AsyncMock(side_effect=error)
with pytest.raises(AuthenticationError):
await handler.execute(func)
func.assert_awaited_once()
async def test_zero_retries_non_retryable_raises_unwrapped(self) -> None:
handler = RetryHandler(_fast_config(max_retries=0))
error = AuthenticationError("bad key")
func = AsyncMock(side_effect=error)
with pytest.raises(AuthenticationError):
await handler.execute(func)
func.assert_awaited_once()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/providers/resilience/test_retry.py` around lines 137 - 145, The
test contains a redundant local import of AuthenticationError inside
test_zero_retries_non_retryable_raises_unwrapped; remove that inner "from
ai_company.providers.errors import AuthenticationError" so the test uses the
AuthenticationError already imported at the top of the module, leaving the rest
of the test (handler = RetryHandler(...), AsyncMock side_effect, pytest.raises
assertion, and func.assert_awaited_once()) unchanged.

Comment on lines +419 to +421
provider = _ConcreteProvider()
provider._rate_limiter = mock_limiter

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Inspect BaseCompletionProvider to see if it supports rate_limiter parameter
find . -type f -name "*.py" | xargs grep -l "class BaseCompletionProvider" | head -5

Repository: Aureliolo/ai-company

Length of output: 98


🏁 Script executed:

# Check the test file at the specific lines mentioned
sed -n '415,440p' tests/unit/providers/test_protocol.py

Repository: Aureliolo/ai-company

Length of output: 1061


🏁 Script executed:

# Get the full BaseCompletionProvider class definition
rg -A 20 "class BaseCompletionProvider" --type py

Repository: Aureliolo/ai-company

Length of output: 1529


🏁 Script executed:

# Get the complete __init__ signature of BaseCompletionProvider
sed -n '1,100p' src/ai_company/providers/base.py | grep -A 15 "def __init__"

Repository: Aureliolo/ai-company

Length of output: 487


🏁 Script executed:

# Find _ConcreteProvider and _RateLimitProvider class definitions in test file
rg -B 3 -A 8 "class _ConcreteProvider|class _RateLimitProvider" tests/unit/providers/test_protocol.py

Repository: Aureliolo/ai-company

Length of output: 724


Inject the rate limiter through constructor instead of mutating private state.

Direct assignment to provider._rate_limiter violates the immutability principle. The parent BaseCompletionProvider already accepts rate_limiter as a keyword-only parameter, so both _ConcreteProvider and _RateLimitProvider should accept and forward it:

♻️ Proposed refactor
class _ConcreteProvider(BaseCompletionProvider):
     """Concrete subclass of BaseCompletionProvider for testing."""

-    def __init__(self) -> None:
-        super().__init__()
+    def __init__(self, *, rate_limiter: RateLimiter | None = None) -> None:
+        super().__init__(rate_limiter=rate_limiter)
         self._caps = ModelCapabilitiesFactory.build()


class _RateLimitProvider(BaseCompletionProvider):
     """Provider that always raises RateLimitError with retry_after."""

-    def __init__(self, retry_after: float | None = None) -> None:
-        super().__init__()
+    def __init__(
+        self,
+        retry_after: float | None = None,
+        *,
+        rate_limiter: RateLimiter | None = None,
+    ) -> None:
+        super().__init__(rate_limiter=rate_limiter)
         self._retry_after = retry_after


-        provider = _ConcreteProvider()
-        provider._rate_limiter = mock_limiter
+        provider = _ConcreteProvider(rate_limiter=mock_limiter)

-        provider = _RateLimitProvider(retry_after=5.0)
-        provider._rate_limiter = mock_limiter
+        provider = _RateLimitProvider(retry_after=5.0, rate_limiter=mock_limiter)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
provider = _ConcreteProvider()
provider._rate_limiter = mock_limiter
provider = _ConcreteProvider(rate_limiter=mock_limiter)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/providers/test_protocol.py` around lines 419 - 421, The test
mutates provider._rate_limiter directly; instead modify the constructors so
_ConcreteProvider and _RateLimitProvider accept a rate_limiter kw-only arg and
forward it to BaseCompletionProvider's rate_limiter parameter; update their
__init__ signatures to accept rate_limiter: Optional[...] and call
super().__init__(..., rate_limiter=rate_limiter) (or pass it along from
_RateLimitProvider to its parent) and change test to construct
_ConcreteProvider(rate_limiter=mock_limiter) rather than assigning to the
private attribute.

Comment on lines +428 to +442
async def test_rate_limit_error_with_retry_after_triggers_pause(self) -> None:
"""RateLimitError with retry_after triggers rate_limiter.pause."""
mock_limiter = MagicMock(spec=RateLimiter)
mock_limiter.acquire = AsyncMock()
mock_limiter.is_enabled = True

provider = _RateLimitProvider(retry_after=5.0)
provider._rate_limiter = mock_limiter

msg = ChatMessage(role=MessageRole.USER, content="Hi")
with pytest.raises(RateLimitError):
await provider.complete([msg], "test-model")

mock_limiter.pause.assert_called_once_with(5.0)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Assert limiter release on the exception path, not only pause behavior.

This test should also check release() (and optionally acquire()) to guard against semaphore leaks when rate-limited calls fail.

✅ Test hardening patch
         with pytest.raises(RateLimitError):
             await provider.complete([msg], "test-model")

+        mock_limiter.acquire.assert_awaited_once()
         mock_limiter.pause.assert_called_once_with(5.0)
+        mock_limiter.release.assert_called_once()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/providers/test_protocol.py` around lines 428 - 442, The test
test_rate_limit_error_with_retry_after_triggers_pause should also assert that
the rate limiter's semaphore is released on the exception path: after invoking
await provider.complete([...]) and catching RateLimitError, add assertions that
mock_limiter.release (and optionally mock_limiter.acquire) were called to ensure
no semaphore leak; update the assertions to call
mock_limiter.release.assert_called_once_with() (and/or check acquire was
awaited) on the _RateLimitProvider instance's mocked _rate_limiter.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 30 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/ai_company/providers/drivers/litellm_driver.py:86

  • The TYPE_CHECKING imports here (AsyncGenerator, AsyncIterator, ProviderConfig, ProviderModelConfig) are referenced in non-quoted runtime annotations elsewhere in this module (e.g., method return types and __init__(..., config: ProviderConfig)). On Python >=3.14 (PEP 649), annotations can be evaluated at runtime, so these names need to be available at runtime to avoid NameError. Please import these types unconditionally (with # noqa: TC00x as needed) or switch the annotations to strings.
if TYPE_CHECKING:
    from collections.abc import AsyncGenerator, AsyncIterator

    from ai_company.config.schema import ProviderConfig, ProviderModelConfig
    from ai_company.providers.models import (
        ChatMessage,
        CompletionConfig,
        ToolDefinition,
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +39 to +44
def __init__(self, config: RetryConfig) -> None:
self._config = config

async def execute(
self,
func: Callable[[], Coroutine[object, object, T]],
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callable/Coroutine and RetryConfig are imported under TYPE_CHECKING but used in runtime type annotations (__init__, execute). With Python >=3.14 (PEP 649) these names must be importable at runtime when annotations are evaluated/inspected, otherwise you'll get NameError. Import these types unconditionally (e.g., from collections.abc import Callable, Coroutine and from .config import RetryConfig with the appropriate # noqa: TC00x), or alternatively string-quote the annotations.

Suggested change
def __init__(self, config: RetryConfig) -> None:
self._config = config
async def execute(
self,
func: Callable[[], Coroutine[object, object, T]],
def __init__(self, config: "RetryConfig") -> None:
self._config = config
async def execute(
self,
func: "Callable[[], Coroutine[object, object, T]]",

Copilot uses AI. Check for mistakes.
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 4, 2026

Greptile Summary

This PR introduces a production-grade resilience layer (RetryHandler, RateLimiter, RetryExhaustedError) and wires it automatically into BaseCompletionProvider, so every current and future driver inherits retry-with-backoff and sliding-window rate limiting without any per-driver code. The architecture is clean: config lives in config/schema.py to break circular imports, the base class exposes two internal helpers (_resilient_execute and _rate_limited_call), and the engine layer can distinguish exhausted retries via RetryExhaustedError for fallback-chain orchestration.

Key changes:

  • RetryHandler: exponential backoff + jitter, retry-after header respect, non-retryable short-circuit, correct 1+max_retries loop semantics.
  • RateLimiter: sliding-window RPM with an asyncio lock, asyncio semaphore concurrency cap, pause-until from retry_after hints; interaction between retry sleep and rate-limiter pause is correctly designed so server-mandated wait times are always honoured.
  • base.py: wires retry/rate-limiting into complete() and stream() via _resilient_execute and _rate_limited_call; non-retryable errors produce duplicate log entries when a RetryHandler is configured (one from retry handler, one from the exception handler).
  • litellm_driver.py: additional logging gaps filled; _apply_completion_config refactored from mutating to returning a new dict (immutability improvement); _ToolCallAccumulator gains a _truncated guard to stop accumulating fragments after the 1 MiB limit is hit; tool calls with unparseable JSON arguments now return None (dropped) instead of being passed with empty arguments — a behavior change not reflected in the PR description.
  • Tests: 50 unit tests + 7 integration tests; existing test fixtures updated to use vendor-agnostic model IDs per CLAUDE.md convention.

Notable findings:

  1. Double-logging in error path: Non-retryable ProviderErrors are logged twice when retry is configured — once by RetryHandler and again by the exception handler in complete().
  2. Unreachable guard in RateLimiter.acquire(): The early-return condition can never execute from normal provider usage since disabled limiters are never stored.
  3. Undocumented behavior change: Tool calls with invalid JSON now drop instead of passing through with empty args — safety improvement but should be documented.

Confidence Score: 4/5

  • Safe to merge — core resilience logic is correct and well-tested; concerns are observability and code quality, not correctness.
  • The PR implements well-designed retry and rate-limiting logic with correct loop bounds, proper interaction between retry sleep and rate-limiter pause, and 94.7% test coverage. Three findings remain: (1) non-retryable errors are double-logged when retry is configured, creating noisy logs; (2) an unreachable early-return guard exists in RateLimiter but is harmless dead code; (3) tool calls with unparseable JSON now drop instead of passing with empty args — a safety improvement but undocumented. None are correctness bugs, and all 1540 existing tests still pass.
  • src/ai_company/providers/base.py (double-logging), src/ai_company/providers/resilience/rate_limiter.py (unreachable guard), src/ai_company/providers/drivers/litellm_driver.py (undocumented behavior change)

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Base as BaseCompletionProvider
    participant Retry as RetryHandler
    participant RL as RateLimiter
    participant Driver as _do_complete / _do_stream

    Caller->>Base: complete(messages, model)
    Base->>Base: _validate_messages / _validate_model
    Base->>Retry: execute(_attempt)

    loop up to 1 + max_retries attempts
        Retry->>Base: call _attempt()
        Base->>RL: acquire() [pause check → RPM window → semaphore]
        RL-->>Base: slot granted

        Base->>Driver: _do_complete(...)
        alt success
            Driver-->>Base: CompletionResponse
            Base->>RL: release() [via finally]
            Base-->>Retry: result
            Retry-->>Base: result
            Base-->>Caller: CompletionResponse
        else RateLimitError (retry_after)
            Driver-->>Base: RateLimitError
            Base->>RL: pause(retry_after)
            Base->>RL: release() [via finally]
            Base-->>Retry: RateLimitError
            Retry->>Retry: sleep(min(retry_after, max_delay))
        else transient ProviderError
            Driver-->>Base: ProviderError(is_retryable=True)
            Base->>RL: release() [via finally]
            Base-->>Retry: ProviderError
            Retry->>Retry: sleep(exponential backoff + jitter)
        else non-retryable ProviderError
            Driver-->>Base: ProviderError(is_retryable=False)
            Base->>RL: release() [via finally]
            Base-->>Retry: ProviderError
            Retry-->>Base: re-raise immediately
            Base-->>Caller: ProviderError
        end
    end

    Retry-->>Base: RetryExhaustedError
    Base-->>Caller: RetryExhaustedError
Loading

Comments Outside Diff (1)

  1. General comment

    Behavior change in tool call handling with unparseable JSON

    This change (from parsed = {} to return None) means tool calls with invalid JSON arguments are now silently dropped rather than passed through with empty arguments. This is a behavior change that wasn't mentioned in the PR description.

    Previous behavior: When JSON parsing failed, the function would set parsed = {} and continue to create a ToolCall with empty arguments.

    Current behavior: When JSON parsing fails, the function returns None, dropping the tool call entirely.

    Consider: If this is intentional (and safer), document it in the PR description or commit message. If not intentional, revert to the previous behavior.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/ai_company/providers/drivers/litellm_driver.py
    Line: 717-724
    
    Comment:
    **Behavior change in tool call handling with unparseable JSON**
    
    This change (from `parsed = {}` to `return None`) means tool calls with invalid JSON arguments are now silently dropped rather than passed through with empty arguments. This is a behavior change that wasn't mentioned in the PR description.
    
    Previous behavior: When JSON parsing failed, the function would set `parsed = {}` and continue to create a `ToolCall` with empty arguments.
    
    Current behavior: When JSON parsing fails, the function returns `None`, dropping the tool call entirely.
    
    Consider: If this is intentional (and safer), document it in the PR description or commit message. If not intentional, revert to the previous behavior.
    
    How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: d555773

Comment on lines 721 to +724
tool_id=self.id,
args_length=len(self.arguments) if self.arguments else 0,
)
parsed = {}
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

behavior change: tool calls with unparseable JSON now return None instead of empty dict {}

this change (from parsed = {} to return None) means tool calls with invalid JSON are now silently dropped rather than passed through with empty arguments. this is safer but represents a behavior change that wasn't mentioned in the PR description.

consider: if this is intentional, document it in the PR description or commit message. if not, revert to parsed = {}

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/providers/drivers/litellm_driver.py
Line: 721-724

Comment:
behavior change: tool calls with unparseable JSON now return `None` instead of empty dict `{}`

this change (from `parsed = {}` to `return None`) means tool calls with invalid JSON are now silently dropped rather than passed through with empty arguments. this is safer but represents a behavior change that wasn't mentioned in the PR description.

consider: if this is intentional, document it in the PR description or commit message. if not, revert to `parsed = {}`

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo and others added 2 commits March 5, 2026 07:52
- Non-ProviderError exceptions raise immediately without retry (3 tests)
- Rate limiter semaphore release on non-RateLimitError
- Streaming rate limiter slot held until consumed / released on close
- RetryConfig _validate_delay_ordering rejects base_delay > max_delay
- RetryConfig/RateLimiterConfig allow_inf_nan=False enforced (3 tests)
- Integration test for concurrent retry + rate-limit together

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Polyfactory generates random values that can violate the
base_delay <= max_delay cross-field constraint added in the previous
commit. Pin both configs to their defaults so the factory test is
deterministic.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 5, 2026 06:56
@Aureliolo Aureliolo merged commit b890545 into main Mar 5, 2026
12 checks passed
@Aureliolo Aureliolo deleted the feat/provider-resilience branch March 5, 2026 06:58
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 30 out of 31 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/ai_company/providers/drivers/litellm_driver.py:86

  • ProviderConfig/ProviderModelConfig, ChatMessage/CompletionConfig/ToolDefinition, and AsyncIterator/AsyncGenerator are imported only under TYPE_CHECKING but are used in runtime annotations throughout this module (e.g. __init__, _do_stream, _wrap_stream). This is inconsistent with other provider modules that keep annotation types importable at runtime via # noqa: TC00x (see src/ai_company/providers/routing/resolver.py:11). Consider moving these imports out of TYPE_CHECKING and adding the appropriate # noqa: TC00x so annotation evaluation can’t fail at runtime.
if TYPE_CHECKING:
    from collections.abc import AsyncGenerator, AsyncIterator

    from ai_company.config.schema import ProviderConfig, ProviderModelConfig
    from ai_company.providers.models import (
        ChatMessage,
        CompletionConfig,
        ToolDefinition,
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +5 to +22
from typing import TYPE_CHECKING, TypeVar

from ai_company.observability import get_logger
from ai_company.observability.events import (
PROVIDER_CALL_ERROR,
PROVIDER_RETRY_ATTEMPT,
PROVIDER_RETRY_EXHAUSTED,
PROVIDER_RETRY_SKIPPED,
)
from ai_company.providers.errors import ProviderError, RateLimitError

from .errors import RetryExhaustedError

if TYPE_CHECKING:
from collections.abc import Callable, Coroutine

from .config import RetryConfig

Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Callable, Coroutine, and RetryConfig are imported only under TYPE_CHECKING, but they’re referenced in runtime annotations on RetryHandler.__init__/execute. In this repo, annotation types are typically imported at runtime with # noqa: TC00x (e.g., src/ai_company/providers/protocol.py:7, src/ai_company/providers/routing/resolver.py:11) to avoid NameError if annotations are evaluated (PEP 649). Import these names at runtime (and suppress TC00x) instead of guarding them with TYPE_CHECKING.

Copilot uses AI. Check for mistakes.
Comment on lines +113 to 117
try:
result = await self._resilient_execute(_attempt)
except Exception:
logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
raise
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double-logging for non-retryable errors when retry is configured

When max_retries > 0 (i.e., a RetryHandler is wired in) and a non-retryable ProviderError is raised, two separate log entries are emitted for the same error:

  1. RetryHandler.execute() logs PROVIDER_RETRY_SKIPPED at WARNING level (reason="non_retryable").
  2. The except Exception: block here logs PROVIDER_CALL_ERROR at ERROR level.

This creates noisy, duplicate log entries for every non-retryable failure (e.g. AuthError, InvalidRequestError). Callers reading logs see two separate events for what is a single failure.

Consider suppressing the PROVIDER_CALL_ERROR log here when the exception is a ProviderError with is_retryable=False — those are already fully described by PROVIDER_RETRY_SKIPPED. Alternatively, only log PROVIDER_CALL_ERROR for RetryExhaustedError and unexpected non-provider exceptions, and leave non-retryable ProviderErrors to RetryHandler's logging.

        try:
            result = await self._resilient_execute(_attempt)
        except RetryExhaustedError:
            logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
            raise
        except ProviderError as exc:
            if exc.is_retryable or self._retry_handler is None:
                logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
            raise
        except Exception:
            logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
            raise
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/providers/base.py
Line: 113-117

Comment:
**Double-logging for non-retryable errors when retry is configured**

When `max_retries > 0` (i.e., a `RetryHandler` is wired in) and a **non-retryable** `ProviderError` is raised, two separate log entries are emitted for the same error:

1. `RetryHandler.execute()` logs `PROVIDER_RETRY_SKIPPED` at `WARNING` level (`reason="non_retryable"`).
2. The `except Exception:` block here logs `PROVIDER_CALL_ERROR` at `ERROR` level.

This creates noisy, duplicate log entries for every non-retryable failure (e.g. `AuthError`, `InvalidRequestError`). Callers reading logs see two separate events for what is a single failure.

Consider suppressing the `PROVIDER_CALL_ERROR` log here when the exception is a `ProviderError` with `is_retryable=False` — those are already fully described by `PROVIDER_RETRY_SKIPPED`. Alternatively, only log `PROVIDER_CALL_ERROR` for `RetryExhaustedError` and unexpected non-provider exceptions, and leave non-retryable `ProviderError`s to `RetryHandler`'s logging.

```python
        try:
            result = await self._resilient_execute(_attempt)
        except RetryExhaustedError:
            logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
            raise
        except ProviderError as exc:
            if exc.is_retryable or self._retry_handler is None:
                logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
            raise
        except Exception:
            logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
            raise
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +60 to +61
if not self.is_enabled and self._pause_until <= time.monotonic():
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early-return guard condition is unreachable in practice

if not self.is_enabled and self._pause_until <= time.monotonic():
    return

is_enabled is determined by the frozen RateLimiterConfig. In LiteLLMDriver.__init__(), the limiter is only passed to super().__init__() when rate_limiter.is_enabled is True:

rate_limiter=rate_limiter if rate_limiter.is_enabled else None,

So when self._rate_limiter is not None inside _rate_limited_call, is_enabled is always True, making not self.is_enabled always False. The early-return guard can therefore never trigger from normal provider usage.

Additionally, since a disabled limiter is never stored on the provider, pause() is never called on it, so _pause_until will always be 0.0. The compound condition is doubly unreachable.

If the guard is intended for direct consumers of RateLimiter, add a comment explaining that — or simplify acquire() to just skip the pause/RPM/semaphore steps individually when each sub-feature is disabled (which the code already does below the guard), removing the need for the compound early return.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/providers/resilience/rate_limiter.py
Line: 60-61

Comment:
**Early-return guard condition is unreachable in practice**

```python
if not self.is_enabled and self._pause_until <= time.monotonic():
    return
```

`is_enabled` is determined by the frozen `RateLimiterConfig`. In `LiteLLMDriver.__init__()`, the limiter is only passed to `super().__init__()` when `rate_limiter.is_enabled` is `True`:

```python
rate_limiter=rate_limiter if rate_limiter.is_enabled else None,
```

So when `self._rate_limiter` is not `None` inside `_rate_limited_call`, `is_enabled` is always `True`, making `not self.is_enabled` always `False`. The early-return guard can therefore never trigger from normal provider usage.

Additionally, since a disabled limiter is never stored on the provider, `pause()` is never called on it, so `_pause_until` will always be `0.0`. The compound condition is doubly unreachable.

If the guard is intended for direct consumers of `RateLimiter`, add a comment explaining that — or simplify `acquire()` to just skip the pause/RPM/semaphore steps individually when each sub-feature is disabled (which the code already does below the guard), removing the need for the compound early return.

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement retry logic, rate limiting, and provider error handling

2 participants