feat: add retry handler, rate limiter, and provider resilience#100
feat: add retry handler, rate limiter, and provider resilience#100
Conversation
Implement automatic retry with exponential backoff + jitter and client-side rate limiting (RPM + concurrency) for the provider layer. - Add RetryHandler with configurable backoff, retry-after respect - Add RateLimiter with sliding-window RPM and semaphore concurrency - Add RetryExhaustedError for fallback chain signaling - Wire resilience into BaseCompletionProvider (retry + rate limiting) - Add RetryConfig and RateLimiterConfig to ProviderConfig - Add 5 observability event constants for retry/rate-limit tracking - Add resilience-audit agent to PR review skill - 57 new tests (unit + integration), 94.70% overall coverage
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds provider-level resilience: configurable retry and rate-limiting models, runtime implementations (RetryHandler, RateLimiter), wiring into BaseCompletionProvider and LiteLLM driver, new observability events, extensive unit/integration tests, and docs/reviewer rules for resilience audits. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant BaseProv as BaseCompletionProvider
participant RateLimiter
participant RetryHandler
participant Driver as ProviderDriver
participant Observ as Observability
Client->>BaseProv: complete()/stream() request
BaseProv->>RateLimiter: acquire() (if enabled)
alt RateLimiter pause or RateLimitError with retry_after
RateLimiter-->>BaseProv: raises RateLimitError / enforces pause
BaseProv->>RetryHandler: execute(func) (honors retry_after)
else granted
RateLimiter-->>BaseProv: granted
BaseProv->>RetryHandler: execute(func)
end
RetryHandler->>Driver: call provider API (wrapped)
alt transient retryable error
Driver-->>RetryHandler: raises ProviderError (retryable)
RetryHandler->>Observ: emit PROVIDER_RETRY_ATTEMPT
RetryHandler-->>RetryHandler: wait backoff (compute_delay)
RetryHandler->>Driver: retry call
else success
Driver-->>RetryHandler: response
end
RetryHandler-->>BaseProv: final result or RetryExhaustedError
BaseProv->>RateLimiter: release() (if acquired)
BaseProv->>Client: return/stream result
Note right of Observ: emits attempt/exhausted/skipped/throttled/paused events
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the stability and reliability of the system's interactions with external completion providers. By introducing a centralized and configurable resilience layer, it ensures that transient network issues, provider timeouts, and rate limits are handled gracefully through automatic retries and intelligent throttling. This change reduces the likelihood of service disruptions and improves the overall robustness of the provider integration, making the system more resilient to external fluctuations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull request overview
Adds a provider-layer resilience stack (retry + rate limiting) and wires it into BaseCompletionProvider so all drivers inherit consistent backoff/throttling behavior.
Changes:
- Introduces
RetryHandler,RateLimiter, andRetryExhaustedError, plus new observability event constants. - Extends
ProviderConfigwithretryandrate_limiterconfiguration models. - Adds comprehensive unit + integration test coverage and updates test helpers/docs for the new provider initialization pattern.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/providers/test_protocol.py | Updates test providers to call BaseCompletionProvider.__init__ for new resilience fields |
| tests/unit/providers/resilience/test_retry.py | Adds unit tests for retry behavior, backoff/jitter, and logging events |
| tests/unit/providers/resilience/test_rate_limiter.py | Adds unit tests for RPM/concurrency limiting, pause behavior, and logging events |
| tests/unit/providers/resilience/test_errors.py | Adds tests for RetryExhaustedError behavior and propagation |
| tests/unit/providers/resilience/test_config.py | Adds validation/default/frozen-model tests for resilience config models |
| tests/unit/providers/resilience/conftest.py | Adds shared fixtures for resilience tests |
| tests/unit/providers/drivers/conftest.py | Extends provider config factory to set explicit retry/rate-limiter defaults for driver tests |
| tests/integration/providers/test_retry_integration.py | Adds integration tests validating end-to-end retry + rate limiting wiring via LiteLLM |
| tests/integration/providers/conftest.py | Ensures integration provider configs disable retries by default to preserve existing expectations |
| src/ai_company/providers/resilience/retry.py | Implements RetryHandler (exponential backoff, jitter, retry_after, logging) |
| src/ai_company/providers/resilience/rate_limiter.py | Implements RateLimiter (sliding-window RPM, semaphore concurrency, pause support, logging) |
| src/ai_company/providers/resilience/errors.py | Introduces RetryExhaustedError wrapper error for fallback-chain signaling |
| src/ai_company/providers/resilience/config.py | Re-exports RetryConfig / RateLimiterConfig from config.schema |
| src/ai_company/providers/resilience/init.py | Package exports for resilience components |
| src/ai_company/providers/drivers/litellm_driver.py | Wires RetryHandler and RateLimiter into BaseCompletionProvider |
| src/ai_company/providers/base.py | Adds resilience execution helpers and wraps complete()/stream() with retry/limiting |
| src/ai_company/providers/init.py | Re-exports resilience public API from ai_company.providers |
| src/ai_company/observability/events.py | Adds new resilience observability event constants |
| src/ai_company/config/schema.py | Adds RetryConfig / RateLimiterConfig models and new fields on ProviderConfig |
| CLAUDE.md | Documents resilience invariants and rules for provider-layer code |
| .claude/skills/aurelio-review-pr/skill.md | Adds “resilience-audit” agent trigger and rule set for PR review automation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return await self._rate_limited_call( | ||
| self._do_stream, | ||
| messages, | ||
| model, | ||
| tools=tools, | ||
| config=config, | ||
| ) | ||
|
|
There was a problem hiding this comment.
stream() acquires/releases the rate limiter around the coroutine that returns an AsyncIterator, so the concurrency slot is released immediately after connection setup—before the stream is consumed. This makes max_concurrent ineffective for long-lived streams. Consider adding a stream-specific wrapper that holds the limiter slot until iteration completes (release in a finally inside the generator), and only releases early if connection setup fails.
| return await self._rate_limited_call( | |
| self._do_stream, | |
| messages, | |
| model, | |
| tools=tools, | |
| config=config, | |
| ) | |
| # For streaming, we must hold any rate-limiter slot for the | |
| # lifetime of the stream, not just for the connection setup | |
| # coroutine that returns an AsyncIterator. | |
| rate_limiter: RateLimiter | None = getattr(self, "_rate_limiter", None) | |
| if rate_limiter is None: | |
| # Fallback: no rate limiting configured. | |
| return await self._do_stream( | |
| messages, | |
| model, | |
| tools=tools, | |
| config=config, | |
| ) | |
| # Acquire the limiter before establishing the stream. | |
| await rate_limiter.acquire() | |
| try: | |
| inner_stream = await self._do_stream( | |
| messages, | |
| model, | |
| tools=tools, | |
| config=config, | |
| ) | |
| except Exception: | |
| # Connection/setup failed; release the slot immediately | |
| # so that retries or other callers can proceed. | |
| rate_limiter.release() | |
| raise | |
| async def _rate_limited_stream() -> AsyncIterator[StreamChunk]: | |
| try: | |
| async for chunk in inner_stream: | |
| yield chunk | |
| finally: | |
| # Ensure the limiter slot is released once streaming | |
| # completes or if the consumer stops early. | |
| rate_limiter.release() | |
| return _rate_limited_stream() |
src/ai_company/providers/base.py
Outdated
| Args: | ||
| func: Async callable to invoke. | ||
| *args: Positional arguments for *func*. | ||
| **kwargs: Keyword arguments for *func*. | ||
|
|
||
| Returns: | ||
| The return value of *func*. | ||
| """ | ||
| if self._rate_limiter is not None: | ||
| await self._rate_limiter.acquire() | ||
| try: | ||
| return await func(*args, **kwargs) | ||
| except RateLimitError as exc: | ||
| if self._rate_limiter is not None and exc.retry_after is not None: | ||
| self._rate_limiter.pause(exc.retry_after) | ||
| raise | ||
| finally: | ||
| if self._rate_limiter is not None: |
There was a problem hiding this comment.
stream() acquires/releases the rate limiter around the coroutine that returns an AsyncIterator, so the concurrency slot is released immediately after connection setup—before the stream is consumed. This makes max_concurrent ineffective for long-lived streams. Consider adding a stream-specific wrapper that holds the limiter slot until iteration completes (release in a finally inside the generator), and only releases early if connection setup fails.
| Args: | |
| func: Async callable to invoke. | |
| *args: Positional arguments for *func*. | |
| **kwargs: Keyword arguments for *func*. | |
| Returns: | |
| The return value of *func*. | |
| """ | |
| if self._rate_limiter is not None: | |
| await self._rate_limiter.acquire() | |
| try: | |
| return await func(*args, **kwargs) | |
| except RateLimitError as exc: | |
| if self._rate_limiter is not None and exc.retry_after is not None: | |
| self._rate_limiter.pause(exc.retry_after) | |
| raise | |
| finally: | |
| if self._rate_limiter is not None: | |
| For streaming calls (where *func* returns an ``AsyncIterator``), | |
| the rate limiter slot is held for the entire lifetime of the | |
| stream and only released when iteration completes. | |
| Args: | |
| func: Async callable to invoke. | |
| *args: Positional arguments for *func*. | |
| **kwargs: Keyword arguments for *func*. | |
| Returns: | |
| The return value of *func*. If *func* returns an | |
| ``AsyncIterator``, a wrapped ``AsyncIterator`` is returned | |
| that releases the rate limiter when it is exhausted or | |
| iteration is otherwise terminated. | |
| """ | |
| acquired = False | |
| is_streaming_result = False | |
| if self._rate_limiter is not None: | |
| await self._rate_limiter.acquire() | |
| acquired = True | |
| try: | |
| result = await func(*args, **kwargs) | |
| # If the call returns an AsyncIterator, wrap it so the rate | |
| # limiter is held for the lifetime of the stream and released | |
| # only after iteration completes. | |
| if acquired and isinstance(result, AsyncIterator): | |
| is_streaming_result = True | |
| async def _wrapped_stream() -> AsyncIterator[Any]: | |
| try: | |
| async for item in result: | |
| yield item | |
| finally: | |
| if self._rate_limiter is not None: | |
| self._rate_limiter.release() | |
| return _wrapped_stream() # type: ignore[return-value] | |
| return result | |
| except RateLimitError as exc: | |
| if self._rate_limiter is not None and exc.retry_after is not None: | |
| self._rate_limiter.pause(exc.retry_after) | |
| raise | |
| finally: | |
| # For non-streaming results, release immediately after the call. | |
| # For streaming results, the wrapped iterator will release when | |
| # iteration completes. | |
| if self._rate_limiter is not None and acquired and not is_streaming_result: |
src/ai_company/providers/base.py
Outdated
| Args: | ||
| func: Async callable to invoke. | ||
| *args: Positional arguments for *func*. | ||
| **kwargs: Keyword arguments for *func*. | ||
|
|
||
| Returns: | ||
| The return value of *func*. | ||
| """ | ||
| if self._rate_limiter is not None: | ||
| await self._rate_limiter.acquire() | ||
| try: | ||
| return await func(*args, **kwargs) | ||
| except RateLimitError as exc: | ||
| if self._rate_limiter is not None and exc.retry_after is not None: | ||
| self._rate_limiter.pause(exc.retry_after) | ||
| raise | ||
| finally: | ||
| if self._rate_limiter is not None: |
There was a problem hiding this comment.
stream() acquires/releases the rate limiter around the coroutine that returns an AsyncIterator, so the concurrency slot is released immediately after connection setup—before the stream is consumed. This makes max_concurrent ineffective for long-lived streams. Consider adding a stream-specific wrapper that holds the limiter slot until iteration completes (release in a finally inside the generator), and only releases early if connection setup fails.
| Args: | |
| func: Async callable to invoke. | |
| *args: Positional arguments for *func*. | |
| **kwargs: Keyword arguments for *func*. | |
| Returns: | |
| The return value of *func*. | |
| """ | |
| if self._rate_limiter is not None: | |
| await self._rate_limiter.acquire() | |
| try: | |
| return await func(*args, **kwargs) | |
| except RateLimitError as exc: | |
| if self._rate_limiter is not None and exc.retry_after is not None: | |
| self._rate_limiter.pause(exc.retry_after) | |
| raise | |
| finally: | |
| if self._rate_limiter is not None: | |
| For streaming calls where *func* returns an ``AsyncIterator``, | |
| the rate limiter slot is held for the duration of stream | |
| consumption and released when iteration completes. | |
| Args: | |
| func: Async callable to invoke. | |
| *args: Positional arguments for *func*. | |
| **kwargs: Keyword arguments for *func*. | |
| Returns: | |
| The return value of *func* or a wrapped ``AsyncIterator`` | |
| for streaming responses. | |
| """ | |
| release_immediately = True | |
| if self._rate_limiter is not None: | |
| await self._rate_limiter.acquire() | |
| try: | |
| result = await func(*args, **kwargs) | |
| # If this is a streaming response, defer release until | |
| # the stream has been fully consumed. | |
| if self._rate_limiter is not None and isinstance(result, AsyncIterator): | |
| rate_limiter = self._rate_limiter | |
| async def _wrapped_stream() -> AsyncIterator[Any]: | |
| try: | |
| async for item in result: | |
| yield item | |
| finally: | |
| # Ensure the limiter slot is released once | |
| # the consumer finishes iterating the stream. | |
| rate_limiter.release() | |
| release_immediately = False | |
| return _wrapped_stream() | |
| return result | |
| except RateLimitError as exc: | |
| if self._rate_limiter is not None and exc.retry_after is not None: | |
| self._rate_limiter.pause(exc.retry_after) | |
| raise | |
| finally: | |
| if self._rate_limiter is not None and release_immediately: |
| def release(self) -> None: | ||
| """Release a concurrency slot.""" | ||
| if self._semaphore is not None: | ||
| self._semaphore.release() |
There was a problem hiding this comment.
asyncio.Semaphore.release() can be over-called and increase the counter beyond the intended max_concurrent, which can silently disable concurrency limiting if a caller releases without acquiring (including due to future refactors/edge paths). Prefer preventing over-release by tracking acquisition count (and making extra releases a no-op with a debug log), or using a bounded mechanism and explicitly handling the error without letting the internal counter grow.
| self._config.max_requests_per_minute > 0 or self._config.max_concurrent > 0 | ||
| ) | ||
|
|
||
| async def acquire(self) -> None: |
There was a problem hiding this comment.
pause_until is only checked once at the start of acquire(). Because acquire() awaits (pause sleep, RPM waits, semaphore waits), another task can extend the pause during those awaits and this task may proceed without re-checking the pause. Restructure acquire() to re-check _pause_until after any await (commonly by looping: check pause → sleep if needed → then attempt rpm/semaphore; after rpm/semaphore waits, re-check pause before returning).
| if not self.is_enabled and self._pause_until <= 0.0: | ||
| return | ||
|
|
||
| # Respect pause-until from retry_after | ||
| now = time.monotonic() | ||
| if self._pause_until > now: | ||
| wait = self._pause_until - now | ||
| logger.info( | ||
| PROVIDER_RATE_LIMITER_THROTTLED, | ||
| provider=self._provider_name, | ||
| wait_seconds=round(wait, 2), | ||
| reason="pause_active", | ||
| ) | ||
| await asyncio.sleep(wait) | ||
|
|
||
| # RPM sliding window | ||
| if self._config.max_requests_per_minute > 0: | ||
| await self._wait_for_rpm_slot() | ||
|
|
||
| # Concurrency semaphore | ||
| if self._semaphore is not None: | ||
| await self._semaphore.acquire() | ||
|
|
There was a problem hiding this comment.
pause_until is only checked once at the start of acquire(). Because acquire() awaits (pause sleep, RPM waits, semaphore waits), another task can extend the pause during those awaits and this task may proceed without re-checking the pause. Restructure acquire() to re-check _pause_until after any await (commonly by looping: check pause → sleep if needed → then attempt rpm/semaphore; after rpm/semaphore waits, re-check pause before returning).
| if not self.is_enabled and self._pause_until <= 0.0: | |
| return | |
| # Respect pause-until from retry_after | |
| now = time.monotonic() | |
| if self._pause_until > now: | |
| wait = self._pause_until - now | |
| logger.info( | |
| PROVIDER_RATE_LIMITER_THROTTLED, | |
| provider=self._provider_name, | |
| wait_seconds=round(wait, 2), | |
| reason="pause_active", | |
| ) | |
| await asyncio.sleep(wait) | |
| # RPM sliding window | |
| if self._config.max_requests_per_minute > 0: | |
| await self._wait_for_rpm_slot() | |
| # Concurrency semaphore | |
| if self._semaphore is not None: | |
| await self._semaphore.acquire() | |
| # Fast path: if nothing is enabled and no pause is scheduled, return immediately. | |
| if not self.is_enabled and self._pause_until <= 0.0: | |
| return | |
| while True: | |
| # Respect pause-until from retry_after | |
| now = time.monotonic() | |
| if self._pause_until > now: | |
| wait = self._pause_until - now | |
| logger.info( | |
| PROVIDER_RATE_LIMITER_THROTTLED, | |
| provider=self._provider_name, | |
| wait_seconds=round(wait, 2), | |
| reason="pause_active", | |
| ) | |
| await asyncio.sleep(wait) | |
| # After sleeping, re-check pause and limits in the next loop iteration. | |
| continue | |
| # No active pause: if limiting is disabled, we can return. | |
| if not self.is_enabled: | |
| return | |
| # RPM sliding window | |
| if self._config.max_requests_per_minute > 0: | |
| await self._wait_for_rpm_slot() | |
| # Concurrency semaphore | |
| if self._semaphore is not None: | |
| await self._semaphore.acquire() | |
| # After waiting for RPM and concurrency, re-check for a new pause | |
| now = time.monotonic() | |
| if self._pause_until > now: | |
| # A pause became active while we were waiting; release the semaphore | |
| # (if acquired) and honor the pause before proceeding. | |
| if self._semaphore is not None: | |
| self._semaphore.release() | |
| # Loop back to handle the active pause (which will sleep as needed). | |
| continue | |
| # No pause is active; we have an RPM slot and a concurrency slot. | |
| return |
| await asyncio.sleep(wait) | ||
|
|
||
| # RPM sliding window | ||
| if self._config.max_requests_per_minute > 0: | ||
| await self._wait_for_rpm_slot() | ||
|
|
||
| # Concurrency semaphore | ||
| if self._semaphore is not None: | ||
| await self._semaphore.acquire() |
There was a problem hiding this comment.
pause_until is only checked once at the start of acquire(). Because acquire() awaits (pause sleep, RPM waits, semaphore waits), another task can extend the pause during those awaits and this task may proceed without re-checking the pause. Restructure acquire() to re-check _pause_until after any await (commonly by looping: check pause → sleep if needed → then attempt rpm/semaphore; after rpm/semaphore waits, re-check pause before returning).
| # RPM sliding window | ||
| if self._config.max_requests_per_minute > 0: | ||
| await self._wait_for_rpm_slot() | ||
|
|
||
| # Concurrency semaphore | ||
| if self._semaphore is not None: | ||
| await self._semaphore.acquire() | ||
|
|
There was a problem hiding this comment.
RPM tokens are reserved before the concurrency semaphore is acquired. Under high contention, tasks can consume RPM capacity while blocked on concurrency, reducing throughput and making throttling more aggressive than intended. Consider acquiring the semaphore first (if enabled) and only appending the RPM timestamp once the call is actually allowed to proceed (or otherwise ensure a reserved RPM slot is ‘returned’ if the task later blocks/cancels before making the request).
| # RPM sliding window | |
| if self._config.max_requests_per_minute > 0: | |
| await self._wait_for_rpm_slot() | |
| # Concurrency semaphore | |
| if self._semaphore is not None: | |
| await self._semaphore.acquire() | |
| # Concurrency semaphore | |
| if self._semaphore is not None: | |
| await self._semaphore.acquire() | |
| # RPM sliding window | |
| if self._config.max_requests_per_minute > 0: | |
| await self._wait_for_rpm_slot() |
| if len(self._request_timestamps) < rpm: | ||
| self._request_timestamps.append(now) | ||
| return |
There was a problem hiding this comment.
RPM tokens are reserved before the concurrency semaphore is acquired. Under high contention, tasks can consume RPM capacity while blocked on concurrency, reducing throughput and making throttling more aggressive than intended. Consider acquiring the semaphore first (if enabled) and only appending the RPM timestamp once the call is actually allowed to proceed (or otherwise ensure a reserved RPM slot is ‘returned’ if the task later blocks/cancels before making the request).
| # Prune timestamps outside the window | ||
| self._request_timestamps = [ | ||
| t for t in self._request_timestamps if t > cutoff | ||
| ] |
There was a problem hiding this comment.
The RPM sliding-window implementation rebuilds the full timestamp list on every loop iteration, which can become allocation-heavy at higher RPM limits. A collections.deque with incremental popleft() pruning (while oldest <= cutoff) avoids repeated list allocations and keeps the operation closer to O(k) for expired entries.
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive resilience layer for provider interactions, incorporating retry mechanisms with exponential backoff and jitter, along with client-side rate limiting. The design is well-integrated into the BaseCompletionProvider, ensuring all drivers automatically inherit these capabilities. The addition of RetryConfig and RateLimiterConfig provides flexible control over these features. New observability event constants and a dedicated RetryExhaustedError enhance monitoring and fallback orchestration. The accompanying unit and integration tests are thorough, demonstrating the robustness and correctness of the new functionality.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ai_company/config/schema.py (1)
128-133: 🧹 Nitpick | 🔵 TrivialDocument newly added
ProviderConfigfields in the class docstring.
retryandrate_limiterwere added (Lines 154-161), but theAttributeslist (Lines 128-133) still omits them.📝 Proposed docstring update
class ProviderConfig(BaseModel): @@ Attributes: driver: Driver backend name (e.g. ``"litellm"``). api_key: API key (typically injected by secret management). base_url: Base URL for the provider API. models: Available models for this provider. + retry: Retry policy for transient provider errors. + rate_limiter: Client-side rate limiting policy.As per coding guidelines, "Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)".
Also applies to: 154-161
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/config/schema.py` around lines 128 - 133, The class ProviderConfig docstring is missing the newly added fields retry and rate_limiter; update the ProviderConfig class docstring Attributes section to document both fields (e.g., describe retry as retry policy/configuration for requests and rate_limiter as rate-limiting configuration or token bucket settings), ensuring descriptions follow Google-style docstring format used elsewhere so linters (ruff D) pass and readers can find these attributes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/config/schema.py`:
- Around line 37-59: The RetryConfig Pydantic model currently sets model_config
= ConfigDict(frozen=True) but does not disable infinities/NaNs, so float
constraints on base_delay, max_delay, and exponential_base can accept
float('inf'); update the model_config for RetryConfig to include
allow_inf_nan=False (in addition to frozen=True) so Pydantic rejects
infinities/NaNs and enforces the gt/le constraints on base_delay, max_delay,
exponential_base and any other float fields.
In `@src/ai_company/providers/base.py`:
- Around line 158-166: The current _attempt uses
self._rate_limited_call(self._do_stream, ...) which acquires the concurrency
token but releases it as soon as _do_stream returns an async iterator, letting
long-lived streams run outside the limiter; fix by changing the stream path so
the concurrency token is held until the stream is fully consumed: have
_rate_limited_call (or _attempt) wrap the iterator returned by _do_stream in an
async generator that yields items and only releases max_concurrent in a finally
block after iteration completes (i.e., acquire in _rate_limited_call/_attempt,
call self._do_stream to get the iterator, then iterate and yield each
StreamChunk, releasing the token in finally). Reference symbols: _attempt,
_rate_limited_call, _do_stream, and max_concurrent/semaphore release logic.
In `@src/ai_company/providers/resilience/rate_limiter.py`:
- Around line 88-99: The pause(seconds: float) method can accept negative or
non-finite values causing _pause_until to be set incorrectly; add validation at
the start of pause to ensure seconds is a finite, non-negative number (use
math.isfinite or equivalent) and raise a ValueError (or TypeError) for invalid
inputs before computing new_until and mutating self._pause_until so only valid
pauses update the limiter state; refer to function pause and attribute
_pause_until when making the change.
- Around line 63-74: The acquire() method currently checks self._pause_until
once and sleeps, but if pause() extends self._pause_until while sleeping the
method can resume too early; change acquire() to re-check the pause state in a
loop (e.g., while self._pause_until > time.monotonic()) so it recalculates wait
and logs PROVIDER_RATE_LIMITER_THROTTLED (using self._provider_name) each
iteration and only proceeds once the pause window truly expired; reference the
acquire(), pause(), and _pause_until symbols when making this change.
In `@src/ai_company/providers/resilience/retry.py`:
- Around line 67-73: The non-retryable branch in retry logic currently logs with
logger.debug then re-raises; change that to logger.warning so the immediate
raise is recorded at warning level; update the call that uses
PROVIDER_RETRY_SKIPPED and the same context args (error_type=type(exc).__name__,
reason="non_retryable") in the function/method where exc is checked (the
non-retryable branch in src/ai_company/providers/resilience/retry.py) so the log
reflects a WARNING before the raise.
In `@tests/unit/providers/resilience/test_rate_limiter.py`:
- Around line 72-77: The test allows calling RateLimiter.release() without a
prior acquire which can over-increment the underlying asyncio.Semaphore; modify
the RateLimiter class to track in-flight acquisitions (e.g., add a private
counter like self._in_flight initialized in __init__) and update acquire() to
increment this counter after a successful await self._semaphore.acquire(), and
update release() to only call self._semaphore.release() and decrement
self._in_flight when self._in_flight > 0 (ignore extra releases); ensure you
reference the RateLimiter class, its __init__, acquire, release methods, and the
existing self._semaphore in your changes and update the test to assert extra
release() calls are no-ops rather than increasing available slots.
---
Outside diff comments:
In `@src/ai_company/config/schema.py`:
- Around line 128-133: The class ProviderConfig docstring is missing the newly
added fields retry and rate_limiter; update the ProviderConfig class docstring
Attributes section to document both fields (e.g., describe retry as retry
policy/configuration for requests and rate_limiter as rate-limiting
configuration or token bucket settings), ensuring descriptions follow
Google-style docstring format used elsewhere so linters (ruff D) pass and
readers can find these attributes.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (22)
.claude/skills/aurelio-review-pr/skill.mdCLAUDE.mdsrc/ai_company/config/schema.pysrc/ai_company/observability/events.pysrc/ai_company/providers/__init__.pysrc/ai_company/providers/base.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/providers/resilience/__init__.pysrc/ai_company/providers/resilience/config.pysrc/ai_company/providers/resilience/errors.pysrc/ai_company/providers/resilience/rate_limiter.pysrc/ai_company/providers/resilience/retry.pytests/integration/providers/conftest.pytests/integration/providers/test_retry_integration.pytests/unit/providers/drivers/conftest.pytests/unit/providers/resilience/__init__.pytests/unit/providers/resilience/conftest.pytests/unit/providers/resilience/test_config.pytests/unit/providers/resilience/test_errors.pytests/unit/providers/resilience/test_rate_limiter.pytests/unit/providers/resilience/test_retry.pytests/unit/providers/test_protocol.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax withexcept A, B:(no parentheses) — ruff enforces this on Python 3.14
Add type hints to all public functions, verify with mypy strict mode
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Follow immutability principle: create new objects instead of mutating existing ones
Use Pydantic v2 models withBaseModel,model_validator, andConfigDictfor data validation
Enforce 88 character line length limit (ruff enforces this)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly and never silently swallow exceptions
Validate at system boundaries: user input, external API responses, and configuration files
Files:
tests/unit/providers/test_protocol.pysrc/ai_company/observability/events.pysrc/ai_company/config/schema.pysrc/ai_company/providers/resilience/rate_limiter.pytests/unit/providers/resilience/test_retry.pytests/unit/providers/resilience/conftest.pytests/unit/providers/resilience/test_config.pysrc/ai_company/providers/resilience/errors.pysrc/ai_company/providers/resilience/retry.pytests/integration/providers/conftest.pysrc/ai_company/providers/drivers/litellm_driver.pytests/unit/providers/resilience/test_rate_limiter.pysrc/ai_company/providers/base.pytests/unit/providers/resilience/test_errors.pysrc/ai_company/providers/__init__.pysrc/ai_company/providers/resilience/__init__.pytests/unit/providers/drivers/conftest.pysrc/ai_company/providers/resilience/config.pytests/integration/providers/test_retry_integration.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowto classify tests
Maintain minimum 80% code coverage (enforced in CI)
Useasyncio_mode = "auto"configuration — no manual@pytest.mark.asyncioneeded
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g.test-haiku-001,test-provider) — never use real vendor model IDs to avoid coupling tests to external providers
Files:
tests/unit/providers/test_protocol.pytests/unit/providers/resilience/test_retry.pytests/unit/providers/resilience/conftest.pytests/unit/providers/resilience/test_config.pytests/integration/providers/conftest.pytests/unit/providers/resilience/test_rate_limiter.pytests/unit/providers/resilience/test_errors.pytests/unit/providers/drivers/conftest.pytests/integration/providers/test_retry_integration.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST import and use:from ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport loggingorlogging.getLogger()orprint()in application code — use the structured logger from observability
Useloggeras the variable name for the structured logger (not_logger, notlog)
Use event name constants fromai_company.observability.eventsfor all logging calls
Use structured logging format:logger.info(EVENT, key=value)— never use format strings likelogger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Files:
src/ai_company/observability/events.pysrc/ai_company/config/schema.pysrc/ai_company/providers/resilience/rate_limiter.pysrc/ai_company/providers/resilience/errors.pysrc/ai_company/providers/resilience/retry.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/providers/base.pysrc/ai_company/providers/__init__.pysrc/ai_company/providers/resilience/__init__.pysrc/ai_company/providers/resilience/config.py
src/ai_company/**/{core,providers}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Pure data models, enums, and re-exports do NOT require logging
Files:
src/ai_company/providers/resilience/rate_limiter.pysrc/ai_company/providers/resilience/errors.pysrc/ai_company/providers/resilience/retry.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/providers/base.pysrc/ai_company/providers/__init__.pysrc/ai_company/providers/resilience/__init__.pysrc/ai_company/providers/resilience/config.py
🧠 Learnings (17)
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` for all logging calls
Applied to files:
src/ai_company/observability/events.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/base.py : Use `threading.Semaphore` for LLM rate limiting in BaseAgent to limit concurrent requests to max 2.
Applied to files:
src/ai_company/providers/resilience/rate_limiter.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Don't implement custom rate limiting - BaseAgent handles rate limiting with max 2 concurrent calls and respects configured timeout
Applied to files:
src/ai_company/providers/resilience/rate_limiter.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/tests/conftest.py : Place shared pytest fixtures in `tests/conftest.py`
Applied to files:
tests/unit/providers/resilience/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures
Applied to files:
tests/unit/providers/resilience/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`
Applied to files:
tests/unit/providers/resilience/conftest.py
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Applies to **/test_*.py : Use appropriate fixture scopes (`function`, `class`, `module`, `session`) and document complex fixtures with docstrings
Applied to files:
tests/unit/providers/resilience/conftest.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : 100% test coverage is MANDATORY for every commit on core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`), CI enforces this coverage requirement
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : 100% test coverage is MANDATORY for every commit. The CI enforces 100% coverage on core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`).
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : Agents must extend `BaseAgent`, use retry logic, and implement configurable timeout via settings.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py,src/memory/**/*.py,src/utils/**/*.py,src/settings.py} : Core modules (`src/agents/`, `src/services/`, `src/memory/`, `src/utils/`, `src/settings.py`) must maintain 100% test coverage
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/**/*.py : RAG retrieval failures are non-fatal—agents proceed with empty context if retrieval fails rather than raising errors.
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Use structured prompts with clear instructions including role definition, constraints, output format (JSON when needed), and context from story state
Applied to files:
.claude/skills/aurelio-review-pr/skill.md
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/**/*.py} : Use decorators `handle_ollama_errors` and `retry_with_fallback` for error handling
Applied to files:
src/ai_company/providers/resilience/retry.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
CLAUDE.md
🧬 Code graph analysis (14)
src/ai_company/providers/resilience/rate_limiter.py (2)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/config/schema.py (1)
RateLimiterConfig(66-87)
tests/unit/providers/resilience/conftest.py (2)
src/ai_company/providers/errors.py (4)
AuthenticationError(72-75)ProviderConnectionError(131-134)ProviderTimeoutError(125-128)RateLimitError(78-104)src/ai_company/config/schema.py (2)
RateLimiterConfig(66-87)RetryConfig(26-63)
tests/unit/providers/resilience/test_config.py (1)
src/ai_company/config/schema.py (2)
RateLimiterConfig(66-87)RetryConfig(26-63)
src/ai_company/providers/resilience/errors.py (1)
src/ai_company/providers/errors.py (1)
ProviderError(22-69)
src/ai_company/providers/resilience/retry.py (3)
src/ai_company/providers/errors.py (2)
ProviderError(22-69)RateLimitError(78-104)src/ai_company/providers/resilience/errors.py (1)
RetryExhaustedError(6-31)src/ai_company/config/schema.py (1)
RetryConfig(26-63)
tests/integration/providers/conftest.py (1)
src/ai_company/config/schema.py (3)
ProviderConfig(125-176)ProviderModelConfig(90-122)RetryConfig(26-63)
src/ai_company/providers/drivers/litellm_driver.py (2)
src/ai_company/providers/resilience/rate_limiter.py (2)
RateLimiter(19-134)is_enabled(48-52)src/ai_company/providers/resilience/retry.py (1)
RetryHandler(27-120)
tests/unit/providers/resilience/test_rate_limiter.py (3)
src/ai_company/config/schema.py (1)
RateLimiterConfig(66-87)src/ai_company/providers/resilience/rate_limiter.py (5)
RateLimiter(19-134)is_enabled(48-52)acquire(54-81)release(83-86)pause(88-104)src/ai_company/providers/registry.py (1)
get(66-94)
src/ai_company/providers/base.py (3)
src/ai_company/providers/errors.py (2)
InvalidRequestError(113-116)RateLimitError(78-104)src/ai_company/providers/resilience/rate_limiter.py (4)
RateLimiter(19-134)acquire(54-81)pause(88-104)release(83-86)src/ai_company/providers/resilience/retry.py (2)
RetryHandler(27-120)execute(41-96)
tests/unit/providers/resilience/test_errors.py (2)
src/ai_company/providers/errors.py (2)
ProviderError(22-69)RateLimitError(78-104)src/ai_company/providers/resilience/errors.py (1)
RetryExhaustedError(6-31)
src/ai_company/providers/__init__.py (4)
src/ai_company/providers/resilience/rate_limiter.py (1)
RateLimiter(19-134)src/ai_company/config/schema.py (2)
RateLimiterConfig(66-87)RetryConfig(26-63)src/ai_company/providers/resilience/errors.py (1)
RetryExhaustedError(6-31)src/ai_company/providers/resilience/retry.py (1)
RetryHandler(27-120)
src/ai_company/providers/resilience/__init__.py (4)
src/ai_company/config/schema.py (2)
RateLimiterConfig(66-87)RetryConfig(26-63)src/ai_company/providers/resilience/errors.py (1)
RetryExhaustedError(6-31)src/ai_company/providers/resilience/rate_limiter.py (1)
RateLimiter(19-134)src/ai_company/providers/resilience/retry.py (1)
RetryHandler(27-120)
tests/unit/providers/drivers/conftest.py (1)
src/ai_company/config/schema.py (4)
ProviderConfig(125-176)ProviderModelConfig(90-122)RateLimiterConfig(66-87)RetryConfig(26-63)
src/ai_company/providers/resilience/config.py (1)
src/ai_company/config/schema.py (2)
RateLimiterConfig(66-87)RetryConfig(26-63)
🪛 LanguageTool
.claude/skills/aurelio-review-pr/skill.md
[style] ~127-~127: Since ownership is already implied, this phrasing may be redundant.
Context: ... rules:** 1. Driver subclass implements its own retry/backoff logic instead of relying ...
(PRP_OWN)
[style] ~131-~131: This phrase is redundant. Consider using “outside”.
Context: .... asyncio.sleep used for retry delays outside of RetryHandler (MAJOR) **Soft rules (S...
(OUTSIDE_OF)
🔇 Additional comments (17)
src/ai_company/providers/resilience/config.py (1)
1-11: Clean re-export module; API surface is explicit and consistent.This keeps canonical config definitions centralized while providing a stable resilience import path.
tests/unit/providers/test_protocol.py (1)
86-88: Good fix: explicit base initialization in test providers.Calling
super().__init__()here prevents base-state drift asBaseCompletionProviderevolves.Also applies to: 129-131
src/ai_company/providers/drivers/litellm_driver.py (1)
128-138: Resilience components are wired correctly into the base provider.The conditional retry handler + optional rate limiter injection matches the intended execution path.
src/ai_company/providers/__init__.py (1)
37-43: Public exports look consistent and complete for resilience API exposure.
__all__was updated in sync with imported resilience symbols.Also applies to: 102-107
tests/unit/providers/resilience/test_errors.py (1)
11-43: Solid behavioral coverage forRetryExhaustedError.The tests validate the key contract surfaces (retryability, original error retention, and context/message propagation).
tests/unit/providers/resilience/test_config.py (1)
11-103: Good validation matrix for resilience config models.Defaults, boundary checks, and immutability assertions are well covered.
tests/integration/providers/conftest.py (1)
20-20: Nice test hardening: explicit retry disablement in provider fixtures.Setting
max_retries=0here keeps integration tests deterministic and faster.Also applies to: 55-55, 83-83, 104-104
.claude/skills/aurelio-review-pr/skill.md (1)
96-96: Resilience audit integration is well-scoped and actionable.The new trigger and hard/soft rule set are clear and aligned with provider-resilience review goals.
Also applies to: 124-136
CLAUDE.md (1)
73-82: Good documentation update for resilience behavior.This section clearly captures retry ownership, retryability policy, and fallback semantics for
RetryExhaustedError.Based on learnings: "When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes."
src/ai_company/observability/events.py (1)
57-63: Event taxonomy extension looks solid.These constants provide the right hooks for retry and rate-limiter telemetry without changing runtime behavior.
tests/integration/providers/test_retry_integration.py (1)
73-236: Integration coverage is comprehensive and targeted.The suite exercises core retry/rate-limit paths, exhaustion behavior, and disabled-retry semantics with clear assertions.
src/ai_company/providers/resilience/errors.py (1)
6-31:RetryExhaustedErrordesign is clean and useful.Preserving
original_errorplus immutable copied context gives good debugging/fallback ergonomics.tests/unit/providers/resilience/test_rate_limiter.py (1)
20-71: The rest of the RateLimiter test matrix is strong.Concurrency blocking, pause extension semantics, and observability assertions are well covered.
Also applies to: 79-168
tests/unit/providers/drivers/conftest.py (1)
8-13: Fixture update improves test isolation.Defaulting to
RetryConfig(max_retries=0)in this helper makes driver-level error mapping tests more deterministic.Also applies to: 21-34, 57-58
tests/unit/providers/resilience/test_retry.py (1)
40-246: Excellent RetryHandler behavioral coverage.This suite validates execution flow, backoff policy, and observability events across the most important resilience edge cases.
src/ai_company/providers/resilience/__init__.py (1)
1-18: Clean and consistent resilience re-export surface.Package-level exports and
__all__are aligned and make the resilience API easy to consume.tests/unit/providers/resilience/conftest.py (1)
15-72: Fixture coverage is solid and test-friendly.Good reuse surface for deterministic resilience tests (configs + retryable/non-retryable error variants).
| async def test_release_without_acquire_does_not_crash(self) -> None: | ||
| config = RateLimiterConfig(max_concurrent=2) | ||
| limiter = RateLimiter(config, provider_name="test-provider") | ||
| # Extra release (semaphore goes above initial count, but doesn't crash) | ||
| limiter.release() | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify current limiter release semantics and the test expectation that permits over-release.
echo "=== Checking RateLimiter release() implementation ==="
rg -n -C3 'def release\(|_semaphore\.release\(' src/ai_company/providers/resilience/rate_limiter.py
echo -e "\n=== Checking test file for release_without_acquire test ==="
rg -n -C3 'test_release_without_acquire|semaphore goes above initial count|limiter.release\(\)' tests/unit/providers/resilience/test_rate_limiter.pyRepository: Aureliolo/ai-company
Length of output: 2140
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check RateLimiter class initialization and imports
echo "=== RateLimiter class initialization ==="
rg -n -B5 -A15 'class RateLimiter|def __init__' src/ai_company/providers/resilience/rate_limiter.py | head -50
echo -e "\n=== Checking imports in rate_limiter.py ==="
head -30 src/ai_company/providers/resilience/rate_limiter.py
echo -e "\n=== Full acquire/release methods ==="
rg -n -B2 -A10 'async def acquire|def release' src/ai_company/providers/resilience/rate_limiter.pyRepository: Aureliolo/ai-company
Length of output: 2879
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify asyncio.Semaphore behavior documentation to confirm over-release capability
echo "=== Checking if there are any guards against over-release in the codebase ==="
rg -n 'self\._semaphore|_acquire_count|_release_count|_acquired|bounded' src/ai_company/providers/resilience/rate_limiter.py
echo -e "\n=== Check full rate_limiter.py file to see if there's state tracking ==="
wc -l src/ai_company/providers/resilience/rate_limiter.py
echo -e "\n=== Check if release() appears in __init__ or elsewhere ==="
rg -n 'def __init__|def release|def acquire|self\._' src/ai_company/providers/resilience/rate_limiter.py | head -30Repository: Aureliolo/ai-company
Length of output: 1987
Guard release() against over-release to maintain max_concurrent guarantees.
The test at lines 72–76 accepts release() without prior acquire(), which allows the internal semaphore counter to exceed the configured limit. With asyncio.Semaphore, this permits more concurrent acquisitions than max_concurrent specifies, breaking the concurrency guarantee.
Add a guard in release() to track in-flight requests or prevent releases beyond acquired slots. Update the test to verify that extra release() calls are safely ignored rather than normalized.
Suggested changes
- async def test_release_without_acquire_does_not_crash(self) -> None:
+ async def test_release_without_acquire_is_noop(self) -> None:
config = RateLimiterConfig(max_concurrent=2)
limiter = RateLimiter(config, provider_name="test-provider")
- # Extra release (semaphore goes above initial count, but doesn't crash)
+ # Extra release should not increase effective capacity
limiter.release()In src/ai_company/providers/resilience/rate_limiter.py, track acquired slots:
def __init__(self, ...):
# ... existing code ...
self._in_flight: int = 0
async def acquire(self) -> None:
# ... existing code ...
if self._semaphore is not None:
await self._semaphore.acquire()
self._in_flight += 1
def release(self) -> None:
if self._semaphore is not None and self._in_flight > 0:
self._semaphore.release()
self._in_flight -= 1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/providers/resilience/test_rate_limiter.py` around lines 72 - 77,
The test allows calling RateLimiter.release() without a prior acquire which can
over-increment the underlying asyncio.Semaphore; modify the RateLimiter class to
track in-flight acquisitions (e.g., add a private counter like self._in_flight
initialized in __init__) and update acquire() to increment this counter after a
successful await self._semaphore.acquire(), and update release() to only call
self._semaphore.release() and decrement self._in_flight when self._in_flight > 0
(ignore extra releases); ensure you reference the RateLimiter class, its
__init__, acquire, release methods, and the existing self._semaphore in your
changes and update the test to assert extra release() calls are no-ops rather
than increasing available slots.
…and Gemini - C1: add asyncio.Lock around RPM sliding-window to prevent race condition - C2: fix early-return in RateLimiter.acquire (use monotonic clock comparison) - C3: _ToolCallAccumulator.build() returns None on JSON parse failure - M1: add acquired flag in _rate_limited_call to prevent double-release - M2: wrap stream in _hold_slot_for_stream to hold concurrency slot for stream lifetime - M3: remove dead *args/**kwargs from RetryHandler.execute; tighten to Callable[[], Coroutine] - M4: add _truncated flag to _ToolCallAccumulator; update() no-ops after truncation - M5: _apply_completion_config returns new dict instead of mutating in-place - M6: replace all vendor model IDs in test fixtures with fake test-model-XXX IDs - M7: add TestBaseCompletionProviderResilience tests for rate limiter wiring - M8: add catch-all except in RetryHandler to log and re-raise non-ProviderErrors - M9: validate pause() argument is finite non-negative; raise ValueError otherwise - M10: change pause check in acquire() from single check to while loop - D1: replace assert with explicit RuntimeError guard in RetryHandler - D2: add model_validator to RetryConfig ensuring base_delay <= max_delay - D3: add test for zero-retries non-retryable error passes through unwrapped - D4: add TestRateLimiterRPMThrottling with time-mocked RPM throttle tests - D5: add TestStreamRetryIntegration for stream retry on connection error - D6: fix resource leak in test_concurrent_limit with try/finally cancel - D7: add test_internal_error_is_retried to verify ProviderInternalError retries - D9/D11: _wrap_stream returns AsyncGenerator type annotation + logging before raise - D10: add Args sections to abstract method docstrings in BaseCompletionProvider - D12: change non-retryable branch from logger.debug to logger.warning - D13: add allow_inf_nan=False to RetryConfig and RateLimiterConfig ConfigDict - D14: update ProviderConfig docstring to include retry and rate_limiter attributes - D15: increase timing tolerances in pause tests to reduce flakiness - N1: add explicit class-level type annotations to _ToolCallAccumulator - N2: update _compute_delay docstring to clarify zero-based retry counter - N3/N5: add logger.warning before raises in _build_model_lookup - N4: add logger.error before raise in _map_response - N6: change TestRetryHandlerBackoff async def to def (_compute_delay is sync) - Move RateLimiterConfig import out of TYPE_CHECKING to fix PEP 649 inspection - Update test_streaming_malformed_json_tool_call to expect dropped tool call (C3)
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive resilience layer with retry and rate-limiting capabilities for provider communications. The implementation is well-structured, with clear separation of concerns into RetryHandler and RateLimiter classes, which are seamlessly integrated into the BaseCompletionProvider. The addition of configuration models and extensive tests is commendable. My review includes a few suggestions to enhance type safety, robustness in tool call argument parsing, and performance in the rate limiter.
Note: Security Review did not run due to the size of the PR.
| async def _hold_slot_for_stream( | ||
| inner: AsyncIterator[Any], | ||
| ) -> AsyncIterator[Any]: | ||
| try: | ||
| async for chunk in inner: | ||
| yield chunk | ||
| finally: | ||
| rate_limiter.release() # type: ignore[union-attr] | ||
|
|
||
| return _hold_slot_for_stream(result) # type: ignore[return-value] |
There was a problem hiding this comment.
To improve type safety and remove the type: ignore comments, you can make _hold_slot_for_stream generic. This requires adding _ChunkT = TypeVar("_ChunkT") at the module level near the definition of _T. The type: ignore on rate_limiter.release() can be replaced with an assert, and the one on the return statement will no longer be needed.
async def _hold_slot_for_stream(
inner: AsyncIterator[_ChunkT],
) -> AsyncIterator[_ChunkT]:
# The logic to reach here ensures rate_limiter is not None.
assert rate_limiter is not None
try:
async for chunk in inner:
yield chunk
finally:
rate_limiter.release()
return _hold_slot_for_stream(result)| args: dict[str, Any] = parsed if isinstance(parsed, dict) else {} | ||
| return ToolCall(id=self.id, name=self.name, arguments=args) |
There was a problem hiding this comment.
If json.loads produces a valid JSON value that is not an object (e.g., a list or string), it's silently converted to an empty dictionary. This could hide issues with the data from the provider, as tool arguments are expected to be a JSON object. For better robustness, consider treating this as a failure case, similar to a JSONDecodeError.
if not isinstance(parsed, dict):
logger.warning(
PROVIDER_TOOL_CALL_ARGUMENTS_PARSE_FAILED,
tool_name=self.name,
tool_id=self.id,
reason="arguments_not_a_json_object",
args_type=type(parsed).__name__,
)
return None
return ToolCall(id=self.id, name=self.name, arguments=parsed)| # Prune timestamps outside the window | ||
| self._request_timestamps = [ | ||
| t for t in self._request_timestamps if t > cutoff | ||
| ] |
There was a problem hiding this comment.
The current implementation of pruning timestamps by creating a new list can be inefficient under high RPMs. For better performance, consider using a collections.deque for _request_timestamps. This allows for efficient O(1) removal of old timestamps.
This would require two changes:
- Import
dequefromcollections. - Change the type of
_request_timestampsin__init__todeque[float]. - Replace this list comprehension with a loop that uses
popleft():
# In _wait_for_rpm_slot
cutoff = now - window
while self._request_timestamps and self._request_timestamps[0] <= cutoff:
self._request_timestamps.popleft()There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/config/schema.py`:
- Around line 65-74: The _validate_delay_ordering model validator in RetryConfig
currently raises ValueError without logging; update the method to log the
failure at WARNING or ERROR level with structured context (include base_delay
and max_delay and any identifying fields on self) immediately before raising
ValueError — call the module/class logger (e.g., logger.error or logger.warning)
with a clear message like "RetryConfig validation failed: base_delay X >
max_delay Y" and include self or relevant attributes for debugging, then raise
the same ValueError as before.
In `@src/ai_company/providers/resilience/rate_limiter.py`:
- Around line 104-106: In the pause() validation branch inside the RateLimiter
(the if block that checks "if not math.isfinite(seconds) or seconds < 0"), add a
log call at WARNING or ERROR level before raising the ValueError; include the
invalid seconds value and relevant context (e.g., method name RateLimiter.pause
and any identifying instance info such as self.name or self._id if available) so
the message clearly shows what input failed, then raise the same ValueError as
before.
In `@src/ai_company/providers/resilience/retry.py`:
- Around line 118-120: The current code clamps provider-specified retry_after
down with min(exc.retry_after, self._config.max_delay) which can make us retry
earlier than the provider requested; change that to return max(exc.retry_after,
self._config.max_delay) so we never choose a value smaller than the provider
hint (keep the RateLimitError check and reference self._config.max_delay when
updating the return expression in the retry logic in retry.py).
In `@tests/integration/providers/test_retry_integration.py`:
- Line 32: Update the patch target string constant _PATCH_ACOMPLETION to the
driver-qualified name used by the code under test (e.g.
"ai_company.providers.drivers.litellm_driver._litellm.acompletion") so the test
patches the symbol as imported by the driver (replace the current value
"litellm.acompletion" with the fully qualified import path the driver uses).
- Around line 207-210: The empty async generator _empty_stream currently uses an
unreachable "return" then "yield" pattern which is valid but unclear; replace
the body with a clearer pattern such as "if False: yield # pragma: no cover"
(keeping the async def _empty_stream() signature and type ignore) so Python
still recognizes it as an async generator while removing the misleading return
statement.
In `@tests/unit/providers/resilience/test_retry.py`:
- Around line 137-145: The test contains a redundant local import of
AuthenticationError inside test_zero_retries_non_retryable_raises_unwrapped;
remove that inner "from ai_company.providers.errors import AuthenticationError"
so the test uses the AuthenticationError already imported at the top of the
module, leaving the rest of the test (handler = RetryHandler(...), AsyncMock
side_effect, pytest.raises assertion, and func.assert_awaited_once()) unchanged.
In `@tests/unit/providers/test_protocol.py`:
- Around line 428-442: The test
test_rate_limit_error_with_retry_after_triggers_pause should also assert that
the rate limiter's semaphore is released on the exception path: after invoking
await provider.complete([...]) and catching RateLimitError, add assertions that
mock_limiter.release (and optionally mock_limiter.acquire) were called to ensure
no semaphore leak; update the assertions to call
mock_limiter.release.assert_called_once_with() (and/or check acquire was
awaited) on the _RateLimitProvider instance's mocked _rate_limiter.
- Around line 419-421: The test mutates provider._rate_limiter directly; instead
modify the constructors so _ConcreteProvider and _RateLimitProvider accept a
rate_limiter kw-only arg and forward it to BaseCompletionProvider's rate_limiter
parameter; update their __init__ signatures to accept rate_limiter:
Optional[...] and call super().__init__(..., rate_limiter=rate_limiter) (or pass
it along from _RateLimitProvider to its parent) and change test to construct
_ConcreteProvider(rate_limiter=mock_limiter) rather than assigning to the
private attribute.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (19)
src/ai_company/config/schema.pysrc/ai_company/providers/base.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/providers/resilience/rate_limiter.pysrc/ai_company/providers/resilience/retry.pytests/integration/providers/conftest.pytests/integration/providers/test_anthropic_pipeline.pytests/integration/providers/test_error_scenarios.pytests/integration/providers/test_ollama_pipeline.pytests/integration/providers/test_openrouter_pipeline.pytests/integration/providers/test_retry_integration.pytests/integration/providers/test_tool_calling_pipeline.pytests/unit/budget/conftest.pytests/unit/budget/test_cost_record.pytests/unit/providers/drivers/conftest.pytests/unit/providers/drivers/test_litellm_driver.pytests/unit/providers/resilience/test_rate_limiter.pytests/unit/providers/resilience/test_retry.pytests/unit/providers/test_protocol.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax withexcept A, B:(no parentheses) — ruff enforces this on Python 3.14
Add type hints to all public functions, verify with mypy strict mode
Use Google-style docstrings, required on all public classes and functions (enforced by ruff D rules)
Follow immutability principle: create new objects instead of mutating existing ones
Use Pydantic v2 models withBaseModel,model_validator, andConfigDictfor data validation
Enforce 88 character line length limit (ruff enforces this)
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly and never silently swallow exceptions
Validate at system boundaries: user input, external API responses, and configuration files
Files:
src/ai_company/providers/resilience/retry.pytests/integration/providers/test_tool_calling_pipeline.pytests/integration/providers/test_error_scenarios.pytests/integration/providers/test_anthropic_pipeline.pytests/integration/providers/conftest.pytests/integration/providers/test_ollama_pipeline.pytests/unit/budget/test_cost_record.pytests/integration/providers/test_retry_integration.pytests/unit/providers/drivers/test_litellm_driver.pytests/unit/providers/resilience/test_retry.pytests/integration/providers/test_openrouter_pipeline.pytests/unit/budget/conftest.pytests/unit/providers/resilience/test_rate_limiter.pysrc/ai_company/config/schema.pysrc/ai_company/providers/resilience/rate_limiter.pytests/unit/providers/drivers/conftest.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/providers/base.pytests/unit/providers/test_protocol.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST import and use:from ai_company.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport loggingorlogging.getLogger()orprint()in application code — use the structured logger from observability
Useloggeras the variable name for the structured logger (not_logger, notlog)
Use event name constants fromai_company.observability.eventsfor all logging calls
Use structured logging format:logger.info(EVENT, key=value)— never use format strings likelogger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Files:
src/ai_company/providers/resilience/retry.pysrc/ai_company/config/schema.pysrc/ai_company/providers/resilience/rate_limiter.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/providers/base.py
src/ai_company/**/{core,providers}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Pure data models, enums, and re-exports do NOT require logging
Files:
src/ai_company/providers/resilience/retry.pysrc/ai_company/providers/resilience/rate_limiter.pysrc/ai_company/providers/drivers/litellm_driver.pysrc/ai_company/providers/base.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowto classify tests
Maintain minimum 80% code coverage (enforced in CI)
Useasyncio_mode = "auto"configuration — no manual@pytest.mark.asyncioneeded
Enforce 30 second timeout per test
Use vendor-agnostic fake model IDs/names in tests (e.g.test-haiku-001,test-provider) — never use real vendor model IDs to avoid coupling tests to external providers
Files:
tests/integration/providers/test_tool_calling_pipeline.pytests/integration/providers/test_error_scenarios.pytests/integration/providers/test_anthropic_pipeline.pytests/integration/providers/conftest.pytests/integration/providers/test_ollama_pipeline.pytests/unit/budget/test_cost_record.pytests/integration/providers/test_retry_integration.pytests/unit/providers/drivers/test_litellm_driver.pytests/unit/providers/resilience/test_retry.pytests/integration/providers/test_openrouter_pipeline.pytests/unit/budget/conftest.pytests/unit/providers/resilience/test_rate_limiter.pytests/unit/providers/drivers/conftest.pytests/unit/providers/test_protocol.py
🧠 Learnings (16)
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to src/ai_company/**/*.py : Log all error paths at WARNING or ERROR level with context before raising exceptions
Applied to files:
src/ai_company/providers/resilience/retry.pysrc/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Implement graceful error recovery: retry with different prompts if needed, fall back to simpler approaches on failure, and don't fail silently - log and raise appropriate exceptions
Applied to files:
src/ai_company/providers/resilience/retry.py
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to tests/**/*.py : Use vendor-agnostic fake model IDs/names in tests (e.g. `test-haiku-001`, `test-provider`) — never use real vendor model IDs to avoid coupling tests to external providers
Applied to files:
tests/integration/providers/test_error_scenarios.pytests/integration/providers/test_anthropic_pipeline.pytests/integration/providers/conftest.pytests/unit/providers/drivers/test_litellm_driver.pytests/integration/providers/test_openrouter_pipeline.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
tests/integration/providers/test_error_scenarios.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Applied to files:
tests/integration/providers/test_error_scenarios.pytests/integration/providers/test_anthropic_pipeline.pytests/unit/providers/drivers/test_litellm_driver.pytests/integration/providers/test_openrouter_pipeline.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Mock models in tests must use a name from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`) - fake model names cause `ValueError: No model tagged for role`.
Applied to files:
tests/integration/providers/test_anthropic_pipeline.pytests/unit/providers/drivers/test_litellm_driver.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to tests/**/*.py : Mock Ollama in tests to avoid requiring running instance - use model names from `RECOMMENDED_MODELS` (e.g., `huihui_ai/dolphin3-abliterated:8b`)
Applied to files:
tests/integration/providers/test_anthropic_pipeline.pytests/integration/providers/test_ollama_pipeline.pytests/unit/providers/drivers/test_litellm_driver.pytests/integration/providers/test_openrouter_pipeline.py
📚 Learning: 2026-01-24T16:33:29.354Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-24T16:33:29.354Z
Learning: Applies to {src/agents/**/*.py,src/services/model_service.py} : Respect existing model configuration patterns in Ollama integration
Applied to files:
tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Mock Ollama API responses to support both dict (`models.get("models")`) and object (`response.models`) patterns in test mocks.
Applied to files:
tests/integration/providers/test_ollama_pipeline.pytests/integration/providers/test_openrouter_pipeline.pytests/unit/providers/drivers/conftest.py
📚 Learning: 2026-01-31T13:51:16.868Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-01-31T13:51:16.868Z
Learning: Applies to tests/**/*.py : Ollama API response mocks must support both dict pattern (`models.get('models')`) and object pattern (`response.models`) to match actual API behavior.
Applied to files:
tests/integration/providers/test_ollama_pipeline.pytests/unit/providers/drivers/test_litellm_driver.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to src/services/model_service.py : Default Ollama endpoint is `http://localhost:11434`. Respect existing model configuration patterns.
Applied to files:
tests/integration/providers/test_ollama_pipeline.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to src/agents/base.py : Use `threading.Semaphore` for LLM rate limiting in BaseAgent to limit concurrent requests to max 2.
Applied to files:
tests/unit/providers/resilience/test_rate_limiter.pysrc/ai_company/providers/resilience/rate_limiter.pysrc/ai_company/providers/base.py
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to src/ai_company/**/*.py : Use event name constants from `ai_company.observability.events` for all logging calls
Applied to files:
src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to src/ai_company/**/*.py : Every module with business logic MUST import and use: `from ai_company.observability import get_logger` then `logger = get_logger(__name__)`
Applied to files:
src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-03-02T19:44:07.518Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-02T19:44:07.518Z
Learning: Applies to src/ai_company/**/*.py : Never use `import logging` or `logging.getLogger()` or `print()` in application code — use the structured logger from observability
Applied to files:
src/ai_company/providers/drivers/litellm_driver.py
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/*.py : Don't implement custom rate limiting - BaseAgent handles rate limiting with max 2 concurrent calls and respects configured timeout
Applied to files:
src/ai_company/providers/base.py
🧬 Code graph analysis (9)
src/ai_company/providers/resilience/retry.py (4)
src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/providers/errors.py (2)
ProviderError(22-69)RateLimitError(78-104)src/ai_company/providers/resilience/errors.py (1)
RetryExhaustedError(6-31)src/ai_company/config/schema.py (1)
RetryConfig(26-74)
tests/integration/providers/test_anthropic_pipeline.py (1)
tests/integration/providers/conftest.py (1)
user_messages(270-272)
tests/integration/providers/conftest.py (1)
src/ai_company/config/schema.py (3)
ProviderConfig(136-189)ProviderModelConfig(101-133)RetryConfig(26-74)
tests/integration/providers/test_retry_integration.py (6)
src/ai_company/config/schema.py (4)
ProviderConfig(136-189)ProviderModelConfig(101-133)RateLimiterConfig(77-98)RetryConfig(26-74)src/ai_company/providers/enums.py (2)
MessageRole(6-12)StreamEventType(25-32)src/ai_company/providers/errors.py (3)
AuthenticationError(72-75)ProviderTimeoutError(125-128)RateLimitError(78-104)src/ai_company/providers/models.py (1)
ChatMessage(114-186)src/ai_company/providers/resilience/errors.py (1)
RetryExhaustedError(6-31)tests/integration/providers/conftest.py (1)
build_model_response(112-143)
tests/unit/providers/resilience/test_retry.py (4)
src/ai_company/providers/errors.py (4)
AuthenticationError(72-75)ProviderConnectionError(131-134)ProviderInternalError(137-140)RateLimitError(78-104)src/ai_company/config/schema.py (1)
RetryConfig(26-74)src/ai_company/providers/resilience/errors.py (1)
RetryExhaustedError(6-31)src/ai_company/providers/resilience/retry.py (3)
RetryHandler(28-127)execute(42-102)_compute_delay(104-127)
tests/unit/providers/drivers/conftest.py (1)
src/ai_company/config/schema.py (4)
ProviderConfig(136-189)ProviderModelConfig(101-133)RateLimiterConfig(77-98)RetryConfig(26-74)
src/ai_company/providers/drivers/litellm_driver.py (4)
src/ai_company/providers/resilience/rate_limiter.py (2)
RateLimiter(18-151)is_enabled(48-52)src/ai_company/providers/resilience/retry.py (1)
RetryHandler(28-127)src/ai_company/config/schema.py (1)
ProviderModelConfig(101-133)src/ai_company/providers/models.py (2)
StreamChunk(285-363)ToolCall(73-95)
src/ai_company/providers/base.py (4)
src/ai_company/providers/errors.py (2)
InvalidRequestError(113-116)RateLimitError(78-104)src/ai_company/providers/resilience/rate_limiter.py (4)
RateLimiter(18-151)acquire(54-84)release(86-89)pause(91-114)src/ai_company/providers/resilience/retry.py (2)
RetryHandler(28-127)execute(42-102)src/ai_company/providers/models.py (2)
CompletionResponse(233-282)StreamChunk(285-363)
tests/unit/providers/test_protocol.py (3)
src/ai_company/providers/errors.py (1)
RateLimitError(78-104)src/ai_company/providers/resilience/rate_limiter.py (5)
RateLimiter(18-151)acquire(54-84)is_enabled(48-52)release(86-89)pause(91-114)tests/unit/providers/conftest.py (1)
ModelCapabilitiesFactory(93-105)
🔇 Additional comments (31)
tests/unit/budget/conftest.py (1)
120-120: Model fixture update is aligned and safe.Using
"test-model-001"keeps this fixture consistent with the suite’s test-model convention.As per coding guidelines: "Use vendor-agnostic fake model IDs/names in tests (e.g.
test-haiku-001,test-provider) — never use real vendor model IDs to avoid coupling tests to external providers".tests/unit/budget/test_cost_record.py (1)
24-24: Expectation update is correct.The assertion now matches the updated fixture model value and keeps tests decoupled from real vendor IDs.
As per coding guidelines: "Use vendor-agnostic fake model IDs/names in tests (e.g.
test-haiku-001,test-provider) — never use real vendor model IDs to avoid coupling tests to external providers".src/ai_company/config/schema.py (1)
167-174: Provider-level resilience defaults are wired cleanly.Using
default_factoryfor bothretryandrate_limiteravoids shared mutable defaults and keeps provider config construction safe.src/ai_company/providers/resilience/retry.py (1)
64-70: Non-retryable failure path is correctly surfaced.Logging at warning level before re-raise improves production visibility of terminal provider errors.
Based on learnings: "Applies to src/ai_company/**/*.py : Log all error paths at WARNING or ERROR level with context before raising exceptions".
src/ai_company/providers/resilience/rate_limiter.py (1)
65-77: Pause-window re-check loop is a solid fix.The looped pause check prevents early resume when pause windows are extended concurrently.
tests/integration/providers/conftest.py (1)
20-240: Integration fixtures are consistently updated for resilience and test-model naming.
RetryConfig(max_retries=0)wiring and thetest-model-*defaults are coherent across provider builders and streaming/non-streaming helpers.As per coding guidelines: "Use vendor-agnostic fake model IDs/names in tests (e.g.
test-haiku-001,test-provider) — never use real vendor model IDs to avoid coupling tests to external providers".tests/integration/providers/test_ollama_pipeline.py (1)
96-96: Updated model assertion is correct.This aligns the pipeline expectation with the shared Ollama integration fixture data.
tests/integration/providers/test_tool_calling_pipeline.py (1)
261-280: Malformed streamed tool-call behavior is now tested clearly.The updated expectation (
len(tc_chunks) == 0) cleanly validates drop-on-malformed-JSON behavior.tests/integration/providers/test_openrouter_pipeline.py (1)
58-58: LGTM!The updates to use vendor-agnostic model identifiers (
test-model-openrouter-001,test-model-openrouter-002) align with the coding guidelines for decoupling tests from real provider model IDs.Also applies to: 112-122
src/ai_company/providers/base.py (3)
60-67: LGTM!The constructor properly accepts optional
retry_handlerandrate_limiterdependencies, enabling resilience features to be injected by concrete drivers.
263-277: LGTM!The
_resilient_executehelper cleanly delegates to the retry handler when configured, falling back to direct execution otherwise.
303-335: Stream slot ownership is now correctly managed.The implementation properly transfers concurrency slot ownership to a wrapper generator for streaming results, ensuring the slot is held for the full lifetime of the iterator. The
_hold_slot_for_streamwrapper releases the slot in itsfinallyblock only after iteration completes or fails. This addresses the previous concern about premature slot release.src/ai_company/providers/drivers/litellm_driver.py (5)
129-139: LGTM!The resilience wiring correctly instantiates
RetryHandleronly when retries are configured (max_retries > 0) andRateLimiteronly when enabled, then passes them to the base class constructor.
261-282: LGTM!Enhanced validation with proper error logging for duplicate model IDs and alias collisions before raising
ValueError. This improves debuggability.
603-621: LGTM!Refactored to return a new merged dict instead of mutating the input, aligning with the immutability principle from the coding guidelines.
669-724: LGTM!The
_ToolCallAccumulatorimprovements properly track truncation state and returnNoneon JSON parse failures, preventing malformed tool calls from propagating.
563-568: PEP 758 except syntax is valid and correctly implemented.The
except ValueError, TypeError:syntax is confirmed to be valid in Python 3.14 per PEP 758 (final). The code correctly uses this syntax without anasclause, which aligns with the project's coding guidelines. No changes needed.tests/integration/providers/test_error_scenarios.py (1)
72-72: LGTM!All LiteLLM exception constructors now use the vendor-agnostic
anthropic/test-model-001identifier, consistent with the coding guidelines.Also applies to: 89-89, 99-99, 108-108, 122-122
tests/integration/providers/test_anthropic_pipeline.py (1)
50-50: LGTM!Model assertions updated to use vendor-agnostic identifiers (
test-model-001,test-model-002), consistent with the test helper defaults and coding guidelines.Also applies to: 70-70, 85-88, 178-188
tests/unit/providers/resilience/test_retry.py (2)
1-39: LGTM!Comprehensive test setup with proper markers and a
_fast_confighelper that creates quick RetryConfig instances with minimal delays suitable for testing.
41-259: LGTM!Thorough test coverage for RetryHandler including success paths, exhaustion behavior, non-retryable error handling, backoff calculations (with and without jitter), retry-after semantics, and logging verification.
tests/unit/providers/resilience/test_rate_limiter.py (4)
1-36: LGTM!Test setup properly covers disabled mode with appropriate assertions for
is_enabledand no-op behavior.
74-78: Guard against over-release already flagged.This test documents that
release()without prioracquire()doesn't crash, but allows the semaphore counter to exceed the configured limit. This breaks themax_concurrentguarantee. A previous review already flagged this concern with a suggested fix to track in-flight requests.
38-72: LGTM!Concurrency limit tests properly verify that the third acquire blocks until a slot is released, with appropriate cleanup handling.
81-266: LGTM!Comprehensive test coverage for RPM controls, pause semantics (including extension and rejection of invalid values), throttling behavior with mocked time, and logging verification.
tests/integration/providers/test_retry_integration.py (2)
35-71: LGTM!Well-structured test helpers with vendor-agnostic identifiers and configurable retry/rate-limiter settings.
73-188: LGTM!Comprehensive integration tests covering retry success after transient failure, retry exhaustion, non-retryable error handling, rate-limit with retry-after, streaming path retries, and disabled-retry behavior.
Also applies to: 190-273
tests/unit/providers/drivers/conftest.py (2)
21-34: Resilience defaults are wired cleanly for driver tests.The
retry/rate_limiterfixture parameters and defaults keep tests deterministic while still allowing explicit override paths.Also applies to: 57-59
38-46: Model fixture IDs are now consistently decoupled from vendor names.Using
test-model-*in both provider config and mock responses keeps resolution tests stable and provider-agnostic.Also applies to: 79-79
tests/unit/providers/drivers/test_litellm_driver.py (1)
98-99: Model-resolution assertions are consistently updated across all relevant tests.Alias resolution, direct ID resolution, and capabilities checks all align with the new
test-model-*fixture contract.Also applies to: 130-131, 140-145, 708-709, 726-727
tests/unit/providers/test_protocol.py (1)
376-407: The dedicated rate-limit test provider improves resilience test clarity.
_RateLimitProvidercleanly isolates retry-after behavior and keeps these tests focused on base-provider resilience flow.
| @model_validator(mode="after") | ||
| def _validate_delay_ordering(self) -> Self: | ||
| """Ensure base_delay does not exceed max_delay.""" | ||
| if self.base_delay > self.max_delay: | ||
| msg = ( | ||
| f"base_delay ({self.base_delay}) must be" | ||
| f" <= max_delay ({self.max_delay})" | ||
| ) | ||
| raise ValueError(msg) | ||
| return self |
There was a problem hiding this comment.
Log RetryConfig validation failures before raising.
_validate_delay_ordering raises ValueError directly without emitting a warning/error log. Add structured context before the raise.
🔧 Suggested patch
`@model_validator`(mode="after")
def _validate_delay_ordering(self) -> Self:
"""Ensure base_delay does not exceed max_delay."""
if self.base_delay > self.max_delay:
msg = (
f"base_delay ({self.base_delay}) must be"
f" <= max_delay ({self.max_delay})"
)
+ logger.warning(
+ CONFIG_VALIDATION_FAILED,
+ model="RetryConfig",
+ error=msg,
+ base_delay=self.base_delay,
+ max_delay=self.max_delay,
+ )
raise ValueError(msg)
return self🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/config/schema.py` around lines 65 - 74, The
_validate_delay_ordering model validator in RetryConfig currently raises
ValueError without logging; update the method to log the failure at WARNING or
ERROR level with structured context (include base_delay and max_delay and any
identifying fields on self) immediately before raising ValueError — call the
module/class logger (e.g., logger.error or logger.warning) with a clear message
like "RetryConfig validation failed: base_delay X > max_delay Y" and include
self or relevant attributes for debugging, then raise the same ValueError as
before.
| if not math.isfinite(seconds) or seconds < 0: | ||
| msg = f"pause seconds must be a finite non-negative number, got {seconds!r}" | ||
| raise ValueError(msg) |
There was a problem hiding this comment.
Log invalid pause() input before raising.
This validation path raises ValueError without warning/error logging. Add contextual logging first.
🔧 Suggested patch
if not math.isfinite(seconds) or seconds < 0:
msg = f"pause seconds must be a finite non-negative number, got {seconds!r}"
+ logger.warning(
+ PROVIDER_RATE_LIMITER_THROTTLED,
+ provider=self._provider_name,
+ reason="invalid_pause_seconds",
+ seconds=seconds,
+ )
raise ValueError(msg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/providers/resilience/rate_limiter.py` around lines 104 - 106,
In the pause() validation branch inside the RateLimiter (the if block that
checks "if not math.isfinite(seconds) or seconds < 0"), add a log call at
WARNING or ERROR level before raising the ValueError; include the invalid
seconds value and relevant context (e.g., method name RateLimiter.pause and any
identifying instance info such as self.name or self._id if available) so the
message clearly shows what input failed, then raise the same ValueError as
before.
| if isinstance(exc, RateLimitError) and exc.retry_after is not None: | ||
| return min(exc.retry_after, self._config.max_delay) | ||
|
|
There was a problem hiding this comment.
Do not clamp retry_after below the provider hint.
Using min(retry_after, max_delay) can retry earlier than requested by the provider, which can cause repeated 429s and slower recovery.
🔧 Suggested patch
if isinstance(exc, RateLimitError) and exc.retry_after is not None:
- return min(exc.retry_after, self._config.max_delay)
+ # Provider Retry-After is authoritative; do not under-wait.
+ return exc.retry_after📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if isinstance(exc, RateLimitError) and exc.retry_after is not None: | |
| return min(exc.retry_after, self._config.max_delay) | |
| if isinstance(exc, RateLimitError) and exc.retry_after is not None: | |
| # Provider Retry-After is authoritative; do not under-wait. | |
| return exc.retry_after |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/providers/resilience/retry.py` around lines 118 - 120, The
current code clamps provider-specified retry_after down with
min(exc.retry_after, self._config.max_delay) which can make us retry earlier
than the provider requested; change that to return max(exc.retry_after,
self._config.max_delay) so we never choose a value smaller than the provider
hint (keep the RateLimitError check and reference self._config.max_delay when
updating the return expression in the retry logic in retry.py).
|
|
||
| pytestmark = pytest.mark.timeout(30) | ||
|
|
||
| _PATCH_ACOMPLETION = "litellm.acompletion" |
There was a problem hiding this comment.
Patch target may not intercept driver calls.
The patch target "litellm.acompletion" patches the original module, but the driver imports it as import litellm as _litellm. Other integration tests use "ai_company.providers.drivers.litellm_driver._litellm.acompletion" which patches the name as looked up by the driver.
🔧 Suggested fix
-_PATCH_ACOMPLETION = "litellm.acompletion"
+_PATCH_ACOMPLETION = "ai_company.providers.drivers.litellm_driver._litellm.acompletion"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _PATCH_ACOMPLETION = "litellm.acompletion" | |
| _PATCH_ACOMPLETION = "ai_company.providers.drivers.litellm_driver._litellm.acompletion" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/providers/test_retry_integration.py` at line 32, Update the
patch target string constant _PATCH_ACOMPLETION to the driver-qualified name
used by the code under test (e.g.
"ai_company.providers.drivers.litellm_driver._litellm.acompletion") so the test
patches the symbol as imported by the driver (replace the current value
"litellm.acompletion" with the fully qualified import path the driver uses).
| async def _empty_stream() -> None: # type: ignore[misc] | ||
| # Async generator that yields nothing (makes it an async iterable) | ||
| return | ||
| yield # pragma: no cover |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Async generator pattern is correct but could be clearer.
The return followed by yield is a valid pattern to create an empty async generator. The yield is unreachable but necessary for Python to recognize this as an async generator. Consider using an alternative pattern for clarity.
♻️ Alternative pattern
- async def _empty_stream() -> None: # type: ignore[misc]
- # Async generator that yields nothing (makes it an async iterable)
- return
- yield # pragma: no cover
+ async def _empty_stream():
+ # Async generator that yields nothing
+ if False:
+ yield # pragma: no cover📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def _empty_stream() -> None: # type: ignore[misc] | |
| # Async generator that yields nothing (makes it an async iterable) | |
| return | |
| yield # pragma: no cover | |
| async def _empty_stream(): | |
| # Async generator that yields nothing | |
| if False: | |
| yield # pragma: no cover |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integration/providers/test_retry_integration.py` around lines 207 -
210, The empty async generator _empty_stream currently uses an unreachable
"return" then "yield" pattern which is valid but unclear; replace the body with
a clearer pattern such as "if False: yield # pragma: no cover" (keeping the
async def _empty_stream() signature and type ignore) so Python still recognizes
it as an async generator while removing the misleading return statement.
| async def test_zero_retries_non_retryable_raises_unwrapped(self) -> None: | ||
| handler = RetryHandler(_fast_config(max_retries=0)) | ||
| from ai_company.providers.errors import AuthenticationError | ||
|
|
||
| error = AuthenticationError("bad key") | ||
| func = AsyncMock(side_effect=error) | ||
| with pytest.raises(AuthenticationError): | ||
| await handler.execute(func) | ||
| func.assert_awaited_once() |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Remove redundant import.
AuthenticationError is already imported at line 14. The local import at line 139 is unnecessary.
♻️ Suggested fix
async def test_zero_retries_non_retryable_raises_unwrapped(self) -> None:
handler = RetryHandler(_fast_config(max_retries=0))
- from ai_company.providers.errors import AuthenticationError
-
error = AuthenticationError("bad key")
func = AsyncMock(side_effect=error)
with pytest.raises(AuthenticationError):📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_zero_retries_non_retryable_raises_unwrapped(self) -> None: | |
| handler = RetryHandler(_fast_config(max_retries=0)) | |
| from ai_company.providers.errors import AuthenticationError | |
| error = AuthenticationError("bad key") | |
| func = AsyncMock(side_effect=error) | |
| with pytest.raises(AuthenticationError): | |
| await handler.execute(func) | |
| func.assert_awaited_once() | |
| async def test_zero_retries_non_retryable_raises_unwrapped(self) -> None: | |
| handler = RetryHandler(_fast_config(max_retries=0)) | |
| error = AuthenticationError("bad key") | |
| func = AsyncMock(side_effect=error) | |
| with pytest.raises(AuthenticationError): | |
| await handler.execute(func) | |
| func.assert_awaited_once() |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/providers/resilience/test_retry.py` around lines 137 - 145, The
test contains a redundant local import of AuthenticationError inside
test_zero_retries_non_retryable_raises_unwrapped; remove that inner "from
ai_company.providers.errors import AuthenticationError" so the test uses the
AuthenticationError already imported at the top of the module, leaving the rest
of the test (handler = RetryHandler(...), AsyncMock side_effect, pytest.raises
assertion, and func.assert_awaited_once()) unchanged.
| provider = _ConcreteProvider() | ||
| provider._rate_limiter = mock_limiter | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Inspect BaseCompletionProvider to see if it supports rate_limiter parameter
find . -type f -name "*.py" | xargs grep -l "class BaseCompletionProvider" | head -5Repository: Aureliolo/ai-company
Length of output: 98
🏁 Script executed:
# Check the test file at the specific lines mentioned
sed -n '415,440p' tests/unit/providers/test_protocol.pyRepository: Aureliolo/ai-company
Length of output: 1061
🏁 Script executed:
# Get the full BaseCompletionProvider class definition
rg -A 20 "class BaseCompletionProvider" --type pyRepository: Aureliolo/ai-company
Length of output: 1529
🏁 Script executed:
# Get the complete __init__ signature of BaseCompletionProvider
sed -n '1,100p' src/ai_company/providers/base.py | grep -A 15 "def __init__"Repository: Aureliolo/ai-company
Length of output: 487
🏁 Script executed:
# Find _ConcreteProvider and _RateLimitProvider class definitions in test file
rg -B 3 -A 8 "class _ConcreteProvider|class _RateLimitProvider" tests/unit/providers/test_protocol.pyRepository: Aureliolo/ai-company
Length of output: 724
Inject the rate limiter through constructor instead of mutating private state.
Direct assignment to provider._rate_limiter violates the immutability principle. The parent BaseCompletionProvider already accepts rate_limiter as a keyword-only parameter, so both _ConcreteProvider and _RateLimitProvider should accept and forward it:
♻️ Proposed refactor
class _ConcreteProvider(BaseCompletionProvider):
"""Concrete subclass of BaseCompletionProvider for testing."""
- def __init__(self) -> None:
- super().__init__()
+ def __init__(self, *, rate_limiter: RateLimiter | None = None) -> None:
+ super().__init__(rate_limiter=rate_limiter)
self._caps = ModelCapabilitiesFactory.build()
class _RateLimitProvider(BaseCompletionProvider):
"""Provider that always raises RateLimitError with retry_after."""
- def __init__(self, retry_after: float | None = None) -> None:
- super().__init__()
+ def __init__(
+ self,
+ retry_after: float | None = None,
+ *,
+ rate_limiter: RateLimiter | None = None,
+ ) -> None:
+ super().__init__(rate_limiter=rate_limiter)
self._retry_after = retry_after
- provider = _ConcreteProvider()
- provider._rate_limiter = mock_limiter
+ provider = _ConcreteProvider(rate_limiter=mock_limiter)
- provider = _RateLimitProvider(retry_after=5.0)
- provider._rate_limiter = mock_limiter
+ provider = _RateLimitProvider(retry_after=5.0, rate_limiter=mock_limiter)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| provider = _ConcreteProvider() | |
| provider._rate_limiter = mock_limiter | |
| provider = _ConcreteProvider(rate_limiter=mock_limiter) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/providers/test_protocol.py` around lines 419 - 421, The test
mutates provider._rate_limiter directly; instead modify the constructors so
_ConcreteProvider and _RateLimitProvider accept a rate_limiter kw-only arg and
forward it to BaseCompletionProvider's rate_limiter parameter; update their
__init__ signatures to accept rate_limiter: Optional[...] and call
super().__init__(..., rate_limiter=rate_limiter) (or pass it along from
_RateLimitProvider to its parent) and change test to construct
_ConcreteProvider(rate_limiter=mock_limiter) rather than assigning to the
private attribute.
| async def test_rate_limit_error_with_retry_after_triggers_pause(self) -> None: | ||
| """RateLimitError with retry_after triggers rate_limiter.pause.""" | ||
| mock_limiter = MagicMock(spec=RateLimiter) | ||
| mock_limiter.acquire = AsyncMock() | ||
| mock_limiter.is_enabled = True | ||
|
|
||
| provider = _RateLimitProvider(retry_after=5.0) | ||
| provider._rate_limiter = mock_limiter | ||
|
|
||
| msg = ChatMessage(role=MessageRole.USER, content="Hi") | ||
| with pytest.raises(RateLimitError): | ||
| await provider.complete([msg], "test-model") | ||
|
|
||
| mock_limiter.pause.assert_called_once_with(5.0) | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Assert limiter release on the exception path, not only pause behavior.
This test should also check release() (and optionally acquire()) to guard against semaphore leaks when rate-limited calls fail.
✅ Test hardening patch
with pytest.raises(RateLimitError):
await provider.complete([msg], "test-model")
+ mock_limiter.acquire.assert_awaited_once()
mock_limiter.pause.assert_called_once_with(5.0)
+ mock_limiter.release.assert_called_once()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/providers/test_protocol.py` around lines 428 - 442, The test
test_rate_limit_error_with_retry_after_triggers_pause should also assert that
the rate limiter's semaphore is released on the exception path: after invoking
await provider.complete([...]) and catching RateLimitError, add assertions that
mock_limiter.release (and optionally mock_limiter.acquire) were called to ensure
no semaphore leak; update the assertions to call
mock_limiter.release.assert_called_once_with() (and/or check acquire was
awaited) on the _RateLimitProvider instance's mocked _rate_limiter.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 29 out of 30 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/ai_company/providers/drivers/litellm_driver.py:86
- The
TYPE_CHECKINGimports here (AsyncGenerator,AsyncIterator,ProviderConfig,ProviderModelConfig) are referenced in non-quoted runtime annotations elsewhere in this module (e.g., method return types and__init__(..., config: ProviderConfig)). On Python >=3.14 (PEP 649), annotations can be evaluated at runtime, so these names need to be available at runtime to avoidNameError. Please import these types unconditionally (with# noqa: TC00xas needed) or switch the annotations to strings.
if TYPE_CHECKING:
from collections.abc import AsyncGenerator, AsyncIterator
from ai_company.config.schema import ProviderConfig, ProviderModelConfig
from ai_company.providers.models import (
ChatMessage,
CompletionConfig,
ToolDefinition,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def __init__(self, config: RetryConfig) -> None: | ||
| self._config = config | ||
|
|
||
| async def execute( | ||
| self, | ||
| func: Callable[[], Coroutine[object, object, T]], |
There was a problem hiding this comment.
Callable/Coroutine and RetryConfig are imported under TYPE_CHECKING but used in runtime type annotations (__init__, execute). With Python >=3.14 (PEP 649) these names must be importable at runtime when annotations are evaluated/inspected, otherwise you'll get NameError. Import these types unconditionally (e.g., from collections.abc import Callable, Coroutine and from .config import RetryConfig with the appropriate # noqa: TC00x), or alternatively string-quote the annotations.
| def __init__(self, config: RetryConfig) -> None: | |
| self._config = config | |
| async def execute( | |
| self, | |
| func: Callable[[], Coroutine[object, object, T]], | |
| def __init__(self, config: "RetryConfig") -> None: | |
| self._config = config | |
| async def execute( | |
| self, | |
| func: "Callable[[], Coroutine[object, object, T]]", |
Greptile SummaryThis PR introduces a production-grade resilience layer ( Key changes:
Notable findings:
Confidence Score: 4/5
Sequence DiagramsequenceDiagram
participant Caller
participant Base as BaseCompletionProvider
participant Retry as RetryHandler
participant RL as RateLimiter
participant Driver as _do_complete / _do_stream
Caller->>Base: complete(messages, model)
Base->>Base: _validate_messages / _validate_model
Base->>Retry: execute(_attempt)
loop up to 1 + max_retries attempts
Retry->>Base: call _attempt()
Base->>RL: acquire() [pause check → RPM window → semaphore]
RL-->>Base: slot granted
Base->>Driver: _do_complete(...)
alt success
Driver-->>Base: CompletionResponse
Base->>RL: release() [via finally]
Base-->>Retry: result
Retry-->>Base: result
Base-->>Caller: CompletionResponse
else RateLimitError (retry_after)
Driver-->>Base: RateLimitError
Base->>RL: pause(retry_after)
Base->>RL: release() [via finally]
Base-->>Retry: RateLimitError
Retry->>Retry: sleep(min(retry_after, max_delay))
else transient ProviderError
Driver-->>Base: ProviderError(is_retryable=True)
Base->>RL: release() [via finally]
Base-->>Retry: ProviderError
Retry->>Retry: sleep(exponential backoff + jitter)
else non-retryable ProviderError
Driver-->>Base: ProviderError(is_retryable=False)
Base->>RL: release() [via finally]
Base-->>Retry: ProviderError
Retry-->>Base: re-raise immediately
Base-->>Caller: ProviderError
end
end
Retry-->>Base: RetryExhaustedError
Base-->>Caller: RetryExhaustedError
|
| tool_id=self.id, | ||
| args_length=len(self.arguments) if self.arguments else 0, | ||
| ) | ||
| parsed = {} | ||
| return None |
There was a problem hiding this comment.
behavior change: tool calls with unparseable JSON now return None instead of empty dict {}
this change (from parsed = {} to return None) means tool calls with invalid JSON are now silently dropped rather than passed through with empty arguments. this is safer but represents a behavior change that wasn't mentioned in the PR description.
consider: if this is intentional, document it in the PR description or commit message. if not, revert to parsed = {}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/providers/drivers/litellm_driver.py
Line: 721-724
Comment:
behavior change: tool calls with unparseable JSON now return `None` instead of empty dict `{}`
this change (from `parsed = {}` to `return None`) means tool calls with invalid JSON are now silently dropped rather than passed through with empty arguments. this is safer but represents a behavior change that wasn't mentioned in the PR description.
consider: if this is intentional, document it in the PR description or commit message. if not, revert to `parsed = {}`
How can I resolve this? If you propose a fix, please make it concise.- Non-ProviderError exceptions raise immediately without retry (3 tests) - Rate limiter semaphore release on non-RateLimitError - Streaming rate limiter slot held until consumed / released on close - RetryConfig _validate_delay_ordering rejects base_delay > max_delay - RetryConfig/RateLimiterConfig allow_inf_nan=False enforced (3 tests) - Integration test for concurrent retry + rate-limit together Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Polyfactory generates random values that can violate the base_delay <= max_delay cross-field constraint added in the previous commit. Pin both configs to their defaults so the factory test is deterministic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 30 out of 31 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/ai_company/providers/drivers/litellm_driver.py:86
ProviderConfig/ProviderModelConfig,ChatMessage/CompletionConfig/ToolDefinition, andAsyncIterator/AsyncGeneratorare imported only underTYPE_CHECKINGbut are used in runtime annotations throughout this module (e.g.__init__,_do_stream,_wrap_stream). This is inconsistent with other provider modules that keep annotation types importable at runtime via# noqa: TC00x(seesrc/ai_company/providers/routing/resolver.py:11). Consider moving these imports out ofTYPE_CHECKINGand adding the appropriate# noqa: TC00xso annotation evaluation can’t fail at runtime.
if TYPE_CHECKING:
from collections.abc import AsyncGenerator, AsyncIterator
from ai_company.config.schema import ProviderConfig, ProviderModelConfig
from ai_company.providers.models import (
ChatMessage,
CompletionConfig,
ToolDefinition,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from typing import TYPE_CHECKING, TypeVar | ||
|
|
||
| from ai_company.observability import get_logger | ||
| from ai_company.observability.events import ( | ||
| PROVIDER_CALL_ERROR, | ||
| PROVIDER_RETRY_ATTEMPT, | ||
| PROVIDER_RETRY_EXHAUSTED, | ||
| PROVIDER_RETRY_SKIPPED, | ||
| ) | ||
| from ai_company.providers.errors import ProviderError, RateLimitError | ||
|
|
||
| from .errors import RetryExhaustedError | ||
|
|
||
| if TYPE_CHECKING: | ||
| from collections.abc import Callable, Coroutine | ||
|
|
||
| from .config import RetryConfig | ||
|
|
There was a problem hiding this comment.
Callable, Coroutine, and RetryConfig are imported only under TYPE_CHECKING, but they’re referenced in runtime annotations on RetryHandler.__init__/execute. In this repo, annotation types are typically imported at runtime with # noqa: TC00x (e.g., src/ai_company/providers/protocol.py:7, src/ai_company/providers/routing/resolver.py:11) to avoid NameError if annotations are evaluated (PEP 649). Import these names at runtime (and suppress TC00x) instead of guarding them with TYPE_CHECKING.
| try: | ||
| result = await self._resilient_execute(_attempt) | ||
| except Exception: | ||
| logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True) | ||
| raise |
There was a problem hiding this comment.
Double-logging for non-retryable errors when retry is configured
When max_retries > 0 (i.e., a RetryHandler is wired in) and a non-retryable ProviderError is raised, two separate log entries are emitted for the same error:
RetryHandler.execute()logsPROVIDER_RETRY_SKIPPEDatWARNINGlevel (reason="non_retryable").- The
except Exception:block here logsPROVIDER_CALL_ERRORatERRORlevel.
This creates noisy, duplicate log entries for every non-retryable failure (e.g. AuthError, InvalidRequestError). Callers reading logs see two separate events for what is a single failure.
Consider suppressing the PROVIDER_CALL_ERROR log here when the exception is a ProviderError with is_retryable=False — those are already fully described by PROVIDER_RETRY_SKIPPED. Alternatively, only log PROVIDER_CALL_ERROR for RetryExhaustedError and unexpected non-provider exceptions, and leave non-retryable ProviderErrors to RetryHandler's logging.
try:
result = await self._resilient_execute(_attempt)
except RetryExhaustedError:
logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
raise
except ProviderError as exc:
if exc.is_retryable or self._retry_handler is None:
logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
raise
except Exception:
logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
raisePrompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/providers/base.py
Line: 113-117
Comment:
**Double-logging for non-retryable errors when retry is configured**
When `max_retries > 0` (i.e., a `RetryHandler` is wired in) and a **non-retryable** `ProviderError` is raised, two separate log entries are emitted for the same error:
1. `RetryHandler.execute()` logs `PROVIDER_RETRY_SKIPPED` at `WARNING` level (`reason="non_retryable"`).
2. The `except Exception:` block here logs `PROVIDER_CALL_ERROR` at `ERROR` level.
This creates noisy, duplicate log entries for every non-retryable failure (e.g. `AuthError`, `InvalidRequestError`). Callers reading logs see two separate events for what is a single failure.
Consider suppressing the `PROVIDER_CALL_ERROR` log here when the exception is a `ProviderError` with `is_retryable=False` — those are already fully described by `PROVIDER_RETRY_SKIPPED`. Alternatively, only log `PROVIDER_CALL_ERROR` for `RetryExhaustedError` and unexpected non-provider exceptions, and leave non-retryable `ProviderError`s to `RetryHandler`'s logging.
```python
try:
result = await self._resilient_execute(_attempt)
except RetryExhaustedError:
logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
raise
except ProviderError as exc:
if exc.is_retryable or self._retry_handler is None:
logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
raise
except Exception:
logger.error(PROVIDER_CALL_ERROR, model=model, exc_info=True)
raise
```
How can I resolve this? If you propose a fix, please make it concise.| if not self.is_enabled and self._pause_until <= time.monotonic(): | ||
| return |
There was a problem hiding this comment.
Early-return guard condition is unreachable in practice
if not self.is_enabled and self._pause_until <= time.monotonic():
returnis_enabled is determined by the frozen RateLimiterConfig. In LiteLLMDriver.__init__(), the limiter is only passed to super().__init__() when rate_limiter.is_enabled is True:
rate_limiter=rate_limiter if rate_limiter.is_enabled else None,So when self._rate_limiter is not None inside _rate_limited_call, is_enabled is always True, making not self.is_enabled always False. The early-return guard can therefore never trigger from normal provider usage.
Additionally, since a disabled limiter is never stored on the provider, pause() is never called on it, so _pause_until will always be 0.0. The compound condition is doubly unreachable.
If the guard is intended for direct consumers of RateLimiter, add a comment explaining that — or simplify acquire() to just skip the pause/RPM/semaphore steps individually when each sub-feature is disabled (which the code already does below the guard), removing the need for the compound early return.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/providers/resilience/rate_limiter.py
Line: 60-61
Comment:
**Early-return guard condition is unreachable in practice**
```python
if not self.is_enabled and self._pause_until <= time.monotonic():
return
```
`is_enabled` is determined by the frozen `RateLimiterConfig`. In `LiteLLMDriver.__init__()`, the limiter is only passed to `super().__init__()` when `rate_limiter.is_enabled` is `True`:
```python
rate_limiter=rate_limiter if rate_limiter.is_enabled else None,
```
So when `self._rate_limiter` is not `None` inside `_rate_limited_call`, `is_enabled` is always `True`, making `not self.is_enabled` always `False`. The early-return guard can therefore never trigger from normal provider usage.
Additionally, since a disabled limiter is never stored on the provider, `pause()` is never called on it, so `_pause_until` will always be `0.0`. The compound condition is doubly unreachable.
If the guard is intended for direct consumers of `RateLimiter`, add a comment explaining that — or simplify `acquire()` to just skip the pause/RPM/semaphore steps individually when each sub-feature is disabled (which the code already does below the guard), removing the need for the compound early return.
How can I resolve this? If you propose a fix, please make it concise.🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
RetryHandlerwith exponential backoff, jitter, andretry-afterheader respect for transient provider errorsRateLimiterwith sliding-window RPM tracking and asyncio semaphore concurrency controlRetryExhaustedErroras a signal for M3 fallback chain orchestrationBaseCompletionProvider— all drivers inherit resilience automaticallyRetryConfigandRateLimiterConfigfields toProviderConfigArchitecture
_rate_limited_call → _do_complete/_do_streamcycleretry-afteris_retryable=False) raise immediately without retryNew Files
src/ai_company/providers/resilience/__init__.pysrc/ai_company/providers/resilience/config.pyconfig/schema.pysrc/ai_company/providers/resilience/errors.pyRetryExhaustedErrorsrc/ai_company/providers/resilience/retry.pyRetryHandler(backoff + jitter)src/ai_company/providers/resilience/rate_limiter.pyRateLimiter(RPM + concurrency)tests/unit/providers/resilience/test_*.pytests/integration/providers/test_retry_integration.pyTest plan
Closes #9