refactor: harden BudgetEnforcer with error handling and review fixes by Aureliolo · Pull Request #182 · Aureliolo/synthorg

Aureliolo · 2026-03-09T08:12:19Z

Summary

Dedicated budget error handling in AgentEngine: Added specific except (BudgetExhaustedError, DailyLimitExceededError) clause before the broad except Exception, returning TerminationReason.BUDGET_EXHAUSTED instead of misclassifying as TerminationReason.ERROR
Extracted validation to engine/_validation.py: Moved validate_run_inputs, validate_agent, validate_task out of agent_engine.py to keep it under the 800-line limit (876 → 792 lines)
Graceful degradation for CostTracker failures: Added try/except in resolve_model and make_budget_checker so CostTracker errors don't crash task execution — falls back to identity unchanged / no budget checker
Fixed exception chain inversion: Changed raise exc from build_exc to raise exc from None so the original error propagates cleanly
Input validation hardening: Added reset_day boundary check in billing.py, allow_inf_nan=False on BudgetConfig
Code simplification: Consolidated _apply_downgrade/_resolve_and_apply into single function, reordered _AlertThresholds fields to natural ascending order, standardized imports
5 new tests: Edge cases for exact-threshold downgrade, unresolvable aliases, task-limit-only checker, daily limit pre-flight error
Docs updated: DESIGN_SPEC.md §10.3/§10.4/§15.3, CLAUDE.md budget/ description, README.md feature status

Closes #44

Test plan

All 4176 tests pass (0 failures)
96.24% coverage (above 80% threshold)
Ruff lint: all checks passed
Ruff format: all files formatted
mypy strict: no issues in 494 source files
Pre-commit hooks: all passed

Review coverage

Pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency). 30 findings consolidated, all 30 implemented.

coderabbitai · 2026-03-09T08:12:36Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Summary by CodeRabbit

Release Notes

New Features
- Implemented comprehensive budget enforcement system with pre-flight checks, in-flight monitoring, and automatic model downgrade when approaching spending limits
- Added per-agent daily spending limits and monthly hard stops for cost control
- Enhanced error handling for budget exhaustion scenarios with granular error types
Documentation
- Updated design specification and README to reflect the new budget enforcement architecture and capabilities

Walkthrough

Introduces a BudgetEnforcer service with pre‑flight, in‑flight, and task‑boundary auto‑downgrade enforcement; adds billing/daily period utilities and config fields (reset_day, auto‑downgrade boundary); integrates BudgetEnforcer into AgentEngine (pre‑flight checks, model resolution, budget-aware checker); adds new errors, observability events, and comprehensive unit tests.

Changes

Cohort / File(s)	Summary
Docs & top-level `CLAUDE.md`, `DESIGN_SPEC.md`, `README.md`	Document and spec updates describing BudgetEnforcer, new budget modules, and updated architecture/narrative (moved surface from `limits.py` to new budget modules).
Budget package public API `src/ai_company/budget/__init__.py`	Re-exports added: `BudgetEnforcer`, `billing_period_start`, `daily_period_start` exposed in package all.
Billing utilities & tests `src/ai_company/budget/billing.py`, `tests/unit/budget/test_billing.py`	New UTC-aware `billing_period_start(reset_day, now=None)` and `daily_period_start(now=None)` utilities with unit tests covering boundaries, rollovers, and tzinfo.
Budget config & tests `src/ai_company/budget/config.py`, `tests/unit/budget/test_config.py`	Added `BudgetConfig.reset_day` (int 1–28) and `AutoDowngradeConfig.boundary = "task_assignment"`; tightened model_config; tests for validation and defaults.
Budget enforcement impl & tests `src/ai_company/budget/enforcer.py`, `tests/unit/budget/test_enforcer.py`	New `BudgetEnforcer` class composing `CostTracker` + `BudgetConfig`: pre‑flight `check_can_execute`, `make_budget_checker` (closure enforcing task/monthly/daily limits and alerts), `resolve_model` (task‑boundary downgrade) and helpers; extensive unit tests.
Engine integration, validation & tests `src/ai_company/engine/agent_engine.py`, `src/ai_company/engine/_validation.py`, `tests/unit/engine/test_agent_engine_budget.py`, `tests/unit/engine/test_agent_engine_errors.py`	AgentEngine gains optional `budget_enforcer` ctor param and enforcer integration: pre‑flight checks, model resolution, budget-aware checker, and new `_handle_budget_error`; centralized validation helpers added.
Errors & exports `src/ai_company/engine/errors.py`, `src/ai_company/engine/__init__.py`	Added `DailyLimitExceededError` (subclass of `BudgetExhaustedError`) and exported it from engine package; updated BudgetExhaustedError docstring.
Observability events `src/ai_company/observability/events/budget.py`, `src/ai_company/observability/events/execution.py`	Added budget-related event constants (thresholds, hard stop, downgrades, baseline/resolve errors, task/daily hits) and `EXECUTION_ENGINE_BUDGET_STOPPED` execution event.

Sequence Diagram(s)

sequenceDiagram
    participant Agent as Agent/User
    participant Engine as AgentEngine
    participant Enforcer as BudgetEnforcer
    participant Tracker as CostTracker
    participant Resolver as ModelResolver
    participant Provider as LLM Provider

    Agent->>Engine: run(agent_id, task_id, ...)
    Engine->>Enforcer: check_can_execute(agent_id)
    Enforcer->>Tracker: query_spending(agent_id, period)
    Tracker-->>Enforcer: monthly_spent, daily_spent
    alt Pre‑flight exhausted
        Enforcer-->>Engine: raise BudgetExhaustedError
        Engine-->>Agent: return AgentRunResult(termination_reason=BUDGET_EXHAUSTED)
    else Pre‑flight ok
        Engine->>Enforcer: resolve_model(identity)
        Enforcer->>Resolver: resolve(identity, thresholds)
        Resolver-->>Enforcer: possibly_downgraded_identity
        Enforcer-->>Engine: identity (maybe downgraded)
        Engine->>Enforcer: make_budget_checker(task, agent_id)
        Enforcer-->>Engine: checker_fn
        loop Per LLM call
            Engine->>checker_fn: check(cost_estimate, usage)
            alt Checker signals stop
                Engine-->>Agent: return AgentRunResult(termination_reason=BUDGET_EXHAUSTED)
            else OK
                Engine->>Provider: generate(model, prompt...)
                Provider-->>Engine: completion + cost
            end
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

feat: implement AgentEngine core orchestrator (#11) #143: Directly related — prior AgentEngine changes that this PR extends with budget_enforcer integration and error mapping.
refactor: pre-PR review improvements for ExecutionLoop + ReAct loop (#124) #141: Related — intersects engine execution/termination and BudgetChecker/TerminationReason surfaces used here.
feat: implement model routing engine #99: Related — ModelResolver/model-routing work that BudgetEnforcer.resolve_model relies on.

Suggested reviewers

Copilot

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main changes: hardening BudgetEnforcer with improved error handling and addressing review findings.
Description check	✅ Passed	The description is directly related to the changeset, detailing the budget enforcement implementation, error handling improvements, code refactoring, and validation hardening.
Linked Issues check	✅ Passed	The PR implements all acceptance criteria from issue `#44`: budget controls (monthly/task/daily limits), enforcement (checks before LLM calls, alerts, hard stops), task-boundary auto-downgrade, tracking/reset, and comprehensive testing with >80% coverage.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to budget enforcement implementation per issue `#44`: BudgetEnforcer service, validation extraction, error handling, billing utilities, configuration updates, and test coverage.
Docstring Coverage	✅ Passed	Docstring coverage is 89.11% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/budget-enforcement

✨ Simplify code

Create PR with simplified code
Commit simplified code in branch feat/budget-enforcement

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-09T08:12:49Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and functionality of the budget enforcement system. It introduces a dedicated BudgetEnforcer service to manage spending limits, model downgrades, and billing periods, ensuring that agent operations adhere to defined financial constraints. The changes also improve error handling for budget-related issues and refactor core validation logic, contributing to a more stable and maintainable codebase.

Highlights

Dedicated Budget Error Handling: Implemented specific exception handling for BudgetExhaustedError and DailyLimitExceededError in AgentEngine, ensuring correct termination reasons instead of generic errors.
Validation Logic Extraction: Moved validate_run_inputs, validate_agent, and validate_task functions from agent_engine.py to a new engine/_validation.py module, reducing agent_engine.py's line count from 876 to 792.
Graceful Degradation for CostTracker Failures: Added try/except blocks in resolve_model and make_budget_checker within BudgetEnforcer to prevent task crashes due to CostTracker errors, falling back to no budget checking or unchanged identity.
Exception Chain Inversion Fix: Corrected exception chaining from raise exc from build_exc to raise exc from None to ensure the original error propagates cleanly without masking.
Input Validation Hardening: Introduced a reset_day boundary check in billing.py and set allow_inf_nan=False on BudgetConfig for stricter input validation.
New Budget Enforcement Service: Introduced BudgetEnforcer to centralize pre-flight checks, in-flight budget monitoring, and automatic model downgrades.
New Unit Tests: Added 5 new tests covering edge cases for exact-threshold downgrade, unresolvable aliases, task-limit-only checkers, and daily limit pre-flight errors.
Documentation Updates: Updated DESIGN_SPEC.md, CLAUDE.md, and README.md to reflect the new budget enforcement features and architecture.

Changelog

CLAUDE.md
- Updated the description for the budget/ directory to reflect expanded responsibilities including enforcement and billing periods.
DESIGN_SPEC.md
- Updated the description of CFO Agent responsibilities to clarify that cost controls are enforced by BudgetEnforcer.
- Added reset_day to the budget configuration example.
- Included an implementation note detailing the three enforcement layers of BudgetEnforcer.
- Updated the file structure to reflect the new billing.py and enforcer.py modules.
README.md
- Updated the feature status for Budget Enforcement, marking it as implemented with BudgetEnforcer.
src/ai_company/budget/init.py
- Imported billing_period_start, daily_period_start, and BudgetEnforcer.
- Added BudgetEnforcer, billing_period_start, and daily_period_start to the module's __all__ export list.
src/ai_company/budget/billing.py
- Added a new module containing billing_period_start and daily_period_start functions for calculating billing period boundaries.
src/ai_company/budget/config.py
- Imported Literal from typing.
- Added a boundary field to AutoDowngradeConfig with a default of 'task_assignment'.
- Set allow_inf_nan=False in BudgetConfig's model_config for stricter float validation.
- Added a reset_day field to BudgetConfig with validation for a range of 1-28.
src/ai_company/budget/enforcer.py
- Added a new module implementing the BudgetEnforcer service, which provides pre-flight checks, in-flight budget checking, and auto-downgrade functionality.
src/ai_company/engine/init.py
- Imported DailyLimitExceededError.
- Added DailyLimitExceededError to the module's __all__ export list.
src/ai_company/engine/_validation.py
- Added a new module containing validate_run_inputs, validate_agent, and validate_task functions, extracted from agent_engine.py.
src/ai_company/engine/agent_engine.py
- Imported validation functions from the new _validation.py module.
- Imported BudgetExhaustedError and DailyLimitExceededError.
- Removed internal validation methods (_validate_run_inputs, _validate_agent, _validate_task).
- Added budget_enforcer as an optional dependency in the constructor and used its cost_tracker if no explicit cost_tracker is provided.
- Integrated BudgetEnforcer for pre-flight budget checks and model resolution before task execution.
- Added specific exception handling for BudgetExhaustedError and DailyLimitExceededError, returning TerminationReason.BUDGET_EXHAUSTED.
- Modified _execute to use BudgetEnforcer's make_budget_checker if available, otherwise falling back to the default.
- Changed exception chaining from raise exc from build_exc to raise exc from None in _handle_fatal_error.
src/ai_company/engine/errors.py
- Defined DailyLimitExceededError as a subclass of BudgetExhaustedError.
- Updated the docstring for BudgetExhaustedError to clarify its usage contexts.
src/ai_company/observability/events/budget.py
- Added new constants for budget-related events, including alerts, hard stops, daily limits, downgrades, and enforcement checks.
tests/unit/budget/test_billing.py
- Added a new test file with unit tests for billing_period_start and daily_period_start functions.
tests/unit/budget/test_config.py
- Added tests for the boundary field in AutoDowngradeConfig, including default value and rejection of invalid values.
- Added tests for the reset_day field in BudgetConfig, covering valid range and rejection of invalid values.
tests/unit/budget/test_enforcer.py
- Added a new test file with comprehensive unit tests for the BudgetEnforcer service, covering pre-flight checks, auto-downgrade logic, and budget checker creation.
tests/unit/engine/test_agent_engine_budget.py
- Added a new test file with unit tests for AgentEngine's integration with BudgetEnforcer, verifying pre-flight error handling, model downgrades, and cost tracker provision.
tests/unit/engine/test_agent_engine_errors.py
- Updated import for make_completion_response.
- Modified tests to reflect the change in exception chaining (raise exc from None), ensuring the original exception propagates cleanly.

Activity

The pull request was pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency).
30 findings were consolidated from the agent reviews.
All 30 consolidated findings were implemented in this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 208-218: Replace the multi-exception parentheses form with a
single exception since DailyLimitExceededError subclasses BudgetExhaustedError:
change the except block catching "(BudgetExhaustedError,
DailyLimitExceededError) as exc" to "except BudgetExhaustedError as exc" and
leave the call to self._handle_budget_error(...) unchanged so the handler
receives exc, identity, task, agent_id, task_id, duration_seconds, ctx, and
system_prompt.

In `@tests/unit/engine/test_agent_engine_budget.py`:
- Around line 204-205: The test couples to an internal attribute by asserting
engine._cost_tracker; change the test to avoid accessing the private field:
either add a public accessor on AgentEngine (e.g., a property like cost_tracker
or get_cost_tracker) and assert engine.cost_tracker is tracker, or replace the
assertion with a behavior-based check that uses AgentEngine's public API (invoke
the operation that records cost and then verify that the provided tracker
instance has the expected cost record). Reference AgentEngine and the private
attribute _cost_tracker when updating code or tests.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: de49c3a1-d352-4cfe-9142-9ae3967476cf

📥 Commits

Reviewing files that changed from the base of the PR and between f753779 and a6675cd.

📒 Files selected for processing (17)

CLAUDE.md
DESIGN_SPEC.md
README.md
src/ai_company/budget/__init__.py
src/ai_company/budget/billing.py
src/ai_company/budget/config.py
src/ai_company/budget/enforcer.py
src/ai_company/engine/__init__.py
src/ai_company/engine/_validation.py
src/ai_company/engine/agent_engine.py
src/ai_company/engine/errors.py
src/ai_company/observability/events/budget.py
tests/unit/budget/test_billing.py
tests/unit/budget/test_config.py
tests/unit/budget/test_enforcer.py
tests/unit/engine/test_agent_engine_budget.py
tests/unit/engine/test_agent_engine_errors.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Agent
GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14

Files:

src/ai_company/engine/errors.py
src/ai_company/engine/_validation.py
src/ai_company/engine/__init__.py
src/ai_company/budget/billing.py
tests/unit/budget/test_enforcer.py
src/ai_company/budget/config.py
src/ai_company/observability/events/budget.py
tests/unit/budget/test_billing.py
src/ai_company/budget/__init__.py
src/ai_company/budget/enforcer.py
tests/unit/budget/test_config.py
tests/unit/engine/test_agent_engine_errors.py
tests/unit/engine/test_agent_engine_budget.py
src/ai_company/engine/agent_engine.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Add type hints to all public functions and classes; mypy strict mode is enforced
Use Google-style docstrings required on public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones (immutability); for non-Pydantic internal collections use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task
Enforce 88 character line length (ruff)
Keep functions to less than 50 lines and files to less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

src/ai_company/engine/errors.py
src/ai_company/engine/_validation.py
src/ai_company/engine/__init__.py
src/ai_company/budget/billing.py
src/ai_company/budget/config.py
src/ai_company/observability/events/budget.py
src/ai_company/budget/__init__.py
src/ai_company/budget/enforcer.py
src/ai_company/engine/agent_engine.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)
Never use import logging, logging.getLogger(), or print() in application code — use the structured logger from ai_company.observability
Always use variable name logger (not _logger, not log) for the logger instance
Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events. import EVENT_CONSTANT
Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases

Files:

src/ai_company/engine/errors.py
src/ai_company/engine/_validation.py
src/ai_company/engine/__init__.py
src/ai_company/budget/billing.py
src/ai_company/budget/config.py
src/ai_company/observability/events/budget.py
src/ai_company/budget/__init__.py
src/ai_company/budget/enforcer.py
src/ai_company/engine/agent_engine.py

src/ai_company/{providers,engine}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains

Files:

src/ai_company/engine/errors.py
src/ai_company/engine/_validation.py
src/ai_company/engine/__init__.py
src/ai_company/engine/agent_engine.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow to categorize tests
Configure asyncio_mode = 'auto' for pytest — no manual @pytest.mark.asyncio needed
Set test timeout to 30 seconds per test
Prefer @pytest.mark.parametrize for testing similar cases
Use generic test provider names (test-provider, test-small-001, etc.) in tests

Files:

tests/unit/budget/test_enforcer.py
tests/unit/budget/test_billing.py
tests/unit/budget/test_config.py
tests/unit/engine/test_agent_engine_errors.py
tests/unit/engine/test_agent_engine_budget.py

🧠 Learnings (4)

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions

Applied to files:

CLAUDE.md
src/ai_company/engine/__init__.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/{providers,engine}/**/*.py : RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains

Applied to files:

src/ai_company/engine/errors.py
src/ai_company/engine/__init__.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/providers/**/*.py : Mark retryable errors with is_retryable=True: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry

Applied to files:

src/ai_company/engine/__init__.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT

Applied to files:

src/ai_company/observability/events/budget.py

🧬 Code graph analysis (9)

src/ai_company/engine/_validation.py (5)

src/ai_company/core/enums.py (2)

AgentStatus (67-72)

TaskStatus (165-191)

src/ai_company/engine/errors.py (1)

ExecutionStateError (12-13)

src/ai_company/observability/_logger.py (1)

get_logger (8-28)

src/ai_company/core/agent.py (1)

AgentIdentity (246-304)

src/ai_company/core/task.py (1)

Task (45-261)

src/ai_company/engine/__init__.py (1)

src/ai_company/engine/errors.py (1)

DailyLimitExceededError (37-38)

tests/unit/budget/test_enforcer.py (7)

src/ai_company/budget/config.py (2)

AutoDowngradeConfig (65-148)

BudgetAlertConfig (15-62)

src/ai_company/budget/enforcer.py (4)

cost_tracker (74-76)

check_can_execute (78-104)

resolve_model (164-222)

make_budget_checker (224-281)

src/ai_company/core/agent.py (2)

AgentIdentity (246-304)

ModelConfig (145-174)

src/ai_company/engine/context.py (2)

AgentContext (87-307)

from_identity (140-171)

src/ai_company/providers/models.py (1)

TokenUsage (12-35)

src/ai_company/providers/routing/models.py (1)

ResolvedModel (9-52)

src/ai_company/providers/routing/resolver.py (1)

ModelResolver (25-205)

tests/unit/budget/test_billing.py (1)

src/ai_company/budget/billing.py (2)

billing_period_start (11-45)

daily_period_start (48-59)

src/ai_company/budget/__init__.py (2)

src/ai_company/budget/billing.py (2)

billing_period_start (11-45)

daily_period_start (48-59)

src/ai_company/budget/enforcer.py (1)

BudgetEnforcer (42-334)

tests/unit/budget/test_config.py (1)

src/ai_company/budget/config.py (2)

AutoDowngradeConfig (65-148)

BudgetConfig (151-227)

tests/unit/engine/test_agent_engine_errors.py (1)

tests/unit/engine/conftest.py (1)

make_completion_response (272-290)

tests/unit/engine/test_agent_engine_budget.py (6)

src/ai_company/budget/config.py (2)

BudgetAlertConfig (15-62)

BudgetConfig (151-227)

src/ai_company/budget/enforcer.py (2)

BudgetEnforcer (42-334)

cost_tracker (74-76)

src/ai_company/budget/tracker.py (1)

CostTracker (67-412)

src/ai_company/engine/errors.py (2)

BudgetExhaustedError (24-34)

DailyLimitExceededError (37-38)

src/ai_company/engine/loop_protocol.py (1)

TerminationReason (28-35)

src/ai_company/engine/run_result.py (1)

termination_reason (64-66)

src/ai_company/engine/agent_engine.py (7)

src/ai_company/core/enums.py (1)

TaskStatus (165-191)

src/ai_company/engine/_validation.py (3)

validate_agent (61-73)

validate_run_inputs (33-58)

validate_task (76-105)

src/ai_company/engine/cost_recording.py (1)

record_execution_costs (27-85)

src/ai_company/engine/errors.py (3)

BudgetExhaustedError (24-34)

DailyLimitExceededError (37-38)

ExecutionStateError (12-13)

src/ai_company/budget/enforcer.py (3)

BudgetEnforcer (42-334)

resolve_model (164-222)

make_budget_checker (224-281)

src/ai_company/engine/loop_protocol.py (3)

ExecutionLoop (146-184)

make_budget_checker (187-202)

ExecutionResult (78-135)

src/ai_company/engine/prompt.py (2)

SystemPrompt (56-82)

build_error_prompt (646-673)

🪛 LanguageTool

README.md

[typographical] ~24-~24: To join two clauses or introduce examples, consider using an em dash.
Context: ...migrations - Budget Enforcement (M5) - BudgetEnforcer service with pre-flight...

(DASH_RULE)

🔇 Additional comments (32)

CLAUDE.md (1)

47-47: LGTM!

The updated description accurately reflects the new budget enforcement capabilities (pre-flight/in-flight checks, auto-downgrade, billing periods) introduced by the BudgetEnforcer service.

src/ai_company/engine/errors.py (2)

27-34: LGTM!

The updated docstring clearly documents the two usage contexts for BudgetExhaustedError: direct raising from pre-flight checks and conversion from TerminationReason.BUDGET_EXHAUSTED results.

37-39: LGTM!

The DailyLimitExceededError exception is appropriately subclassed from BudgetExhaustedError, enabling callers to catch either the specific daily limit error or the broader budget exhaustion category. The docstring is concise and clear.

src/ai_company/engine/__init__.py (2)

53-53: LGTM!

The import correctly exposes DailyLimitExceededError from the errors module.

171-171: LGTM!

The export is correctly added to __all__ in alphabetical order, maintaining consistency with the existing list.

src/ai_company/budget/billing.py (2)

11-45: LGTM!

The billing_period_start function correctly:

Validates reset_day bounds to avoid invalid dates (28 max avoids Feb issues)

Handles same-month vs previous-month logic

Handles year rollover when month == 1

Type hints and Google-style docstring are present and well-structured.

48-59: LGTM!

The daily_period_start function is a clean, simple utility for computing midnight UTC. Implementation is correct.

src/ai_company/observability/events/budget.py (1)

17-26: LGTM!

The new event constants comprehensively cover the budget enforcement lifecycle:

Threshold and limit events (ALERT_THRESHOLD_CROSSED, HARD_STOP_TRIGGERED, DAILY_LIMIT_EXCEEDED/HIT, TASK_LIMIT_HIT)

Downgrade events (DOWNGRADE_APPLIED, DOWNGRADE_SKIPPED)

Operational events (ENFORCEMENT_CHECK, BASELINE_ERROR, RESOLVE_MODEL_ERROR)

Naming follows the established BUDGET_<CATEGORY>_<ACTION> convention.

README.md (1)

24-24: LGTM!

The new feature entry accurately documents the BudgetEnforcer service capabilities (pre-flight checks, in-flight budget checking, auto-downgrade) and notes pending items (CFO agent, advanced reporting). This aligns with the PR objectives and implementation.

tests/unit/engine/test_agent_engine_errors.py (4)

30-30: LGTM!

The import correctly references make_completion_response directly from conftest, aligning with the helper function's actual name.

176-177: LGTM!

The test correctly uses the make_completion_response helper with a custom cost_usd to simulate a completion with cost data for the cost recording test.

197-198: LGTM!

Consistent usage of make_completion_response(cost_usd=0.05) for the RecursionError cost recording test.

264-266: LGTM!

The updated assertion correctly verifies the raise exc from None exception chaining behavior introduced in this PR. By asserting __cause__ is None, the test confirms that secondary failures during error handling don't pollute the exception chain, allowing the original error to propagate cleanly.

src/ai_company/budget/__init__.py (4)

8-8: LGTM!

The import correctly exposes the new billing period utilities from the billing module.

32-32: LGTM!

The BudgetEnforcer import correctly exposes the new enforcement service from the enforcer module.

53-53: LGTM!

BudgetEnforcer is correctly added to __all__ in alphabetical order.

77-78: LGTM!

The billing utility functions are correctly added to __all__ in alphabetical order, completing the public API surface for the budget enforcement feature.

tests/unit/budget/test_billing.py (1)

1-132: LGTM!

Comprehensive test coverage for billing period utilities with:

Well-structured parametrized tests covering edge cases (year boundary, February 28, exact boundaries)

Proper pytest markers (@pytest.mark.unit, pytest.mark.timeout(30))

UTC timezone verification in default behavior tests

Clear and descriptive test IDs

src/ai_company/budget/config.py (1)

98-103: LGTM!

Good additions to the budget configuration:

boundary: Literal["task_assignment"] enforces the single valid boundary type per DESIGN_SPEC §10.4

reset_day with ge=1, le=28, strict=True prevents month-length edge cases

allow_inf_nan=False on BudgetConfig prevents invalid float values (NaN/Inf) from being accepted

Also applies to: 167-167, 192-200

tests/unit/budget/test_config.py (1)

106-106: LGTM!

Good test coverage additions:

Tests verify boundary default is "task_assignment" and rejects other values

Tests verify reset_day default (1), valid range (1-28), and rejection of invalid inputs (0, 29, float)

Type ignore comments appropriately used for intentional type mismatches in negative tests

Also applies to: 187-195, 224-224, 273-293

src/ai_company/engine/_validation.py (1)

1-105: LGTM!

Well-structured validation module with:

Consistent logging before raising exceptions on all error paths

Proper use of structured logging with event constants

frozenset for immutable _EXECUTABLE_STATUSES

Clear separation of concerns (scalar inputs vs agent state vs task state)

The check at line 94 (task.assigned_to is not None and task.assigned_to != agent_id) is defensive against potential edge cases even though the Task model enforces assigned_to for ASSIGNED status.

tests/unit/engine/test_agent_engine_budget.py (1)

1-203: LGTM!

Solid integration tests for AgentEngine budget enforcer:

Proper use of AsyncMock and patch.object for mocking enforcer methods

Good coverage of pre-flight errors, model downgrade flow, and fallback behavior

Uses generic test provider names (test-small-001)

Clear test structure with descriptive docstrings

tests/unit/budget/test_enforcer.py (1)

1-908: LGTM!

Excellent test coverage for BudgetEnforcer:

Pre-flight checks: Tests boundary conditions (under, at exact, over limits) for both monthly and daily limits

Auto-downgrade: Comprehensive coverage including threshold boundaries, alias matching, chain downgrade, and graceful degradation paths

Budget checker: Tests task limit, monthly hard stop, daily limit, and alert deduplication

Well-organized helper functions (_make_budget_config, _make_identity, _patch_periods) reduce test boilerplate

Good use of @pytest.mark.parametrize for alert threshold tests

src/ai_company/budget/enforcer.py (2)

198-199: LGTM! Correct PEP 758 except syntax.

The bare tuple syntax except MemoryError, RecursionError: is correct per the coding guidelines for Python 3.14+ (PEP 758). Good pattern to re-raise critical errors while gracefully degrading on recoverable exceptions.

Also applies to: 300-301

42-596: LGTM!

Well-designed budget enforcement service:

Pre-flight checks (check_can_execute): Clear separation of monthly and daily limit checks with proper logging

Auto-downgrade (resolve_model): Graceful degradation when CostTracker fails or alias not found

In-flight checker (make_budget_checker): Efficient pre-computed baselines with alert deduplication

Mutable cell pattern (last_alert: list[BudgetAlertLevel]) for closure state is documented and appropriate

TOCTOU limitation is clearly documented in the class docstring

DESIGN_SPEC.md (1)

1676-1728: LGTM — Documentation accurately reflects the BudgetEnforcer architecture.

The updated implementation note clearly documents the three enforcement layers (pre-flight checks, in-flight budget checking, task-boundary auto-downgrade) and the composition of CostTracker + BudgetConfig. The billing period scoping via billing_period_start(reset_day) is also correctly referenced.

src/ai_company/engine/agent_engine.py (6)

12-24: LGTM — Clean import organization.

The validation function extraction and budget error imports align well with the refactoring objectives.

103-129: LGTM — Well-designed constructor with proper fallback logic.

The cost tracker fallback chain (cost_tracker → budget_enforcer.cost_tracker → None) is documented in the docstring and correctly implemented. The initialization log captures the new has_budget_enforcer flag for observability.

152-159: LGTM — Clean validation extraction.

The validation logic is appropriately delegated to the dedicated _validation.py module, maintaining the same error semantics (ExecutionStateError, ValueError) while reducing the file's line count.

668-708: LGTM — Well-designed budget error handler.

Good design decisions:

Logs at WARNING level (budget exhaustion is an expected boundary, not a crash).

Uses TerminationReason.BUDGET_EXHAUSTED enabling the orchestration layer to distinguish budget stops from errors.

No recovery applied (intentional — budget exhaustion is not a failure to recover from).

Properly handles the case where context or system_prompt may not have been built yet.

756-772: LGTM — Appropriate exception handling refinements.

logger.exception correctly captures the traceback for non-recoverable errors.

raise exc from None intentionally suppresses exception chaining so that if building the error result fails, the original exception is raised cleanly without confusing the user with the secondary build failure.

88-100: LGTM — Comprehensive docstring update.

The updated docstring clearly documents:

The cost tracker fallback chain

The budget enforcer's role in pre-flight checks, auto-downgrade, and enhanced in-flight budget checking

src/ai_company/engine/agent_engine.py

tests/unit/engine/test_agent_engine_budget.py

greptile-apps · 2026-03-09T08:16:43Z

Greptile Summary

This PR introduces BudgetEnforcer, a new service for pre-flight budget checks, in-flight enforcement, and task-boundary auto-downgrade. It extracts validation helpers into engine/_validation.py, adds billing.py for billing-period math, and integrates dedicated BudgetExhaustedError handling into AgentEngine.

Critical issues identified:

Python 2 except syntax — bare-comma exception handlers (except X, Y:) at multiple locations (enforcer.py lines 102, 104, 213, 321; agent_engine.py lines 210, 333, 363, 724, 781) cause compile-time SyntaxError in Python 3 and prevent module import.
Pre-flight daily limit skip — when total_monthly <= 0, check_can_execute returns early without checking per_agent_daily_limit, creating an enforcement gap inconsistent with in-flight make_budget_checker logic.
Module-level assert — the exhaustiveness check for _ALERT_LEVEL_ORDER uses an assert that is eliminated under python -O, risking silent runtime failures in optimized builds.

The billing.py, config.py, errors.py, and _validation.py modules are clean. Pre-flight integration in agent_engine.py is sound. Test coverage is comprehensive.

Confidence Score: 0/5

Not safe to merge — multiple Python 2 syntax errors prevent code from being imported in Python 3.
Multiple bare-comma except clauses (Python 2 syntax) are present across enforcer.py and agent_engine.py. These cause compile-time SyntaxError in Python 3 and prevent the modules from being imported at all, making execution impossible. Additionally, pre-flight daily limit enforcement is inconsistent with in-flight behavior when monthly budget is disabled, and module-level assert can be silently eliminated in optimized builds.
src/ai_company/budget/enforcer.py and src/ai_company/engine/agent_engine.py require fixes to all bare-comma except clauses before code can be imported.

_{Last reviewed commit: 8b20a75}

src/ai_company/budget/enforcer.py

src/ai_company/engine/agent_engine.py

src/ai_company/budget/enforcer.py

gemini-code-assist

Code Review

This pull request significantly hardens budget enforcement by introducing a dedicated BudgetEnforcer service, improving error handling in AgentEngine, and enhancing input validation. The refactoring and comprehensive tests are excellent. However, a critical vulnerability exists due to the use of outdated Python 2 except syntax (e.g., except MemoryError, RecursionError:) in enforcer.py and agent_engine.py. This invalid syntax in Python 3 will cause a SyntaxError at runtime, potentially leading to a Denial of Service or improper handling of non-recoverable errors, allowing the system to fail in an insecure or inconsistent state. Other hardening measures, such as allow_inf_nan=False and improved input validation, are well-implemented.

gemini-code-assist · 2026-03-09T08:17:22Z

src/ai_company/budget/enforcer.py

+            monthly_cost = await self._cost_tracker.get_total_cost(
+                start=period_start,
+            )
+        except MemoryError, RecursionError:


The except syntax except MemoryError, RecursionError: is invalid in Python 3, causing a SyntaxError at runtime. This can lead to a Denial of Service (DoS) as the module will fail to load. If interpreted with Python 2 semantics, it would improperly catch MemoryError and assign RecursionError as the exception object, allowing actual RecursionError exceptions to be caught by a broader except Exception: block, leading to insecure "fail open" behavior. The correct Python 3 syntax uses a tuple of exception types.

except (MemoryError, RecursionError):

gemini-code-assist · 2026-03-09T08:17:22Z

src/ai_company/budget/enforcer.py

+                daily_limit,
+                agent_id,
+            )
+        except MemoryError, RecursionError:


The except syntax except MemoryError, RecursionError: is invalid in Python 3 and will cause a SyntaxError at runtime. This can lead to a complete failure to load the module or improper error handling where non-recoverable exceptions are swallowed by generic handlers, potentially resulting in an inconsistent system state. The correct Python 3 syntax uses a tuple of exception types.

Suggested change

except MemoryError, RecursionError:

except (MemoryError, RecursionError):

Copilot

Pull request overview

This PR hardens and refactors budget enforcement in the engine by introducing a dedicated BudgetEnforcer service, improving termination classification for budget stops, extracting AgentEngine input validation, and adding/expanding tests and documentation to cover budget edge cases.

Changes:

Integrate BudgetEnforcer into AgentEngine for pre-flight checks, task-boundary model auto-downgrade, and enhanced in-flight budget checking; classify budget stops as TerminationReason.BUDGET_EXHAUSTED.
Extract AgentEngine validation helpers into engine/_validation.py and tighten budget config validation (reset_day bounds, allow_inf_nan=False, downgrade boundary).
Add new budget/billing utilities, observability event constants, and comprehensive unit tests + docs updates.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/unit/engine/test_agent_engine_errors.py	Updates tests around error-chain behavior and cost-recording propagation.
tests/unit/engine/test_agent_engine_budget.py	Adds engine-level integration tests for BudgetEnforcer wiring and termination reasons.
tests/unit/budget/test_enforcer.py	Adds extensive unit coverage for BudgetEnforcer pre-flight checks, downgrade logic, and checker behavior.
tests/unit/budget/test_config.py	Adds coverage for new config fields/validation (`boundary`, `reset_day`).
tests/unit/budget/test_billing.py	Adds tests for new billing-period boundary utilities.
src/ai_company/observability/events/budget.py	Introduces budget-enforcement event name constants.
src/ai_company/engine/errors.py	Clarifies budget error semantics and adds `DailyLimitExceededError`.
src/ai_company/engine/agent_engine.py	Wires in BudgetEnforcer, refactors validation usage, and adds budget-specific termination handling.
src/ai_company/engine/_validation.py	New module with extracted run/agent/task validation helpers.
src/ai_company/engine/init.py	Re-exports `DailyLimitExceededError`.
src/ai_company/budget/enforcer.py	New BudgetEnforcer service implementing pre-flight checks, in-flight checker factory, and task-boundary downgrade.
src/ai_company/budget/config.py	Adds downgrade boundary field and validates `reset_day`; disallows inf/nan floats.
src/ai_company/budget/billing.py	New pure functions for monthly billing period start and daily period start.
src/ai_company/budget/init.py	Exposes BudgetEnforcer and billing utilities from the budget package.
README.md	Updates feature-status messaging to reflect budget enforcement availability.
DESIGN_SPEC.md	Updates spec/docs to reflect BudgetEnforcer implementation details and config.
CLAUDE.md	Updates repository overview to include budget enforcement and billing periods.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T08:18:22Z

src/ai_company/budget/enforcer.py

+                agent_id=agent_id,
+                reason="falling_back_to_task_only_checker",
+            )
+            return None


This exception fallback logs reason="falling_back_to_task_only_checker", but then returns None, which disables all in-flight checks (including per-task limits) if monthly/daily baseline queries fail. Consider returning a task-only checker when task_limit > 0 and only dropping the baseline-dependent monthly/daily enforcement.

Suggested change

return None

# Fall back to neutral baselines so that task-level enforcement

# can still proceed even if monthly/daily baseline queries fail.

return 0.0, 0.0

Copilot · 2026-03-09T08:18:22Z

src/ai_company/engine/agent_engine.py

+        logger.warning(
+            EXECUTION_ENGINE_ERROR,
+            agent_id=agent_id,
+            task_id=task_id,


Budget exhaustion is a controlled termination (TerminationReason.BUDGET_EXHAUSTED), but this logs under the generic EXECUTION_ENGINE_ERROR event name. That can inflate error metrics/alerts. Consider emitting a budget-specific event (or adding an EXECUTION_ENGINE_BUDGET_EXHAUSTED event) and logging at an appropriate level.

Suggested change

logger.warning(

EXECUTION_ENGINE_ERROR,

agent_id=agent_id,

task_id=task_id,

logger.info(

"EXECUTION_ENGINE_BUDGET_EXHAUSTED",

agent_id=agent_id,

task_id=task_id,

termination_reason=TerminationReason.BUDGET_EXHAUSTED.value,

Copilot · 2026-03-09T08:18:22Z

src/ai_company/engine/agent_engine.py

@@ -182,14 +198,24 @@ async def run(  # noqa: PLR0913
                tool_invoker=tool_invoker,
            )
        except MemoryError, RecursionError:


except MemoryError, RecursionError: is invalid Python syntax (will raise SyntaxError). Use the tuple form except (MemoryError, RecursionError): (or separate except blocks) to catch both exceptions.

Suggested change

except MemoryError, RecursionError:

except (MemoryError, RecursionError):

Copilot · 2026-03-09T08:18:22Z

src/ai_company/budget/enforcer.py

+            monthly_cost = await self._cost_tracker.get_total_cost(
+                start=period_start,
+            )
+        except MemoryError, RecursionError:


except MemoryError, RecursionError: is invalid Python syntax (will raise SyntaxError). Use except (MemoryError, RecursionError): (or separate except blocks).

Suggested change

except MemoryError, RecursionError:

except (MemoryError, RecursionError):

src/ai_company/engine/agent_engine.py

src/ai_company/budget/enforcer.py

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/ai_company/engine/agent_engine.py (1)
152-218: 🛠️ Refactor suggestion | 🟠 Major

Extract the new budget flow out of AgentEngine.

The added pre-flight branches plus _handle_budget_error() push run() well past the 50-line cap and leave src/ai_company/engine/agent_engine.py over the 800-line limit again. Moving the budget-specific orchestration/result building into helpers would keep the entrypoint readable.

As per coding guidelines: "Keep functions to less than 50 lines and files to less than 800 lines".

Also applies to: 668-722
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 152 - 218, The run()
method has grown past 50 lines due to pre-flight budget checks and budget-error
orchestration; extract the budget-specific flow into one or two helpers to
reduce length: move the pre-flight steps (the block that uses
self._budget_enforcer, calls to check_can_execute and resolve_model, and
creation of tool_invoker) and context preparation (calls to
self._make_tool_invoker and self._prepare_context) into a new helper like
_prepare_for_execution(identity, agent_id, task, task_id, max_turns,
memory_messages) that returns (identity, tool_invoker, ctx, system_prompt), and
move the BudgetExhaustedError handling and result-building currently invoking
_handle_budget_error into a separate helper like _run_handle_budget_error(exc,
identity, task, agent_id, task_id, start, ctx, system_prompt) and have run()
call these helpers before/after _execute; ensure you update run() to call the
new helpers and keep all original symbols (_handle_budget_error,
_prepare_context, _execute, _make_tool_invoker, self._budget_enforcer) intact so
behavior is unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 987-989: Update the spec sentence that says "12 constants in
`events/execution.py`" to reflect the current count of engine events (13)
because `src/ai_company/observability/events/execution.py` now includes the new
`EXECUTION_ENGINE_BUDGET_STOPPED` constant; change the text to mention 13 engine
events (or rephrase to avoid hardcoding a number) and optionally note the added
`EXECUTION_ENGINE_BUDGET_STOPPED` event for clarity.

In `@src/ai_company/budget/enforcer.py`:
- Around line 253-260: The code that decides whether to build a budget checker
must treat BudgetConfig.total_monthly == 0 as a global "budgets disabled"
switch; change the early-return logic in the method that uses
self._budget_config (variables cfg, monthly_budget, task_limit, daily_limit) so
that if monthly_budget == 0 it immediately returns None regardless of
task.budget_limit or cfg.per_agent_daily_limit, ensuring consistency with
check_can_execute() and resolve_model().

In `@src/ai_company/engine/agent_engine.py`:
- Around line 117-120: The constructor sets self._budget_enforcer and
self._cost_tracker separately which can leave budget_enforcer.cost_tracker and
self._cost_tracker as different instances, breaking enforcement in
_post_execution_pipeline and pre/in-flight checks; fix by ensuring they point to
the same CostTracker instance whenever both are provided — e.g., after assigning
self._budget_enforcer and self._cost_tracker, if self._budget_enforcer and
self._cost_tracker are both truthy and self._budget_enforcer.cost_tracker is not
self._cost_tracker, set self._budget_enforcer.cost_tracker = self._cost_tracker
(or vice versa) so _post_execution_pipeline, pre-flight, and in-flight
enforcement all share the same tracker.
- Around line 688-699: The ExecutionResult returned in the budget-stop path
drops the exception details; update the code around
logger.warning/EXECUTION_ENGINE_BUDGET_STOPPED so the constructed
ExecutionResult (created with error_ctx and TerminationReason.BUDGET_EXHAUSTED)
also sets an informative error_message (e.g., f"{type(exc).__name__}: {exc}" or
a normalized budget-stop reason) so callers can distinguish daily vs monthly
hard stops; modify the ExecutionResult creation in this block (referencing
AgentContext.from_identity, ctx, ExecutionResult, and
TerminationReason.BUDGET_EXHAUSTED) to include that error text.

In `@tests/unit/engine/test_agent_engine_budget.py`:
- Around line 49-113: Combine the two duplicated tests into a single
parametrized test using pytest.mark.parametrize that iterates over the injected
exception types (BudgetExhaustedError and DailyLimitExceededError) and their
messages, replace the two functions
test_preflight_budget_exhausted_returns_error and
test_preflight_daily_limit_returns_budget_exhausted with one test (e.g.,
test_preflight_prevents_execution_for_preflight_errors) that patches
enforcer.check_can_execute with AsyncMock(side_effect=exception) and calls
AgentEngine.run, then assert result.termination_reason ==
TerminationReason.BUDGET_EXHAUSTED and provider.call_count == 0; keep existing
use of MockCompletionProvider, BudgetEnforcer, CostTracker, and sample fixtures.

---

Outside diff comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 152-218: The run() method has grown past 50 lines due to
pre-flight budget checks and budget-error orchestration; extract the
budget-specific flow into one or two helpers to reduce length: move the
pre-flight steps (the block that uses self._budget_enforcer, calls to
check_can_execute and resolve_model, and creation of tool_invoker) and context
preparation (calls to self._make_tool_invoker and self._prepare_context) into a
new helper like _prepare_for_execution(identity, agent_id, task, task_id,
max_turns, memory_messages) that returns (identity, tool_invoker, ctx,
system_prompt), and move the BudgetExhaustedError handling and result-building
currently invoking _handle_budget_error into a separate helper like
_run_handle_budget_error(exc, identity, task, agent_id, task_id, start, ctx,
system_prompt) and have run() call these helpers before/after _execute; ensure
you update run() to call the new helpers and keep all original symbols
(_handle_budget_error, _prepare_context, _execute, _make_tool_invoker,
self._budget_enforcer) intact so behavior is unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8350002a-f876-4975-a4e8-ddbe4499997e

📥 Commits

Reviewing files that changed from the base of the PR and between a6675cd and 5948c87.

📒 Files selected for processing (9)

DESIGN_SPEC.md
src/ai_company/budget/enforcer.py
src/ai_company/engine/agent_engine.py
src/ai_company/observability/events/execution.py
tests/unit/budget/test_billing.py
tests/unit/budget/test_config.py
tests/unit/budget/test_enforcer.py
tests/unit/engine/test_agent_engine_budget.py
tests/unit/engine/test_agent_engine_errors.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14

Files:

tests/unit/engine/test_agent_engine_budget.py
tests/unit/budget/test_billing.py
tests/unit/engine/test_agent_engine_errors.py
tests/unit/budget/test_config.py
src/ai_company/observability/events/execution.py
tests/unit/budget/test_enforcer.py
src/ai_company/budget/enforcer.py
src/ai_company/engine/agent_engine.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow to categorize tests
Configure asyncio_mode = 'auto' for pytest — no manual @pytest.mark.asyncio needed
Set test timeout to 30 seconds per test
Prefer @pytest.mark.parametrize for testing similar cases
Use generic test provider names (test-provider, test-small-001, etc.) in tests

Files:

tests/unit/engine/test_agent_engine_budget.py
tests/unit/budget/test_billing.py
tests/unit/engine/test_agent_engine_errors.py
tests/unit/budget/test_config.py
tests/unit/budget/test_enforcer.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Add type hints to all public functions and classes; mypy strict mode is enforced
Use Google-style docstrings required on public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones (immutability); for non-Pydantic internal collections use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task
Enforce 88 character line length (ruff)
Keep functions to less than 50 lines and files to less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

src/ai_company/observability/events/execution.py
src/ai_company/budget/enforcer.py
src/ai_company/engine/agent_engine.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)
Never use import logging, logging.getLogger(), or print() in application code — use the structured logger from ai_company.observability
Always use variable name logger (not _logger, not log) for the logger instance
Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events. import EVENT_CONSTANT
Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases

Files:

src/ai_company/observability/events/execution.py
src/ai_company/budget/enforcer.py
src/ai_company/engine/agent_engine.py

src/ai_company/{providers,engine}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains

Files:

src/ai_company/engine/agent_engine.py

🧠 Learnings (3)

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT

Applied to files:

src/ai_company/observability/events/execution.py
src/ai_company/engine/agent_engine.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14

Applied to files:

src/ai_company/budget/enforcer.py
src/ai_company/engine/agent_engine.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/**/*.py : Handle errors explicitly, never silently swallow exceptions

Applied to files:

src/ai_company/budget/enforcer.py

🧬 Code graph analysis (5)

tests/unit/budget/test_billing.py (1)

src/ai_company/budget/billing.py (2)

billing_period_start (11-45)

daily_period_start (48-59)

tests/unit/engine/test_agent_engine_errors.py (2)

tests/unit/engine/conftest.py (2)

make_completion_response (272-290)

sample_agent_with_personality (60-87)

src/ai_company/engine/agent_engine.py (1)

run (131-229)

tests/unit/budget/test_config.py (1)

src/ai_company/budget/config.py (2)

AutoDowngradeConfig (65-148)

BudgetConfig (151-227)

src/ai_company/budget/enforcer.py (7)

src/ai_company/budget/billing.py (2)

billing_period_start (11-45)

daily_period_start (48-59)

src/ai_company/budget/enums.py (1)

BudgetAlertLevel (6-16)

src/ai_company/engine/errors.py (2)

BudgetExhaustedError (24-34)

DailyLimitExceededError (37-38)

src/ai_company/budget/config.py (1)

BudgetConfig (151-227)

src/ai_company/budget/tracker.py (1)

CostTracker (67-412)

src/ai_company/core/agent.py (2)

AgentIdentity (246-304)

ModelConfig (145-174)

src/ai_company/providers/routing/resolver.py (2)

ModelResolver (25-205)

resolve_safe (154-172)

src/ai_company/engine/agent_engine.py (4)

src/ai_company/engine/_validation.py (3)

validate_agent (61-73)

validate_run_inputs (33-58)

validate_task (76-105)

src/ai_company/budget/enforcer.py (4)

BudgetEnforcer (42-338)

cost_tracker (74-76)

check_can_execute (78-104)

make_budget_checker (224-279)

src/ai_company/engine/loop_protocol.py (1)

make_budget_checker (187-202)

src/ai_company/engine/prompt.py (2)

SystemPrompt (56-82)

build_error_prompt (646-673)

DESIGN_SPEC.md

coderabbitai · 2026-03-09T08:41:38Z

src/ai_company/budget/enforcer.py

+        cfg = self._budget_config
+        task_limit = task.budget_limit
+        monthly_budget = cfg.total_monthly
+        daily_limit = cfg.per_agent_daily_limit
+
+        # All enforcement disabled
+        if monthly_budget <= 0 and task_limit <= 0 and daily_limit <= 0:
+            return None


⚠️ Potential issue | 🟠 Major

Honor total_monthly=0 as a full budget-disable switch.

BudgetConfig documents that total_monthly == 0.0 disables budget enforcement at runtime, and check_can_execute() / resolve_model() already follow that contract. This branch does not: it still builds a checker whenever task.budget_limit or per_agent_daily_limit is positive, so a run can pass pre-flight and then get stopped mid-execution even though budgets were supposed to be off.

Suggested fix

- # All enforcement disabled - if monthly_budget <= 0 and task_limit <= 0 and daily_limit <= 0: + # A zero monthly budget disables budget enforcement entirely. + if monthly_budget <= 0: return None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cfg = self._budget_config

task_limit = task.budget_limit

monthly_budget = cfg.total_monthly

daily_limit = cfg.per_agent_daily_limit

# All enforcement disabled

if monthly_budget <= 0 and task_limit <= 0 and daily_limit <= 0:

return None

cfg = self._budget_config

task_limit = task.budget_limit

monthly_budget = cfg.total_monthly

daily_limit = cfg.per_agent_daily_limit

# A zero monthly budget disables budget enforcement entirely.

if monthly_budget <= 0:

return None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ai_company/budget/enforcer.py` around lines 253 - 260, The code that decides whether to build a budget checker must treat BudgetConfig.total_monthly == 0 as a global "budgets disabled" switch; change the early-return logic in the method that uses self._budget_config (variables cfg, monthly_budget, task_limit, daily_limit) so that if monthly_budget == 0 it immediately returns None regardless of task.budget_limit or cfg.per_agent_daily_limit, ensuring consistency with check_can_execute() and resolve_model().

src/ai_company/engine/agent_engine.py

coderabbitai · 2026-03-09T08:41:38Z

src/ai_company/engine/agent_engine.py

+        logger.warning(
+            EXECUTION_ENGINE_BUDGET_STOPPED,
+            agent_id=agent_id,
+            task_id=task_id,
+            error=f"{type(exc).__name__}: {exc}",
+        )
+        try:
+            error_ctx = ctx or AgentContext.from_identity(identity, task=task)
+            budget_result = ExecutionResult(
+                context=error_ctx,
+                termination_reason=TerminationReason.BUDGET_EXHAUSTED,
+            )


⚠️ Potential issue | 🟠 Major

Return the budget-stop reason in the result.

This path logs the exception text, but the returned ExecutionResult drops it. Callers only get TerminationReason.BUDGET_EXHAUSTED with error_message=None, so they cannot distinguish a monthly hard stop from a daily-limit stop, which misses the linked clear-error-message requirement.

♻️ Proposed fix

budget_result = ExecutionResult( context=error_ctx, termination_reason=TerminationReason.BUDGET_EXHAUSTED, + error_message=f"{type(exc).__name__}: {exc}", )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/agent_engine.py` around lines 688 - 699, The ExecutionResult returned in the budget-stop path drops the exception details; update the code around logger.warning/EXECUTION_ENGINE_BUDGET_STOPPED so the constructed ExecutionResult (created with error_ctx and TerminationReason.BUDGET_EXHAUSTED) also sets an informative error_message (e.g., f"{type(exc).__name__}: {exc}" or a normalized budget-stop reason) so callers can distinguish daily vs monthly hard stops; modify the ExecutionResult creation in this block (referencing AgentContext.from_identity, ctx, ExecutionResult, and TerminationReason.BUDGET_EXHAUSTED) to include that error text.

tests/unit/engine/test_agent_engine_budget.py

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T08:49:34Z

src/ai_company/budget/enforcer.py

+        try:
+            return await self._compute_baselines(
+                cfg,
+                monthly_budget,
+                daily_limit,
+                agent_id,
+            )
+        except MemoryError, RecursionError:
+            raise
+        except Exception:
+            logger.exception(
+                BUDGET_BASELINE_ERROR,
+                agent_id=agent_id,
+                reason="falling_back_to_zero_baselines",
+            )
+            return 0.0, 0.0


Same Python 3 syntax issue here: except MemoryError, RecursionError: is a SyntaxError. Switch to except (MemoryError, RecursionError): (optionally binding as exc).

Copilot · 2026-03-09T08:49:35Z

tests/unit/engine/test_agent_engine_budget.py

+    async def test_preflight_budget_exhausted_returns_error(
+        self,
+        sample_agent_with_personality: AgentIdentity,
+        sample_task_with_criteria: Task,
+    ) -> None:
+        """Pre-flight BudgetExhaustedError propagates as error result."""


Test name/docstring mismatch: this test asserts TerminationReason.BUDGET_EXHAUSTED, but the name/docstring say “returns error / propagates as error result”. Renaming to reflect “budget exhausted” (or updating the docstring) will keep intent clear.

Suggested change

async def test_preflight_budget_exhausted_returns_error(

self,

sample_agent_with_personality: AgentIdentity,

sample_task_with_criteria: Task,

) -> None:

"""Pre-flight BudgetExhaustedError propagates as error result."""

async def test_preflight_budget_exhausted_sets_budget_exhausted_reason(

self,

sample_agent_with_personality: AgentIdentity,

sample_task_with_criteria: Task,

) -> None:

"""Pre-flight BudgetExhaustedError yields BUDGET_EXHAUSTED termination."""

Copilot · 2026-03-09T08:49:35Z

src/ai_company/engine/agent_engine.py

        except MemoryError, RecursionError:
-            logger.error(
+            logger.exception(
                EXECUTION_ENGINE_ERROR,
                agent_id=agent_id,
                task_id=task_id,
                error="non-recoverable error in run()",
-                exc_info=True,
            )
            raise


except MemoryError, RecursionError: uses Python 2 exception syntax and is a SyntaxError on Python 3. Use tuple syntax instead (e.g., except (MemoryError, RecursionError): or except (MemoryError, RecursionError) as exc:) to keep the non-recoverable propagation behavior.

Copilot · 2026-03-09T08:49:35Z

src/ai_company/engine/agent_engine.py

+                agent_id=agent_id,
+                task_id=task_id,
+            )
+        except MemoryError, RecursionError:


Same Python 2 exception syntax issue here: except MemoryError, RecursionError: will not parse on Python 3. Switch to except (MemoryError, RecursionError): (optionally binding as exc).

Suggested change

except MemoryError, RecursionError:

except (MemoryError, RecursionError):

Copilot · 2026-03-09T08:49:36Z

src/ai_company/engine/agent_engine.py

        except MemoryError, RecursionError:
-            logger.error(
+            logger.exception(
                EXECUTION_ENGINE_ERROR,
                agent_id=agent_id,
                task_id=task_id,
                error="non-recoverable error while building error result",
-                exc_info=True,
            )
            raise


except MemoryError, RecursionError: is invalid Python 3 syntax and will raise a SyntaxError at import time. Please change to tuple syntax (except (MemoryError, RecursionError):).

Copilot · 2026-03-09T08:49:36Z

src/ai_company/engine/agent_engine.py

+                task_id=task_id,
+                error=f"Failed to build budget-exhausted result: {build_exc}",
+            )
+            raise exc from build_exc


In _handle_budget_error, raise exc from build_exc reintroduces the secondary-exception chain inversion (the build failure becomes the cause of the original budget error). If the goal is to propagate the original error cleanly (as in _handle_fatal_error), suppress the secondary cause (raise exc from None) or re-raise the build exception separately, depending on the desired debugging behavior.

Suggested change

raise exc from build_exc

raise exc from None

Copilot · 2026-03-09T08:49:36Z

src/ai_company/budget/enforcer.py

+    async def check_can_execute(self, agent_id: str) -> None:
+        """Pre-flight: verify monthly + daily limits allow execution.
+
+        Raises:
+            BudgetExhaustedError: Monthly hard stop exceeded.
+            DailyLimitExceededError: Agent daily limit exceeded.
+        """
+        cfg = self._budget_config
+
+        # Skip if enforcement disabled (total_monthly == 0)
+        if cfg.total_monthly <= 0:
+            logger.debug(
+                BUDGET_ENFORCEMENT_CHECK,
+                agent_id=agent_id,
+                result="pass",
+                reason="enforcement_disabled",
+            )
+            return
+
+        await self._check_monthly_hard_stop(cfg, agent_id)
+        await self._check_daily_limit(cfg, agent_id)
+


check_can_execute() returns early when total_monthly <= 0, which skips the daily-limit pre-flight check even if per_agent_daily_limit is configured. This also contradicts make_budget_checker() which can still enforce daily/task limits when total_monthly is 0. Consider treating total_monthly <= 0 as “monthly limit disabled” (skip only the monthly hard-stop check) while still running the daily limit check when per_agent_daily_limit > 0, or otherwise align the runtime behavior + docstrings/tests to a single meaning.

Copilot · 2026-03-09T08:49:36Z

src/ai_company/budget/enforcer.py

+        try:
+            period_start = billing_period_start(cfg.reset_day)
+            monthly_cost = await self._cost_tracker.get_total_cost(
+                start=period_start,
+            )
+        except MemoryError, RecursionError:
+            raise
+        except Exception:
+            logger.exception(
+                BUDGET_RESOLVE_MODEL_ERROR,
+                agent_id=str(identity.id),
+                reason="cost_tracker_query_failed",
+            )
+            return identity


except MemoryError, RecursionError: is Python 2 syntax and will not parse on Python 3. Use except (MemoryError, RecursionError): to preserve the intended “non-recoverable errors propagate” behavior.

src/ai_company/budget/enforcer.py

src/ai_company/engine/agent_engine.py

src/ai_company/budget/enforcer.py

…dget checking, and auto-downgrade (#44) Implement budget enforcement engine as described in DESIGN_SPEC §10.4: - Add BudgetEnforcer service composing CostTracker + BudgetConfig - Pre-flight checks: monthly hard stop + per-agent daily limit verification - In-flight budget checker: sync closure with pre-computed baselines for task, monthly, and daily limits with deduplicated alert emission - Task-boundary auto-downgrade via ModelResolver alias lookup - Add billing period utilities (billing_period_start, daily_period_start) - Add DailyLimitExceededError, budget event constants - Add reset_day and boundary config fields - Wire BudgetEnforcer into AgentEngine execution pipeline - Comprehensive tests: 276 passing (config, billing, enforcer, engine integration) Closes #44

…ction, and review fixes Pre-reviewed by 9 agents, 30 findings addressed: - Add dedicated budget error handling in AgentEngine (BudgetExhaustedError/DailyLimitExceededError) - Extract validation functions to engine/_validation.py (keep agent_engine.py under 800 lines) - Add graceful degradation in resolve_model and make_budget_checker for CostTracker failures - Fix exception chain inversion (raise exc from None) - Add reset_day boundary validation in billing.py - Add allow_inf_nan=False to BudgetConfig - Consolidate _apply_downgrade/_resolve_and_apply into single function - Add 5 new tests for edge cases and error paths - Update DESIGN_SPEC.md, CLAUDE.md, README.md for budget enforcement status

…iewers Critical: fix _compute_baselines_safe to return (0.0, 0.0) instead of None on CostTracker failure, preserving task-only budget checking. Add try-except self-protection to _handle_budget_error. Create dedicated BUDGET_STOPPED event. Simplify redundant exception catch (DailyLimitExceededError is subclass). Add _ALERT_LEVEL_ORDER sync assertion. Update DESIGN_SPEC engine pipeline and constructor docs. Add graceful degradation, timeout validation, and boundary-value tests.

- Fix exception chain inversion: raise exc from None in _handle_budget_error - Fix comment/code mismatch: total_monthly == 0 → <= 0 in enforcer - Update DESIGN_SPEC event count: 12 → 13 (BUDGET_STOPPED added) - Validate cost_tracker consistency with budget_enforcer in AgentEngine - Parametrize duplicated pre-flight budget stop tests - Inline _get_tool_definitions, trim docstrings to stay under 800 lines

coderabbitai

Actionable comments posted: 4

♻️ Duplicate comments (4)

src/ai_company/budget/enforcer.py (1)

258-260: ⚠️ Potential issue | 🟠 Major

total_monthly == 0 still needs to disable the checker.

BudgetConfig in src/ai_company/budget/config.py documents zero monthly budget as disabling runtime enforcement, and this class already follows that in check_can_execute() and resolve_model(). This branch still builds a checker when task.budget_limit or per_agent_daily_limit is positive, so a run can pass pre-flight and then be stopped mid-execution even though budgets are supposed to be off. That also means tests/unit/budget/test_enforcer.py::test_returns_checker_when_only_task_limit_active is currently asserting the opposite contract.
Suggested fix
-        # All enforcement disabled
-        if monthly_budget <= 0 and task_limit <= 0 and daily_limit <= 0:
+        # A zero monthly budget disables enforcement entirely.
+        if monthly_budget <= 0:
             return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/budget/enforcer.py` around lines 258 - 260, The branch that
currently returns None when monthly_budget, task_limit, and daily_limit are all
<= 0 misses the configured total_monthly flag; update the early-return in the
enforcer (use the same symbols monthly_budget, task_limit, daily_limit and also
check total_monthly) so that if total_monthly <= 0 it also treats enforcement as
disabled and returns None; ensure logic in the enforcer's constructor or factory
mirrors BudgetConfig's semantics (and remains consistent with
check_can_execute() and resolve_model()) so no checker is created when
total_monthly is zero.

DESIGN_SPEC.md (1)

989-989: ⚠️ Potential issue | 🟡 Minor

Update the engine event count in the spec.

This was flagged previously: Line 989 states "12 constants in events/execution.py", but EXECUTION_ENGINE_BUDGET_STOPPED was added (imported at Line 41 in agent_engine.py), making the count 13. The past comment noted this discrepancy.

📝 Proposed fix

-Logs structured events under the `execution.engine.*` namespace (12 constants in `events/execution.py`): creation, start, prompt built, completion, errors, invalid input, task transitions, cost recording outcomes, task metrics, and timeout.
+Logs structured events under the `execution.engine.*` namespace (13 constants in `events/execution.py`): creation, start, prompt built, completion, errors, invalid input, task transitions, cost recording outcomes, task metrics, timeout, and budget stopped.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@DESIGN_SPEC.md` at line 989, The spec line claiming "12 constants in
events/execution.py" is out of date because EXECUTION_ENGINE_BUDGET_STOPPED was
added; update the count to 13 and any related list/description to include
EXECUTION_ENGINE_BUDGET_STOPPED (it is imported in agent_engine.py), and ensure
the events/execution.py constants list and the DESIGN_SPEC description both
reflect the new total and include the budget-stopped event name.

src/ai_company/engine/agent_engine.py (2)

117-120: ⚠️ Potential issue | 🟠 Major

budget_enforcer and cost_tracker may reference different instances.

This concern was raised previously: when both budget_enforcer and cost_tracker are provided, _post_execution_pipeline() records spend to self._cost_tracker, but pre-flight and in-flight enforcement read from budget_enforcer.cost_tracker. If these are different instances, newly recorded spend won't feed back into enforcement, silently under-enforcing monthly/daily caps.

🛡️ Proposed fix to validate tracker consistency

         self._tool_registry = tool_registry
         self._budget_enforcer = budget_enforcer
-        self._cost_tracker = cost_tracker or (
-            budget_enforcer.cost_tracker if budget_enforcer else None
-        )
+        if budget_enforcer is not None:
+            if cost_tracker is not None and cost_tracker is not budget_enforcer.cost_tracker:
+                raise ValueError(
+                    "cost_tracker must match budget_enforcer.cost_tracker "
+                    "when budget_enforcer is provided"
+                )
+            self._cost_tracker = budget_enforcer.cost_tracker
+        else:
+            self._cost_tracker = cost_tracker

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 117 - 120, The
constructor currently allows a provided budget_enforcer and a separate
cost_tracker which can lead to inconsistent tracking (post-execution writes to
self._cost_tracker while enforcement reads budget_enforcer.cost_tracker); in the
__init__ (where self._budget_enforcer and self._cost_tracker are set) add a
validation: if both budget_enforcer and cost_tracker are provided assert they
are the same instance (e.g., cost_tracker is budget_enforcer.cost_tracker) and
raise a clear ValueError if not, or alternatively prefer
budget_enforcer.cost_tracker and ignore the passed cost_tracker (documented
choice), ensuring _post_execution_pipeline writes and enforcement read from the
same CostTracker.

694-699: ⚠️ Potential issue | 🟠 Major

Include the budget-stop reason in the result.

This was flagged previously: the returned ExecutionResult drops the exception details. Callers only receive TerminationReason.BUDGET_EXHAUSTED with error_message=None, so they cannot distinguish a monthly hard stop from a daily-limit stop — missing the linked "clear error message" requirement from issue #44.

🐛 Proposed fix to include error message

             budget_result = ExecutionResult(
                 context=error_ctx,
                 termination_reason=TerminationReason.BUDGET_EXHAUSTED,
+                error_message=f"{type(exc).__name__}: {exc}",
             )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/ai_company/engine/agent_engine.py` around lines 694 - 699, The current
ExecutionResult created in the AgentEngine path (using error_ctx = ctx or
AgentContext.from_identity(identity, task=task) and constructing ExecutionResult
with TerminationReason.BUDGET_EXHAUSTED) drops the exception details; update the
exception handling so the ExecutionResult includes the budget-stop message (set
error_message to the exception text or specific code from the caught
BudgetExhausted exception) and any relevant metadata before returning. Locate
the try/except around that block in agent_engine.py, catch the budget-related
exception (or capture the caught variable), and pass its message (e.g., str(e)
or getattr(e, "message", None)) into ExecutionResult(error_message=...) so
callers can distinguish monthly vs daily stops while still setting
termination_reason=TerminationReason.BUDGET_EXHAUSTED.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/ai_company/budget/billing.py`:
- Around line 32-45: The function billing_period_start should hard-validate its
inputs: ensure reset_day is an int and in 1..28 (raise TypeError for non-int,
ValueError for out-of-range) and ensure now is a datetime that is normalized to
UTC (if now is tz-aware, call now = now.astimezone(UTC); if now is naive,
explicitly set or document treating it as UTC by replacing tzinfo=UTC). Update
the billing_period_start implementation (and the analogous helper used at lines
57-59) to perform these checks/conversions before any date arithmetic so you
don't silently copy non-UTC datetimes into UTC or accept non-integer reset_day
values.
- Line 8: Add the module-level logger by importing get_logger from
ai_company.observability and initializing logger = get_logger(__name__) at the
top of the billing module; specifically, add "from ai_company.observability
import get_logger" and then "logger = get_logger(__name__)" near the existing
datetime import so the functions in this file (e.g., any raise paths for invalid
input) can use logger for observability.

In `@src/ai_company/budget/enforcer.py`:
- Around line 518-606: The _build_checker_closure function is too large and
mixes task, monthly, alert-deduplication, and daily logic; split its logic into
small helpers (e.g. _check_task_limit, _check_monthly_thresholds,
_check_daily_limit) and have _check simply call those helpers in order. Move the
task-limit branch (using BUDGET_TASK_LIMIT_HIT and task_limit) into
_check_task_limit, move the monthly calculations and calls to _emit_alert /
BudgetAlertLevel into _check_monthly_thresholds (ensure it receives last_alert,
monthly_baseline, thresholds and monthly_budget and returns a tuple of
(should_stop: bool, maybe_total_monthly) or just bool), and move the daily
branch (BUDGET_DAILY_LIMIT_HIT, daily_baseline, daily_limit) into
_check_daily_limit; keep _build_checker_closure to wiring/closure creation only
and return the small _check that delegates to these helpers.

In `@src/ai_company/engine/agent_engine.py`:
- Around line 714-721: In _handle_budget_error, the exception chaining is
inconsistent: change the re-raise "raise exc from build_exc" to "raise exc from
None" so the original budget error is re-raised without exposing the build
exception in the traceback; locate the except block that catches build_exc (the
logger.exception call using EXECUTION_ENGINE_ERROR with agent_id/task_id and
error message "Failed to build budget-exhausted result") and update the raise
statement to use "from None" matching _handle_fatal_error's behavior.

---

Duplicate comments:
In `@DESIGN_SPEC.md`:
- Line 989: The spec line claiming "12 constants in events/execution.py" is out
of date because EXECUTION_ENGINE_BUDGET_STOPPED was added; update the count to
13 and any related list/description to include EXECUTION_ENGINE_BUDGET_STOPPED
(it is imported in agent_engine.py), and ensure the events/execution.py
constants list and the DESIGN_SPEC description both reflect the new total and
include the budget-stopped event name.

In `@src/ai_company/budget/enforcer.py`:
- Around line 258-260: The branch that currently returns None when
monthly_budget, task_limit, and daily_limit are all <= 0 misses the configured
total_monthly flag; update the early-return in the enforcer (use the same
symbols monthly_budget, task_limit, daily_limit and also check total_monthly) so
that if total_monthly <= 0 it also treats enforcement as disabled and returns
None; ensure logic in the enforcer's constructor or factory mirrors
BudgetConfig's semantics (and remains consistent with check_can_execute() and
resolve_model()) so no checker is created when total_monthly is zero.

In `@src/ai_company/engine/agent_engine.py`:
- Around line 117-120: The constructor currently allows a provided
budget_enforcer and a separate cost_tracker which can lead to inconsistent
tracking (post-execution writes to self._cost_tracker while enforcement reads
budget_enforcer.cost_tracker); in the __init__ (where self._budget_enforcer and
self._cost_tracker are set) add a validation: if both budget_enforcer and
cost_tracker are provided assert they are the same instance (e.g., cost_tracker
is budget_enforcer.cost_tracker) and raise a clear ValueError if not, or
alternatively prefer budget_enforcer.cost_tracker and ignore the passed
cost_tracker (documented choice), ensuring _post_execution_pipeline writes and
enforcement read from the same CostTracker.
- Around line 694-699: The current ExecutionResult created in the AgentEngine
path (using error_ctx = ctx or AgentContext.from_identity(identity, task=task)
and constructing ExecutionResult with TerminationReason.BUDGET_EXHAUSTED) drops
the exception details; update the exception handling so the ExecutionResult
includes the budget-stop message (set error_message to the exception text or
specific code from the caught BudgetExhausted exception) and any relevant
metadata before returning. Locate the try/except around that block in
agent_engine.py, catch the budget-related exception (or capture the caught
variable), and pass its message (e.g., str(e) or getattr(e, "message", None))
into ExecutionResult(error_message=...) so callers can distinguish monthly vs
daily stops while still setting
termination_reason=TerminationReason.BUDGET_EXHAUSTED.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1a99f543-3fb6-4c82-8088-cdb3136af9f7

📥 Commits

Reviewing files that changed from the base of the PR and between 5948c87 and fac7ffb.

📒 Files selected for processing (18)

CLAUDE.md
DESIGN_SPEC.md
README.md
src/ai_company/budget/__init__.py
src/ai_company/budget/billing.py
src/ai_company/budget/config.py
src/ai_company/budget/enforcer.py
src/ai_company/engine/__init__.py
src/ai_company/engine/_validation.py
src/ai_company/engine/agent_engine.py
src/ai_company/engine/errors.py
src/ai_company/observability/events/budget.py
src/ai_company/observability/events/execution.py
tests/unit/budget/test_billing.py
tests/unit/budget/test_config.py
tests/unit/budget/test_enforcer.py
tests/unit/engine/test_agent_engine_budget.py
tests/unit/engine/test_agent_engine_errors.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Agent
GitHub Check: Greptile Review

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT use from __future__ import annotations — Python 3.14 has PEP 649
Use PEP 758 except syntax: use except A, B: (no parentheses) — ruff enforces this on Python 3.14

Files:

src/ai_company/observability/events/execution.py
src/ai_company/budget/billing.py
src/ai_company/engine/_validation.py
tests/unit/engine/test_agent_engine_budget.py
src/ai_company/budget/enforcer.py
src/ai_company/engine/__init__.py
tests/unit/engine/test_agent_engine_errors.py
src/ai_company/budget/config.py
src/ai_company/engine/agent_engine.py
src/ai_company/budget/__init__.py
tests/unit/budget/test_billing.py
src/ai_company/observability/events/budget.py
tests/unit/budget/test_config.py
src/ai_company/engine/errors.py
tests/unit/budget/test_enforcer.py

src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Add type hints to all public functions and classes; mypy strict mode is enforced
Use Google-style docstrings required on public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones (immutability); for non-Pydantic internal collections use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use @computed_field for derived values; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task
Enforce 88 character line length (ruff)
Keep functions to less than 50 lines and files to less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

src/ai_company/observability/events/execution.py
src/ai_company/budget/billing.py
src/ai_company/engine/_validation.py
src/ai_company/budget/enforcer.py
src/ai_company/engine/__init__.py
src/ai_company/budget/config.py
src/ai_company/engine/agent_engine.py
src/ai_company/budget/__init__.py
src/ai_company/observability/events/budget.py
src/ai_company/engine/errors.py

src/ai_company/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)
Never use import logging, logging.getLogger(), or print() in application code — use the structured logger from ai_company.observability
Always use variable name logger (not _logger, not log) for the logger instance
Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events. import EVENT_CONSTANT
Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases

Files:

src/ai_company/observability/events/execution.py
src/ai_company/budget/billing.py
src/ai_company/engine/_validation.py
src/ai_company/budget/enforcer.py
src/ai_company/engine/__init__.py
src/ai_company/budget/config.py
src/ai_company/engine/agent_engine.py
src/ai_company/budget/__init__.py
src/ai_company/observability/events/budget.py
src/ai_company/engine/errors.py

src/ai_company/{providers,engine}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains

Files:

src/ai_company/engine/_validation.py
src/ai_company/engine/__init__.py
src/ai_company/engine/agent_engine.py
src/ai_company/engine/errors.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow to categorize tests
Configure asyncio_mode = 'auto' for pytest — no manual @pytest.mark.asyncio needed
Set test timeout to 30 seconds per test
Prefer @pytest.mark.parametrize for testing similar cases
Use generic test provider names (test-provider, test-small-001, etc.) in tests

Files:

tests/unit/engine/test_agent_engine_budget.py
tests/unit/engine/test_agent_engine_errors.py
tests/unit/budget/test_billing.py
tests/unit/budget/test_config.py
tests/unit/budget/test_enforcer.py

🧠 Learnings (7)

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT

Applied to files:

src/ai_company/observability/events/execution.py
src/ai_company/engine/agent_engine.py
src/ai_company/observability/events/budget.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to tests/**/*.py : Prefer pytest.mark.parametrize for testing similar cases

Applied to files:

tests/unit/engine/test_agent_engine_budget.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions

Applied to files:

CLAUDE.md
src/ai_company/engine/__init__.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14

Applied to files:

src/ai_company/budget/enforcer.py
src/ai_company/engine/agent_engine.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/**/*.py : Handle errors explicitly, never silently swallow exceptions

Applied to files:

src/ai_company/budget/enforcer.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/{providers,engine}/**/*.py : RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains

Applied to files:

src/ai_company/engine/__init__.py
src/ai_company/engine/errors.py

📚 Learning: 2026-03-09T06:51:01.916Z

Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/providers/**/*.py : Mark retryable errors with is_retryable=True: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry

Applied to files:

src/ai_company/engine/__init__.py

🧬 Code graph analysis (5)

src/ai_company/engine/__init__.py (1)

src/ai_company/engine/errors.py (1)

DailyLimitExceededError (37-38)

tests/unit/engine/test_agent_engine_errors.py (1)

tests/unit/engine/conftest.py (1)

make_completion_response (272-290)

src/ai_company/budget/__init__.py (2)

src/ai_company/budget/billing.py (2)

billing_period_start (11-45)

daily_period_start (48-59)

src/ai_company/budget/enforcer.py (1)

BudgetEnforcer (42-338)

tests/unit/budget/test_billing.py (1)

src/ai_company/budget/billing.py (2)

billing_period_start (11-45)

daily_period_start (48-59)

tests/unit/budget/test_config.py (1)

src/ai_company/budget/config.py (2)

AutoDowngradeConfig (65-148)

BudgetConfig (151-227)

🪛 LanguageTool

README.md

[typographical] ~25-~25: To join two clauses or introduce examples, consider using an em dash.
Context: ...nd factory - Budget Enforcement (M5) - BudgetEnforcer service with pre-flight...

(DASH_RULE)

🔇 Additional comments (21)

CLAUDE.md (1)

47-47: LGTM!

The updated description accurately reflects the expanded budget module capabilities with pre-flight/in-flight checks, auto-downgrade, and billing periods.

README.md (1)

25-25: LGTM!

The README accurately reflects the new Budget Enforcement (M5) implementation status, clearly noting what's implemented and what's pending.

src/ai_company/engine/errors.py (2)

27-34: LGTM!

The updated docstring clearly documents the two usage contexts for BudgetExhaustedError, improving developer understanding of when this exception is raised.

37-39: LGTM!

DailyLimitExceededError appropriately inherits from BudgetExhaustedError, enabling callers to catch either the specific daily limit error or the broader budget exhaustion category. The docstring is concise and clear.

src/ai_company/engine/__init__.py (2)

53-53: LGTM!

The import is correctly placed alphabetically among other error imports.

171-171: LGTM!

DailyLimitExceededError is properly added to __all__ in alphabetical order, making it part of the engine's public API surface.

src/ai_company/engine/_validation.py (5)

1-20: LGTM!

The module setup follows all coding guidelines: proper docstring, structured logger via get_logger(__name__), and event constants imported from the observability domain module.

22-30: LGTM!

Using frozenset for _EXECUTABLE_STATUSES ensures immutability, and the detailed docstring explains the rationale for which task statuses are executable.

33-58: LGTM!

validate_run_inputs correctly validates scalar arguments, logs warnings with structured context before raising, and uses the proper event constant. The keyword-only parameter design prevents argument order mistakes.

61-73: LGTM!

validate_agent follows the guideline to log at WARNING with context before raising, and the error message clearly explains the requirement.

76-105: LGTM!

validate_task properly checks both task status and assignment ownership, with appropriate logging before raising ExecutionStateError. The condition on line 94 correctly allows unassigned tasks (assigned_to is None) to proceed.

src/ai_company/observability/events/budget.py (1)

17-26: LGTM!

The new budget event constants are well-organized, following the established naming convention (BUDGET_<CATEGORY>_<ACTION>) and value pattern (budget.<category>.<action>). The coverage aligns with the BudgetEnforcer's observability needs.

src/ai_company/observability/events/execution.py (1)

37-37: LGTM!

The new EXECUTION_ENGINE_BUDGET_STOPPED constant complements the existing EXECUTION_LOOP_BUDGET_EXHAUSTED by providing engine-level visibility for budget stop events. Naming and placement are consistent with existing patterns.

tests/unit/budget/test_billing.py (3)

1-9: LGTM!

Good test file setup with proper imports and the 30-second timeout configured as per coding guidelines.

12-117: LGTM!

Comprehensive test coverage for billing_period_start using @pytest.mark.parametrize as recommended. The test cases thoroughly cover:

Same-month and previous-month rollover scenarios

Exact boundary conditions (day == reset_day)

Year boundary rollback (January → December)

February 28 edge case (max allowed reset_day)

Invalid input validation

The descriptive test IDs improve readability.

120-142: LGTM!

TestDailyPeriodStart covers the essential behaviors: midnight UTC calculation, idempotency at midnight, and default behavior without explicit now.

src/ai_company/engine/agent_engine.py (3)

173-176: LGTM — Pre-flight budget enforcement integration.

The pre-flight check (check_can_execute) and model resolution (resolve_model) are correctly sequenced before context preparation, aligning with the design spec's three-layer enforcement model (§10.4). The identity reassignment from resolve_model ensures any auto-downgrade is applied to the agent for this run.

246-253: LGTM — Budget checker with enforcer fallback.

The conditional properly delegates to budget_enforcer.make_budget_checker when available, falling back to the basic make_budget_checker(task) otherwise. This maintains backward compatibility while enabling the enhanced in-flight checking with pre-computed baselines.

208-218: LGTM — Budget error handling simplified.

Catching only BudgetExhaustedError is correct since DailyLimitExceededError is a subclass (per the PR objectives). The handler routes to _handle_budget_error which returns BUDGET_EXHAUSTED termination reason, properly distinguishing budget stops from crashes.

DESIGN_SPEC.md (2)

974-987: LGTM — Pipeline documentation accurately reflects budget enforcement integration.

The updated pipeline steps (2-11) correctly describe:

Pre-flight checks via check_can_execute() and resolve_model()

Budget-aware checker creation from BudgetEnforcer

Error handling distinguishing BudgetExhaustedError (including DailyLimitExceededError) from other exceptions

The description aligns with the AgentEngine implementation.

1827-1834: LGTM — BudgetEnforcer implementation note is accurate and complete.

The note correctly describes:

Three enforcement layers (pre-flight, in-flight, task-boundary auto-downgrade)

Composition of CostTracker + BudgetConfig

Billing period scoping via billing_period_start(reset_day)

DailyLimitExceededError as subclass of BudgetExhaustedError

This provides clear guidance for understanding the budget enforcement architecture.

coderabbitai · 2026-03-09T08:56:57Z

src/ai_company/budget/billing.py

+to scope cost queries to the current billing cycle.
+"""
+
+from datetime import UTC, datetime


🛠️ Refactor suggestion | 🟠 Major

Initialize the standard logger for this module.

These helpers are now part of the budget-enforcement business logic path, but the module still doesn't define logger = get_logger(__name__). That also leaves the invalid-input raise paths without the standard observability hook used elsewhere under src/ai_company/**.

Suggested fix

from datetime import UTC, datetime +from ai_company.observability import get_logger + +logger = get_logger(__name__)

As per coding guidelines, "Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

from datetime import UTC, datetime

from datetime import UTC, datetime

from ai_company.observability import get_logger

logger = get_logger(__name__)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ai_company/budget/billing.py` at line 8, Add the module-level logger by importing get_logger from ai_company.observability and initializing logger = get_logger(__name__) at the top of the billing module; specifically, add "from ai_company.observability import get_logger" and then "logger = get_logger(__name__)" near the existing datetime import so the functions in this file (e.g., any raise paths for invalid input) can use logger for observability.

coderabbitai · 2026-03-09T08:56:58Z

src/ai_company/budget/billing.py

+    if not 1 <= reset_day <= 28:  # noqa: PLR2004
+        msg = f"reset_day must be 1-28, got {reset_day}"
+        raise ValueError(msg)
+
+    if now is None:
+        now = datetime.now(UTC)
+
+    if now.day >= reset_day:
+        return datetime(now.year, now.month, reset_day, tzinfo=UTC)
+
+    # Roll back to previous month
+    if now.month == 1:
+        return datetime(now.year - 1, 12, reset_day, tzinfo=UTC)
+    return datetime(now.year, now.month - 1, reset_day, tzinfo=UTC)


⚠️ Potential issue | 🟠 Major

Harden the public input boundary for reset_day and now.

billing_period_start(True) is currently treated as day 1, billing_period_start(1.5) falls through to a TypeError, and non-UTC now values are copied into a UTC timestamp without conversion. Because these helpers define the monthly/daily windows used by budget enforcement, that can select the wrong billing period around UTC day/month boundaries.

Suggested hardening

+def _normalize_utc_now(*, now: datetime | None) -> datetime: + if now is None: + return datetime.now(UTC) + if now.tzinfo is None: + msg = "now must be timezone-aware" + raise ValueError(msg) + return now.astimezone(UTC) + + def billing_period_start( reset_day: int, *, now: datetime | None = None, ) -> datetime: @@ - if not 1 <= reset_day <= 28: # noqa: PLR2004 + if ( + isinstance(reset_day, bool) + or not isinstance(reset_day, int) + or not 1 <= reset_day <= 28 # noqa: PLR2004 + ): msg = f"reset_day must be 1-28, got {reset_day}" raise ValueError(msg) - if now is None: - now = datetime.now(UTC) + now = _normalize_utc_now(now=now) @@ def daily_period_start(*, now: datetime | None = None) -> datetime: @@ - if now is None: - now = datetime.now(UTC) + now = _normalize_utc_now(now=now) return datetime(now.year, now.month, now.day, tzinfo=UTC)

As per coding guidelines, "Validate at system boundaries (user input, external APIs, config files)".

Also applies to: 57-59

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/ai_company/budget/billing.py` around lines 32 - 45, The function billing_period_start should hard-validate its inputs: ensure reset_day is an int and in 1..28 (raise TypeError for non-int, ValueError for out-of-range) and ensure now is a datetime that is normalized to UTC (if now is tz-aware, call now = now.astimezone(UTC); if now is naive, explicitly set or document treating it as UTC by replacing tzinfo=UTC). Update the billing_period_start implementation (and the analogous helper used at lines 57-59) to perform these checks/conversions before any date arithmetic so you don't silently copy non-UTC datetimes into UTC or accept non-integer reset_day values.

src/ai_company/budget/enforcer.py

src/ai_company/engine/agent_engine.py

github-actions · 2026-03-09T08:57:47Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-09T09:02:17Z

src/ai_company/engine/agent_engine.py

@@ -700,12 +779,11 @@ async def _handle_fatal_error(  # noqa: PLR0913
                task_id=task_id,
            )
        except MemoryError, RecursionError:


This handler uses except MemoryError, RecursionError: which is invalid in Python 3. Replace with tuple syntax (except (MemoryError, RecursionError): or ... as exc) so non-recoverable errors propagate without causing a SyntaxError at import time.

Suggested change

except MemoryError, RecursionError:

except (MemoryError, RecursionError):

Copilot · 2026-03-09T09:02:18Z

src/ai_company/budget/enforcer.py

+                daily_limit,
+                agent_id,
+            )
+        except MemoryError, RecursionError:


This uses except MemoryError, RecursionError: which is invalid in Python 3 and will raise a SyntaxError. Update it to except (MemoryError, RecursionError): (or ... as exc) so the graceful-degradation path only catches the intended exceptions.

Suggested change

except MemoryError, RecursionError:

except (MemoryError, RecursionError):

…hema, closure split - Add graceful degradation to check_can_execute: CostTracker failures fall back to allowing execution instead of aborting the task - Split BUDGET_HARD_STOP_TRIGGERED into EXCEEDED (pre-flight) vs TRIGGERED (in-flight) for distinct event semantics - Add BUDGET_PREFLIGHT_ERROR event for pre-flight fallback logging - Split _build_checker_closure inner function into 3 focused helpers (_check_task_limit, _check_monthly_limit, _check_daily_limit) to stay under 50-line function limit

greptile-apps · 2026-03-09T09:12:27Z

src/ai_company/budget/enforcer.py

+        except BudgetExhaustedError, DailyLimitExceededError:
+            raise
+        except MemoryError, RecursionError:
+            raise


Python 2 except syntax — compile-time SyntaxError in Python 3.

except BudgetExhaustedError, DailyLimitExceededError: uses bare-comma syntax, which is only valid in Python 2. Python 3 requires parentheses: except (BudgetExhaustedError, DailyLimitExceededError):. Without this fix, the module cannot be imported.

Additionally, since DailyLimitExceededError is a subclass of BudgetExhaustedError (see errors.py line 37), catching both is redundant — catching just BudgetExhaustedError will catch both.

This same bare-comma pattern appears at multiple other locations in the codebase and must be fixed consistently:

enforcer.py lines 104, 213, 321

agent_engine.py lines 210, 333, 363, 724, 781

Suggested change

except BudgetExhaustedError, DailyLimitExceededError:

raise

except MemoryError, RecursionError:

raise

except (BudgetExhaustedError, DailyLimitExceededError):

raise

except (MemoryError, RecursionError):

raise

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/budget/enforcer.py Line: 102-105 Comment: Python 2 `except` syntax — compile-time `SyntaxError` in Python 3. `except BudgetExhaustedError, DailyLimitExceededError:` uses bare-comma syntax, which is only valid in Python 2. Python 3 requires parentheses: `except (BudgetExhaustedError, DailyLimitExceededError):`. Without this fix, the module cannot be imported. Additionally, since `DailyLimitExceededError` is a subclass of `BudgetExhaustedError` (see `errors.py` line 37), catching both is redundant — catching just `BudgetExhaustedError` will catch both. This same bare-comma pattern appears at multiple other locations in the codebase and must be fixed consistently: - `enforcer.py` lines 104, 213, 321 - `agent_engine.py` lines 210, 333, 363, 724, 781 ```suggestion except (BudgetExhaustedError, DailyLimitExceededError): raise except (MemoryError, RecursionError): raise ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-09T09:12:28Z

src/ai_company/budget/enforcer.py

+        # Skip if enforcement disabled (total_monthly <= 0)
+        if cfg.total_monthly <= 0:
+            logger.debug(
+                BUDGET_ENFORCEMENT_CHECK,
+                agent_id=agent_id,
+                result="pass",
+                reason="enforcement_disabled",
+            )
+            return


Pre-flight enforcement skips per_agent_daily_limit check when total_monthly <= 0.

When total_monthly <= 0, check_can_execute returns early without calling _check_daily_limit, even if per_agent_daily_limit is set to a positive value. This creates an enforcement gap: the daily limit is only enforced in-flight by make_budget_checker, not at pre-flight.

This is inconsistent with make_budget_checker, which treats the three limits independently (see lines 273–277). An operator who configures total_monthly=0 (to disable global budget caps) while keeping per_agent_daily_limit=10.0 (to rate-limit individual agents) will find the daily limit never enforced at pre-flight — the agent can exceed it freely until the in-flight checker eventually stops it mid-task.

The daily limit check should be decoupled from the monthly check:

Suggested change

# Skip if enforcement disabled (total_monthly <= 0)

if cfg.total_monthly <= 0:

logger.debug(

BUDGET_ENFORCEMENT_CHECK,

agent_id=agent_id,

result="pass",

reason="enforcement_disabled",

)

return

# Skip monthly hard stop if enforcement disabled (total_monthly <= 0)

if cfg.total_monthly > 0:

await self._check_monthly_hard_stop(cfg, agent_id)

# Daily limit is independent and should always be checked

await self._check_daily_limit(cfg, agent_id)

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/budget/enforcer.py Line: 89-97 Comment: Pre-flight enforcement skips `per_agent_daily_limit` check when `total_monthly <= 0`. When `total_monthly <= 0`, `check_can_execute` returns early without calling `_check_daily_limit`, even if `per_agent_daily_limit` is set to a positive value. This creates an enforcement gap: the daily limit is only enforced in-flight by `make_budget_checker`, not at pre-flight. This is inconsistent with `make_budget_checker`, which treats the three limits independently (see lines 273–277). An operator who configures `total_monthly=0` (to disable global budget caps) while keeping `per_agent_daily_limit=10.0` (to rate-limit individual agents) will find the daily limit never enforced at pre-flight — the agent can exceed it freely until the in-flight checker eventually stops it mid-task. The daily limit check should be decoupled from the monthly check: ```suggestion # Skip monthly hard stop if enforcement disabled (total_monthly <= 0) if cfg.total_monthly > 0: await self._check_monthly_hard_stop(cfg, agent_id) # Daily limit is independent and should always be checked await self._check_daily_limit(cfg, agent_id) ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-03-09T09:12:29Z

src/ai_company/budget/enforcer.py

+assert set(_ALERT_LEVEL_ORDER) == set(BudgetAlertLevel), (  # noqa: S101
+    f"_ALERT_LEVEL_ORDER keys {set(_ALERT_LEVEL_ORDER)} do not match "
+    f"BudgetAlertLevel members {set(BudgetAlertLevel)}"
+)


Module-level assert is eliminated under python -O, creating a silent safety gap.

The exhaustiveness check for _ALERT_LEVEL_ORDER uses an assert statement, which is compiled out when Python runs with the -O (optimize) flag. If a future developer adds a new BudgetAlertLevel enum member without a corresponding entry in _ALERT_LEVEL_ORDER, this guard will silently vanish in optimized builds — and _emit_alert will raise a KeyError at runtime instead of failing fast at startup.

Replace the assert with an explicit runtime check that fires in all environments:

Suggested change

assert set(_ALERT_LEVEL_ORDER) == set(BudgetAlertLevel), ( # noqa: S101

f"_ALERT_LEVEL_ORDER keys {set(_ALERT_LEVEL_ORDER)} do not match "

f"BudgetAlertLevel members {set(BudgetAlertLevel)}"

)

_missing = set(BudgetAlertLevel) - set(_ALERT_LEVEL_ORDER)

if _missing:

msg = f"_ALERT_LEVEL_ORDER is missing members: {_missing}"

raise RuntimeError(msg)

Prompt To Fix With AI

This is a comment left during a code review. Path: src/ai_company/budget/enforcer.py Line: 464-467 Comment: Module-level `assert` is eliminated under `python -O`, creating a silent safety gap. The exhaustiveness check for `_ALERT_LEVEL_ORDER` uses an `assert` statement, which is compiled out when Python runs with the `-O` (optimize) flag. If a future developer adds a new `BudgetAlertLevel` enum member without a corresponding entry in `_ALERT_LEVEL_ORDER`, this guard will silently vanish in optimized builds — and `_emit_alert` will raise a `KeyError` at runtime instead of failing fast at startup. Replace the `assert` with an explicit runtime check that fires in all environments: ```suggestion _missing = set(BudgetAlertLevel) - set(_ALERT_LEVEL_ORDER) if _missing: msg = f"_ALERT_LEVEL_ORDER is missing members: {_missing}" raise RuntimeError(msg) ``` How can I resolve this? If you propose a fix, please make it concise.

- Fix pre-flight check_can_execute skipping daily limit when total_monthly <= 0 (enforcer.py) - Replace module-level assert with RuntimeError for _ALERT_LEVEL_ORDER exhaustiveness check (enforcer.py) - Simplify redundant except clause (DailyLimitExceededError is subclass of BudgetExhaustedError) (enforcer.py) - Fix turn_range semantic inconsistency: use turn.turn_number for turn-based findings instead of raw enumerate index (detectors.py) - Update ErrorFinding.turn_range docstring to clarify position semantics across detector types (models.py) - Change MemoryBackend.backend_name return type to NotBlankStr (protocol.py) - Add MemoryNotFoundError docstring clarifying protocol vs impl usage (errors.py) - Add vector_store/history_store validation against known values (config.py) - Fix DESIGN_SPEC TOC ordering (7.4 before 7.5) - Fix ADR-001: add mem0ai to version table, fix Memari->Memary typo, fix YAML agents example (mapping->list), add RootConfig note - Add test for ClassificationResult._validate_findings_match_categories - Add Windows path traversal test cases for MemoryStorageConfig - Fix misleading docstring in integration test file

🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>

Copilot AI review requested due to automatic review settings March 9, 2026 08:12

Copilot started reviewing on behalf of Aureliolo March 9, 2026 08:12 View session

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

src/ai_company/engine/agent_engine.py Outdated Show resolved Hide resolved

tests/unit/engine/test_agent_engine_budget.py Outdated Show resolved Hide resolved

greptile-apps bot reviewed Mar 9, 2026

View reviewed changes

src/ai_company/budget/enforcer.py Show resolved Hide resolved

src/ai_company/budget/enforcer.py Show resolved Hide resolved

src/ai_company/engine/agent_engine.py Show resolved Hide resolved

src/ai_company/budget/enforcer.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

Copilot AI reviewed Mar 9, 2026

View reviewed changes

greptile-apps bot reviewed Mar 9, 2026

View reviewed changes

src/ai_company/engine/agent_engine.py Outdated Show resolved Hide resolved

src/ai_company/budget/enforcer.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 9, 2026 08:45

Aureliolo force-pushed the feat/budget-enforcement branch from 5948c87 to fac7ffb Compare March 9, 2026 08:45

Copilot started reviewing on behalf of Aureliolo March 9, 2026 08:45 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

greptile-apps bot reviewed Mar 9, 2026

View reviewed changes

Aureliolo added 4 commits March 9, 2026 09:55

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings March 9, 2026 08:57

Aureliolo force-pushed the feat/budget-enforcement branch from e89538f to de946df Compare March 9, 2026 08:57

Copilot started reviewing on behalf of Aureliolo March 9, 2026 08:58 View session

Copilot AI reviewed Mar 9, 2026

View reviewed changes

Aureliolo merged commit c107bf9 into main Mar 9, 2026
7 checks passed

Aureliolo deleted the feat/budget-enforcement branch March 9, 2026 09:03

greptile-apps bot reviewed Mar 9, 2026

View reviewed changes

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release ai-company 0.1.1 #282

Merged

This was referenced Mar 10, 2026

fix: resolve circular imports, bump litellm, fix release tag format #285

Closed

fix: resolve circular imports, bump litellm, fix release tag format #286

Merged

Aureliolo mentioned this pull request Mar 10, 2026

chore(main): release 0.1.0 #283

Merged

This was referenced Mar 15, 2026

chore(main): release 0.2.4 #431

Merged

chore(main): release 0.2.0 #442

Closed

chore(main): release 0.2.5 #447

Merged

chore(main): release 0.2.0 #460

Closed

chore(main): release 0.2.0 #471

Closed

coderabbitai bot mentioned this pull request Mar 19, 2026

feat(engine): implement execution loop auto-selection based on task complexity #567

Merged

6 tasks

	except MemoryError, RecursionError:
	except (MemoryError, RecursionError):

-assert set(_ALERT_LEVEL_ORDER) == set(BudgetAlertLevel), (  # noqa: S101
-    f"_ALERT_LEVEL_ORDER keys {set(_ALERT_LEVEL_ORDER)} do not match "
-    f"BudgetAlertLevel members {set(BudgetAlertLevel)}"
-)
+_missing = set(BudgetAlertLevel) - set(_ALERT_LEVEL_ORDER)
+if _missing:
+    msg = f"_ALERT_LEVEL_ORDER is missing members: {_missing}"
+    raise RuntimeError(msg)

Conversation

Aureliolo commented Mar 9, 2026

Summary

Test plan

Review coverage

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Summary by CodeRabbit

Release Notes

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

gemini-code-assist bot commented Mar 9, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

greptile-apps bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 0/5

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

greptile-apps bot commented Mar 9, 2026 •

edited

Loading

github-actions bot commented Mar 9, 2026 •

edited

Loading