refactor: harden BudgetEnforcer with error handling and review fixes#182
refactor: harden BudgetEnforcer with error handling and review fixes#182
Conversation
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughIntroduces a BudgetEnforcer service with pre‑flight, in‑flight, and task‑boundary auto‑downgrade enforcement; adds billing/daily period utilities and config fields (reset_day, auto‑downgrade boundary); integrates BudgetEnforcer into AgentEngine (pre‑flight checks, model resolution, budget-aware checker); adds new errors, observability events, and comprehensive unit tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent as Agent/User
participant Engine as AgentEngine
participant Enforcer as BudgetEnforcer
participant Tracker as CostTracker
participant Resolver as ModelResolver
participant Provider as LLM Provider
Agent->>Engine: run(agent_id, task_id, ...)
Engine->>Enforcer: check_can_execute(agent_id)
Enforcer->>Tracker: query_spending(agent_id, period)
Tracker-->>Enforcer: monthly_spent, daily_spent
alt Pre‑flight exhausted
Enforcer-->>Engine: raise BudgetExhaustedError
Engine-->>Agent: return AgentRunResult(termination_reason=BUDGET_EXHAUSTED)
else Pre‑flight ok
Engine->>Enforcer: resolve_model(identity)
Enforcer->>Resolver: resolve(identity, thresholds)
Resolver-->>Enforcer: possibly_downgraded_identity
Enforcer-->>Engine: identity (maybe downgraded)
Engine->>Enforcer: make_budget_checker(task, agent_id)
Enforcer-->>Engine: checker_fn
loop Per LLM call
Engine->>checker_fn: check(cost_estimate, usage)
alt Checker signals stop
Engine-->>Agent: return AgentRunResult(termination_reason=BUDGET_EXHAUSTED)
else OK
Engine->>Provider: generate(model, prompt...)
Provider-->>Engine: completion + cost
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and functionality of the budget enforcement system. It introduces a dedicated Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 208-218: Replace the multi-exception parentheses form with a
single exception since DailyLimitExceededError subclasses BudgetExhaustedError:
change the except block catching "(BudgetExhaustedError,
DailyLimitExceededError) as exc" to "except BudgetExhaustedError as exc" and
leave the call to self._handle_budget_error(...) unchanged so the handler
receives exc, identity, task, agent_id, task_id, duration_seconds, ctx, and
system_prompt.
In `@tests/unit/engine/test_agent_engine_budget.py`:
- Around line 204-205: The test couples to an internal attribute by asserting
engine._cost_tracker; change the test to avoid accessing the private field:
either add a public accessor on AgentEngine (e.g., a property like cost_tracker
or get_cost_tracker) and assert engine.cost_tracker is tracker, or replace the
assertion with a behavior-based check that uses AgentEngine's public API (invoke
the operation that records cost and then verify that the provided tracker
instance has the expected cost record). Reference AgentEngine and the private
attribute _cost_tracker when updating code or tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: de49c3a1-d352-4cfe-9142-9ae3967476cf
📒 Files selected for processing (17)
CLAUDE.mdDESIGN_SPEC.mdREADME.mdsrc/ai_company/budget/__init__.pysrc/ai_company/budget/billing.pysrc/ai_company/budget/config.pysrc/ai_company/budget/enforcer.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/_validation.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/errors.pysrc/ai_company/observability/events/budget.pytests/unit/budget/test_billing.pytests/unit/budget/test_config.pytests/unit/budget/test_enforcer.pytests/unit/engine/test_agent_engine_budget.pytests/unit/engine/test_agent_engine_errors.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649
Use PEP 758 except syntax: useexcept A, B:(no parentheses) — ruff enforces this on Python 3.14
Files:
src/ai_company/engine/errors.pysrc/ai_company/engine/_validation.pysrc/ai_company/engine/__init__.pysrc/ai_company/budget/billing.pytests/unit/budget/test_enforcer.pysrc/ai_company/budget/config.pysrc/ai_company/observability/events/budget.pytests/unit/budget/test_billing.pysrc/ai_company/budget/__init__.pysrc/ai_company/budget/enforcer.pytests/unit/budget/test_config.pytests/unit/engine/test_agent_engine_errors.pytests/unit/engine/test_agent_engine_budget.pysrc/ai_company/engine/agent_engine.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Add type hints to all public functions and classes; mypy strict mode is enforced
Use Google-style docstrings required on public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones (immutability); for non-Pydantic internal collections use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use@computed_fieldfor derived values; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task
Enforce 88 character line length (ruff)
Keep functions to less than 50 lines and files to less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/ai_company/engine/errors.pysrc/ai_company/engine/_validation.pysrc/ai_company/engine/__init__.pysrc/ai_company/budget/billing.pysrc/ai_company/budget/config.pysrc/ai_company/observability/events/budget.pysrc/ai_company/budget/__init__.pysrc/ai_company/budget/enforcer.pysrc/ai_company/engine/agent_engine.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)
Never useimport logging,logging.getLogger(), orprint()in application code — use the structured logger from ai_company.observability
Always use variable namelogger(not_logger, notlog) for the logger instance
Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events. import EVENT_CONSTANT
Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Files:
src/ai_company/engine/errors.pysrc/ai_company/engine/_validation.pysrc/ai_company/engine/__init__.pysrc/ai_company/budget/billing.pysrc/ai_company/budget/config.pysrc/ai_company/observability/events/budget.pysrc/ai_company/budget/__init__.pysrc/ai_company/budget/enforcer.pysrc/ai_company/engine/agent_engine.py
src/ai_company/{providers,engine}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains
Files:
src/ai_company/engine/errors.pysrc/ai_company/engine/_validation.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
Configure asyncio_mode = 'auto' for pytest — no manual@pytest.mark.asyncioneeded
Set test timeout to 30 seconds per test
Prefer@pytest.mark.parametrizefor testing similar cases
Use generic test provider names (test-provider, test-small-001, etc.) in tests
Files:
tests/unit/budget/test_enforcer.pytests/unit/budget/test_billing.pytests/unit/budget/test_config.pytests/unit/engine/test_agent_engine_errors.pytests/unit/engine/test_agent_engine_budget.py
🧠 Learnings (4)
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Applied to files:
CLAUDE.mdsrc/ai_company/engine/__init__.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/{providers,engine}/**/*.py : RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains
Applied to files:
src/ai_company/engine/errors.pysrc/ai_company/engine/__init__.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/providers/**/*.py : Mark retryable errors with is_retryable=True: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry
Applied to files:
src/ai_company/engine/__init__.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT
Applied to files:
src/ai_company/observability/events/budget.py
🧬 Code graph analysis (9)
src/ai_company/engine/_validation.py (5)
src/ai_company/core/enums.py (2)
AgentStatus(67-72)TaskStatus(165-191)src/ai_company/engine/errors.py (1)
ExecutionStateError(12-13)src/ai_company/observability/_logger.py (1)
get_logger(8-28)src/ai_company/core/agent.py (1)
AgentIdentity(246-304)src/ai_company/core/task.py (1)
Task(45-261)
src/ai_company/engine/__init__.py (1)
src/ai_company/engine/errors.py (1)
DailyLimitExceededError(37-38)
tests/unit/budget/test_enforcer.py (7)
src/ai_company/budget/config.py (2)
AutoDowngradeConfig(65-148)BudgetAlertConfig(15-62)src/ai_company/budget/enforcer.py (4)
cost_tracker(74-76)check_can_execute(78-104)resolve_model(164-222)make_budget_checker(224-281)src/ai_company/core/agent.py (2)
AgentIdentity(246-304)ModelConfig(145-174)src/ai_company/engine/context.py (2)
AgentContext(87-307)from_identity(140-171)src/ai_company/providers/models.py (1)
TokenUsage(12-35)src/ai_company/providers/routing/models.py (1)
ResolvedModel(9-52)src/ai_company/providers/routing/resolver.py (1)
ModelResolver(25-205)
tests/unit/budget/test_billing.py (1)
src/ai_company/budget/billing.py (2)
billing_period_start(11-45)daily_period_start(48-59)
src/ai_company/budget/__init__.py (2)
src/ai_company/budget/billing.py (2)
billing_period_start(11-45)daily_period_start(48-59)src/ai_company/budget/enforcer.py (1)
BudgetEnforcer(42-334)
tests/unit/budget/test_config.py (1)
src/ai_company/budget/config.py (2)
AutoDowngradeConfig(65-148)BudgetConfig(151-227)
tests/unit/engine/test_agent_engine_errors.py (1)
tests/unit/engine/conftest.py (1)
make_completion_response(272-290)
tests/unit/engine/test_agent_engine_budget.py (6)
src/ai_company/budget/config.py (2)
BudgetAlertConfig(15-62)BudgetConfig(151-227)src/ai_company/budget/enforcer.py (2)
BudgetEnforcer(42-334)cost_tracker(74-76)src/ai_company/budget/tracker.py (1)
CostTracker(67-412)src/ai_company/engine/errors.py (2)
BudgetExhaustedError(24-34)DailyLimitExceededError(37-38)src/ai_company/engine/loop_protocol.py (1)
TerminationReason(28-35)src/ai_company/engine/run_result.py (1)
termination_reason(64-66)
src/ai_company/engine/agent_engine.py (7)
src/ai_company/core/enums.py (1)
TaskStatus(165-191)src/ai_company/engine/_validation.py (3)
validate_agent(61-73)validate_run_inputs(33-58)validate_task(76-105)src/ai_company/engine/cost_recording.py (1)
record_execution_costs(27-85)src/ai_company/engine/errors.py (3)
BudgetExhaustedError(24-34)DailyLimitExceededError(37-38)ExecutionStateError(12-13)src/ai_company/budget/enforcer.py (3)
BudgetEnforcer(42-334)resolve_model(164-222)make_budget_checker(224-281)src/ai_company/engine/loop_protocol.py (3)
ExecutionLoop(146-184)make_budget_checker(187-202)ExecutionResult(78-135)src/ai_company/engine/prompt.py (2)
SystemPrompt(56-82)build_error_prompt(646-673)
🪛 LanguageTool
README.md
[typographical] ~24-~24: To join two clauses or introduce examples, consider using an em dash.
Context: ...migrations - Budget Enforcement (M5) - BudgetEnforcer service with pre-flight...
(DASH_RULE)
🔇 Additional comments (32)
CLAUDE.md (1)
47-47: LGTM!The updated description accurately reflects the new budget enforcement capabilities (pre-flight/in-flight checks, auto-downgrade, billing periods) introduced by the
BudgetEnforcerservice.src/ai_company/engine/errors.py (2)
27-34: LGTM!The updated docstring clearly documents the two usage contexts for
BudgetExhaustedError: direct raising from pre-flight checks and conversion fromTerminationReason.BUDGET_EXHAUSTEDresults.
37-39: LGTM!The
DailyLimitExceededErrorexception is appropriately subclassed fromBudgetExhaustedError, enabling callers to catch either the specific daily limit error or the broader budget exhaustion category. The docstring is concise and clear.src/ai_company/engine/__init__.py (2)
53-53: LGTM!The import correctly exposes
DailyLimitExceededErrorfrom the errors module.
171-171: LGTM!The export is correctly added to
__all__in alphabetical order, maintaining consistency with the existing list.src/ai_company/budget/billing.py (2)
11-45: LGTM!The
billing_period_startfunction correctly:
- Validates
reset_daybounds to avoid invalid dates (28 max avoids Feb issues)- Handles same-month vs previous-month logic
- Handles year rollover when
month == 1Type hints and Google-style docstring are present and well-structured.
48-59: LGTM!The
daily_period_startfunction is a clean, simple utility for computing midnight UTC. Implementation is correct.src/ai_company/observability/events/budget.py (1)
17-26: LGTM!The new event constants comprehensively cover the budget enforcement lifecycle:
- Threshold and limit events (
ALERT_THRESHOLD_CROSSED,HARD_STOP_TRIGGERED,DAILY_LIMIT_EXCEEDED/HIT,TASK_LIMIT_HIT)- Downgrade events (
DOWNGRADE_APPLIED,DOWNGRADE_SKIPPED)- Operational events (
ENFORCEMENT_CHECK,BASELINE_ERROR,RESOLVE_MODEL_ERROR)Naming follows the established
BUDGET_<CATEGORY>_<ACTION>convention.README.md (1)
24-24: LGTM!The new feature entry accurately documents the
BudgetEnforcerservice capabilities (pre-flight checks, in-flight budget checking, auto-downgrade) and notes pending items (CFO agent, advanced reporting). This aligns with the PR objectives and implementation.tests/unit/engine/test_agent_engine_errors.py (4)
30-30: LGTM!The import correctly references
make_completion_responsedirectly fromconftest, aligning with the helper function's actual name.
176-177: LGTM!The test correctly uses the
make_completion_responsehelper with a customcost_usdto simulate a completion with cost data for the cost recording test.
197-198: LGTM!Consistent usage of
make_completion_response(cost_usd=0.05)for the RecursionError cost recording test.
264-266: LGTM!The updated assertion correctly verifies the
raise exc from Noneexception chaining behavior introduced in this PR. By asserting__cause__ is None, the test confirms that secondary failures during error handling don't pollute the exception chain, allowing the original error to propagate cleanly.src/ai_company/budget/__init__.py (4)
8-8: LGTM!The import correctly exposes the new billing period utilities from the
billingmodule.
32-32: LGTM!The
BudgetEnforcerimport correctly exposes the new enforcement service from theenforcermodule.
53-53: LGTM!
BudgetEnforceris correctly added to__all__in alphabetical order.
77-78: LGTM!The billing utility functions are correctly added to
__all__in alphabetical order, completing the public API surface for the budget enforcement feature.tests/unit/budget/test_billing.py (1)
1-132: LGTM!Comprehensive test coverage for billing period utilities with:
- Well-structured parametrized tests covering edge cases (year boundary, February 28, exact boundaries)
- Proper pytest markers (
@pytest.mark.unit,pytest.mark.timeout(30))- UTC timezone verification in default behavior tests
- Clear and descriptive test IDs
src/ai_company/budget/config.py (1)
98-103: LGTM!Good additions to the budget configuration:
boundary: Literal["task_assignment"]enforces the single valid boundary type per DESIGN_SPEC §10.4reset_daywithge=1, le=28, strict=Trueprevents month-length edge casesallow_inf_nan=FalseonBudgetConfigprevents invalid float values (NaN/Inf) from being acceptedAlso applies to: 167-167, 192-200
tests/unit/budget/test_config.py (1)
106-106: LGTM!Good test coverage additions:
- Tests verify
boundarydefault is"task_assignment"and rejects other values- Tests verify
reset_daydefault (1), valid range (1-28), and rejection of invalid inputs (0, 29, float)- Type ignore comments appropriately used for intentional type mismatches in negative tests
Also applies to: 187-195, 224-224, 273-293
src/ai_company/engine/_validation.py (1)
1-105: LGTM!Well-structured validation module with:
- Consistent logging before raising exceptions on all error paths
- Proper use of structured logging with event constants
frozensetfor immutable_EXECUTABLE_STATUSES- Clear separation of concerns (scalar inputs vs agent state vs task state)
The check at line 94 (
task.assigned_to is not None and task.assigned_to != agent_id) is defensive against potential edge cases even though the Task model enforcesassigned_toforASSIGNEDstatus.tests/unit/engine/test_agent_engine_budget.py (1)
1-203: LGTM!Solid integration tests for AgentEngine budget enforcer:
- Proper use of
AsyncMockandpatch.objectfor mocking enforcer methods- Good coverage of pre-flight errors, model downgrade flow, and fallback behavior
- Uses generic test provider names (
test-small-001)- Clear test structure with descriptive docstrings
tests/unit/budget/test_enforcer.py (1)
1-908: LGTM!Excellent test coverage for
BudgetEnforcer:
- Pre-flight checks: Tests boundary conditions (under, at exact, over limits) for both monthly and daily limits
- Auto-downgrade: Comprehensive coverage including threshold boundaries, alias matching, chain downgrade, and graceful degradation paths
- Budget checker: Tests task limit, monthly hard stop, daily limit, and alert deduplication
- Well-organized helper functions (
_make_budget_config,_make_identity,_patch_periods) reduce test boilerplate- Good use of
@pytest.mark.parametrizefor alert threshold testssrc/ai_company/budget/enforcer.py (2)
198-199: LGTM! Correct PEP 758 except syntax.The bare tuple syntax
except MemoryError, RecursionError:is correct per the coding guidelines for Python 3.14+ (PEP 758). Good pattern to re-raise critical errors while gracefully degrading on recoverable exceptions.Also applies to: 300-301
42-596: LGTM!Well-designed budget enforcement service:
- Pre-flight checks (
check_can_execute): Clear separation of monthly and daily limit checks with proper logging- Auto-downgrade (
resolve_model): Graceful degradation when CostTracker fails or alias not found- In-flight checker (
make_budget_checker): Efficient pre-computed baselines with alert deduplication- Mutable cell pattern (
last_alert: list[BudgetAlertLevel]) for closure state is documented and appropriate- TOCTOU limitation is clearly documented in the class docstring
DESIGN_SPEC.md (1)
1676-1728: LGTM — Documentation accurately reflects the BudgetEnforcer architecture.The updated implementation note clearly documents the three enforcement layers (pre-flight checks, in-flight budget checking, task-boundary auto-downgrade) and the composition of
CostTracker+BudgetConfig. The billing period scoping viabilling_period_start(reset_day)is also correctly referenced.src/ai_company/engine/agent_engine.py (6)
12-24: LGTM — Clean import organization.The validation function extraction and budget error imports align well with the refactoring objectives.
103-129: LGTM — Well-designed constructor with proper fallback logic.The cost tracker fallback chain (
cost_tracker→budget_enforcer.cost_tracker→None) is documented in the docstring and correctly implemented. The initialization log captures the newhas_budget_enforcerflag for observability.
152-159: LGTM — Clean validation extraction.The validation logic is appropriately delegated to the dedicated
_validation.pymodule, maintaining the same error semantics (ExecutionStateError,ValueError) while reducing the file's line count.
668-708: LGTM — Well-designed budget error handler.Good design decisions:
- Logs at
WARNINGlevel (budget exhaustion is an expected boundary, not a crash).- Uses
TerminationReason.BUDGET_EXHAUSTEDenabling the orchestration layer to distinguish budget stops from errors.- No recovery applied (intentional — budget exhaustion is not a failure to recover from).
- Properly handles the case where context or system_prompt may not have been built yet.
756-772: LGTM — Appropriate exception handling refinements.
logger.exceptioncorrectly captures the traceback for non-recoverable errors.raise exc from Noneintentionally suppresses exception chaining so that if building the error result fails, the original exception is raised cleanly without confusing the user with the secondary build failure.
88-100: LGTM — Comprehensive docstring update.The updated docstring clearly documents:
- The cost tracker fallback chain
- The budget enforcer's role in pre-flight checks, auto-downgrade, and enhanced in-flight budget checking
Greptile SummaryThis PR introduces Critical issues identified:
The Confidence Score: 0/5
Last reviewed commit: 8b20a75 |
There was a problem hiding this comment.
Code Review
This pull request significantly hardens budget enforcement by introducing a dedicated BudgetEnforcer service, improving error handling in AgentEngine, and enhancing input validation. The refactoring and comprehensive tests are excellent. However, a critical vulnerability exists due to the use of outdated Python 2 except syntax (e.g., except MemoryError, RecursionError:) in enforcer.py and agent_engine.py. This invalid syntax in Python 3 will cause a SyntaxError at runtime, potentially leading to a Denial of Service or improper handling of non-recoverable errors, allowing the system to fail in an insecure or inconsistent state. Other hardening measures, such as allow_inf_nan=False and improved input validation, are well-implemented.
| monthly_cost = await self._cost_tracker.get_total_cost( | ||
| start=period_start, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
The except syntax except MemoryError, RecursionError: is invalid in Python 3, causing a SyntaxError at runtime. This can lead to a Denial of Service (DoS) as the module will fail to load. If interpreted with Python 2 semantics, it would improperly catch MemoryError and assign RecursionError as the exception object, allowing actual RecursionError exceptions to be caught by a broader except Exception: block, leading to insecure "fail open" behavior. The correct Python 3 syntax uses a tuple of exception types.
except (MemoryError, RecursionError):| daily_limit, | ||
| agent_id, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
The except syntax except MemoryError, RecursionError: is invalid in Python 3 and will cause a SyntaxError at runtime. This can lead to a complete failure to load the module or improper error handling where non-recoverable exceptions are swallowed by generic handlers, potentially resulting in an inconsistent system state. The correct Python 3 syntax uses a tuple of exception types.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
There was a problem hiding this comment.
Pull request overview
This PR hardens and refactors budget enforcement in the engine by introducing a dedicated BudgetEnforcer service, improving termination classification for budget stops, extracting AgentEngine input validation, and adding/expanding tests and documentation to cover budget edge cases.
Changes:
- Integrate
BudgetEnforcerintoAgentEnginefor pre-flight checks, task-boundary model auto-downgrade, and enhanced in-flight budget checking; classify budget stops asTerminationReason.BUDGET_EXHAUSTED. - Extract AgentEngine validation helpers into
engine/_validation.pyand tighten budget config validation (reset_daybounds,allow_inf_nan=False, downgrade boundary). - Add new budget/billing utilities, observability event constants, and comprehensive unit tests + docs updates.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/engine/test_agent_engine_errors.py | Updates tests around error-chain behavior and cost-recording propagation. |
| tests/unit/engine/test_agent_engine_budget.py | Adds engine-level integration tests for BudgetEnforcer wiring and termination reasons. |
| tests/unit/budget/test_enforcer.py | Adds extensive unit coverage for BudgetEnforcer pre-flight checks, downgrade logic, and checker behavior. |
| tests/unit/budget/test_config.py | Adds coverage for new config fields/validation (boundary, reset_day). |
| tests/unit/budget/test_billing.py | Adds tests for new billing-period boundary utilities. |
| src/ai_company/observability/events/budget.py | Introduces budget-enforcement event name constants. |
| src/ai_company/engine/errors.py | Clarifies budget error semantics and adds DailyLimitExceededError. |
| src/ai_company/engine/agent_engine.py | Wires in BudgetEnforcer, refactors validation usage, and adds budget-specific termination handling. |
| src/ai_company/engine/_validation.py | New module with extracted run/agent/task validation helpers. |
| src/ai_company/engine/init.py | Re-exports DailyLimitExceededError. |
| src/ai_company/budget/enforcer.py | New BudgetEnforcer service implementing pre-flight checks, in-flight checker factory, and task-boundary downgrade. |
| src/ai_company/budget/config.py | Adds downgrade boundary field and validates reset_day; disallows inf/nan floats. |
| src/ai_company/budget/billing.py | New pure functions for monthly billing period start and daily period start. |
| src/ai_company/budget/init.py | Exposes BudgetEnforcer and billing utilities from the budget package. |
| README.md | Updates feature-status messaging to reflect budget enforcement availability. |
| DESIGN_SPEC.md | Updates spec/docs to reflect BudgetEnforcer implementation details and config. |
| CLAUDE.md | Updates repository overview to include budget enforcement and billing periods. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/ai_company/budget/enforcer.py
Outdated
| agent_id=agent_id, | ||
| reason="falling_back_to_task_only_checker", | ||
| ) | ||
| return None |
There was a problem hiding this comment.
This exception fallback logs reason="falling_back_to_task_only_checker", but then returns None, which disables all in-flight checks (including per-task limits) if monthly/daily baseline queries fail. Consider returning a task-only checker when task_limit > 0 and only dropping the baseline-dependent monthly/daily enforcement.
| return None | |
| # Fall back to neutral baselines so that task-level enforcement | |
| # can still proceed even if monthly/daily baseline queries fail. | |
| return 0.0, 0.0 |
| logger.warning( | ||
| EXECUTION_ENGINE_ERROR, | ||
| agent_id=agent_id, | ||
| task_id=task_id, |
There was a problem hiding this comment.
Budget exhaustion is a controlled termination (TerminationReason.BUDGET_EXHAUSTED), but this logs under the generic EXECUTION_ENGINE_ERROR event name. That can inflate error metrics/alerts. Consider emitting a budget-specific event (or adding an EXECUTION_ENGINE_BUDGET_EXHAUSTED event) and logging at an appropriate level.
| logger.warning( | |
| EXECUTION_ENGINE_ERROR, | |
| agent_id=agent_id, | |
| task_id=task_id, | |
| logger.info( | |
| "EXECUTION_ENGINE_BUDGET_EXHAUSTED", | |
| agent_id=agent_id, | |
| task_id=task_id, | |
| termination_reason=TerminationReason.BUDGET_EXHAUSTED.value, |
| @@ -182,14 +198,24 @@ async def run( # noqa: PLR0913 | |||
| tool_invoker=tool_invoker, | |||
| ) | |||
| except MemoryError, RecursionError: | |||
There was a problem hiding this comment.
except MemoryError, RecursionError: is invalid Python syntax (will raise SyntaxError). Use the tuple form except (MemoryError, RecursionError): (or separate except blocks) to catch both exceptions.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| monthly_cost = await self._cost_tracker.get_total_cost( | ||
| start=period_start, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
except MemoryError, RecursionError: is invalid Python syntax (will raise SyntaxError). Use except (MemoryError, RecursionError): (or separate except blocks).
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/ai_company/engine/agent_engine.py (1)
152-218: 🛠️ Refactor suggestion | 🟠 MajorExtract the new budget flow out of
AgentEngine.The added pre-flight branches plus
_handle_budget_error()pushrun()well past the 50-line cap and leavesrc/ai_company/engine/agent_engine.pyover the 800-line limit again. Moving the budget-specific orchestration/result building into helpers would keep the entrypoint readable.As per coding guidelines: "Keep functions to less than 50 lines and files to less than 800 lines".
Also applies to: 668-722
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/agent_engine.py` around lines 152 - 218, The run() method has grown past 50 lines due to pre-flight budget checks and budget-error orchestration; extract the budget-specific flow into one or two helpers to reduce length: move the pre-flight steps (the block that uses self._budget_enforcer, calls to check_can_execute and resolve_model, and creation of tool_invoker) and context preparation (calls to self._make_tool_invoker and self._prepare_context) into a new helper like _prepare_for_execution(identity, agent_id, task, task_id, max_turns, memory_messages) that returns (identity, tool_invoker, ctx, system_prompt), and move the BudgetExhaustedError handling and result-building currently invoking _handle_budget_error into a separate helper like _run_handle_budget_error(exc, identity, task, agent_id, task_id, start, ctx, system_prompt) and have run() call these helpers before/after _execute; ensure you update run() to call the new helpers and keep all original symbols (_handle_budget_error, _prepare_context, _execute, _make_tool_invoker, self._budget_enforcer) intact so behavior is unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 987-989: Update the spec sentence that says "12 constants in
`events/execution.py`" to reflect the current count of engine events (13)
because `src/ai_company/observability/events/execution.py` now includes the new
`EXECUTION_ENGINE_BUDGET_STOPPED` constant; change the text to mention 13 engine
events (or rephrase to avoid hardcoding a number) and optionally note the added
`EXECUTION_ENGINE_BUDGET_STOPPED` event for clarity.
In `@src/ai_company/budget/enforcer.py`:
- Around line 253-260: The code that decides whether to build a budget checker
must treat BudgetConfig.total_monthly == 0 as a global "budgets disabled"
switch; change the early-return logic in the method that uses
self._budget_config (variables cfg, monthly_budget, task_limit, daily_limit) so
that if monthly_budget == 0 it immediately returns None regardless of
task.budget_limit or cfg.per_agent_daily_limit, ensuring consistency with
check_can_execute() and resolve_model().
In `@src/ai_company/engine/agent_engine.py`:
- Around line 117-120: The constructor sets self._budget_enforcer and
self._cost_tracker separately which can leave budget_enforcer.cost_tracker and
self._cost_tracker as different instances, breaking enforcement in
_post_execution_pipeline and pre/in-flight checks; fix by ensuring they point to
the same CostTracker instance whenever both are provided — e.g., after assigning
self._budget_enforcer and self._cost_tracker, if self._budget_enforcer and
self._cost_tracker are both truthy and self._budget_enforcer.cost_tracker is not
self._cost_tracker, set self._budget_enforcer.cost_tracker = self._cost_tracker
(or vice versa) so _post_execution_pipeline, pre-flight, and in-flight
enforcement all share the same tracker.
- Around line 688-699: The ExecutionResult returned in the budget-stop path
drops the exception details; update the code around
logger.warning/EXECUTION_ENGINE_BUDGET_STOPPED so the constructed
ExecutionResult (created with error_ctx and TerminationReason.BUDGET_EXHAUSTED)
also sets an informative error_message (e.g., f"{type(exc).__name__}: {exc}" or
a normalized budget-stop reason) so callers can distinguish daily vs monthly
hard stops; modify the ExecutionResult creation in this block (referencing
AgentContext.from_identity, ctx, ExecutionResult, and
TerminationReason.BUDGET_EXHAUSTED) to include that error text.
In `@tests/unit/engine/test_agent_engine_budget.py`:
- Around line 49-113: Combine the two duplicated tests into a single
parametrized test using pytest.mark.parametrize that iterates over the injected
exception types (BudgetExhaustedError and DailyLimitExceededError) and their
messages, replace the two functions
test_preflight_budget_exhausted_returns_error and
test_preflight_daily_limit_returns_budget_exhausted with one test (e.g.,
test_preflight_prevents_execution_for_preflight_errors) that patches
enforcer.check_can_execute with AsyncMock(side_effect=exception) and calls
AgentEngine.run, then assert result.termination_reason ==
TerminationReason.BUDGET_EXHAUSTED and provider.call_count == 0; keep existing
use of MockCompletionProvider, BudgetEnforcer, CostTracker, and sample fixtures.
---
Outside diff comments:
In `@src/ai_company/engine/agent_engine.py`:
- Around line 152-218: The run() method has grown past 50 lines due to
pre-flight budget checks and budget-error orchestration; extract the
budget-specific flow into one or two helpers to reduce length: move the
pre-flight steps (the block that uses self._budget_enforcer, calls to
check_can_execute and resolve_model, and creation of tool_invoker) and context
preparation (calls to self._make_tool_invoker and self._prepare_context) into a
new helper like _prepare_for_execution(identity, agent_id, task, task_id,
max_turns, memory_messages) that returns (identity, tool_invoker, ctx,
system_prompt), and move the BudgetExhaustedError handling and result-building
currently invoking _handle_budget_error into a separate helper like
_run_handle_budget_error(exc, identity, task, agent_id, task_id, start, ctx,
system_prompt) and have run() call these helpers before/after _execute; ensure
you update run() to call the new helpers and keep all original symbols
(_handle_budget_error, _prepare_context, _execute, _make_tool_invoker,
self._budget_enforcer) intact so behavior is unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8350002a-f876-4975-a4e8-ddbe4499997e
📒 Files selected for processing (9)
DESIGN_SPEC.mdsrc/ai_company/budget/enforcer.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/observability/events/execution.pytests/unit/budget/test_billing.pytests/unit/budget/test_config.pytests/unit/budget/test_enforcer.pytests/unit/engine/test_agent_engine_budget.pytests/unit/engine/test_agent_engine_errors.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649
Use PEP 758 except syntax: useexcept A, B:(no parentheses) — ruff enforces this on Python 3.14
Files:
tests/unit/engine/test_agent_engine_budget.pytests/unit/budget/test_billing.pytests/unit/engine/test_agent_engine_errors.pytests/unit/budget/test_config.pysrc/ai_company/observability/events/execution.pytests/unit/budget/test_enforcer.pysrc/ai_company/budget/enforcer.pysrc/ai_company/engine/agent_engine.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
Configure asyncio_mode = 'auto' for pytest — no manual@pytest.mark.asyncioneeded
Set test timeout to 30 seconds per test
Prefer@pytest.mark.parametrizefor testing similar cases
Use generic test provider names (test-provider, test-small-001, etc.) in tests
Files:
tests/unit/engine/test_agent_engine_budget.pytests/unit/budget/test_billing.pytests/unit/engine/test_agent_engine_errors.pytests/unit/budget/test_config.pytests/unit/budget/test_enforcer.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Add type hints to all public functions and classes; mypy strict mode is enforced
Use Google-style docstrings required on public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones (immutability); for non-Pydantic internal collections use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use@computed_fieldfor derived values; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task
Enforce 88 character line length (ruff)
Keep functions to less than 50 lines and files to less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/ai_company/observability/events/execution.pysrc/ai_company/budget/enforcer.pysrc/ai_company/engine/agent_engine.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)
Never useimport logging,logging.getLogger(), orprint()in application code — use the structured logger from ai_company.observability
Always use variable namelogger(not_logger, notlog) for the logger instance
Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events. import EVENT_CONSTANT
Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Files:
src/ai_company/observability/events/execution.pysrc/ai_company/budget/enforcer.pysrc/ai_company/engine/agent_engine.py
src/ai_company/{providers,engine}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains
Files:
src/ai_company/engine/agent_engine.py
🧠 Learnings (3)
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT
Applied to files:
src/ai_company/observability/events/execution.pysrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14
Applied to files:
src/ai_company/budget/enforcer.pysrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/**/*.py : Handle errors explicitly, never silently swallow exceptions
Applied to files:
src/ai_company/budget/enforcer.py
🧬 Code graph analysis (5)
tests/unit/budget/test_billing.py (1)
src/ai_company/budget/billing.py (2)
billing_period_start(11-45)daily_period_start(48-59)
tests/unit/engine/test_agent_engine_errors.py (2)
tests/unit/engine/conftest.py (2)
make_completion_response(272-290)sample_agent_with_personality(60-87)src/ai_company/engine/agent_engine.py (1)
run(131-229)
tests/unit/budget/test_config.py (1)
src/ai_company/budget/config.py (2)
AutoDowngradeConfig(65-148)BudgetConfig(151-227)
src/ai_company/budget/enforcer.py (7)
src/ai_company/budget/billing.py (2)
billing_period_start(11-45)daily_period_start(48-59)src/ai_company/budget/enums.py (1)
BudgetAlertLevel(6-16)src/ai_company/engine/errors.py (2)
BudgetExhaustedError(24-34)DailyLimitExceededError(37-38)src/ai_company/budget/config.py (1)
BudgetConfig(151-227)src/ai_company/budget/tracker.py (1)
CostTracker(67-412)src/ai_company/core/agent.py (2)
AgentIdentity(246-304)ModelConfig(145-174)src/ai_company/providers/routing/resolver.py (2)
ModelResolver(25-205)resolve_safe(154-172)
src/ai_company/engine/agent_engine.py (4)
src/ai_company/engine/_validation.py (3)
validate_agent(61-73)validate_run_inputs(33-58)validate_task(76-105)src/ai_company/budget/enforcer.py (4)
BudgetEnforcer(42-338)cost_tracker(74-76)check_can_execute(78-104)make_budget_checker(224-279)src/ai_company/engine/loop_protocol.py (1)
make_budget_checker(187-202)src/ai_company/engine/prompt.py (2)
SystemPrompt(56-82)build_error_prompt(646-673)
| cfg = self._budget_config | ||
| task_limit = task.budget_limit | ||
| monthly_budget = cfg.total_monthly | ||
| daily_limit = cfg.per_agent_daily_limit | ||
|
|
||
| # All enforcement disabled | ||
| if monthly_budget <= 0 and task_limit <= 0 and daily_limit <= 0: | ||
| return None |
There was a problem hiding this comment.
Honor total_monthly=0 as a full budget-disable switch.
BudgetConfig documents that total_monthly == 0.0 disables budget enforcement at runtime, and check_can_execute() / resolve_model() already follow that contract. This branch does not: it still builds a checker whenever task.budget_limit or per_agent_daily_limit is positive, so a run can pass pre-flight and then get stopped mid-execution even though budgets were supposed to be off.
Suggested fix
- # All enforcement disabled
- if monthly_budget <= 0 and task_limit <= 0 and daily_limit <= 0:
+ # A zero monthly budget disables budget enforcement entirely.
+ if monthly_budget <= 0:
return None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cfg = self._budget_config | |
| task_limit = task.budget_limit | |
| monthly_budget = cfg.total_monthly | |
| daily_limit = cfg.per_agent_daily_limit | |
| # All enforcement disabled | |
| if monthly_budget <= 0 and task_limit <= 0 and daily_limit <= 0: | |
| return None | |
| cfg = self._budget_config | |
| task_limit = task.budget_limit | |
| monthly_budget = cfg.total_monthly | |
| daily_limit = cfg.per_agent_daily_limit | |
| # A zero monthly budget disables budget enforcement entirely. | |
| if monthly_budget <= 0: | |
| return None |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/budget/enforcer.py` around lines 253 - 260, The code that
decides whether to build a budget checker must treat BudgetConfig.total_monthly
== 0 as a global "budgets disabled" switch; change the early-return logic in the
method that uses self._budget_config (variables cfg, monthly_budget, task_limit,
daily_limit) so that if monthly_budget == 0 it immediately returns None
regardless of task.budget_limit or cfg.per_agent_daily_limit, ensuring
consistency with check_can_execute() and resolve_model().
| logger.warning( | ||
| EXECUTION_ENGINE_BUDGET_STOPPED, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| error=f"{type(exc).__name__}: {exc}", | ||
| ) | ||
| try: | ||
| error_ctx = ctx or AgentContext.from_identity(identity, task=task) | ||
| budget_result = ExecutionResult( | ||
| context=error_ctx, | ||
| termination_reason=TerminationReason.BUDGET_EXHAUSTED, | ||
| ) |
There was a problem hiding this comment.
Return the budget-stop reason in the result.
This path logs the exception text, but the returned ExecutionResult drops it. Callers only get TerminationReason.BUDGET_EXHAUSTED with error_message=None, so they cannot distinguish a monthly hard stop from a daily-limit stop, which misses the linked clear-error-message requirement.
♻️ Proposed fix
budget_result = ExecutionResult(
context=error_ctx,
termination_reason=TerminationReason.BUDGET_EXHAUSTED,
+ error_message=f"{type(exc).__name__}: {exc}",
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/engine/agent_engine.py` around lines 688 - 699, The
ExecutionResult returned in the budget-stop path drops the exception details;
update the code around logger.warning/EXECUTION_ENGINE_BUDGET_STOPPED so the
constructed ExecutionResult (created with error_ctx and
TerminationReason.BUDGET_EXHAUSTED) also sets an informative error_message
(e.g., f"{type(exc).__name__}: {exc}" or a normalized budget-stop reason) so
callers can distinguish daily vs monthly hard stops; modify the ExecutionResult
creation in this block (referencing AgentContext.from_identity, ctx,
ExecutionResult, and TerminationReason.BUDGET_EXHAUSTED) to include that error
text.
5948c87 to
fac7ffb
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| return await self._compute_baselines( | ||
| cfg, | ||
| monthly_budget, | ||
| daily_limit, | ||
| agent_id, | ||
| ) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception: | ||
| logger.exception( | ||
| BUDGET_BASELINE_ERROR, | ||
| agent_id=agent_id, | ||
| reason="falling_back_to_zero_baselines", | ||
| ) | ||
| return 0.0, 0.0 |
There was a problem hiding this comment.
Same Python 3 syntax issue here: except MemoryError, RecursionError: is a SyntaxError. Switch to except (MemoryError, RecursionError): (optionally binding as exc).
| async def test_preflight_budget_exhausted_returns_error( | ||
| self, | ||
| sample_agent_with_personality: AgentIdentity, | ||
| sample_task_with_criteria: Task, | ||
| ) -> None: | ||
| """Pre-flight BudgetExhaustedError propagates as error result.""" |
There was a problem hiding this comment.
Test name/docstring mismatch: this test asserts TerminationReason.BUDGET_EXHAUSTED, but the name/docstring say “returns error / propagates as error result”. Renaming to reflect “budget exhausted” (or updating the docstring) will keep intent clear.
| async def test_preflight_budget_exhausted_returns_error( | |
| self, | |
| sample_agent_with_personality: AgentIdentity, | |
| sample_task_with_criteria: Task, | |
| ) -> None: | |
| """Pre-flight BudgetExhaustedError propagates as error result.""" | |
| async def test_preflight_budget_exhausted_sets_budget_exhausted_reason( | |
| self, | |
| sample_agent_with_personality: AgentIdentity, | |
| sample_task_with_criteria: Task, | |
| ) -> None: | |
| """Pre-flight BudgetExhaustedError yields BUDGET_EXHAUSTED termination.""" |
| except MemoryError, RecursionError: | ||
| logger.error( | ||
| logger.exception( | ||
| EXECUTION_ENGINE_ERROR, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| error="non-recoverable error in run()", | ||
| exc_info=True, | ||
| ) | ||
| raise |
There was a problem hiding this comment.
except MemoryError, RecursionError: uses Python 2 exception syntax and is a SyntaxError on Python 3. Use tuple syntax instead (e.g., except (MemoryError, RecursionError): or except (MemoryError, RecursionError) as exc:) to keep the non-recoverable propagation behavior.
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
Same Python 2 exception syntax issue here: except MemoryError, RecursionError: will not parse on Python 3. Switch to except (MemoryError, RecursionError): (optionally binding as exc).
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| except MemoryError, RecursionError: | ||
| logger.error( | ||
| logger.exception( | ||
| EXECUTION_ENGINE_ERROR, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| error="non-recoverable error while building error result", | ||
| exc_info=True, | ||
| ) | ||
| raise |
There was a problem hiding this comment.
except MemoryError, RecursionError: is invalid Python 3 syntax and will raise a SyntaxError at import time. Please change to tuple syntax (except (MemoryError, RecursionError):).
| task_id=task_id, | ||
| error=f"Failed to build budget-exhausted result: {build_exc}", | ||
| ) | ||
| raise exc from build_exc |
There was a problem hiding this comment.
In _handle_budget_error, raise exc from build_exc reintroduces the secondary-exception chain inversion (the build failure becomes the cause of the original budget error). If the goal is to propagate the original error cleanly (as in _handle_fatal_error), suppress the secondary cause (raise exc from None) or re-raise the build exception separately, depending on the desired debugging behavior.
| raise exc from build_exc | |
| raise exc from None |
| async def check_can_execute(self, agent_id: str) -> None: | ||
| """Pre-flight: verify monthly + daily limits allow execution. | ||
|
|
||
| Raises: | ||
| BudgetExhaustedError: Monthly hard stop exceeded. | ||
| DailyLimitExceededError: Agent daily limit exceeded. | ||
| """ | ||
| cfg = self._budget_config | ||
|
|
||
| # Skip if enforcement disabled (total_monthly == 0) | ||
| if cfg.total_monthly <= 0: | ||
| logger.debug( | ||
| BUDGET_ENFORCEMENT_CHECK, | ||
| agent_id=agent_id, | ||
| result="pass", | ||
| reason="enforcement_disabled", | ||
| ) | ||
| return | ||
|
|
||
| await self._check_monthly_hard_stop(cfg, agent_id) | ||
| await self._check_daily_limit(cfg, agent_id) | ||
|
|
There was a problem hiding this comment.
check_can_execute() returns early when total_monthly <= 0, which skips the daily-limit pre-flight check even if per_agent_daily_limit is configured. This also contradicts make_budget_checker() which can still enforce daily/task limits when total_monthly is 0. Consider treating total_monthly <= 0 as “monthly limit disabled” (skip only the monthly hard-stop check) while still running the daily limit check when per_agent_daily_limit > 0, or otherwise align the runtime behavior + docstrings/tests to a single meaning.
| try: | ||
| period_start = billing_period_start(cfg.reset_day) | ||
| monthly_cost = await self._cost_tracker.get_total_cost( | ||
| start=period_start, | ||
| ) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception: | ||
| logger.exception( | ||
| BUDGET_RESOLVE_MODEL_ERROR, | ||
| agent_id=str(identity.id), | ||
| reason="cost_tracker_query_failed", | ||
| ) | ||
| return identity |
There was a problem hiding this comment.
except MemoryError, RecursionError: is Python 2 syntax and will not parse on Python 3. Use except (MemoryError, RecursionError): to preserve the intended “non-recoverable errors propagate” behavior.
…dget checking, and auto-downgrade (#44) Implement budget enforcement engine as described in DESIGN_SPEC §10.4: - Add BudgetEnforcer service composing CostTracker + BudgetConfig - Pre-flight checks: monthly hard stop + per-agent daily limit verification - In-flight budget checker: sync closure with pre-computed baselines for task, monthly, and daily limits with deduplicated alert emission - Task-boundary auto-downgrade via ModelResolver alias lookup - Add billing period utilities (billing_period_start, daily_period_start) - Add DailyLimitExceededError, budget event constants - Add reset_day and boundary config fields - Wire BudgetEnforcer into AgentEngine execution pipeline - Comprehensive tests: 276 passing (config, billing, enforcer, engine integration) Closes #44
…ction, and review fixes Pre-reviewed by 9 agents, 30 findings addressed: - Add dedicated budget error handling in AgentEngine (BudgetExhaustedError/DailyLimitExceededError) - Extract validation functions to engine/_validation.py (keep agent_engine.py under 800 lines) - Add graceful degradation in resolve_model and make_budget_checker for CostTracker failures - Fix exception chain inversion (raise exc from None) - Add reset_day boundary validation in billing.py - Add allow_inf_nan=False to BudgetConfig - Consolidate _apply_downgrade/_resolve_and_apply into single function - Add 5 new tests for edge cases and error paths - Update DESIGN_SPEC.md, CLAUDE.md, README.md for budget enforcement status
…iewers Critical: fix _compute_baselines_safe to return (0.0, 0.0) instead of None on CostTracker failure, preserving task-only budget checking. Add try-except self-protection to _handle_budget_error. Create dedicated BUDGET_STOPPED event. Simplify redundant exception catch (DailyLimitExceededError is subclass). Add _ALERT_LEVEL_ORDER sync assertion. Update DESIGN_SPEC engine pipeline and constructor docs. Add graceful degradation, timeout validation, and boundary-value tests.
- Fix exception chain inversion: raise exc from None in _handle_budget_error - Fix comment/code mismatch: total_monthly == 0 → <= 0 in enforcer - Update DESIGN_SPEC event count: 12 → 13 (BUDGET_STOPPED added) - Validate cost_tracker consistency with budget_enforcer in AgentEngine - Parametrize duplicated pre-flight budget stop tests - Inline _get_tool_definitions, trim docstrings to stay under 800 lines
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (4)
src/ai_company/budget/enforcer.py (1)
258-260:⚠️ Potential issue | 🟠 Major
total_monthly == 0still needs to disable the checker.
BudgetConfiginsrc/ai_company/budget/config.pydocuments zero monthly budget as disabling runtime enforcement, and this class already follows that incheck_can_execute()andresolve_model(). This branch still builds a checker whentask.budget_limitorper_agent_daily_limitis positive, so a run can pass pre-flight and then be stopped mid-execution even though budgets are supposed to be off. That also meanstests/unit/budget/test_enforcer.py::test_returns_checker_when_only_task_limit_activeis currently asserting the opposite contract.Suggested fix
- # All enforcement disabled - if monthly_budget <= 0 and task_limit <= 0 and daily_limit <= 0: + # A zero monthly budget disables enforcement entirely. + if monthly_budget <= 0: return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/budget/enforcer.py` around lines 258 - 260, The branch that currently returns None when monthly_budget, task_limit, and daily_limit are all <= 0 misses the configured total_monthly flag; update the early-return in the enforcer (use the same symbols monthly_budget, task_limit, daily_limit and also check total_monthly) so that if total_monthly <= 0 it also treats enforcement as disabled and returns None; ensure logic in the enforcer's constructor or factory mirrors BudgetConfig's semantics (and remains consistent with check_can_execute() and resolve_model()) so no checker is created when total_monthly is zero.DESIGN_SPEC.md (1)
989-989:⚠️ Potential issue | 🟡 MinorUpdate the engine event count in the spec.
This was flagged previously: Line 989 states "12 constants in
events/execution.py", butEXECUTION_ENGINE_BUDGET_STOPPEDwas added (imported at Line 41 in agent_engine.py), making the count 13. The past comment noted this discrepancy.📝 Proposed fix
-Logs structured events under the `execution.engine.*` namespace (12 constants in `events/execution.py`): creation, start, prompt built, completion, errors, invalid input, task transitions, cost recording outcomes, task metrics, and timeout. +Logs structured events under the `execution.engine.*` namespace (13 constants in `events/execution.py`): creation, start, prompt built, completion, errors, invalid input, task transitions, cost recording outcomes, task metrics, timeout, and budget stopped.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@DESIGN_SPEC.md` at line 989, The spec line claiming "12 constants in events/execution.py" is out of date because EXECUTION_ENGINE_BUDGET_STOPPED was added; update the count to 13 and any related list/description to include EXECUTION_ENGINE_BUDGET_STOPPED (it is imported in agent_engine.py), and ensure the events/execution.py constants list and the DESIGN_SPEC description both reflect the new total and include the budget-stopped event name.src/ai_company/engine/agent_engine.py (2)
117-120:⚠️ Potential issue | 🟠 Major
budget_enforcerandcost_trackermay reference different instances.This concern was raised previously: when both
budget_enforcerandcost_trackerare provided,_post_execution_pipeline()records spend toself._cost_tracker, but pre-flight and in-flight enforcement read frombudget_enforcer.cost_tracker. If these are different instances, newly recorded spend won't feed back into enforcement, silently under-enforcing monthly/daily caps.🛡️ Proposed fix to validate tracker consistency
self._tool_registry = tool_registry self._budget_enforcer = budget_enforcer - self._cost_tracker = cost_tracker or ( - budget_enforcer.cost_tracker if budget_enforcer else None - ) + if budget_enforcer is not None: + if cost_tracker is not None and cost_tracker is not budget_enforcer.cost_tracker: + raise ValueError( + "cost_tracker must match budget_enforcer.cost_tracker " + "when budget_enforcer is provided" + ) + self._cost_tracker = budget_enforcer.cost_tracker + else: + self._cost_tracker = cost_tracker🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/agent_engine.py` around lines 117 - 120, The constructor currently allows a provided budget_enforcer and a separate cost_tracker which can lead to inconsistent tracking (post-execution writes to self._cost_tracker while enforcement reads budget_enforcer.cost_tracker); in the __init__ (where self._budget_enforcer and self._cost_tracker are set) add a validation: if both budget_enforcer and cost_tracker are provided assert they are the same instance (e.g., cost_tracker is budget_enforcer.cost_tracker) and raise a clear ValueError if not, or alternatively prefer budget_enforcer.cost_tracker and ignore the passed cost_tracker (documented choice), ensuring _post_execution_pipeline writes and enforcement read from the same CostTracker.
694-699:⚠️ Potential issue | 🟠 MajorInclude the budget-stop reason in the result.
This was flagged previously: the returned
ExecutionResultdrops the exception details. Callers only receiveTerminationReason.BUDGET_EXHAUSTEDwitherror_message=None, so they cannot distinguish a monthly hard stop from a daily-limit stop — missing the linked "clear error message" requirement from issue#44.🐛 Proposed fix to include error message
budget_result = ExecutionResult( context=error_ctx, termination_reason=TerminationReason.BUDGET_EXHAUSTED, + error_message=f"{type(exc).__name__}: {exc}", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/ai_company/engine/agent_engine.py` around lines 694 - 699, The current ExecutionResult created in the AgentEngine path (using error_ctx = ctx or AgentContext.from_identity(identity, task=task) and constructing ExecutionResult with TerminationReason.BUDGET_EXHAUSTED) drops the exception details; update the exception handling so the ExecutionResult includes the budget-stop message (set error_message to the exception text or specific code from the caught BudgetExhausted exception) and any relevant metadata before returning. Locate the try/except around that block in agent_engine.py, catch the budget-related exception (or capture the caught variable), and pass its message (e.g., str(e) or getattr(e, "message", None)) into ExecutionResult(error_message=...) so callers can distinguish monthly vs daily stops while still setting termination_reason=TerminationReason.BUDGET_EXHAUSTED.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/budget/billing.py`:
- Around line 32-45: The function billing_period_start should hard-validate its
inputs: ensure reset_day is an int and in 1..28 (raise TypeError for non-int,
ValueError for out-of-range) and ensure now is a datetime that is normalized to
UTC (if now is tz-aware, call now = now.astimezone(UTC); if now is naive,
explicitly set or document treating it as UTC by replacing tzinfo=UTC). Update
the billing_period_start implementation (and the analogous helper used at lines
57-59) to perform these checks/conversions before any date arithmetic so you
don't silently copy non-UTC datetimes into UTC or accept non-integer reset_day
values.
- Line 8: Add the module-level logger by importing get_logger from
ai_company.observability and initializing logger = get_logger(__name__) at the
top of the billing module; specifically, add "from ai_company.observability
import get_logger" and then "logger = get_logger(__name__)" near the existing
datetime import so the functions in this file (e.g., any raise paths for invalid
input) can use logger for observability.
In `@src/ai_company/budget/enforcer.py`:
- Around line 518-606: The _build_checker_closure function is too large and
mixes task, monthly, alert-deduplication, and daily logic; split its logic into
small helpers (e.g. _check_task_limit, _check_monthly_thresholds,
_check_daily_limit) and have _check simply call those helpers in order. Move the
task-limit branch (using BUDGET_TASK_LIMIT_HIT and task_limit) into
_check_task_limit, move the monthly calculations and calls to _emit_alert /
BudgetAlertLevel into _check_monthly_thresholds (ensure it receives last_alert,
monthly_baseline, thresholds and monthly_budget and returns a tuple of
(should_stop: bool, maybe_total_monthly) or just bool), and move the daily
branch (BUDGET_DAILY_LIMIT_HIT, daily_baseline, daily_limit) into
_check_daily_limit; keep _build_checker_closure to wiring/closure creation only
and return the small _check that delegates to these helpers.
In `@src/ai_company/engine/agent_engine.py`:
- Around line 714-721: In _handle_budget_error, the exception chaining is
inconsistent: change the re-raise "raise exc from build_exc" to "raise exc from
None" so the original budget error is re-raised without exposing the build
exception in the traceback; locate the except block that catches build_exc (the
logger.exception call using EXECUTION_ENGINE_ERROR with agent_id/task_id and
error message "Failed to build budget-exhausted result") and update the raise
statement to use "from None" matching _handle_fatal_error's behavior.
---
Duplicate comments:
In `@DESIGN_SPEC.md`:
- Line 989: The spec line claiming "12 constants in events/execution.py" is out
of date because EXECUTION_ENGINE_BUDGET_STOPPED was added; update the count to
13 and any related list/description to include EXECUTION_ENGINE_BUDGET_STOPPED
(it is imported in agent_engine.py), and ensure the events/execution.py
constants list and the DESIGN_SPEC description both reflect the new total and
include the budget-stopped event name.
In `@src/ai_company/budget/enforcer.py`:
- Around line 258-260: The branch that currently returns None when
monthly_budget, task_limit, and daily_limit are all <= 0 misses the configured
total_monthly flag; update the early-return in the enforcer (use the same
symbols monthly_budget, task_limit, daily_limit and also check total_monthly) so
that if total_monthly <= 0 it also treats enforcement as disabled and returns
None; ensure logic in the enforcer's constructor or factory mirrors
BudgetConfig's semantics (and remains consistent with check_can_execute() and
resolve_model()) so no checker is created when total_monthly is zero.
In `@src/ai_company/engine/agent_engine.py`:
- Around line 117-120: The constructor currently allows a provided
budget_enforcer and a separate cost_tracker which can lead to inconsistent
tracking (post-execution writes to self._cost_tracker while enforcement reads
budget_enforcer.cost_tracker); in the __init__ (where self._budget_enforcer and
self._cost_tracker are set) add a validation: if both budget_enforcer and
cost_tracker are provided assert they are the same instance (e.g., cost_tracker
is budget_enforcer.cost_tracker) and raise a clear ValueError if not, or
alternatively prefer budget_enforcer.cost_tracker and ignore the passed
cost_tracker (documented choice), ensuring _post_execution_pipeline writes and
enforcement read from the same CostTracker.
- Around line 694-699: The current ExecutionResult created in the AgentEngine
path (using error_ctx = ctx or AgentContext.from_identity(identity, task=task)
and constructing ExecutionResult with TerminationReason.BUDGET_EXHAUSTED) drops
the exception details; update the exception handling so the ExecutionResult
includes the budget-stop message (set error_message to the exception text or
specific code from the caught BudgetExhausted exception) and any relevant
metadata before returning. Locate the try/except around that block in
agent_engine.py, catch the budget-related exception (or capture the caught
variable), and pass its message (e.g., str(e) or getattr(e, "message", None))
into ExecutionResult(error_message=...) so callers can distinguish monthly vs
daily stops while still setting
termination_reason=TerminationReason.BUDGET_EXHAUSTED.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1a99f543-3fb6-4c82-8088-cdb3136af9f7
📒 Files selected for processing (18)
CLAUDE.mdDESIGN_SPEC.mdREADME.mdsrc/ai_company/budget/__init__.pysrc/ai_company/budget/billing.pysrc/ai_company/budget/config.pysrc/ai_company/budget/enforcer.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/_validation.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/errors.pysrc/ai_company/observability/events/budget.pysrc/ai_company/observability/events/execution.pytests/unit/budget/test_billing.pytests/unit/budget/test_config.pytests/unit/budget/test_enforcer.pytests/unit/engine/test_agent_engine_budget.pytests/unit/engine/test_agent_engine_errors.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Python 3.14+ with PEP 649 native lazy annotations
Do NOT usefrom __future__ import annotations— Python 3.14 has PEP 649
Use PEP 758 except syntax: useexcept A, B:(no parentheses) — ruff enforces this on Python 3.14
Files:
src/ai_company/observability/events/execution.pysrc/ai_company/budget/billing.pysrc/ai_company/engine/_validation.pytests/unit/engine/test_agent_engine_budget.pysrc/ai_company/budget/enforcer.pysrc/ai_company/engine/__init__.pytests/unit/engine/test_agent_engine_errors.pysrc/ai_company/budget/config.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/budget/__init__.pytests/unit/budget/test_billing.pysrc/ai_company/observability/events/budget.pytests/unit/budget/test_config.pysrc/ai_company/engine/errors.pytests/unit/budget/test_enforcer.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Add type hints to all public functions and classes; mypy strict mode is enforced
Use Google-style docstrings required on public classes and functions (enforced by ruff D rules)
Create new objects instead of mutating existing ones (immutability); for non-Pydantic internal collections use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves — never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); use@computed_fieldfor derived values; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task
Enforce 88 character line length (ruff)
Keep functions to less than 50 lines and files to less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/ai_company/observability/events/execution.pysrc/ai_company/budget/billing.pysrc/ai_company/engine/_validation.pysrc/ai_company/budget/enforcer.pysrc/ai_company/engine/__init__.pysrc/ai_company/budget/config.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/budget/__init__.pysrc/ai_company/observability/events/budget.pysrc/ai_company/engine/errors.py
src/ai_company/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/ai_company/**/*.py: Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)
Never useimport logging,logging.getLogger(), orprint()in application code — use the structured logger from ai_company.observability
Always use variable namelogger(not_logger, notlog) for the logger instance
Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events. import EVENT_CONSTANT
Always use structured kwargs in logging: logger.info(EVENT, key=value) — never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: example-provider, example-large-001, example-medium-001, example-small-001, or large/medium/small aliases
Files:
src/ai_company/observability/events/execution.pysrc/ai_company/budget/billing.pysrc/ai_company/engine/_validation.pysrc/ai_company/budget/enforcer.pysrc/ai_company/engine/__init__.pysrc/ai_company/budget/config.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/budget/__init__.pysrc/ai_company/observability/events/budget.pysrc/ai_company/engine/errors.py
src/ai_company/{providers,engine}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains
Files:
src/ai_company/engine/_validation.pysrc/ai_company/engine/__init__.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/engine/errors.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
Configure asyncio_mode = 'auto' for pytest — no manual@pytest.mark.asyncioneeded
Set test timeout to 30 seconds per test
Prefer@pytest.mark.parametrizefor testing similar cases
Use generic test provider names (test-provider, test-small-001, etc.) in tests
Files:
tests/unit/engine/test_agent_engine_budget.pytests/unit/engine/test_agent_engine_errors.pytests/unit/budget/test_billing.pytests/unit/budget/test_config.pytests/unit/budget/test_enforcer.py
🧠 Learnings (7)
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : Always use event name constants from ai_company.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider); import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT
Applied to files:
src/ai_company/observability/events/execution.pysrc/ai_company/engine/agent_engine.pysrc/ai_company/observability/events/budget.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to tests/**/*.py : Prefer pytest.mark.parametrize for testing similar cases
Applied to files:
tests/unit/engine/test_agent_engine_budget.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Applied to files:
CLAUDE.mdsrc/ai_company/engine/__init__.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14
Applied to files:
src/ai_company/budget/enforcer.pysrc/ai_company/engine/agent_engine.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/**/*.py : Handle errors explicitly, never silently swallow exceptions
Applied to files:
src/ai_company/budget/enforcer.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/{providers,engine}/**/*.py : RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains
Applied to files:
src/ai_company/engine/__init__.pysrc/ai_company/engine/errors.py
📚 Learning: 2026-03-09T06:51:01.916Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T06:51:01.916Z
Learning: Applies to src/ai_company/providers/**/*.py : Mark retryable errors with is_retryable=True: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError; non-retryable errors raise immediately without retry
Applied to files:
src/ai_company/engine/__init__.py
🧬 Code graph analysis (5)
src/ai_company/engine/__init__.py (1)
src/ai_company/engine/errors.py (1)
DailyLimitExceededError(37-38)
tests/unit/engine/test_agent_engine_errors.py (1)
tests/unit/engine/conftest.py (1)
make_completion_response(272-290)
src/ai_company/budget/__init__.py (2)
src/ai_company/budget/billing.py (2)
billing_period_start(11-45)daily_period_start(48-59)src/ai_company/budget/enforcer.py (1)
BudgetEnforcer(42-338)
tests/unit/budget/test_billing.py (1)
src/ai_company/budget/billing.py (2)
billing_period_start(11-45)daily_period_start(48-59)
tests/unit/budget/test_config.py (1)
src/ai_company/budget/config.py (2)
AutoDowngradeConfig(65-148)BudgetConfig(151-227)
🪛 LanguageTool
README.md
[typographical] ~25-~25: To join two clauses or introduce examples, consider using an em dash.
Context: ...nd factory - Budget Enforcement (M5) - BudgetEnforcer service with pre-flight...
(DASH_RULE)
🔇 Additional comments (21)
CLAUDE.md (1)
47-47: LGTM!The updated description accurately reflects the expanded budget module capabilities with pre-flight/in-flight checks, auto-downgrade, and billing periods.
README.md (1)
25-25: LGTM!The README accurately reflects the new Budget Enforcement (M5) implementation status, clearly noting what's implemented and what's pending.
src/ai_company/engine/errors.py (2)
27-34: LGTM!The updated docstring clearly documents the two usage contexts for
BudgetExhaustedError, improving developer understanding of when this exception is raised.
37-39: LGTM!
DailyLimitExceededErrorappropriately inherits fromBudgetExhaustedError, enabling callers to catch either the specific daily limit error or the broader budget exhaustion category. The docstring is concise and clear.src/ai_company/engine/__init__.py (2)
53-53: LGTM!The import is correctly placed alphabetically among other error imports.
171-171: LGTM!
DailyLimitExceededErroris properly added to__all__in alphabetical order, making it part of the engine's public API surface.src/ai_company/engine/_validation.py (5)
1-20: LGTM!The module setup follows all coding guidelines: proper docstring, structured logger via
get_logger(__name__), and event constants imported from the observability domain module.
22-30: LGTM!Using
frozensetfor_EXECUTABLE_STATUSESensures immutability, and the detailed docstring explains the rationale for which task statuses are executable.
33-58: LGTM!
validate_run_inputscorrectly validates scalar arguments, logs warnings with structured context before raising, and uses the proper event constant. The keyword-only parameter design prevents argument order mistakes.
61-73: LGTM!
validate_agentfollows the guideline to log at WARNING with context before raising, and the error message clearly explains the requirement.
76-105: LGTM!
validate_taskproperly checks both task status and assignment ownership, with appropriate logging before raisingExecutionStateError. The condition on line 94 correctly allows unassigned tasks (assigned_to is None) to proceed.src/ai_company/observability/events/budget.py (1)
17-26: LGTM!The new budget event constants are well-organized, following the established naming convention (
BUDGET_<CATEGORY>_<ACTION>) and value pattern (budget.<category>.<action>). The coverage aligns with the BudgetEnforcer's observability needs.src/ai_company/observability/events/execution.py (1)
37-37: LGTM!The new
EXECUTION_ENGINE_BUDGET_STOPPEDconstant complements the existingEXECUTION_LOOP_BUDGET_EXHAUSTEDby providing engine-level visibility for budget stop events. Naming and placement are consistent with existing patterns.tests/unit/budget/test_billing.py (3)
1-9: LGTM!Good test file setup with proper imports and the 30-second timeout configured as per coding guidelines.
12-117: LGTM!Comprehensive test coverage for
billing_period_startusing@pytest.mark.parametrizeas recommended. The test cases thoroughly cover:
- Same-month and previous-month rollover scenarios
- Exact boundary conditions (day == reset_day)
- Year boundary rollback (January → December)
- February 28 edge case (max allowed reset_day)
- Invalid input validation
The descriptive test IDs improve readability.
120-142: LGTM!
TestDailyPeriodStartcovers the essential behaviors: midnight UTC calculation, idempotency at midnight, and default behavior without explicitnow.src/ai_company/engine/agent_engine.py (3)
173-176: LGTM — Pre-flight budget enforcement integration.The pre-flight check (
check_can_execute) and model resolution (resolve_model) are correctly sequenced before context preparation, aligning with the design spec's three-layer enforcement model (§10.4). Theidentityreassignment fromresolve_modelensures any auto-downgrade is applied to the agent for this run.
246-253: LGTM — Budget checker with enforcer fallback.The conditional properly delegates to
budget_enforcer.make_budget_checkerwhen available, falling back to the basicmake_budget_checker(task)otherwise. This maintains backward compatibility while enabling the enhanced in-flight checking with pre-computed baselines.
208-218: LGTM — Budget error handling simplified.Catching only
BudgetExhaustedErroris correct sinceDailyLimitExceededErroris a subclass (per the PR objectives). The handler routes to_handle_budget_errorwhich returnsBUDGET_EXHAUSTEDtermination reason, properly distinguishing budget stops from crashes.DESIGN_SPEC.md (2)
974-987: LGTM — Pipeline documentation accurately reflects budget enforcement integration.The updated pipeline steps (2-11) correctly describe:
- Pre-flight checks via
check_can_execute()andresolve_model()- Budget-aware checker creation from
BudgetEnforcer- Error handling distinguishing
BudgetExhaustedError(includingDailyLimitExceededError) from other exceptionsThe description aligns with the
AgentEngineimplementation.
1827-1834: LGTM — BudgetEnforcer implementation note is accurate and complete.The note correctly describes:
- Three enforcement layers (pre-flight, in-flight, task-boundary auto-downgrade)
- Composition of
CostTracker+BudgetConfig- Billing period scoping via
billing_period_start(reset_day)DailyLimitExceededErroras subclass ofBudgetExhaustedErrorThis provides clear guidance for understanding the budget enforcement architecture.
| to scope cost queries to the current billing cycle. | ||
| """ | ||
|
|
||
| from datetime import UTC, datetime |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Initialize the standard logger for this module.
These helpers are now part of the budget-enforcement business logic path, but the module still doesn't define logger = get_logger(__name__). That also leaves the invalid-input raise paths without the standard observability hook used elsewhere under src/ai_company/**.
Suggested fix
from datetime import UTC, datetime
+from ai_company.observability import get_logger
+
+logger = get_logger(__name__)As per coding guidelines, "Every module with business logic MUST have: from ai_company.observability import get_logger then logger = get_logger(name)".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from datetime import UTC, datetime | |
| from datetime import UTC, datetime | |
| from ai_company.observability import get_logger | |
| logger = get_logger(__name__) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/budget/billing.py` at line 8, Add the module-level logger by
importing get_logger from ai_company.observability and initializing logger =
get_logger(__name__) at the top of the billing module; specifically, add "from
ai_company.observability import get_logger" and then "logger =
get_logger(__name__)" near the existing datetime import so the functions in this
file (e.g., any raise paths for invalid input) can use logger for observability.
| if not 1 <= reset_day <= 28: # noqa: PLR2004 | ||
| msg = f"reset_day must be 1-28, got {reset_day}" | ||
| raise ValueError(msg) | ||
|
|
||
| if now is None: | ||
| now = datetime.now(UTC) | ||
|
|
||
| if now.day >= reset_day: | ||
| return datetime(now.year, now.month, reset_day, tzinfo=UTC) | ||
|
|
||
| # Roll back to previous month | ||
| if now.month == 1: | ||
| return datetime(now.year - 1, 12, reset_day, tzinfo=UTC) | ||
| return datetime(now.year, now.month - 1, reset_day, tzinfo=UTC) |
There was a problem hiding this comment.
Harden the public input boundary for reset_day and now.
billing_period_start(True) is currently treated as day 1, billing_period_start(1.5) falls through to a TypeError, and non-UTC now values are copied into a UTC timestamp without conversion. Because these helpers define the monthly/daily windows used by budget enforcement, that can select the wrong billing period around UTC day/month boundaries.
Suggested hardening
+def _normalize_utc_now(*, now: datetime | None) -> datetime:
+ if now is None:
+ return datetime.now(UTC)
+ if now.tzinfo is None:
+ msg = "now must be timezone-aware"
+ raise ValueError(msg)
+ return now.astimezone(UTC)
+
+
def billing_period_start(
reset_day: int,
*,
now: datetime | None = None,
) -> datetime:
@@
- if not 1 <= reset_day <= 28: # noqa: PLR2004
+ if (
+ isinstance(reset_day, bool)
+ or not isinstance(reset_day, int)
+ or not 1 <= reset_day <= 28 # noqa: PLR2004
+ ):
msg = f"reset_day must be 1-28, got {reset_day}"
raise ValueError(msg)
- if now is None:
- now = datetime.now(UTC)
+ now = _normalize_utc_now(now=now)
@@
def daily_period_start(*, now: datetime | None = None) -> datetime:
@@
- if now is None:
- now = datetime.now(UTC)
+ now = _normalize_utc_now(now=now)
return datetime(now.year, now.month, now.day, tzinfo=UTC)As per coding guidelines, "Validate at system boundaries (user input, external APIs, config files)".
Also applies to: 57-59
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/budget/billing.py` around lines 32 - 45, The function
billing_period_start should hard-validate its inputs: ensure reset_day is an int
and in 1..28 (raise TypeError for non-int, ValueError for out-of-range) and
ensure now is a datetime that is normalized to UTC (if now is tz-aware, call now
= now.astimezone(UTC); if now is naive, explicitly set or document treating it
as UTC by replacing tzinfo=UTC). Update the billing_period_start implementation
(and the analogous helper used at lines 57-59) to perform these
checks/conversions before any date arithmetic so you don't silently copy non-UTC
datetimes into UTC or accept non-integer reset_day values.
e89538f to
de946df
Compare
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -700,12 +779,11 @@ async def _handle_fatal_error( # noqa: PLR0913 | |||
| task_id=task_id, | |||
| ) | |||
| except MemoryError, RecursionError: | |||
There was a problem hiding this comment.
This handler uses except MemoryError, RecursionError: which is invalid in Python 3. Replace with tuple syntax (except (MemoryError, RecursionError): or ... as exc) so non-recoverable errors propagate without causing a SyntaxError at import time.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| daily_limit, | ||
| agent_id, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
This uses except MemoryError, RecursionError: which is invalid in Python 3 and will raise a SyntaxError. Update it to except (MemoryError, RecursionError): (or ... as exc) so the graceful-degradation path only catches the intended exceptions.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
…hema, closure split - Add graceful degradation to check_can_execute: CostTracker failures fall back to allowing execution instead of aborting the task - Split BUDGET_HARD_STOP_TRIGGERED into EXCEEDED (pre-flight) vs TRIGGERED (in-flight) for distinct event semantics - Add BUDGET_PREFLIGHT_ERROR event for pre-flight fallback logging - Split _build_checker_closure inner function into 3 focused helpers (_check_task_limit, _check_monthly_limit, _check_daily_limit) to stay under 50-line function limit
| except BudgetExhaustedError, DailyLimitExceededError: | ||
| raise | ||
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
Python 2 except syntax — compile-time SyntaxError in Python 3.
except BudgetExhaustedError, DailyLimitExceededError: uses bare-comma syntax, which is only valid in Python 2. Python 3 requires parentheses: except (BudgetExhaustedError, DailyLimitExceededError):. Without this fix, the module cannot be imported.
Additionally, since DailyLimitExceededError is a subclass of BudgetExhaustedError (see errors.py line 37), catching both is redundant — catching just BudgetExhaustedError will catch both.
This same bare-comma pattern appears at multiple other locations in the codebase and must be fixed consistently:
enforcer.pylines 104, 213, 321agent_engine.pylines 210, 333, 363, 724, 781
| except BudgetExhaustedError, DailyLimitExceededError: | |
| raise | |
| except MemoryError, RecursionError: | |
| raise | |
| except (BudgetExhaustedError, DailyLimitExceededError): | |
| raise | |
| except (MemoryError, RecursionError): | |
| raise |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/budget/enforcer.py
Line: 102-105
Comment:
Python 2 `except` syntax — compile-time `SyntaxError` in Python 3.
`except BudgetExhaustedError, DailyLimitExceededError:` uses bare-comma syntax, which is only valid in Python 2. Python 3 requires parentheses: `except (BudgetExhaustedError, DailyLimitExceededError):`. Without this fix, the module cannot be imported.
Additionally, since `DailyLimitExceededError` is a subclass of `BudgetExhaustedError` (see `errors.py` line 37), catching both is redundant — catching just `BudgetExhaustedError` will catch both.
This same bare-comma pattern appears at multiple other locations in the codebase and must be fixed consistently:
- `enforcer.py` lines 104, 213, 321
- `agent_engine.py` lines 210, 333, 363, 724, 781
```suggestion
except (BudgetExhaustedError, DailyLimitExceededError):
raise
except (MemoryError, RecursionError):
raise
```
How can I resolve this? If you propose a fix, please make it concise.| # Skip if enforcement disabled (total_monthly <= 0) | ||
| if cfg.total_monthly <= 0: | ||
| logger.debug( | ||
| BUDGET_ENFORCEMENT_CHECK, | ||
| agent_id=agent_id, | ||
| result="pass", | ||
| reason="enforcement_disabled", | ||
| ) | ||
| return |
There was a problem hiding this comment.
Pre-flight enforcement skips per_agent_daily_limit check when total_monthly <= 0.
When total_monthly <= 0, check_can_execute returns early without calling _check_daily_limit, even if per_agent_daily_limit is set to a positive value. This creates an enforcement gap: the daily limit is only enforced in-flight by make_budget_checker, not at pre-flight.
This is inconsistent with make_budget_checker, which treats the three limits independently (see lines 273–277). An operator who configures total_monthly=0 (to disable global budget caps) while keeping per_agent_daily_limit=10.0 (to rate-limit individual agents) will find the daily limit never enforced at pre-flight — the agent can exceed it freely until the in-flight checker eventually stops it mid-task.
The daily limit check should be decoupled from the monthly check:
| # Skip if enforcement disabled (total_monthly <= 0) | |
| if cfg.total_monthly <= 0: | |
| logger.debug( | |
| BUDGET_ENFORCEMENT_CHECK, | |
| agent_id=agent_id, | |
| result="pass", | |
| reason="enforcement_disabled", | |
| ) | |
| return | |
| # Skip monthly hard stop if enforcement disabled (total_monthly <= 0) | |
| if cfg.total_monthly > 0: | |
| await self._check_monthly_hard_stop(cfg, agent_id) | |
| # Daily limit is independent and should always be checked | |
| await self._check_daily_limit(cfg, agent_id) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/budget/enforcer.py
Line: 89-97
Comment:
Pre-flight enforcement skips `per_agent_daily_limit` check when `total_monthly <= 0`.
When `total_monthly <= 0`, `check_can_execute` returns early without calling `_check_daily_limit`, even if `per_agent_daily_limit` is set to a positive value. This creates an enforcement gap: the daily limit is only enforced in-flight by `make_budget_checker`, not at pre-flight.
This is inconsistent with `make_budget_checker`, which treats the three limits independently (see lines 273–277). An operator who configures `total_monthly=0` (to disable global budget caps) while keeping `per_agent_daily_limit=10.0` (to rate-limit individual agents) will find the daily limit never enforced at pre-flight — the agent can exceed it freely until the in-flight checker eventually stops it mid-task.
The daily limit check should be decoupled from the monthly check:
```suggestion
# Skip monthly hard stop if enforcement disabled (total_monthly <= 0)
if cfg.total_monthly > 0:
await self._check_monthly_hard_stop(cfg, agent_id)
# Daily limit is independent and should always be checked
await self._check_daily_limit(cfg, agent_id)
```
How can I resolve this? If you propose a fix, please make it concise.| assert set(_ALERT_LEVEL_ORDER) == set(BudgetAlertLevel), ( # noqa: S101 | ||
| f"_ALERT_LEVEL_ORDER keys {set(_ALERT_LEVEL_ORDER)} do not match " | ||
| f"BudgetAlertLevel members {set(BudgetAlertLevel)}" | ||
| ) |
There was a problem hiding this comment.
Module-level assert is eliminated under python -O, creating a silent safety gap.
The exhaustiveness check for _ALERT_LEVEL_ORDER uses an assert statement, which is compiled out when Python runs with the -O (optimize) flag. If a future developer adds a new BudgetAlertLevel enum member without a corresponding entry in _ALERT_LEVEL_ORDER, this guard will silently vanish in optimized builds — and _emit_alert will raise a KeyError at runtime instead of failing fast at startup.
Replace the assert with an explicit runtime check that fires in all environments:
| assert set(_ALERT_LEVEL_ORDER) == set(BudgetAlertLevel), ( # noqa: S101 | |
| f"_ALERT_LEVEL_ORDER keys {set(_ALERT_LEVEL_ORDER)} do not match " | |
| f"BudgetAlertLevel members {set(BudgetAlertLevel)}" | |
| ) | |
| _missing = set(BudgetAlertLevel) - set(_ALERT_LEVEL_ORDER) | |
| if _missing: | |
| msg = f"_ALERT_LEVEL_ORDER is missing members: {_missing}" | |
| raise RuntimeError(msg) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/budget/enforcer.py
Line: 464-467
Comment:
Module-level `assert` is eliminated under `python -O`, creating a silent safety gap.
The exhaustiveness check for `_ALERT_LEVEL_ORDER` uses an `assert` statement, which is compiled out when Python runs with the `-O` (optimize) flag. If a future developer adds a new `BudgetAlertLevel` enum member without a corresponding entry in `_ALERT_LEVEL_ORDER`, this guard will silently vanish in optimized builds — and `_emit_alert` will raise a `KeyError` at runtime instead of failing fast at startup.
Replace the `assert` with an explicit runtime check that fires in all environments:
```suggestion
_missing = set(BudgetAlertLevel) - set(_ALERT_LEVEL_ORDER)
if _missing:
msg = f"_ALERT_LEVEL_ORDER is missing members: {_missing}"
raise RuntimeError(msg)
```
How can I resolve this? If you propose a fix, please make it concise.- Fix pre-flight check_can_execute skipping daily limit when total_monthly <= 0 (enforcer.py) - Replace module-level assert with RuntimeError for _ALERT_LEVEL_ORDER exhaustiveness check (enforcer.py) - Simplify redundant except clause (DailyLimitExceededError is subclass of BudgetExhaustedError) (enforcer.py) - Fix turn_range semantic inconsistency: use turn.turn_number for turn-based findings instead of raw enumerate index (detectors.py) - Update ErrorFinding.turn_range docstring to clarify position semantics across detector types (models.py) - Change MemoryBackend.backend_name return type to NotBlankStr (protocol.py) - Add MemoryNotFoundError docstring clarifying protocol vs impl usage (errors.py) - Add vector_store/history_store validation against known values (config.py) - Fix DESIGN_SPEC TOC ordering (7.4 before 7.5) - Fix ADR-001: add mem0ai to version table, fix Memari->Memary typo, fix YAML agents example (mapping->list), add RootConfig note - Add test for ClassificationResult._validate_findings_match_categories - Add Windows path traversal test cases for MemoryStorageConfig - Fix misleading docstring in integration test file
🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
except (BudgetExhaustedError, DailyLimitExceededError)clause before the broadexcept Exception, returningTerminationReason.BUDGET_EXHAUSTEDinstead of misclassifying asTerminationReason.ERRORengine/_validation.py: Movedvalidate_run_inputs,validate_agent,validate_taskout ofagent_engine.pyto keep it under the 800-line limit (876 → 792 lines)resolve_modelandmake_budget_checkerso CostTracker errors don't crash task execution — falls back to identity unchanged / no budget checkerraise exc from build_exctoraise exc from Noneso the original error propagates cleanlyreset_dayboundary check inbilling.py,allow_inf_nan=FalseonBudgetConfig_apply_downgrade/_resolve_and_applyinto single function, reordered_AlertThresholdsfields to natural ascending order, standardized importsCloses #44
Test plan
Review coverage
Pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, type-design-analyzer, logging-audit, resilience-audit, security-reviewer, docs-consistency). 30 findings consolidated, all 30 implemented.