feat: implement per-call cost tracking service (#7)#102
Conversation
Add CostTracker service (budget/tracker.py) that records CostRecord entries in an append-only in-memory store and provides aggregation queries for budget monitoring. Includes time-filtered queries, per-agent and per-department breakdowns, alert level computation, and structured logging via budget event constants. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove lock-free sync properties (record_count, total_cost_usd) — all access now goes through async lock-safe methods - Log exception details in _resolve_department (error, error_type) - Move log statement inside lock in record() for ordering consistency - Add early start >= end validation in get_total_cost, get_agent_cost, and build_summary - Rewrite _filter_records as single-pass generator - Use math.fsum for precise float accumulation in _aggregate - Replace private _snapshot test with public API assertions - Add tests: start > end raises, BUDGET_SUMMARY_BUILT log event, resolver failure log event with error details Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a new in-memory CostTracker service (async, append-only) with recording, query, and summary APIs; exports CostTracker from the budget package; adds budget-related observability event constants; updates CLAUDE.md with two mandatory sections; and introduces fixtures plus extensive unit tests for CostTracker behavior and validations. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant CostTracker
participant Lock as asyncio.Lock
participant Records as In-Memory Store
participant Logger as Event Logger
Client->>CostTracker: record(cost_record)
CostTracker->>Lock: acquire()
Lock->>Records: append cost_record
Records-->>Lock: ok
Lock->>Lock: release()
CostTracker->>Logger: emit BUDGET_RECORD_ADDED
Logger-->>Client: ack
sequenceDiagram
participant Client
participant CostTracker
participant Lock as asyncio.Lock
participant Records as In-Memory Store
participant Resolver as Department Resolver
participant BudgetConfig
participant Logger as Event Logger
Client->>CostTracker: build_summary(start, end)
CostTracker->>Lock: acquire()
Lock->>Records: snapshot records
Records-->>Lock: records[]
Lock->>Lock: release()
CostTracker->>CostTracker: filter_records(start,end)
CostTracker->>CostTracker: aggregate by agent
loop per agent
CostTracker->>Resolver: resolve_department(agent_id)
alt resolves
Resolver-->>CostTracker: department
else fails
Resolver-->>Logger: error
Logger-->>CostTracker: emitted BUDGET_DEPARTMENT_RESOLVE_FAILED
end
end
CostTracker->>BudgetConfig: compute_alert_level(total_cost)
BudgetConfig-->>CostTracker: alert_level
CostTracker->>Logger: emit BUDGET_SUMMARY_BUILT
Logger-->>Client: SpendingSummary
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a core Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a CostTracker service for real-time, in-memory cost tracking. The implementation is robust, featuring async-safe operations for recording and querying cost data, and is accompanied by a comprehensive suite of unit tests achieving 100% coverage. The code is well-structured and follows good practices. My review includes one suggestion to optimize the performance of the spending summary generation by avoiding redundant data processing.
src/ai_company/budget/tracker.py
Outdated
| dept_map: dict[str, list[CostRecord]] = defaultdict(list) | ||
| for aid, records in by_agent_map.items(): | ||
| dept = self._resolve_department(aid) | ||
| if dept is not None: | ||
| dept_map[dept].extend(records) | ||
|
|
||
| dept_spendings: list[DepartmentSpending] = [] | ||
| for dname in sorted(dept_map): | ||
| d_cost, d_in, d_out, d_count = _aggregate(dept_map[dname]) | ||
| dept_spendings.append( | ||
| DepartmentSpending( | ||
| department_name=dname, | ||
| total_cost_usd=d_cost, | ||
| total_input_tokens=d_in, | ||
| total_output_tokens=d_out, | ||
| record_count=d_count, | ||
| ) | ||
| ) |
There was a problem hiding this comment.
The current implementation for department aggregation re-iterates over all CostRecord objects, which can be inefficient if there are many records.
You can improve performance by aggregating department spending from the agent_spendings list that has already been computed. This avoids re-processing every record and instead aggregates the pre-computed per-agent totals.
| dept_map: dict[str, list[CostRecord]] = defaultdict(list) | |
| for aid, records in by_agent_map.items(): | |
| dept = self._resolve_department(aid) | |
| if dept is not None: | |
| dept_map[dept].extend(records) | |
| dept_spendings: list[DepartmentSpending] = [] | |
| for dname in sorted(dept_map): | |
| d_cost, d_in, d_out, d_count = _aggregate(dept_map[dname]) | |
| dept_spendings.append( | |
| DepartmentSpending( | |
| department_name=dname, | |
| total_cost_usd=d_cost, | |
| total_input_tokens=d_in, | |
| total_output_tokens=d_out, | |
| record_count=d_count, | |
| ) | |
| ) | |
| dept_spendings_map: dict[str, list[AgentSpending]] = defaultdict(list) | |
| for agent_spend in agent_spendings: | |
| dept = self._resolve_department(agent_spend.agent_id) | |
| if dept is not None: | |
| dept_spendings_map[dept].append(agent_spend) | |
| dept_spendings = [ | |
| DepartmentSpending( | |
| department_name=dname, | |
| total_cost_usd=round( | |
| math.fsum(s.total_cost_usd for s in agent_spends), | |
| BUDGET_ROUNDING_PRECISION, | |
| ), | |
| total_input_tokens=sum(s.total_input_tokens for s in agent_spends), | |
| total_output_tokens=sum(s.total_output_tokens for s in agent_spends), | |
| record_count=sum(s.record_count for s in agent_spends), | |
| ) | |
| for dname, agent_spends in sorted(dept_spendings_map.items()) | |
| ] |
Greptile SummaryThis PR introduces the Key changes and observations:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant CostTracker
participant asyncio.Lock
participant _helpers as Module helpers
Note over Caller, _helpers: record() flow
Caller->>CostTracker: record(cost_record)
CostTracker->>asyncio.Lock: acquire
CostTracker->>CostTracker: _records.append(cost_record)
CostTracker->>CostTracker: logger.info(BUDGET_RECORD_ADDED)
CostTracker->>asyncio.Lock: release
Note over Caller, _helpers: get_total_cost() / get_agent_cost() flow
Caller->>CostTracker: get_total_cost(start, end)
CostTracker->>_helpers: _validate_time_range(start, end)
CostTracker->>asyncio.Lock: acquire (via _snapshot)
CostTracker->>asyncio.Lock: release
CostTracker->>_helpers: _filter_records(snapshot, start, end)
CostTracker->>_helpers: _aggregate(filtered)
CostTracker-->>Caller: float (rounded cost)
Note over Caller, _helpers: build_summary() flow
Caller->>CostTracker: build_summary(start, end)
CostTracker->>_helpers: _validate_time_range(start, end)
CostTracker->>asyncio.Lock: acquire (via _snapshot)
CostTracker->>asyncio.Lock: release
CostTracker->>_helpers: _filter_records(snapshot, start, end)
CostTracker->>_helpers: _aggregate(filtered) → totals
CostTracker->>_helpers: _build_agent_spendings(filtered)
CostTracker->>CostTracker: _build_dept_spendings(agent_spendings)
CostTracker->>CostTracker: _build_budget_context(totals.cost)
CostTracker->>CostTracker: _compute_alert_level(used_pct)
CostTracker-->>Caller: SpendingSummary
Last reviewed commit: 3ebc865 |
There was a problem hiding this comment.
Pull request overview
This PR implements a per-call cost tracking service (CostTracker) for the budget module, as specified in DESIGN_SPEC Section 10.2 and Issue #7. It provides an append-only in-memory store for CostRecord entries with async lock-safe queries, time-filtered aggregation by agent/department/period, and alert level computation against budget thresholds.
Changes:
- Add
CostTrackerservice insrc/ai_company/budget/tracker.pywith async lock-protected record storage, aggregation queries, spending summary builder, and budget alert computation - Add three budget event constants (
BUDGET_RECORD_ADDED,BUDGET_SUMMARY_BUILT,BUDGET_DEPARTMENT_RESOLVE_FAILED) and new "Design Spec" / "Planning" sections toCLAUDE.md - Add 31 unit tests and supporting fixtures/helpers in the budget test module
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/ai_company/budget/tracker.py |
New CostTracker service: append-only in-memory store with async-safe record, query, and summary methods using math.fsum for precise float aggregation |
src/ai_company/observability/events.py |
Three new budget lifecycle event constants following domain.noun.verb convention |
src/ai_company/budget/__init__.py |
Re-exports CostTracker in the public __all__ list |
tests/unit/budget/test_tracker.py |
31 unit tests covering record storage, queries, summaries, alert levels, concurrency, and error handling |
tests/unit/budget/conftest.py |
Test fixtures (cost_tracker, budget_config_for_tracker, department_resolver) and make_cost_record helper |
CLAUDE.md |
Added "Design Spec (MANDATORY)" and "Planning (MANDATORY)" process documentation sections |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1,433 @@ | |||
| """Unit tests for the CostTracker service.""" | |||
There was a problem hiding this comment.
Missing pytestmark = pytest.mark.timeout(30) at module level. Every other test file in tests/unit/budget/ (e.g. test_config.py, test_cost_record.py, test_enums.py, test_hierarchy.py, test_spending_summary.py) includes this, and CLAUDE.md line 103 states "Timeout: 30 seconds per test". Add pytestmark = pytest.mark.timeout(30) after the imports.
| class TestCostTrackerRecord: | ||
| """Tests for CostTracker.record().""" | ||
|
|
||
| @pytest.mark.unit | ||
| async def test_record_stores_record(self, cost_tracker: CostTracker) -> None: |
There was a problem hiding this comment.
@pytest.mark.unit is applied to each individual test method, but every other test file in tests/unit/budget/ applies it at the class level (e.g., test_config.py:23, test_cost_record.py:15, test_enums.py:10). Move @pytest.mark.unit to the class decorators on TestCostTrackerRecord, TestCostTrackerQuery, TestCostTrackerBuildSummary, and TestCostTrackerAlertLevel, and remove the per-method markers.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/budget/tracker.py`:
- Around line 288-295: The _validate_time_range function currently raises
ValueError without logging; import the observability event constant
BUDGET_TIME_RANGE_INVALID from ai_company.observability.events and the module
logger (or use the existing logger instance), then emit a warning-or-error log
that includes the event constant and context (start and end ISO strings)
immediately before raising ValueError in the branch where start >= end; ensure
the log message uses the event constant (BUDGET_TIME_RANGE_INVALID) and includes
the same formatted message that will be raised.
- Around line 142-245: The build_summary method is too large and should be split
into focused helper functions: extract the filtering/snapshot logic into a
_get_filtered_snapshot(start,end) that calls self._snapshot and _filter_records,
move the per-agent aggregation into _aggregate_by_agent(filtered) which uses
_aggregate and returns agent_spendings and the by_agent_map, move the
per-department aggregation into _aggregate_by_department(by_agent_map) which
uses self._resolve_department and _aggregate to return dept_spendings, and
extract budget math/alert computation into _compute_budget_context(total_cost)
which uses self._budget_config and self._compute_alert_level; then have
build_summary call these helpers and assemble the SpendingSummary
(SpendingSummary, PeriodSpending, AgentSpending, DepartmentSpending) and perform
the logger.info call. Ensure each new helper is small, pure where possible, and
referenced by name (_get_filtered_snapshot, _aggregate_by_agent,
_aggregate_by_department, _compute_budget_context) so tests and callers remain
clear.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: ed7a12e5-df8e-4d0f-bdb8-9a566ec3ec01
📒 Files selected for processing (6)
CLAUDE.mdsrc/ai_company/budget/__init__.pysrc/ai_company/budget/tracker.pysrc/ai_company/observability/events.pytests/unit/budget/conftest.pytests/unit/budget/test_tracker.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses for exception handling (PEP 758), notexcept (A, B):
All public functions and classes must have type hints; use mypy strict mode for enforcement
Use Google-style docstrings for all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Use line length of 88 characters (enforced by ruff)
Files:
tests/unit/budget/conftest.pysrc/ai_company/budget/__init__.pytests/unit/budget/test_tracker.pysrc/ai_company/budget/tracker.pysrc/ai_company/observability/events.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
All tests must use fake/vendor-agnostic model IDs and names (e.g.,test-haiku-001,test-provider), never real vendor model IDs — keep tests decoupled from external providers
Set tests to timeout after 30 seconds per test case
Files:
tests/unit/budget/conftest.pytests/unit/budget/test_tracker.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import logger viafrom ai_company.observability import get_loggerand instantiate aslogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code — use the structured logger from observability module
Use event name constants fromai_company.observability.eventsfor all logging statements, not arbitrary strings
Use structured logging format:logger.info(EVENT, key=value)with kwargs, never format strings likelogger.info("msg %s", val)
All error paths must log at WARNING or ERROR level with context before raising exceptions
All state transitions must be logged at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Always use Pydantic v2 models withBaseModel,model_validator, andConfigDict
Create new objects instead of mutating existing ones — follow immutability principles
Validate explicitly at system boundaries (user input, external APIs, config files) — never silently swallow validation errors
Files:
src/ai_company/budget/__init__.pysrc/ai_company/budget/tracker.pysrc/ai_company/observability/events.py
🧠 Learnings (8)
📚 Learning: 2026-01-24T09:54:56.100Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/test-files.instructions.md:0-0
Timestamp: 2026-01-24T09:54:56.100Z
Learning: Each test should be independent and not rely on other tests; use pytest fixtures for test setup (shared fixtures in `tests/conftest.py`); clean up resources in teardown/fixtures
Applied to files:
tests/unit/budget/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: Applies to tests/**/*.py : Tests must use fake model names (e.g., `test-model:8b`, `fake-writer:latest`)—never use real model IDs from `RECOMMENDED_MODELS`.
Applied to files:
tests/unit/budget/conftest.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/**/*.py : Use pytest fixtures for test setup. Shared fixtures should be in `tests/conftest.py`
Applied to files:
tests/unit/budget/conftest.py
📚 Learning: 2026-02-26T17:43:50.902Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T17:43:50.902Z
Learning: When making changes that affect architecture, services, key files, settings, or workflows, update the relevant sections of existing documentation (CLAUDE.md, README.md, etc.) to reflect those changes.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-05T06:58:57.777Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T06:58:57.777Z
Learning: Design and architecture are specified in DESIGN_SPEC.md; all major changes must align with the high-level spec
Applied to files:
CLAUDE.md
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to README.md : Update README.md for significant feature changes
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-05T06:58:57.777Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T06:58:57.777Z
Learning: Project layout uses src-layout convention with `src/ai_company/` for source code and `tests/` for all test categories (unit, integration, e2e)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-05T06:58:57.777Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T06:58:57.777Z
Learning: Applies to src/**/*.py : Use event name constants from `ai_company.observability.events` for all logging statements, not arbitrary strings
Applied to files:
src/ai_company/observability/events.py
🪛 LanguageTool
CLAUDE.md
[style] ~17-~17: A comma is missing here.
Context: ...al - When a spec section is referenced (e.g. "Section 10.2"), read that section verb...
(EG_NO_COMMA)
[style] ~23-~23: The phrase ‘look for ways to’ is wordy and overused. Use a shorter, less frequent alternative to let your writing stand out and sound more polished.
Context: ... implementation, be critical — actively look for ways to improve the design in the spirit of what we're ...
(LOOK_FOR_STYLE)
🔇 Additional comments (7)
CLAUDE.md (1)
11-25: Good addition of mandatory design/planning guardrails.These directives are clear, enforceable, and align well with architecture-first workflow expectations.
src/ai_company/budget/__init__.py (1)
26-37:CostTrackerexport wiring looks correct.Import and
__all__update are consistent and expose the new API cleanly.src/ai_company/observability/events.py (1)
101-105: Budget event constants are well-structured and consistent.Names follow the existing convention and support structured logging in the new tracker flow.
tests/unit/budget/conftest.py (1)
200-259: CostTracker fixture scaffolding is clean and test-friendly.Good use of reusable fixtures plus a helper builder to keep tests focused and deterministic.
tests/unit/budget/test_tracker.py (2)
23-433: Excellent coverage for the new CostTracker behavior surface.The suite exercises concurrency safety, filtering semantics, alert thresholds, resolver failures, and observability events comprehensively.
26-433: No action needed — 30-second timeout is already enforced globally.The pytest-timeout plugin (
pytest-timeout==2.4.0) is configured inpyproject.tomlwith a globaltimeout = 30setting that applies to all tests, including this module. Individual timeout markers are not required.src/ai_company/budget/tracker.py (1)
318-328: Aggregation helper is solid.Using
math.fsumplus centralized rounding/token/count aggregation is a good reliability improvement.
…eviewers Source changes (tracker.py): - Fix budget_used_percent/alert_level threshold disagreement (#1) - Add logger.warning before ValueError in _validate_time_range (#2) - Split build_summary into _build_agent_spendings, _build_dept_spendings, _build_budget_context helpers to meet <50 line guideline (#3) - Replace _aggregate tuple return with _AggregateResult NamedTuple (#4) - Rewrite _aggregate as single-pass loop (#5) - Aggregate departments from AgentSpending objects, not raw records (#6) - Add DEBUG log in __init__ for object creation (#7) - Fix module docstring "Section 10.2 service layer" accuracy (#15) - Add DEBUG entry logs for get_total_cost/get_agent_cost (#16) Event constants (events.py): - Add BUDGET_TRACKER_CREATED, BUDGET_TOTAL_COST_QUERIED, BUDGET_AGENT_COST_QUERIED, BUDGET_TIME_RANGE_INVALID Test improvements (test_tracker.py): - Assert budget_used_percent in all alert-level tests (#8) - Assert budget_total_monthly in configured summary test (#9) - Add start==end raises ValueError test (#10) - Add end-only time filter test for get_total_cost (#11) - Add multi-record token aggregation assertions (#12) - Add pytestmark = pytest.mark.timeout(30) (#13) - Move @pytest.mark.unit to class level (#14) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| def _filter_records( | ||
| records: Sequence[CostRecord], | ||
| *, | ||
| agent_id: str | None = None, | ||
| start: datetime | None = None, | ||
| end: datetime | None = None, | ||
| ) -> tuple[CostRecord, ...]: | ||
| """Filter records by agent and/or time range. | ||
|
|
||
| Time semantics: ``start <= timestamp < end``. | ||
| """ | ||
| return tuple( | ||
| r | ||
| for r in records | ||
| if (agent_id is None or r.agent_id == agent_id) | ||
| and (start is None or r.timestamp >= start) | ||
| and (end is None or r.timestamp < end) | ||
| ) |
There was a problem hiding this comment.
Timezone-aware/naive mismatch raises opaque TypeError at comparison
CostRecord.timestamp is typed as plain datetime (not Pydantic's AwareDatetime), so it accepts both timezone-aware and timezone-naive values. If a record is stored with a naive timestamp and a caller passes an aware start/end (or vice versa), Python raises:
TypeError: can't compare offset-naive and offset-aware datetimes
...inside the r.timestamp >= start and r.timestamp < end comparisons, with no indication of which record or parameter caused the mismatch.
Two complementary options to consider:
-
At the model boundary — change
CostRecord.timestampto Pydantic'sAwareDatetime(which enforces timezone-awareness on creation and is the right long-term fix):from pydantic import AwareDatetime timestamp: AwareDatetime = Field(description="Timestamp of the API call")
-
At the tracker boundary — add an explicit check in
_validate_time_range(or as a separate helper called from_filter_records) that raises a clearValueErrorif the timezone-awareness ofstart/enddisagrees with the stored records.
The tests currently always use UTC-aware datetimes, so this code path is untested. Option 1 is the simpler fix and closes the gap at the source.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/budget/tracker.py
Line: 332-349
Comment:
**Timezone-aware/naive mismatch raises opaque `TypeError` at comparison**
`CostRecord.timestamp` is typed as plain `datetime` (not Pydantic's `AwareDatetime`), so it accepts both timezone-aware and timezone-naive values. If a record is stored with a naive timestamp and a caller passes an aware `start`/`end` (or vice versa), Python raises:
```
TypeError: can't compare offset-naive and offset-aware datetimes
```
...inside the `r.timestamp >= start` and `r.timestamp < end` comparisons, with no indication of which record or parameter caused the mismatch.
Two complementary options to consider:
1. **At the model boundary** — change `CostRecord.timestamp` to Pydantic's `AwareDatetime` (which enforces timezone-awareness on creation and is the right long-term fix):
```python
from pydantic import AwareDatetime
timestamp: AwareDatetime = Field(description="Timestamp of the API call")
```
2. **At the tracker boundary** — add an explicit check in `_validate_time_range` (or as a separate helper called from `_filter_records`) that raises a clear `ValueError` if the timezone-awareness of `start`/`end` disagrees with the stored records.
The tests currently always use UTC-aware datetimes, so this code path is untested. Option 1 is the simpler fix and closes the gap at the source.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/ai_company/budget/tracker.py`:
- Around line 75-76: self._records is currently unbounded; add a configurable
cap (e.g., max_records) on the number of CostRecord entries and enforce it when
adding records to avoid memory growth: introduce a constructor parameter
(default value like 10_000) or a module-level constant, and either switch
_records to a collections.deque(maxlen=max_records) or trim the list after
append (inside the existing asyncio.Lock) so oldest entries are dropped on
overflow; update any methods that append to or iterate over self._records to
respect the cap and document the new parameter.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7261ba88-ec8f-4598-bd48-1d2d4b38c214
📒 Files selected for processing (3)
src/ai_company/budget/tracker.pysrc/ai_company/observability/events.pytests/unit/budget/test_tracker.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses for exception handling (PEP 758), notexcept (A, B):
All public functions and classes must have type hints; use mypy strict mode for enforcement
Use Google-style docstrings for all public classes and functions (enforced by ruff D rules)
Keep functions under 50 lines and files under 800 lines
Use line length of 88 characters (enforced by ruff)
Files:
tests/unit/budget/test_tracker.pysrc/ai_company/budget/tracker.pysrc/ai_company/observability/events.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
All tests must use fake/vendor-agnostic model IDs and names (e.g.,test-haiku-001,test-provider), never real vendor model IDs — keep tests decoupled from external providers
Set tests to timeout after 30 seconds per test case
Files:
tests/unit/budget/test_tracker.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must import logger viafrom ai_company.observability import get_loggerand instantiate aslogger = get_logger(__name__)
Never useimport logging,logging.getLogger(), orprint()in application code — use the structured logger from observability module
Use event name constants fromai_company.observability.eventsfor all logging statements, not arbitrary strings
Use structured logging format:logger.info(EVENT, key=value)with kwargs, never format strings likelogger.info("msg %s", val)
All error paths must log at WARNING or ERROR level with context before raising exceptions
All state transitions must be logged at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Always use Pydantic v2 models withBaseModel,model_validator, andConfigDict
Create new objects instead of mutating existing ones — follow immutability principles
Validate explicitly at system boundaries (user input, external APIs, config files) — never silently swallow validation errors
Files:
src/ai_company/budget/tracker.pysrc/ai_company/observability/events.py
🧠 Learnings (5)
📚 Learning: 2026-01-24T09:54:45.426Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/instructions/agents.instructions.md:0-0
Timestamp: 2026-01-24T09:54:45.426Z
Learning: Applies to agents/test*.py : Agent tests should cover: successful generation with valid output, handling malformed LLM responses, error conditions (network errors, timeouts), output format validation, and integration with story state
Applied to files:
tests/unit/budget/test_tracker.py
📚 Learning: 2026-03-05T06:58:57.777Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T06:58:57.777Z
Learning: Applies to tests/**/*.py : Set tests to timeout after 30 seconds per test case
Applied to files:
tests/unit/budget/test_tracker.py
📚 Learning: 2026-03-05T06:58:57.777Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T06:58:57.777Z
Learning: Applies to tests/**/*.py : Use pytest markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow` to categorize tests
Applied to files:
tests/unit/budget/test_tracker.py
📚 Learning: 2026-01-26T08:59:32.818Z
Learnt from: CR
Repo: Aureliolo/story-factory PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-01-26T08:59:32.818Z
Learning: Applies to tests/unit/test_*.py : Write unit tests for new functionality using pytest. Place test files in `tests/unit/` with `test_*.py` naming convention.
Applied to files:
tests/unit/budget/test_tracker.py
📚 Learning: 2026-03-05T06:58:57.777Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-05T06:58:57.777Z
Learning: Applies to src/**/*.py : Use event name constants from `ai_company.observability.events` for all logging statements, not arbitrary strings
Applied to files:
src/ai_company/observability/events.py
🔇 Additional comments (6)
src/ai_company/observability/events.py (1)
101-109: Budget event constants are consistent and well-scoped.These additions fit the existing event taxonomy and provide clear coverage for tracker lifecycle/query/error paths.
tests/unit/budget/test_tracker.py (3)
20-26: Test categorization and timeout policy are correctly applied.Good use of
pytest.mark.unitand 30-second timeout enforcement at module scope.
63-138: Observability and boundary-condition coverage is strong.Event-capture assertions and start/end validation tests are well-targeted and reduce regression risk.
Also applies to: 315-383
388-460: Alert-level test matrix is solid.NORMAL/WARNING/CRITICAL/HARD_STOP plus no-config and zero-monthly scenarios are all covered.
src/ai_company/budget/tracker.py (2)
69-99: Constructor and record path look robust.The dependency-injected setup, lock-guarded append, and structured event logging are implemented cleanly.
100-230: Query/summary flow and validation are well-executed.Snapshot-based reads, explicit time-range validation, and summary event emission are coherent and reliable.
Also applies to: 317-329
| self._records: list[CostRecord] = [] | ||
| self._lock: asyncio.Lock = asyncio.Lock() |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add a short-term memory guardrail plan for _records.
self._records is unbounded append-only storage; consider a configurable cap/rollover metric until persistent storage lands to prevent long-run memory pressure.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/ai_company/budget/tracker.py` around lines 75 - 76, self._records is
currently unbounded; add a configurable cap (e.g., max_records) on the number of
CostRecord entries and enforce it when adding records to avoid memory growth:
introduce a constructor parameter (default value like 10_000) or a module-level
constant, and either switch _records to a collections.deque(maxlen=max_records)
or trim the list after append (inside the existing asyncio.Lock) so oldest
entries are dropped on overflow; update any methods that append to or iterate
over self._records to respect the cap and document the new parameter.
- Use Pydantic AwareDatetime for CostRecord.timestamp to enforce timezone-aware datetimes at validation time (Greptile R2-1) - Remove dead `is not None` walrus guard in _build_agent_spendings since _aggregate always returns _AggregateResult (Greptile R2-2) - Add test_naive_datetime_rejected to verify naive timestamps are rejected by CostRecord validation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| async def get_total_cost( | ||
| self, | ||
| *, | ||
| start: datetime | None = None, | ||
| end: datetime | None = None, | ||
| ) -> float: |
There was a problem hiding this comment.
Naive datetime passed as start/end raises opaque TypeError against always-aware CostRecord.timestamp
Now that CostRecord.timestamp is AwareDatetime (always timezone-aware), any naive datetime passed as start or end to get_total_cost, get_agent_cost, or build_summary will cause an unhandled TypeError: can't compare offset-naive and offset-aware datetimes deep inside _filter_records:
# _filter_records line 347
and (start is None or r.timestamp >= start) # TypeError if start is naive_validate_time_range does not catch this because both start and end can be naive simultaneously (their mutual comparison succeeds fine), so the check passes silently and the error only surfaces when comparing against the first stored record.
The type annotations for all three public methods should be updated to AwareDatetime to enforce the same constraint at the API boundary as CostRecord.timestamp. The same applies to build_summary's start and end parameters:
from pydantic import AwareDatetime
async def get_total_cost(
self,
*,
start: AwareDatetime | None = None,
end: AwareDatetime | None = None,
) -> float:And on build_summary:
async def build_summary(
self,
*,
start: AwareDatetime,
end: AwareDatetime,
) -> SpendingSummary:This also applies to get_agent_cost (src/ai_company/budget/tracker.py lines 125–131) and the internal _validate_time_range / _filter_records helpers.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/budget/tracker.py
Line: 100-105
Comment:
**Naive `datetime` passed as `start`/`end` raises opaque `TypeError` against always-aware `CostRecord.timestamp`**
Now that `CostRecord.timestamp` is `AwareDatetime` (always timezone-aware), any naive `datetime` passed as `start` or `end` to `get_total_cost`, `get_agent_cost`, or `build_summary` will cause an unhandled `TypeError: can't compare offset-naive and offset-aware datetimes` deep inside `_filter_records`:
```python
# _filter_records line 347
and (start is None or r.timestamp >= start) # TypeError if start is naive
```
`_validate_time_range` does not catch this because both `start` and `end` can be naive simultaneously (their mutual comparison succeeds fine), so the check passes silently and the error only surfaces when comparing against the first stored record.
The type annotations for all three public methods should be updated to `AwareDatetime` to enforce the same constraint at the API boundary as `CostRecord.timestamp`. The same applies to `build_summary`'s `start` and `end` parameters:
```python
from pydantic import AwareDatetime
async def get_total_cost(
self,
*,
start: AwareDatetime | None = None,
end: AwareDatetime | None = None,
) -> float:
```
And on `build_summary`:
```python
async def build_summary(
self,
*,
start: AwareDatetime,
end: AwareDatetime,
) -> SpendingSummary:
```
This also applies to `get_agent_cost` (`src/ai_company/budget/tracker.py` lines 125–131) and the internal `_validate_time_range` / `_filter_records` helpers.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
🤖 I have created a release *beep* *boop* --- ## [0.1.1](ai-company-v0.1.0...ai-company-v0.1.1) (2026-03-10) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [0.1.0](v0.0.0...v0.1.0) (2026-03-11) ### Features * add autonomy levels and approval timeout policies ([#42](#42), [#126](#126)) ([#197](#197)) ([eecc25a](eecc25a)) * add CFO cost optimization service with anomaly detection, reports, and approval decisions ([#186](#186)) ([a7fa00b](a7fa00b)) * add code quality toolchain (ruff, mypy, pre-commit, dependabot) ([#63](#63)) ([36681a8](36681a8)) * add configurable cost tiers and subscription/quota-aware tracking ([#67](#67)) ([#185](#185)) ([9baedfa](9baedfa)) * add container packaging, Docker Compose, and CI pipeline ([#269](#269)) ([435bdfe](435bdfe)), closes [#267](#267) * add coordination error taxonomy classification pipeline ([#146](#146)) ([#181](#181)) ([70c7480](70c7480)) * add cost-optimized, hierarchical, and auction assignment strategies ([#175](#175)) ([ce924fa](ce924fa)), closes [#173](#173) * add design specification, license, and project setup ([8669a09](8669a09)) * add env var substitution and config file auto-discovery ([#77](#77)) ([7f53832](7f53832)) * add FastestStrategy routing + vendor-agnostic cleanup ([#140](#140)) ([09619cb](09619cb)), closes [#139](#139) * add HR engine and performance tracking ([#45](#45), [#47](#47)) ([#193](#193)) ([2d091ea](2d091ea)) * add issue auto-search and resolution verification to PR review skill ([#119](#119)) ([deecc39](deecc39)) * add mandatory JWT + API key authentication ([#256](#256)) ([c279cfe](c279cfe)) * add memory retrieval, ranking, and context injection pipeline ([#41](#41)) ([873b0aa](873b0aa)) * add pluggable MemoryBackend protocol with models, config, and events ([#180](#180)) ([46cfdd4](46cfdd4)) * add pluggable MemoryBackend protocol with models, config, and events ([#32](#32)) ([46cfdd4](46cfdd4)) * add pluggable output scan response policies ([#263](#263)) ([b9907e8](b9907e8)) * add pluggable PersistenceBackend protocol with SQLite implementation ([#36](#36)) ([f753779](f753779)) * add progressive trust and promotion/demotion subsystems ([#43](#43), [#49](#49)) ([3a87c08](3a87c08)) * add retry handler, rate limiter, and provider resilience ([#100](#100)) ([b890545](b890545)) * add SecOps security agent with rule engine, audit log, and ToolInvoker integration ([#40](#40)) ([83b7b6c](83b7b6c)) * add shared org memory and memory consolidation/archival ([#125](#125), [#48](#48)) ([4a0832b](4a0832b)) * design unified provider interface ([#86](#86)) ([3e23d64](3e23d64)) * expand template presets, rosters, and add inheritance ([#80](#80), [#81](#81), [#84](#84)) ([15a9134](15a9134)) * implement agent runtime state vs immutable config split ([#115](#115)) ([4cb1ca5](4cb1ca5)) * implement AgentEngine core orchestrator ([#11](#11)) ([#143](#143)) ([f2eb73a](f2eb73a)) * implement AuditRepository for security audit log persistence ([#279](#279)) ([94bc29f](94bc29f)) * implement basic tool system (registry, invocation, results) ([#15](#15)) ([c51068b](c51068b)) * implement built-in file system tools ([#18](#18)) ([325ef98](325ef98)) * implement communication foundation — message bus, dispatcher, and messenger ([#157](#157)) ([8e71bfd](8e71bfd)) * implement company template system with 7 built-in presets ([#85](#85)) ([cbf1496](cbf1496)) * implement conflict resolution protocol ([#122](#122)) ([#166](#166)) ([e03f9f2](e03f9f2)) * implement core entity and role system models ([#69](#69)) ([acf9801](acf9801)) * implement crash recovery with fail-and-reassign strategy ([#149](#149)) ([e6e91ed](e6e91ed)) * implement engine extensions — Plan-and-Execute loop and call categorization ([#134](#134), [#135](#135)) ([#159](#159)) ([9b2699f](9b2699f)) * implement enterprise logging system with structlog ([#73](#73)) ([2f787e5](2f787e5)) * implement graceful shutdown with cooperative timeout strategy ([#130](#130)) ([6592515](6592515)) * implement hierarchical delegation and loop prevention ([#12](#12), [#17](#17)) ([6be60b6](6be60b6)) * implement LiteLLM driver and provider registry ([#88](#88)) ([ae3f18b](ae3f18b)), closes [#4](#4) * implement LLM decomposition strategy and workspace isolation ([#174](#174)) ([aa0eefe](aa0eefe)) * implement meeting protocol system ([#123](#123)) ([ee7caca](ee7caca)) * implement message and communication domain models ([#74](#74)) ([560a5d2](560a5d2)) * implement model routing engine ([#99](#99)) ([d3c250b](d3c250b)) * implement parallel agent execution ([#22](#22)) ([#161](#161)) ([65940b3](65940b3)) * implement per-call cost tracking service ([#7](#7)) ([#102](#102)) ([c4f1f1c](c4f1f1c)) * implement personality injection and system prompt construction ([#105](#105)) ([934dd85](934dd85)) * implement single-task execution lifecycle ([#21](#21)) ([#144](#144)) ([c7e64e4](c7e64e4)) * implement subprocess sandbox for tool execution isolation ([#131](#131)) ([#153](#153)) ([3c8394e](3c8394e)) * implement task assignment subsystem with pluggable strategies ([#172](#172)) ([c7f1b26](c7f1b26)), closes [#26](#26) [#30](#30) * implement task decomposition and routing engine ([#14](#14)) ([9c7fb52](9c7fb52)) * implement Task, Project, Artifact, Budget, and Cost domain models ([#71](#71)) ([81eabf1](81eabf1)) * implement tool permission checking ([#16](#16)) ([833c190](833c190)) * implement YAML config loader with Pydantic validation ([#59](#59)) ([ff3a2ba](ff3a2ba)) * implement YAML config loader with Pydantic validation ([#75](#75)) ([ff3a2ba](ff3a2ba)) * initialize project with uv, hatchling, and src layout ([39005f9](39005f9)) * initialize project with uv, hatchling, and src layout ([#62](#62)) ([39005f9](39005f9)) * Litestar REST API, WebSocket feed, and approval queue (M6) ([#189](#189)) ([29fcd08](29fcd08)) * make TokenUsage.total_tokens a computed field ([#118](#118)) ([c0bab18](c0bab18)), closes [#109](#109) * parallel tool execution in ToolInvoker.invoke_all ([#137](#137)) ([58517ee](58517ee)) * testing framework, CI pipeline, and M0 gap fixes ([#64](#64)) ([f581749](f581749)) * wire all modules into observability system ([#97](#97)) ([f7a0617](f7a0617)) ### Bug Fixes * address Greptile post-merge review findings from PRs [#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175) ([#176](#176)) ([c5ca929](c5ca929)) * address post-merge review feedback from PRs [#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167) ([#170](#170)) ([3bf897a](3bf897a)), closes [#169](#169) * enforce strict mypy on test files ([#89](#89)) ([aeeff8c](aeeff8c)) * harden Docker sandbox, MCP bridge, and code runner ([#50](#50), [#53](#53)) ([d5e1b6e](d5e1b6e)) * harden git tools security + code quality improvements ([#150](#150)) ([000a325](000a325)) * harden subprocess cleanup, env filtering, and shutdown resilience ([#155](#155)) ([d1fe1fb](d1fe1fb)) * incorporate post-merge feedback + pre-PR review fixes ([#164](#164)) ([c02832a](c02832a)) * pre-PR review fixes for post-merge findings ([#183](#183)) ([26b3108](26b3108)) * resolve circular imports, bump litellm, fix release tag format ([#286](#286)) ([a6659b5](a6659b5)) * strengthen immutability for BaseTool schema and ToolInvoker boundaries ([#117](#117)) ([7e5e861](7e5e861)) ### Performance * harden non-inferable principle implementation ([#195](#195)) ([02b5f4e](02b5f4e)), closes [#188](#188) ### Refactoring * adopt NotBlankStr across all models ([#108](#108)) ([#120](#120)) ([ef89b90](ef89b90)) * extract _SpendingTotals base class from spending summary models ([#111](#111)) ([2f39c1b](2f39c1b)) * harden BudgetEnforcer with error handling, validation extraction, and review fixes ([#182](#182)) ([c107bf9](c107bf9)) * harden personality profiles, department validation, and template rendering ([#158](#158)) ([10b2299](10b2299)) * pre-PR review improvements for ExecutionLoop + ReAct loop ([#124](#124)) ([8dfb3c0](8dfb3c0)) * split events.py into per-domain event modules ([#136](#136)) ([e9cba89](e9cba89)) ### Documentation * add ADR-001 memory layer evaluation and selection ([#178](#178)) ([db3026f](db3026f)), closes [#39](#39) * add agent scaling research findings to DESIGN_SPEC ([#145](#145)) ([57e487b](57e487b)) * add CLAUDE.md, contributing guide, and dev documentation ([#65](#65)) ([55c1025](55c1025)), closes [#54](#54) * add crash recovery, sandboxing, analytics, and testing decisions ([#127](#127)) ([5c11595](5c11595)) * address external review feedback with MVP scope and new protocols ([#128](#128)) ([3b30b9a](3b30b9a)) * expand design spec with pluggable strategy protocols ([#121](#121)) ([6832db6](6832db6)) * finalize 23 design decisions (ADR-002) ([#190](#190)) ([8c39742](8c39742)) * update project docs for M2.5 conventions and add docs-consistency review agent ([#114](#114)) ([99766ee](99766ee)) ### Tests * add e2e single agent integration tests ([#24](#24)) ([#156](#156)) ([f566fb4](f566fb4)) * add provider adapter integration tests ([#90](#90)) ([40a61f4](40a61f4)) ### CI/CD * add Release Please for automated versioning and GitHub Releases ([#278](#278)) ([a488758](a488758)) * bump actions/checkout from 4 to 6 ([#95](#95)) ([1897247](1897247)) * bump actions/upload-artifact from 4 to 7 ([#94](#94)) ([27b1517](27b1517)) * bump anchore/scan-action from 6.5.1 to 7.3.2 ([#271](#271)) ([80a1c15](80a1c15)) * bump docker/build-push-action from 6.19.2 to 7.0.0 ([#273](#273)) ([dd0219e](dd0219e)) * bump docker/login-action from 3.7.0 to 4.0.0 ([#272](#272)) ([33d6238](33d6238)) * bump docker/metadata-action from 5.10.0 to 6.0.0 ([#270](#270)) ([baee04e](baee04e)) * bump docker/setup-buildx-action from 3.12.0 to 4.0.0 ([#274](#274)) ([5fc06f7](5fc06f7)) * bump sigstore/cosign-installer from 3.9.1 to 4.1.0 ([#275](#275)) ([29dd16c](29dd16c)) * harden CI/CD pipeline ([#92](#92)) ([ce4693c](ce4693c)) * split vulnerability scans into critical-fail and high-warn tiers ([#277](#277)) ([aba48af](aba48af)) ### Maintenance * add /worktree skill for parallel worktree management ([#171](#171)) ([951e337](951e337)) * add design spec context loading to research-link skill ([8ef9685](8ef9685)) * add post-merge-cleanup skill ([#70](#70)) ([f913705](f913705)) * add pre-pr-review skill and update CLAUDE.md ([#103](#103)) ([92e9023](92e9023)) * add research-link skill and rename skill files to SKILL.md ([#101](#101)) ([651c577](651c577)) * bump aiosqlite from 0.21.0 to 0.22.1 ([#191](#191)) ([3274a86](3274a86)) * bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group ([#96](#96)) ([0338d0c](0338d0c)) * bump ruff from 0.15.4 to 0.15.5 ([a49ee46](a49ee46)) * fix M0 audit items ([#66](#66)) ([c7724b5](c7724b5)) * **main:** release ai-company 0.1.1 ([#282](#282)) ([2f4703d](2f4703d)) * pin setup-uv action to full SHA ([#281](#281)) ([4448002](4448002)) * post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests, hookify rules ([#148](#148)) ([c57a6a9](c57a6a9)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Summary
CostTrackerservice (budget/tracker.py) — append-only in-memory store forCostRecordentries with async lock-safe queriesBUDGET_RECORD_ADDED,BUDGET_SUMMARY_BUILT,BUDGET_DEPARTMENT_RESOLVE_FAILED)Review fixes applied
_resolve_department(error,error_type)record()for ordering consistencystart >= endvalidation on all query methodsmath.fsumfor precise float accumulation_filter_recordsrewriteCloses #7 — completes M2 (Provider Layer).
Test plan
uv run ruff check src/ tests/— cleanuv run mypy src/ tests/— cleanuv run pytest tests/unit/budget/test_tracker.py -v— 31 passeduv run pytest tests/ -n auto --cov=ai_company --cov-fail-under=80— 1592 passed, 94.95% coverage🤖 Generated with Claude Code