Skip to content

feat: Prometheus metrics -- daily budget %, per-agent cost, per-agent budget %#1154

Merged
Aureliolo merged 11 commits intomainfrom
feat/prometheus-metrics
Apr 8, 2026
Merged

feat: Prometheus metrics -- daily budget %, per-agent cost, per-agent budget %#1154
Aureliolo merged 11 commits intomainfrom
feat/prometheus-metrics

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Adds three new Prometheus metric families to PrometheusCollector, completing the deferred acceptance criteria from #1122:

  • synthorg_budget_daily_used_percent -- daily cost as percentage of prorated daily budget (total_monthly / days_in_month)
  • synthorg_agent_cost_total{agent_id} -- per-agent accumulated cost in USD
  • synthorg_agent_budget_used_percent{agent_id} -- per-agent daily cost as percentage of per_agent_daily_limit

Changes

Source (prometheus_collector.py)

  • Register 3 new Gauge metric families in __init__
  • _refresh_daily_budget_metric() -- computes daily budget utilization, capped at 100%
  • _refresh_agent_cost_metrics() -- queries per-agent costs using asyncio.TaskGroup for parallel fan-out
  • _refresh_agent_metrics() now returns the agents tuple for reuse downstream

Tests (test_prometheus_collector.py)

  • TestPrometheusCollectorDailyBudget -- 6 tests (cap at 100%, partial day, zero cost, zero monthly budget, no config, no tracker)
  • TestPrometheusCollectorAgentCost -- 8 tests (per-agent totals, budget %, stale label clearing, empty agents, no tracker, no limit, error recovery)
  • TestPrometheusCollectorSecurityVerdicts -- added invalid verdict rejection test

Docs (operations.md)

  • Updated metric families count from 9 to 12 with descriptions for all three new metrics

Test plan

  • uv run python -m pytest tests/ -n 8 -- 16011 passed, 9 skipped
  • uv run mypy src/ tests/ -- clean
  • uv run ruff check src/ tests/ -- clean

Review coverage

Pre-reviewed by 12 agents (docs-consistency, code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, logging-audit, resilience-audit, conventions-enforcer, test-quality-reviewer, async-concurrency-reviewer, issue-resolution-verifier). 10 findings addressed.

Closes #1148

Copilot AI review requested due to automatic review settings April 8, 2026 17:44
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

Warning

Rate limit exceeded

@Aureliolo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 9 minutes and 10 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 9 minutes and 10 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e15e584c-24ff-4c0c-9d71-bdd10630a39f

📥 Commits

Reviewing files that changed from the base of the PR and between c3b6c2a and 8ba86e7.

📒 Files selected for processing (1)
  • tests/unit/observability/test_prometheus_collector.py

Walkthrough

Documentation updated: /api/v1/metrics now lists 12 Prometheus metric families (renamed synthorg_appsynthorg_app_info; added synthorg_budget_daily_used_percent, synthorg_agent_cost_total{agent_id}, synthorg_agent_budget_used_percent{agent_id}). PrometheusCollector now snapshots UTC-midnight, billing-period start, and total costs; adds daily and per-agent cost/budget metrics; clears stale label series before repopulating; computes monthly utilization from billing-period cost; and resets gauges to 0.0 when required data or configs are missing. Unit tests expanded accordingly.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the three main metric additions (daily budget %, per-agent cost, per-agent budget %), matching the core changes in the PR.
Description check ✅ Passed The PR description provides a comprehensive summary of the changes including the three new metrics, implementation details, tests, and documentation updates, all directly related to the changeset.
Linked Issues check ✅ Passed The PR addresses all coding objectives from #1148: adds time-windowed cost queries (daily budget %), per-agent cost breakdown metrics (agent_cost_total, agent_budget_used_percent), updates CostTracker for time-windowed data, and extends PrometheusCollector with three new Gauge metrics.
Out of Scope Changes check ✅ Passed All changes align with PR objectives: prometheus_collector.py adds three metrics, tests provide comprehensive coverage, and operations.md documents the changes. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 65.71% which is sufficient. The required threshold is 40.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 8ba86e7.
Ensure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds additional Prometheus budget and per-agent cost utilization metrics to PrometheusCollector, aligning the /api/v1/metrics endpoint with deferred acceptance criteria from the Prometheus/OTLP observability work.

Changes:

  • Add synthorg_budget_daily_used_percent, synthorg_agent_cost_total{agent_id}, and synthorg_agent_budget_used_percent{agent_id} metric families and refresh logic.
  • Extend the collector refresh flow to compute daily cost since UTC midnight and fan-out per-agent cost queries via asyncio.TaskGroup.
  • Add/expand unit tests for daily budget and per-agent cost/budget metrics; update operations docs metric family count and descriptions.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/synthorg/observability/prometheus_collector.py Registers new gauges and implements refresh logic for daily budget % and per-agent cost/budget %.
tests/unit/observability/test_prometheus_collector.py Adds coverage for the new metric families and an invalid security verdict test.
docs/design/operations.md Updates /metrics endpoint documentation to reflect 12 metric families and lists the new ones.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +283 to +291
if not app_state.has_cost_tracker or daily_cost is None:
return
try:
tracker = app_state.cost_tracker
if tracker.budget_config is None:
return
monthly = tracker.budget_config.total_monthly
if monthly <= 0:
return
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_refresh_daily_budget_metric() returns early in several cases (no cost tracker, missing budget_config, monthly<=0, etc.) without setting the gauge. If this metric was previously set on an earlier scrape, these early returns will leave a stale non-zero value exposed. Consider explicitly setting synthorg_budget_daily_used_percent to 0.0 (or NaN) on all skip/early-return paths and/or on exceptions to avoid stale values.

Copilot uses AI. Check for mistakes.
Comment on lines +373 to +378
if not agents or not app_state.has_cost_tracker:
return
try:
tracker = app_state.cost_tracker
self._agent_cost_total.clear()
self._agent_budget_used_percent.clear()
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_refresh_agent_cost_metrics() returns early when agents is empty or the cost tracker is unavailable, but it only clears the per-agent gauges after passing that check. This means if agents existed on a previous scrape, and later the registry returns 0 agents (or cost tracker becomes unavailable), the old labeled series will remain and be reported as if still current. Clear _agent_cost_total/_agent_budget_used_percent before the early return (or clear whenever skipping) so stale agent label series are dropped reliably.

Copilot uses AI. Check for mistakes.
Comment on lines +333 to +336
await collector.refresh(state)
output = generate_latest(collector.registry).decode()
assert "synthorg_budget_daily_used_percent" in output
assert "100.0" in output
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test asserts that the exposition output contains "100.0" anywhere, which can match unrelated metrics and make the test brittle. It would be more reliable to extract the specific synthorg_budget_daily_used_percent sample line (as done in the partial_day test) and assert its numeric value is 100.0.

Copilot uses AI. Check for mistakes.
Comment on lines +554 to +565
async def test_agent_cost_skipped_when_agents_empty(self) -> None:
"""Empty agents tuple with both services available still skips."""
collector = PrometheusCollector()
state = _mock_app_state(
has_cost_tracker=True,
has_agent_registry=True,
agents=(),
budget_total_monthly=100.0,
)
await collector.refresh(state)
output = generate_latest(collector.registry).decode()
assert "synthorg_agent_cost_total{" not in output
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current agent-cost tests cover stale label clearing when an agent disappears but do not cover the skip paths introduced by the early return in _refresh_agent_cost_metrics (e.g., second scrape where agents=() or cost tracker becomes unavailable). Adding a two-scrape regression test for those cases would prevent stale per-agent series from silently persisting.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request expands the observability metrics by adding daily budget utilization and per-agent cost tracking to the Prometheus collector. It includes updates to the documentation, the collector implementation, and comprehensive unit tests. I have reviewed the changes and suggest a minor simplification for the conditional assignment of the per-agent daily limit to improve code readability.

Comment on lines +380 to +382
per_agent_limit = (
budget_cfg.per_agent_daily_limit if budget_cfg is not None else 0.0
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This conditional assignment can be simplified for better readability. Since budget_cfg is either an object or None, checking for is not None is equivalent to checking its truthiness.

            per_agent_limit = budget_cfg.per_agent_daily_limit if budget_cfg else 0.0

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Codecov Report

❌ Patch coverage is 90.56604% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.88%. Comparing base (6a0db11) to head (8ba86e7).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/observability/prometheus_collector.py 90.56% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1154   +/-   ##
=======================================
  Coverage   88.87%   88.88%           
=======================================
  Files         857      857           
  Lines       50408    50440   +32     
  Branches     5071     5072    +1     
=======================================
+ Hits        44801    44833   +32     
- Misses       4643     4644    +1     
+ Partials      964      963    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/operations.md`:
- Line 1304: The docs list the info metric as `synthorg_app` but the actual
exporter and tests use `synthorg_app_info`; update the `/api/v1/metrics` table
entry and any metric-family list in docs/design/operations.md to use
`synthorg_app_info` (replace the string `synthorg_app` with `synthorg_app_info`)
so it matches the exporter and tests (see reference test
tests/unit/observability/test_prometheus_collector.py asserting
`synthorg_app_info`).

In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 283-291: The function early-returns when budgets are not
applicable (checks on app_state.has_cost_tracker, daily_cost is None,
tracker.budget_config is None, and monthly <= 0) but leaves
synthorg_budget_daily_used_percent set from prior scrapes; before each return
clear or reset that metric (e.g., call the Prometheus gauge's remove()/set(0)
API) so it no longer reports stale values. Locate the code around app_state,
tracker, tracker.budget_config, daily_cost and the
synthorg_budget_daily_used_percent gauge in prometheus_collector.py and add a
single reset call immediately before each of the three return points.
- Around line 379-382: The code currently reads
tracker.budget_config.per_agent_daily_limit without checking
budget_cfg.total_monthly; update the per-agent limit computation to treat
budgeting as off when budget_cfg is None or budget_cfg.total_monthly <= 0 by
using per_agent_limit = budget_cfg.per_agent_daily_limit if (budget_cfg is not
None and getattr(budget_cfg, "total_monthly", 0) > 0) else 0.0 so
synthorg_agent_budget_used_percent is not emitted when total_monthly <= 0;
reference tracker.budget_config, budget_cfg.total_monthly,
budget_cfg.per_agent_daily_limit and the synthorg_agent_budget_used_percent
metric when making the change.
- Around line 373-378: The code returns early when `not agents` or `not
app_state.has_cost_tracker` but only clears metrics after that check, causing
stale `synthorg_agent_cost_total` and `synthorg_agent_budget_used_percent`
series to remain; move or duplicate the clears so that
`self._agent_cost_total.clear()` and `self._agent_budget_used_percent.clear()`
are executed before the early `return` (while still preserving the later logic
that sets `tracker = app_state.cost_tracker` when present).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5c3a6a89-0e39-44a1-be55-9be9ca7b8fb6

📥 Commits

Reviewing files that changed from the base of the PR and between 63a9390 and 4c4f774.

📒 Files selected for processing (3)
  • docs/design/operations.md
  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Deploy Preview
  • GitHub Check: Agent
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations -- Python 3.14 has PEP 649 native lazy annotations
Use except A, B: (no parentheses) -- PEP 758 except syntax, enforced by ruff on Python 3.14
All public functions must have type hints. Use mypy strict mode.
Google style docstrings are required on public classes and functions. Enforced by ruff D rules.
Use immutability patterns: create new objects, never mutate existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction + MappingProxyType wrapping.
Use frozen Pydantic models for config/identity; separate mutable-via-copy models for runtime state. Never mix static config fields with mutable runtime fields.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Include allow_inf_nan=False in all ConfigDict declarations.
Use @computed_field for derived values instead of storing + validating redundant fields
Use NotBlankStr (from core.types) for all identifier/name fields, including optional and tuple variants, instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code
Line length must be 88 characters. Enforced by ruff.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Always use variable name logger (not _logger, not log) for logger instances

Files:

  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST import from synthorg.observability import get_logger and set logger = get_logger(__name__)
Never use import logging or logging.getLogger() or print() in application code. Exceptions: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, observability/http_handler.py, observability/otlp_handler.py may use stdlib logging and stderr print for handler construction.
Event names must always use constants from domain-specific modules under synthorg.observability.events. Import directly from specific domain modules.
Use structured kwargs for logging: always logger.info(EVENT, key=value) -- never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig

Files:

  • src/synthorg/observability/prometheus_collector.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, test-provider, test-small-001, etc.

Files:

  • src/synthorg/observability/prometheus_collector.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/prometheus_collector.py
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Update the relevant docs/design/ page to reflect approved deviations from the spec

Files:

  • docs/design/operations.md
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Use asyncio_mode = 'auto' for async tests -- no manual @pytest.mark.asyncio needed
Global test timeout is 30 seconds per test in pyproject.toml. Non-default overrides like timeout(60) are allowed, but do not add per-file pytest.mark.timeout(30) markers.
Prefer @pytest.mark.parametrize for testing similar cases
Use Hypothesis for property-based testing with @given + @settings decorators
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep(). For indefinite blocks, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/observability/test_prometheus_collector.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/observability/test_prometheus_collector.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).

Applied to files:

  • docs/design/operations.md

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/synthorg/observability/prometheus_collector.py (1)

328-354: ⚠️ Potential issue | 🟠 Major

Clear synthorg_active_agents_total before the empty-return paths.

Line 339 only runs after list_active() succeeds. If Line 328 returns early or the exception path at Line 348 is hit, the previous (status, trust_level) series stay exported indefinitely even though the registry is unavailable.

🧹 Proposed fix
     async def _refresh_agent_metrics(
         self,
         app_state: AppState,
     ) -> tuple[Any, ...]:
@@
-        if not app_state.has_agent_registry:
-            return ()
+        self._agents_total.clear()
+        if not app_state.has_agent_registry:
+            return ()
         try:
             agents = await app_state.agent_registry.list_active()
             counts: Counter[tuple[str, str]] = Counter()
             for agent in agents:
                 status = str(agent.status)
                 trust = str(agent.tools.access_level)
                 counts[(status, trust)] += 1
-            # Clear stale label series so disappeared combinations
-            # drop to zero instead of retaining previous values.
-            self._agents_total.clear()
             for (status, trust), count in counts.items():
                 self._agents_total.labels(
                     status=status,
                     trust_level=trust,
                 ).set(count)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/observability/prometheus_collector.py` around lines 328 - 354,
If app_state.has_agent_registry is false or an exception occurs we must clear
the synthorg_active_agents_total metric so stale label series are not exported;
move or duplicate the call to self._agents_total.clear() so it runs before the
early return in the has_agent_registry False branch and in the exception path
inside collect_active_agents (i.e., around the block that calls
app_state.agent_registry.list_active and handles exceptions), ensuring that
self._agents_total.clear() is invoked whenever list_active won't populate
current labels.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/operations.md`:
- Line 1852: The "Recommended Framing" footer in docs/design/operations.md is
inconsistent with the gap table: G1 is closed and the section notes G1–G5
closed, but the paragraph still says G3 and G4 remain open; update that
paragraph to match the table by removing the mention of G3/G4 (or explicitly
state that G1–G5 are closed/no remaining gaps). Edit the paragraph labeled
"Recommended Framing" to reference the gap IDs (G1, G3, G4, G5 or G1–G5) so the
section shows a single current status consistent with the table.

In `@src/synthorg/observability/prometheus_collector.py`:
- Line 224: The daily/monthly budget refresh recomputes the current UTC time
twice causing inconsistent day-count vs numerator timestamp; modify
_refresh_daily_budget_metric (and the analogous monthly logic) to compute a
single timestamp variable (e.g., utc_midnight or now_utc) once at the start of
the scrape handler and pass that same timestamp into both the numerator
calculation and the month-length/divisor logic instead of calling
datetime.now(UTC) again so the numerator and divisor use the identical UTC
moment.

---

Duplicate comments:
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 328-354: If app_state.has_agent_registry is false or an exception
occurs we must clear the synthorg_active_agents_total metric so stale label
series are not exported; move or duplicate the call to
self._agents_total.clear() so it runs before the early return in the
has_agent_registry False branch and in the exception path inside
collect_active_agents (i.e., around the block that calls
app_state.agent_registry.list_active and handles exceptions), ensuring that
self._agents_total.clear() is invoked whenever list_active won't populate
current labels.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d597882e-4bca-45a9-8c16-1c5d71432258

📥 Commits

Reviewing files that changed from the base of the PR and between 4c4f774 and 9e50231.

📒 Files selected for processing (3)
  • docs/design/operations.md
  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Deploy Preview
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Update the relevant docs/design/ page to reflect approved deviations from the spec

Files:

  • docs/design/operations.md
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations -- Python 3.14 has PEP 649 native lazy annotations
Use except A, B: (no parentheses) -- PEP 758 except syntax, enforced by ruff on Python 3.14
All public functions must have type hints. Use mypy strict mode.
Google style docstrings are required on public classes and functions. Enforced by ruff D rules.
Use immutability patterns: create new objects, never mutate existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction + MappingProxyType wrapping.
Use frozen Pydantic models for config/identity; separate mutable-via-copy models for runtime state. Never mix static config fields with mutable runtime fields.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Include allow_inf_nan=False in all ConfigDict declarations.
Use @computed_field for derived values instead of storing + validating redundant fields
Use NotBlankStr (from core.types) for all identifier/name fields, including optional and tuple variants, instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code
Line length must be 88 characters. Enforced by ruff.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Always use variable name logger (not _logger, not log) for logger instances

Files:

  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST import from synthorg.observability import get_logger and set logger = get_logger(__name__)
Never use import logging or logging.getLogger() or print() in application code. Exceptions: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, observability/http_handler.py, observability/otlp_handler.py may use stdlib logging and stderr print for handler construction.
Event names must always use constants from domain-specific modules under synthorg.observability.events. Import directly from specific domain modules.
Use structured kwargs for logging: always logger.info(EVENT, key=value) -- never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig

Files:

  • src/synthorg/observability/prometheus_collector.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, test-provider, test-small-001, etc.

Files:

  • src/synthorg/observability/prometheus_collector.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/prometheus_collector.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Use asyncio_mode = 'auto' for async tests -- no manual @pytest.mark.asyncio needed
Global test timeout is 30 seconds per test in pyproject.toml. Non-default overrides like timeout(60) are allowed, but do not add per-file pytest.mark.timeout(30) markers.
Prefer @pytest.mark.parametrize for testing similar cases
Use Hypothesis for property-based testing with @given + @settings decorators
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep(). For indefinite blocks, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/observability/test_prometheus_collector.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/observability/test_prometheus_collector.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-21T12:54:22.557Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-21T12:54:22.557Z
Learning: Version bumping (pre-1.0): `fix:` = patch, `feat:` = patch, `feat!:` = minor, `BREAKING CHANGE` trailer = minor. Update version in `pyproject.toml` (`[tool.commitizen].version`) and `src/synthorg/__init__.py` (`__version__`)

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py

@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/synthorg/observability/prometheus_collector.py (2)

199-224: ⚠️ Potential issue | 🟠 Major

Don’t reuse lifetime total_cost for the monthly budget gauge.

_cost_total can legitimately use get_total_cost() across all retained records, but synthorg_budget_used_percent cannot. src/synthorg/budget/tracker.py:177-200 documents that an unbounded get_total_cost() sums across all records, so feeding that value into _refresh_budget_metrics() turns a “monthly budget” metric into “retained-history / monthly budget” and it can stay pinned at 100% after prior billing periods’ spend. Pass a billing-period total into _refresh_budget_metrics() instead of the lifetime total.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/observability/prometheus_collector.py` around lines 199 - 224,
The code currently passes the lifetime total_cost into _refresh_budget_metrics,
which makes the monthly budget metric use retained-history instead of the
current billing period; compute the billing-period start (e.g., month_start =
utc_midnight.replace(day=1)) and call
app_state.cost_tracker.get_total_cost(start=month_start) to obtain a
billing_period_total, then pass billing_period_total to
_refresh_budget_metrics(app_state, ...) while still using total_cost for
_refresh_cost_gauge and daily_cost for _refresh_daily_budget_metric; keep the
get_total_cost calls grouped in the same try/except and reference the variables
total_cost, daily_cost, utc_midnight, billing_period_total,
_refresh_budget_metrics, and app_state.cost_tracker.get_total_cost() when making
the change.

245-260: ⚠️ Potential issue | 🟠 Major

Reset synthorg_budget_used_percent when it is not computable.

This method now clears the budget gauges for missing tracker/config, but monthly <= 0 and total_cost is None still fall through without touching _budget_used_percent. After one successful scrape, that leaves the previous percentage exported even though the budget is disabled or the total query failed.

🧹 Proposed fix
             monthly = tracker.budget_config.total_monthly
             self._budget_monthly_usd.set(monthly)
-            if monthly > 0 and total_cost is not None:
-                self._budget_used_percent.set(
-                    min(100.0, (total_cost / monthly) * 100.0),
-                )
+            if monthly <= 0 or total_cost is None:
+                self._budget_used_percent.set(0.0)
+                return
+            self._budget_used_percent.set(
+                min(100.0, (total_cost / monthly) * 100.0),
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/observability/prometheus_collector.py` around lines 245 - 260,
The budget percent gauge _budget_used_percent is not cleared when monthly <= 0
or total_cost is None, leaving stale values; inside the same method where you
read app_state.has_cost_tracker and tracker = app_state.cost_tracker (in
prometheus_collector.py), ensure you explicitly set
self._budget_used_percent.set(0.0) whenever tracker.budget_config is missing or
monthly <= 0 or total_cost is None (after setting _budget_monthly_usd),
otherwise compute the percent as before using min(100.0, (total_cost / monthly)
* 100.0).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 199-224: The code currently passes the lifetime total_cost into
_refresh_budget_metrics, which makes the monthly budget metric use
retained-history instead of the current billing period; compute the
billing-period start (e.g., month_start = utc_midnight.replace(day=1)) and call
app_state.cost_tracker.get_total_cost(start=month_start) to obtain a
billing_period_total, then pass billing_period_total to
_refresh_budget_metrics(app_state, ...) while still using total_cost for
_refresh_cost_gauge and daily_cost for _refresh_daily_budget_metric; keep the
get_total_cost calls grouped in the same try/except and reference the variables
total_cost, daily_cost, utc_midnight, billing_period_total,
_refresh_budget_metrics, and app_state.cost_tracker.get_total_cost() when making
the change.
- Around line 245-260: The budget percent gauge _budget_used_percent is not
cleared when monthly <= 0 or total_cost is None, leaving stale values; inside
the same method where you read app_state.has_cost_tracker and tracker =
app_state.cost_tracker (in prometheus_collector.py), ensure you explicitly set
self._budget_used_percent.set(0.0) whenever tracker.budget_config is missing or
monthly <= 0 or total_cost is None (after setting _budget_monthly_usd),
otherwise compute the percent as before using min(100.0, (total_cost / monthly)
* 100.0).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8649ec36-6f54-4435-87de-433fc53017af

📥 Commits

Reviewing files that changed from the base of the PR and between 9e50231 and 3bab1df.

📒 Files selected for processing (2)
  • docs/design/operations.md
  • src/synthorg/observability/prometheus_collector.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
docs/design/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Update the relevant docs/design/ page to reflect approved deviations from the spec

Files:

  • docs/design/operations.md
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations -- Python 3.14 has PEP 649 native lazy annotations
Use except A, B: (no parentheses) -- PEP 758 except syntax, enforced by ruff on Python 3.14
All public functions must have type hints. Use mypy strict mode.
Google style docstrings are required on public classes and functions. Enforced by ruff D rules.
Use immutability patterns: create new objects, never mutate existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction + MappingProxyType wrapping.
Use frozen Pydantic models for config/identity; separate mutable-via-copy models for runtime state. Never mix static config fields with mutable runtime fields.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Include allow_inf_nan=False in all ConfigDict declarations.
Use @computed_field for derived values instead of storing + validating redundant fields
Use NotBlankStr (from core.types) for all identifier/name fields, including optional and tuple variants, instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code
Line length must be 88 characters. Enforced by ruff.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Always use variable name logger (not _logger, not log) for logger instances

Files:

  • src/synthorg/observability/prometheus_collector.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST import from synthorg.observability import get_logger and set logger = get_logger(__name__)
Never use import logging or logging.getLogger() or print() in application code. Exceptions: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, observability/http_handler.py, observability/otlp_handler.py may use stdlib logging and stderr print for handler construction.
Event names must always use constants from domain-specific modules under synthorg.observability.events. Import directly from specific domain modules.
Use structured kwargs for logging: always logger.info(EVENT, key=value) -- never logger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig

Files:

  • src/synthorg/observability/prometheus_collector.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Never use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, test-provider, test-small-001, etc.

Files:

  • src/synthorg/observability/prometheus_collector.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/prometheus_collector.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/observability/prometheus_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/observability/prometheus_collector.py
🔇 Additional comments (1)
src/synthorg/observability/prometheus_collector.py (1)

391-406: ⚠️ Potential issue | 🟠 Major

Avoid per-agent full-scan fan-out on every scrape.

asyncio.TaskGroup helps the latency profile, but it does not reduce the underlying work here. src/synthorg/budget/tracker.py:202-229 shows that each get_agent_cost() call snapshots and filters the record set, so launching one or two of those per active agent makes scrape cost scale with both agent count and retained history size. Prefer a single grouped aggregation / single snapshot from the tracker for all agent labels.

⛔ Skipped due to learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-08T19:05:17.823Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:39:21.584Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.

coderabbitai[bot]
coderabbitai bot previously approved these changes Apr 8, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 209-218: The metrics collector currently computes billing_cost
from the calendar month start; change it to use the configured billing period
start: import billing_period_start from synthorg.budget.billing, compute
billing_period_start(app_state.cost_tracker.budget_config.reset_day) and pass
that start into app_state.cost_tracker.get_total_cost(...) when setting
billing_cost (the variables/functions to update are billing_cost and month_start
logic in prometheus_collector.py and use
app_state.cost_tracker.budget_config.reset_day to obtain reset_day).

In `@tests/unit/observability/test_prometheus_collector.py`:
- Around line 381-399: Update the inline comment in the
test_daily_budget_percent_computed test to remove the hardcoded "30 days"
assumption and instead state that the provided daily_cost (50.0) exceeds the
prorated daily budget for any month length, so the metric should cap at 100%;
edit the comment near the test function and the variables total_cost,
daily_cost, and budget_total_monthly to reflect this behavior rather than
specific 30-day math.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: cde75d34-93fb-453c-a115-2b9280701676

📥 Commits

Reviewing files that changed from the base of the PR and between 3bab1df and 739a9b6.

📒 Files selected for processing (3)
  • docs/design/operations.md
  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do NOT use from __future__ import annotations in Python code—Python 3.14 has PEP 649 native lazy annotations
Use except A, B: (no parentheses) for PEP 758 except syntax in Python 3.14; ruff enforces this
All public functions and classes in Python must have type hints; mypy strict mode is enforced
Use Google-style docstrings for all public classes and functions in Python; ruff D rules enforce this
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves (e.g. agent execution state, task progress). Never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict); adopt convention: use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time
Use @computed_field for derived values in Pydantic models instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields in Pydantic models—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently ...

Files:

  • tests/unit/observability/test_prometheus_collector.py
  • src/synthorg/observability/prometheus_collector.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow markers for Python tests
Python tests must achieve 80% minimum code coverage (enforced in CI)
Prefer @pytest.mark.parametrize for testing similar cases in Python tests
NEVER skip, dismiss, or ignore flaky Python tests—always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() instead of widening timing margins. Use asyncio.Event().wait() for indefinite blocking instead of asyncio.sleep(large_number)

Files:

  • tests/unit/observability/test_prometheus_collector.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/observability/test_prometheus_collector.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Every Python module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in Python application code; exceptions: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, observability/http_handler.py, and observability/otlp_handler.py may use stdlib logging and print(..., file=sys.stderr) for handler construction, bootstrap, and error reporting code
Always use variable name logger (not _logger, not log) in Python code
Always use event name constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api); import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Always use structured kwargs in Python logging: logger.info(EVENT, key=value)—never logger.info("msg %s", val)
All error paths in Python code must log at WARNING or ERROR with context before raising
All state transitions in Python code must log at INFO level
Use DEBUG level for object creation, internal flow, entry/exit of key functions in Python logging
All provider calls in Python go through BaseCompletionProvider which applies retry + rate limiting automatically; never implement retry logic in driver subclasses or calling code
Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig in Python
Retryable errors in Python providers: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError (with is_retryable=True); non-retryable errors raise immediately
RetryExhaustedError in Python signals all retries failed—engine layer catches this to trigger fallback chains
Rate limiter in Python providers respects RateLimitError.retry_after from providers—automatically pauses future requests
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings...

Files:

  • src/synthorg/observability/prometheus_collector.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/prometheus_collector.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
  • src/synthorg/observability/prometheus_collector.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
  • src/synthorg/observability/prometheus_collector.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
🔇 Additional comments (12)
src/synthorg/observability/prometheus_collector.py (6)

96-114: LGTM! New gauge registrations are well-structured.

The three new metric families are correctly registered with appropriate descriptions and labels. The agent_id label on per-agent gauges aligns with the PR objectives for per-agent cost breakdowns.


244-282: LGTM! Budget metrics properly handle all edge cases.

The method correctly:

  • Resets gauges to 0.0 before early returns (addressing prior review feedback)
  • Uses billing-period cost for utilization calculation
  • Caps percentage at 100%
  • Handles missing config/tracker gracefully

284-328: LGTM! Daily budget metric implementation is correct.

The method properly:

  • Uses the passed utc_midnight for consistent month/day calculations (addressing prior TOCTOU feedback)
  • Resets gauge to 0.0 for all early-return paths
  • Guards against division by zero (monthly <= 0 check precedes the calculation)
  • Caps at 100%

330-372: LGTM! Agent metrics refactored to return active agents for reuse.

The method correctly:

  • Clears label series at the start (addressing prior feedback about stale labels)
  • Returns empty tuple on early exit or error
  • Returns the agents tuple for downstream use by _refresh_agent_cost_metrics

374-441: LGTM! Per-agent cost metrics implementation is well-structured.

The method correctly:

  • Clears gauge label series before any early return (prior feedback addressed)
  • Honors total_monthly <= 0 when computing per_agent_limit (prior feedback addressed)
  • Uses asyncio.TaskGroup for parallel fan-out (compliant with coding guidelines)
  • Caps budget utilization at 100%
  • Isolates exceptions without crashing the scrape

443-445: LGTM! Task metrics clearing moved to method start.

Consistent with the pattern applied to agent metrics—clears label series before any early return to prevent stale data.

docs/design/operations.md (2)

1304-1304: LGTM! Metrics documentation updated accurately.

The /api/v1/metrics documentation correctly:

  • Updates count from 9 to 12 metric families
  • Uses synthorg_app_info (prior feedback addressed)
  • Documents all three new metrics with their labels and purposes

1852-1867: LGTM! Control-plane gap tracking updated consistently.

The gap table and framing paragraph are now consistent—G1-G5 marked as closed, with G6 (budget history granularity) noted as the sole remaining low-priority gap. Prior feedback about inconsistent framing addressed.

tests/unit/observability/test_prometheus_collector.py (4)

49-59: Mock routing logic for get_total_cost is correct but could be more explicit.

The mock differentiates between:

  • No start → returns _total (lifetime)
  • start.day == 1 → returns _billing (month-to-date)
  • Otherwise → returns _daily (daily)

This matches the collector's query patterns. The comment at line 56-57 clarifies the billing-period heuristic.


513-711: LGTM! Comprehensive test coverage for per-agent metrics.

The test class covers:

  • Per-agent cost and budget percentage calculations
  • Stale label clearing when agents disappear
  • Early return paths (no agents, no tracker, empty tuple)
  • per_agent_daily_limit=0 disabling budget percentage
  • Exception handling clearing gauges

212-260: LGTM! Billing period cost tests validate key behavior changes.

test_budget_percent_uses_billing_period_cost correctly verifies that month-to-date cost (not lifetime) is used for utilization. test_budget_percent_reset_when_cost_unavailable validates resilience when the tracker fails.


327-330: LGTM! Invalid verdict rejection test added.

Validates that record_security_verdict raises ValueError for unknown verdicts, preventing label cardinality explosion from arbitrary strings.

@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Pre-reviewed by 12 agents, 10 findings addressed:
- Refactor _refresh_agent_cost_metrics to use asyncio.TaskGroup for
  parallel fan-out of per-agent cost queries
- Improve docstrings for _refresh_daily_budget_metric,
  _refresh_agent_metrics, _refresh_agent_cost_metrics
- Add 5 new tests: zero daily cost, zero monthly budget, empty agents
  tuple, agent cost exception recovery, invalid security verdict
- Update operations.md: 9 to 12 metric families with descriptions
@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/observability/prometheus_collector.py (1)

287-292: ⚠️ Potential issue | 🟠 Major

Reset the budget gauges when refresh fails unexpectedly.

Both except Exception: paths only log. After any prior successful scrape, an exception here leaves Prometheus serving the previous budget values, which is worse than exporting 0.0/unavailable. Initialize these gauges to 0.0 before the guarded work or reset them in the handlers.

🧹 Proposed fix
     def _refresh_budget_metrics(
         self,
         app_state: AppState,
         billing_cost: float | None,
     ) -> None:
+        self._budget_used_percent.set(0.0)
+        self._budget_monthly_usd.set(0.0)
         if not app_state.has_cost_tracker:
-            self._budget_used_percent.set(0.0)
-            self._budget_monthly_usd.set(0.0)
             return
         try:
             tracker = app_state.cost_tracker
@@
     def _refresh_daily_budget_metric(
         self,
         app_state: AppState,
         daily_cost: float | None,
         utc_midnight: datetime,
     ) -> None:
+        self._budget_daily_used_percent.set(0.0)
         if not app_state.has_cost_tracker or daily_cost is None:
-            self._budget_daily_used_percent.set(0.0)
             return

Also applies to: 333-338

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/observability/prometheus_collector.py` around lines 287 - 292,
The budget scrape except blocks only log errors and leave previous values alive;
modify the guarded budget refresh code so that immediately before the try/except
you initialize the Prometheus budget gauges to 0.0 (or, alternatively, in each
except Exception: handler explicitly set the budget-related Gauge objects to
0.0) and then proceed with the normal scrape; update the except block shown (the
logger.warning(METRICS_SCRAPE_FAILED, component="budget", exc_info=True)
handler) and the other budget except handler to reset all budget gauges to 0.0
before re-raising/returning, while keeping the existing logger.warning call with
exc_info.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/operations.md`:
- Line 1852: In the G1 table row update the endpoint string to the concrete API
path used elsewhere by replacing `/metrics` with `/api/v1/metrics` so the entry
matches the API Surface documentation; specifically edit the G1 row text that
references PrometheusCollector and `/metrics` (the description mentioning
`PrometheusCollector` instantiated in `on_startup()` and the OTLP sink note) to
use `/api/v1/metrics` to remove the ambiguity.

In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 322-329: The calculation for daily budget uses
monthrange(utc_midnight.year, utc_midnight.month) which ignores
BudgetConfig.reset_day and can span two months; update the denominator to
compute the current billing period length by using billing_period_start(tracker,
utc_midnight) (or billing_period_start(...) already used elsewhere) and the next
reset boundary (billing period start + period length determined from reset_day)
to derive days_in_billing_period, then compute daily_budget =
tracker.budget_config.total_monthly / days_in_billing_period and call
self._budget_daily_used_percent.set(min(100.0, (daily_cost / daily_budget) *
100.0)); also add a regression test targeting the billing period logic with
reset_day != 1 to verify correct prorating across month boundaries.

---

Outside diff comments:
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 287-292: The budget scrape except blocks only log errors and leave
previous values alive; modify the guarded budget refresh code so that
immediately before the try/except you initialize the Prometheus budget gauges to
0.0 (or, alternatively, in each except Exception: handler explicitly set the
budget-related Gauge objects to 0.0) and then proceed with the normal scrape;
update the except block shown (the logger.warning(METRICS_SCRAPE_FAILED,
component="budget", exc_info=True) handler) and the other budget except handler
to reset all budget gauges to 0.0 before re-raising/returning, while keeping the
existing logger.warning call with exc_info.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 96ce9f43-e5f8-4165-8f8d-70de9644e1ab

📥 Commits

Reviewing files that changed from the base of the PR and between b367c4b and 906dc4d.

📒 Files selected for processing (3)
  • docs/design/operations.md
  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always use from synthorg.observability import get_logger followed by logger = get_logger(__name__) in every module with business logic
Never use import logging, logging.getLogger(), or print() in application code (exceptions: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, observability/http_handler.py, and observability/otlp_handler.py may use stdlib logging for handler construction and bootstrap)
Use logger as the variable name for loggers, never _logger or log
Always use event constants from synthorg.observability.events domain modules (e.g., API_REQUEST_STARTED from events.api) instead of string literals for log event names
Use structured logging with kwargs in the format logger.info(EVENT, key=value) instead of string formatting like logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Do NOT use from __future__ import annotations -- Python 3.14 has native PEP 649 support
Use PEP 758 except syntax: except A, B: without parentheses (ruff enforces this on Python 3.14)
All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Prioritize immutability: create new objects instead of mutating existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction plus MappingProxyType wrapping for read-only enforcement
Use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization) for mutable fields in frozen Pydantic models
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with model_copy(update=...) for runtime state that evolves (e.g., agent execution state, task progress)...

Files:

  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Always read the relevant docs/design/ page before implementing any feature or planning any issue
If implementation deviates from the design spec (better approach found, scope evolved, etc.), alert the user and explain why before proceeding; do NOT silently diverge
When approved deviations occur, update the relevant docs/design/ page to reflect the new reality

Files:

  • src/synthorg/observability/prometheus_collector.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/prometheus_collector.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow markers to classify tests
Maintain minimum 80% code coverage (enforced in CI)
Use asyncio_mode = "auto" configuration; do NOT add manual @pytest.mark.asyncio decorators
Global test timeout is 30 seconds per test (set in pyproject.toml); do NOT add per-file pytest.mark.timeout(30) markers (non-default overrides like timeout(60) are allowed)
ALWAYS include -n 8 when running pytest locally for parallelism; never run tests sequentially
Prefer @pytest.mark.parametrize for testing similar cases
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names like example-provider, example-large-001, test-provider, test-small-001, or large/medium/small aliases
Use Hypothesis for property-based testing with @given and @settings decorators; configure via HYPOTHESIS_PROFILE env var with profiles: ci (deterministic, 10 examples, derandomized), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline)
When Hypothesis finds a failure, it is a real bug; the shrunk example is saved for analysis but is NOT automatically replayed (fix the underlying bug and add an explicit @example(...) decorator to the test)
NEVER skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For indefinite blocking tasks, use asyncio.Event().wait() instead of asyncio.sleep(large_number)

Files:

  • tests/unit/observability/test_prometheus_collector.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/observability/test_prometheus_collector.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-04-08T20:20:05.318Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-08T20:20:05.318Z
Learning: Applies to src/synthorg/**/*.py : If implementation deviates from the design spec (better approach found, scope evolved, etc.), alert the user and explain why before proceeding; do NOT silently diverge

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).

Applied to files:

  • docs/design/operations.md

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 435-448: The current scrape fans out a get_agent_cost() task per
agent (and another per-agent when per_agent_limit > 0) using asyncio.TaskGroup
by iterating agent_ids and populating total_tasks/daily_tasks, which can create
an O(active_agents) burst; fix by first preferring a bulk tracker call (e.g., a
tracker API that returns costs for multiple agent_ids in one request) and use
that instead of per-agent get_agent_cost() when available, otherwise limit
concurrency: replace the unbounded per-agent task creation with a bounded
approach (e.g., use an asyncio.Semaphore/BoundedSemaphore or a limited worker
TaskGroup) so only N concurrent get_agent_cost() calls run at once when calling
tracker.get_agent_cost(aid, ...) for each aid, keeping references like
total_tasks, daily_tasks, get_agent_cost, tracker, agent_ids, per_agent_limit,
and utc_midnight to locate the change.

In `@tests/unit/observability/test_prometheus_collector.py`:
- Around line 405-426: The test test_daily_budget_percent_partial_day is too
loose—freeze the collector date or directly call
PrometheusCollector._refresh_daily_budget_metric with a fixed utc_midnight so
proration is deterministic; update the test to set a concrete reset_day != 1 and
assert the exact expected percentage (non-capped) instead of only checking 0 <
value < 100, using the PrometheusCollector instance (collector) and
generate_latest(collector.registry) to verify the metric output; mirror this
change for the other failing block around lines 516-537 to ensure the proration
logic is pinned.
- Around line 611-617: The test currently uses cost_lines = [...] and assert
all('agent_id="alice"' in ln for ln in cost_lines), which is vacuously true for
an empty cost_lines; change the assertions to ensure cost_lines actually
contains the surviving series and that it includes alice: first assert
cost_lines is not empty (or assert any(...) is True), then assert that at least
one line in cost_lines contains 'agent_id="alice"' (and optionally assert no
line contains 'agent_id="bob"'); reference the cost_lines variable, output_v2,
and the metric name synthorg_agent_cost_total when locating the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 58eb2966-ed95-4390-a4f1-42e522bae0a2

📥 Commits

Reviewing files that changed from the base of the PR and between 906dc4d and 96ceaed.

📒 Files selected for processing (3)
  • docs/design/operations.md
  • src/synthorg/observability/prometheus_collector.py
  • tests/unit/observability/test_prometheus_collector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Deploy Preview
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always use from synthorg.observability import get_logger followed by logger = get_logger(__name__) in every module with business logic
Never use import logging, logging.getLogger(), or print() in application code (exceptions: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, observability/http_handler.py, and observability/otlp_handler.py may use stdlib logging for handler construction and bootstrap)
Use logger as the variable name for loggers, never _logger or log
Always use event constants from synthorg.observability.events domain modules (e.g., API_REQUEST_STARTED from events.api) instead of string literals for log event names
Use structured logging with kwargs in the format logger.info(EVENT, key=value) instead of string formatting like logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Do NOT use from __future__ import annotations -- Python 3.14 has native PEP 649 support
Use PEP 758 except syntax: except A, B: without parentheses (ruff enforces this on Python 3.14)
All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Prioritize immutability: create new objects instead of mutating existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction plus MappingProxyType wrapping for read-only enforcement
Use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization) for mutable fields in frozen Pydantic models
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with model_copy(update=...) for runtime state that evolves (e.g., agent execution state, task progress)...

Files:

  • tests/unit/observability/test_prometheus_collector.py
  • src/synthorg/observability/prometheus_collector.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow markers to classify tests
Maintain minimum 80% code coverage (enforced in CI)
Use asyncio_mode = "auto" configuration; do NOT add manual @pytest.mark.asyncio decorators
Global test timeout is 30 seconds per test (set in pyproject.toml); do NOT add per-file pytest.mark.timeout(30) markers (non-default overrides like timeout(60) are allowed)
ALWAYS include -n 8 when running pytest locally for parallelism; never run tests sequentially
Prefer @pytest.mark.parametrize for testing similar cases
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names like example-provider, example-large-001, test-provider, test-small-001, or large/medium/small aliases
Use Hypothesis for property-based testing with @given and @settings decorators; configure via HYPOTHESIS_PROFILE env var with profiles: ci (deterministic, 10 examples, derandomized), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline)
When Hypothesis finds a failure, it is a real bug; the shrunk example is saved for analysis but is NOT automatically replayed (fix the underlying bug and add an explicit @example(...) decorator to the test)
NEVER skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For indefinite blocking tasks, use asyncio.Event().wait() instead of asyncio.sleep(large_number)

Files:

  • tests/unit/observability/test_prometheus_collector.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/observability/test_prometheus_collector.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Always read the relevant docs/design/ page before implementing any feature or planning any issue
If implementation deviates from the design spec (better approach found, scope evolved, etc.), alert the user and explain why before proceeding; do NOT silently diverge
When approved deviations occur, update the relevant docs/design/ page to reflect the new reality

Files:

  • src/synthorg/observability/prometheus_collector.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/prometheus_collector.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
📚 Learning: 2026-04-08T20:20:05.318Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-08T20:20:05.318Z
Learning: Applies to src/synthorg/**/*.py : If implementation deviates from the design spec (better approach found, scope evolved, etc.), alert the user and explain why before proceeding; do NOT silently diverge

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).

Applied to files:

  • docs/design/operations.md
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
  • src/synthorg/observability/prometheus_collector.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
  • src/synthorg/observability/prometheus_collector.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
  • src/synthorg/observability/prometheus_collector.py

Comment on lines +435 to +448
total_tasks: dict[str, asyncio.Task[float]] = {}
daily_tasks: dict[str, asyncio.Task[float]] = {}
async with asyncio.TaskGroup() as tg:
for aid in agent_ids:
total_tasks[aid] = tg.create_task(
tracker.get_agent_cost(aid),
)
if per_agent_limit > 0:
daily_tasks[aid] = tg.create_task(
tracker.get_agent_cost(
aid,
start=utc_midnight,
),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Avoid turning each scrape into an O(active_agents) cost-query burst.

This fans out one get_agent_cost() call per active agent, and a second per agent when per_agent_limit > 0. On the Prometheus scrape path that makes /metrics scale linearly with fleet size and can create a large concurrent burst against the tracker backend. If the tracker/repository can return per-agent totals in one pass, prefer that here; otherwise cap the fan-out.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/observability/prometheus_collector.py` around lines 435 - 448,
The current scrape fans out a get_agent_cost() task per agent (and another
per-agent when per_agent_limit > 0) using asyncio.TaskGroup by iterating
agent_ids and populating total_tasks/daily_tasks, which can create an
O(active_agents) burst; fix by first preferring a bulk tracker call (e.g., a
tracker API that returns costs for multiple agent_ids in one request) and use
that instead of per-agent get_agent_cost() when available, otherwise limit
concurrency: replace the unbounded per-agent task creation with a bounded
approach (e.g., use an asyncio.Semaphore/BoundedSemaphore or a limited worker
TaskGroup) so only N concurrent get_agent_cost() calls run at once when calling
tracker.get_agent_cost(aid, ...) for each aid, keeping references like
total_tasks, daily_tasks, get_agent_cost, tracker, agent_ids, per_agent_limit,
and utc_midnight to locate the change.

@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/observability/test_prometheus_collector.py`:
- Around line 405-440: The test test_daily_budget_percent_partial_day fails on
the 1st of a month because the helper _mock_app_state sometimes returns the
mocked _billing value instead of the intended daily cost; update _mock_app_state
so that when a daily_cost (or _daily) is provided it always returns that value
(or add a boolean flag like force_daily) rather than falling back to _billing on
month-start, then adjust the test to pass the explicit daily_cost (or
force_daily=True) so PrometheusCollector.refresh and the expected calculation
use the consistent daily cost.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3a96f71d-e5d8-4efd-b44a-e38fd942cb04

📥 Commits

Reviewing files that changed from the base of the PR and between 96ceaed and c3b6c2a.

📒 Files selected for processing (1)
  • tests/unit/observability/test_prometheus_collector.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Always use from synthorg.observability import get_logger followed by logger = get_logger(__name__) in every module with business logic
Never use import logging, logging.getLogger(), or print() in application code (exceptions: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, observability/http_handler.py, and observability/otlp_handler.py may use stdlib logging for handler construction and bootstrap)
Use logger as the variable name for loggers, never _logger or log
Always use event constants from synthorg.observability.events domain modules (e.g., API_REQUEST_STARTED from events.api) instead of string literals for log event names
Use structured logging with kwargs in the format logger.info(EVENT, key=value) instead of string formatting like logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Do NOT use from __future__ import annotations -- Python 3.14 has native PEP 649 support
Use PEP 758 except syntax: except A, B: without parentheses (ruff enforces this on Python 3.14)
All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Prioritize immutability: create new objects instead of mutating existing ones. For non-Pydantic internal collections, use copy.deepcopy() at construction plus MappingProxyType wrapping for read-only enforcement
Use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization) for mutable fields in frozen Pydantic models
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with model_copy(update=...) for runtime state that evolves (e.g., agent execution state, task progress)...

Files:

  • tests/unit/observability/test_prometheus_collector.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow markers to classify tests
Maintain minimum 80% code coverage (enforced in CI)
Use asyncio_mode = "auto" configuration; do NOT add manual @pytest.mark.asyncio decorators
Global test timeout is 30 seconds per test (set in pyproject.toml); do NOT add per-file pytest.mark.timeout(30) markers (non-default overrides like timeout(60) are allowed)
ALWAYS include -n 8 when running pytest locally for parallelism; never run tests sequentially
Prefer @pytest.mark.parametrize for testing similar cases
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names like example-provider, example-large-001, test-provider, test-small-001, or large/medium/small aliases
Use Hypothesis for property-based testing with @given and @settings decorators; configure via HYPOTHESIS_PROFILE env var with profiles: ci (deterministic, 10 examples, derandomized), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline)
When Hypothesis finds a failure, it is a real bug; the shrunk example is saved for analysis but is NOT automatically replayed (fix the underlying bug and add an explicit @example(...) decorator to the test)
NEVER skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For indefinite blocking tasks, use asyncio.Event().wait() instead of asyncio.sleep(large_number)

Files:

  • tests/unit/observability/test_prometheus_collector.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/observability/test_prometheus_collector.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • tests/unit/observability/test_prometheus_collector.py
🔇 Additional comments (7)
tests/unit/observability/test_prometheus_collector.py (7)

53-62: Date-dependent mock logic can cause flaky tests on the reset day.

The mock distinguishes daily vs. billing-period queries solely by start.day == _reset_day. When a test runs on the reset day (e.g., the 1st of the month for default reset_day=1), both queries match the condition:

  • Daily cost query: start=utc_midnightstart.day == 1
  • Billing period query: start=billing_period_startstart.day == 1

Tests expecting daily_cost (e.g., test_daily_budget_percent_partial_day with daily_cost=3.0) will instead receive _billing=50.0 on the 1st, causing intermittent failures.

Consider disambiguating by comparing the full timestamp or by ensuring tests use a reset_day that differs from today:

🛠️ Potential fix: compare hour to distinguish daily vs billing queries
         async def _get_total_cost(
             *,
             start: datetime | None = None,
             end: datetime | None = None,
         ) -> float:
             if start is None:
                 return _total
-            if start.day == _reset_day:
+            # Billing-period queries use billing_period_start (always 00:00:00).
+            # Daily queries use utc_midnight (also 00:00:00 but for today).
+            # Distinguish by checking if start is exactly at day boundary
+            # AND the month matches billing period logic. Simplest fix:
+            # billing period start is never "today", so check if start < today.
+            from datetime import UTC
+            today = datetime.now(UTC).replace(hour=0, minute=0, second=0, microsecond=0)
+            if start.day == _reset_day and start < today:
                 return _billing
             return _daily

Alternatively, set billing_cost=daily_cost in tests that don't care about the billing/daily distinction.
[raise_minor_issue, request_verification]

#!/bin/bash
# Check if any CI runs have failed on the 1st of the month for this test file
rg -n "test_daily_budget_percent_partial_day|test_daily_budget_zero_cost" tests/unit/observability/test_prometheus_collector.py

106-117: LGTM!

The name parameter addition is backward-compatible and enables explicit agent ID specification for per-agent metric tests.


216-234: LGTM!

Good coverage for billing-period vs. lifetime cost distinction. The assertion on exact value (25.0) confirms the correct cost source is used.


236-263: LGTM!

Good error-recovery test: verifies the gauge resets to 0.0 when the cost tracker becomes unavailable, preventing stale metrics from persisting.


331-334: LGTM!

Validates that invalid security verdicts are rejected with a clear error message.


625-632: LGTM!

The stale label assertion is now robust:

  • len(cost_lines) > 0 ensures the check is non-vacuous
  • all(...) verifies alice survives
  • not any(...) confirms bob is removed

554-754: Comprehensive per-agent metrics test coverage.

The test class covers all key scenarios:

  • Per-agent cost totals and budget percentages
  • Stale label clearing between scrapes
  • Skip conditions (no agents, no tracker, no limit, empty tuple)
  • Error recovery (gauges cleared on failure)

Well-structured and thorough.

Comment on lines +405 to +440
async def test_daily_budget_percent_partial_day(self) -> None:
"""Normal daily utilization produces correct percentage."""
from datetime import UTC

from synthorg.budget.billing import billing_period_start

collector = PrometheusCollector()
state = _mock_app_state(
has_cost_tracker=True,
total_cost=50.0,
daily_cost=3.0,
budget_total_monthly=300.0,
)
await collector.refresh(state)
output = generate_latest(collector.registry).decode()
lines = [
ln
for ln in output.splitlines()
if ln.startswith("synthorg_budget_daily_used_percent ")
]
assert len(lines) == 1
value = float(lines[0].split()[-1])
# Compute expected from current billing period length.
now = datetime.now(UTC)
ps = billing_period_start(1, now=now)
ns = (
ps.replace(
year=ps.year + 1,
month=1,
)
if ps.month == 12
else ps.replace(month=ps.month + 1)
)
days = (ns - ps).days
expected = (3.0 / (300.0 / days)) * 100.0
assert value == pytest.approx(expected, abs=0.01)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dynamic expected value calculation is well-designed but inherits the mock date-sensitivity.

The test correctly mirrors the production proration logic by computing expected from the current billing period length. This makes the assertion deterministic within a single test run.

Note: On the 1st of the month, the mock at lines 53-62 will return _billing=50.0 instead of _daily=3.0, causing the assertion to fail (see earlier comment on mock design).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/observability/test_prometheus_collector.py` around lines 405 -
440, The test test_daily_budget_percent_partial_day fails on the 1st of a month
because the helper _mock_app_state sometimes returns the mocked _billing value
instead of the intended daily cost; update _mock_app_state so that when a
daily_cost (or _daily) is provided it always returns that value (or add a
boolean flag like force_daily) rather than falling back to _billing on
month-start, then adjust the test to pass the explicit daily_cost (or
force_daily=True) so PrometheusCollector.refresh and the expected calculation
use the consistent daily cost.

@Aureliolo Aureliolo merged commit 581c494 into main Apr 8, 2026
27 of 28 checks passed
@Aureliolo Aureliolo deleted the feat/prometheus-metrics branch April 8, 2026 22:03
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview April 8, 2026 22:04 — with GitHub Actions Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prometheus metrics: daily budget %, per-agent budget, per-agent task counts

2 participants