feat: Prometheus metrics -- daily budget %, per-agent cost, per-agent budget %#1154
feat: Prometheus metrics -- daily budget %, per-agent cost, per-agent budget %#1154
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 9 minutes and 10 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughDocumentation updated: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
There was a problem hiding this comment.
Pull request overview
Adds additional Prometheus budget and per-agent cost utilization metrics to PrometheusCollector, aligning the /api/v1/metrics endpoint with deferred acceptance criteria from the Prometheus/OTLP observability work.
Changes:
- Add
synthorg_budget_daily_used_percent,synthorg_agent_cost_total{agent_id}, andsynthorg_agent_budget_used_percent{agent_id}metric families and refresh logic. - Extend the collector refresh flow to compute daily cost since UTC midnight and fan-out per-agent cost queries via
asyncio.TaskGroup. - Add/expand unit tests for daily budget and per-agent cost/budget metrics; update operations docs metric family count and descriptions.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/synthorg/observability/prometheus_collector.py |
Registers new gauges and implements refresh logic for daily budget % and per-agent cost/budget %. |
tests/unit/observability/test_prometheus_collector.py |
Adds coverage for the new metric families and an invalid security verdict test. |
docs/design/operations.md |
Updates /metrics endpoint documentation to reflect 12 metric families and lists the new ones. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not app_state.has_cost_tracker or daily_cost is None: | ||
| return | ||
| try: | ||
| tracker = app_state.cost_tracker | ||
| if tracker.budget_config is None: | ||
| return | ||
| monthly = tracker.budget_config.total_monthly | ||
| if monthly <= 0: | ||
| return |
There was a problem hiding this comment.
_refresh_daily_budget_metric() returns early in several cases (no cost tracker, missing budget_config, monthly<=0, etc.) without setting the gauge. If this metric was previously set on an earlier scrape, these early returns will leave a stale non-zero value exposed. Consider explicitly setting synthorg_budget_daily_used_percent to 0.0 (or NaN) on all skip/early-return paths and/or on exceptions to avoid stale values.
| if not agents or not app_state.has_cost_tracker: | ||
| return | ||
| try: | ||
| tracker = app_state.cost_tracker | ||
| self._agent_cost_total.clear() | ||
| self._agent_budget_used_percent.clear() |
There was a problem hiding this comment.
_refresh_agent_cost_metrics() returns early when agents is empty or the cost tracker is unavailable, but it only clears the per-agent gauges after passing that check. This means if agents existed on a previous scrape, and later the registry returns 0 agents (or cost tracker becomes unavailable), the old labeled series will remain and be reported as if still current. Clear _agent_cost_total/_agent_budget_used_percent before the early return (or clear whenever skipping) so stale agent label series are dropped reliably.
| await collector.refresh(state) | ||
| output = generate_latest(collector.registry).decode() | ||
| assert "synthorg_budget_daily_used_percent" in output | ||
| assert "100.0" in output |
There was a problem hiding this comment.
This test asserts that the exposition output contains "100.0" anywhere, which can match unrelated metrics and make the test brittle. It would be more reliable to extract the specific synthorg_budget_daily_used_percent sample line (as done in the partial_day test) and assert its numeric value is 100.0.
| async def test_agent_cost_skipped_when_agents_empty(self) -> None: | ||
| """Empty agents tuple with both services available still skips.""" | ||
| collector = PrometheusCollector() | ||
| state = _mock_app_state( | ||
| has_cost_tracker=True, | ||
| has_agent_registry=True, | ||
| agents=(), | ||
| budget_total_monthly=100.0, | ||
| ) | ||
| await collector.refresh(state) | ||
| output = generate_latest(collector.registry).decode() | ||
| assert "synthorg_agent_cost_total{" not in output |
There was a problem hiding this comment.
Current agent-cost tests cover stale label clearing when an agent disappears but do not cover the skip paths introduced by the early return in _refresh_agent_cost_metrics (e.g., second scrape where agents=() or cost tracker becomes unavailable). Adding a two-scrape regression test for those cases would prevent stale per-agent series from silently persisting.
There was a problem hiding this comment.
Code Review
This pull request expands the observability metrics by adding daily budget utilization and per-agent cost tracking to the Prometheus collector. It includes updates to the documentation, the collector implementation, and comprehensive unit tests. I have reviewed the changes and suggest a minor simplification for the conditional assignment of the per-agent daily limit to improve code readability.
| per_agent_limit = ( | ||
| budget_cfg.per_agent_daily_limit if budget_cfg is not None else 0.0 | ||
| ) |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1154 +/- ##
=======================================
Coverage 88.87% 88.88%
=======================================
Files 857 857
Lines 50408 50440 +32
Branches 5071 5072 +1
=======================================
+ Hits 44801 44833 +32
- Misses 4643 4644 +1
+ Partials 964 963 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/operations.md`:
- Line 1304: The docs list the info metric as `synthorg_app` but the actual
exporter and tests use `synthorg_app_info`; update the `/api/v1/metrics` table
entry and any metric-family list in docs/design/operations.md to use
`synthorg_app_info` (replace the string `synthorg_app` with `synthorg_app_info`)
so it matches the exporter and tests (see reference test
tests/unit/observability/test_prometheus_collector.py asserting
`synthorg_app_info`).
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 283-291: The function early-returns when budgets are not
applicable (checks on app_state.has_cost_tracker, daily_cost is None,
tracker.budget_config is None, and monthly <= 0) but leaves
synthorg_budget_daily_used_percent set from prior scrapes; before each return
clear or reset that metric (e.g., call the Prometheus gauge's remove()/set(0)
API) so it no longer reports stale values. Locate the code around app_state,
tracker, tracker.budget_config, daily_cost and the
synthorg_budget_daily_used_percent gauge in prometheus_collector.py and add a
single reset call immediately before each of the three return points.
- Around line 379-382: The code currently reads
tracker.budget_config.per_agent_daily_limit without checking
budget_cfg.total_monthly; update the per-agent limit computation to treat
budgeting as off when budget_cfg is None or budget_cfg.total_monthly <= 0 by
using per_agent_limit = budget_cfg.per_agent_daily_limit if (budget_cfg is not
None and getattr(budget_cfg, "total_monthly", 0) > 0) else 0.0 so
synthorg_agent_budget_used_percent is not emitted when total_monthly <= 0;
reference tracker.budget_config, budget_cfg.total_monthly,
budget_cfg.per_agent_daily_limit and the synthorg_agent_budget_used_percent
metric when making the change.
- Around line 373-378: The code returns early when `not agents` or `not
app_state.has_cost_tracker` but only clears metrics after that check, causing
stale `synthorg_agent_cost_total` and `synthorg_agent_budget_used_percent`
series to remain; move or duplicate the clears so that
`self._agent_cost_total.clear()` and `self._agent_budget_used_percent.clear()`
are executed before the early `return` (while still preserving the later logic
that sets `tracker = app_state.cost_tracker` when present).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5c3a6a89-0e39-44a1-be55-9be9ca7b8fb6
📒 Files selected for processing (3)
docs/design/operations.mdsrc/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Deploy Preview
- GitHub Check: Agent
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations-- Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:(no parentheses) -- PEP 758 except syntax, enforced by ruff on Python 3.14
All public functions must have type hints. Use mypy strict mode.
Google style docstrings are required on public classes and functions. Enforced by ruff D rules.
Use immutability patterns: create new objects, never mutate existing ones. For non-Pydantic internal collections, usecopy.deepcopy()at construction +MappingProxyTypewrapping.
Use frozen Pydantic models for config/identity; separate mutable-via-copy models for runtime state. Never mix static config fields with mutable runtime fields.
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Includeallow_inf_nan=Falsein allConfigDictdeclarations.
Use@computed_fieldfor derived values instead of storing + validating redundant fields
UseNotBlankStr(fromcore.types) for all identifier/name fields, including optional and tuple variants, instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code
Line length must be 88 characters. Enforced by ruff.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Always use variable namelogger(not_logger, notlog) for logger instances
Files:
src/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST importfrom synthorg.observability import get_loggerand setlogger = get_logger(__name__)
Never useimport loggingorlogging.getLogger()orprint()in application code. Exceptions: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, observability/http_handler.py, observability/otlp_handler.py may use stdlib logging and stderr print for handler construction.
Event names must always use constants from domain-specific modules undersynthorg.observability.events. Import directly from specific domain modules.
Use structured kwargs for logging: alwayslogger.info(EVENT, key=value)-- neverlogger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions
RetryConfig and RateLimiterConfig are set per-provider inProviderConfig
Files:
src/synthorg/observability/prometheus_collector.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Never use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,example-medium-001,test-provider,test-small-001, etc.
Files:
src/synthorg/observability/prometheus_collector.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/prometheus_collector.py
docs/design/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Update the relevant
docs/design/page to reflect approved deviations from the spec
Files:
docs/design/operations.md
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Useasyncio_mode = 'auto'for async tests -- no manual@pytest.mark.asyncioneeded
Global test timeout is 30 seconds per test in pyproject.toml. Non-default overrides liketimeout(60)are allowed, but do not add per-filepytest.mark.timeout(30)markers.
Prefer@pytest.mark.parametrizefor testing similar cases
Use Hypothesis for property-based testing with@given+@settingsdecorators
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep(). For indefinite blocks, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/observability/test_prometheus_collector.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/observability/test_prometheus_collector.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
src/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
src/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).
Applied to files:
docs/design/operations.md
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/synthorg/observability/prometheus_collector.py (1)
328-354:⚠️ Potential issue | 🟠 MajorClear
synthorg_active_agents_totalbefore the empty-return paths.Line 339 only runs after
list_active()succeeds. If Line 328 returns early or the exception path at Line 348 is hit, the previous(status, trust_level)series stay exported indefinitely even though the registry is unavailable.🧹 Proposed fix
async def _refresh_agent_metrics( self, app_state: AppState, ) -> tuple[Any, ...]: @@ - if not app_state.has_agent_registry: - return () + self._agents_total.clear() + if not app_state.has_agent_registry: + return () try: agents = await app_state.agent_registry.list_active() counts: Counter[tuple[str, str]] = Counter() for agent in agents: status = str(agent.status) trust = str(agent.tools.access_level) counts[(status, trust)] += 1 - # Clear stale label series so disappeared combinations - # drop to zero instead of retaining previous values. - self._agents_total.clear() for (status, trust), count in counts.items(): self._agents_total.labels( status=status, trust_level=trust, ).set(count)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/observability/prometheus_collector.py` around lines 328 - 354, If app_state.has_agent_registry is false or an exception occurs we must clear the synthorg_active_agents_total metric so stale label series are not exported; move or duplicate the call to self._agents_total.clear() so it runs before the early return in the has_agent_registry False branch and in the exception path inside collect_active_agents (i.e., around the block that calls app_state.agent_registry.list_active and handles exceptions), ensuring that self._agents_total.clear() is invoked whenever list_active won't populate current labels.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/operations.md`:
- Line 1852: The "Recommended Framing" footer in docs/design/operations.md is
inconsistent with the gap table: G1 is closed and the section notes G1–G5
closed, but the paragraph still says G3 and G4 remain open; update that
paragraph to match the table by removing the mention of G3/G4 (or explicitly
state that G1–G5 are closed/no remaining gaps). Edit the paragraph labeled
"Recommended Framing" to reference the gap IDs (G1, G3, G4, G5 or G1–G5) so the
section shows a single current status consistent with the table.
In `@src/synthorg/observability/prometheus_collector.py`:
- Line 224: The daily/monthly budget refresh recomputes the current UTC time
twice causing inconsistent day-count vs numerator timestamp; modify
_refresh_daily_budget_metric (and the analogous monthly logic) to compute a
single timestamp variable (e.g., utc_midnight or now_utc) once at the start of
the scrape handler and pass that same timestamp into both the numerator
calculation and the month-length/divisor logic instead of calling
datetime.now(UTC) again so the numerator and divisor use the identical UTC
moment.
---
Duplicate comments:
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 328-354: If app_state.has_agent_registry is false or an exception
occurs we must clear the synthorg_active_agents_total metric so stale label
series are not exported; move or duplicate the call to
self._agents_total.clear() so it runs before the early return in the
has_agent_registry False branch and in the exception path inside
collect_active_agents (i.e., around the block that calls
app_state.agent_registry.list_active and handles exceptions), ensuring that
self._agents_total.clear() is invoked whenever list_active won't populate
current labels.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d597882e-4bca-45a9-8c16-1c5d71432258
📒 Files selected for processing (3)
docs/design/operations.mdsrc/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Deploy Preview
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
docs/design/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Update the relevant
docs/design/page to reflect approved deviations from the spec
Files:
docs/design/operations.md
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations-- Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:(no parentheses) -- PEP 758 except syntax, enforced by ruff on Python 3.14
All public functions must have type hints. Use mypy strict mode.
Google style docstrings are required on public classes and functions. Enforced by ruff D rules.
Use immutability patterns: create new objects, never mutate existing ones. For non-Pydantic internal collections, usecopy.deepcopy()at construction +MappingProxyTypewrapping.
Use frozen Pydantic models for config/identity; separate mutable-via-copy models for runtime state. Never mix static config fields with mutable runtime fields.
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Includeallow_inf_nan=Falsein allConfigDictdeclarations.
Use@computed_fieldfor derived values instead of storing + validating redundant fields
UseNotBlankStr(fromcore.types) for all identifier/name fields, including optional and tuple variants, instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code
Line length must be 88 characters. Enforced by ruff.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Always use variable namelogger(not_logger, notlog) for logger instances
Files:
src/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST importfrom synthorg.observability import get_loggerand setlogger = get_logger(__name__)
Never useimport loggingorlogging.getLogger()orprint()in application code. Exceptions: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, observability/http_handler.py, observability/otlp_handler.py may use stdlib logging and stderr print for handler construction.
Event names must always use constants from domain-specific modules undersynthorg.observability.events. Import directly from specific domain modules.
Use structured kwargs for logging: alwayslogger.info(EVENT, key=value)-- neverlogger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions
RetryConfig and RateLimiterConfig are set per-provider inProviderConfig
Files:
src/synthorg/observability/prometheus_collector.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Never use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,example-medium-001,test-provider,test-small-001, etc.
Files:
src/synthorg/observability/prometheus_collector.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/prometheus_collector.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Useasyncio_mode = 'auto'for async tests -- no manual@pytest.mark.asyncioneeded
Global test timeout is 30 seconds per test in pyproject.toml. Non-default overrides liketimeout(60)are allowed, but do not add per-filepytest.mark.timeout(30)markers.
Prefer@pytest.mark.parametrizefor testing similar cases
Use Hypothesis for property-based testing with@given+@settingsdecorators
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep(). For indefinite blocks, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/observability/test_prometheus_collector.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/observability/test_prometheus_collector.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-21T12:54:22.557Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-21T12:54:22.557Z
Learning: Version bumping (pre-1.0): `fix:` = patch, `feat:` = patch, `feat!:` = minor, `BREAKING CHANGE` trailer = minor. Update version in `pyproject.toml` (`[tool.commitizen].version`) and `src/synthorg/__init__.py` (`__version__`)
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
src/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
src/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/synthorg/observability/prometheus_collector.py (2)
199-224:⚠️ Potential issue | 🟠 MajorDon’t reuse lifetime
total_costfor the monthly budget gauge.
_cost_totalcan legitimately useget_total_cost()across all retained records, butsynthorg_budget_used_percentcannot.src/synthorg/budget/tracker.py:177-200documents that an unboundedget_total_cost()sums across all records, so feeding that value into_refresh_budget_metrics()turns a “monthly budget” metric into “retained-history / monthly budget” and it can stay pinned at 100% after prior billing periods’ spend. Pass a billing-period total into_refresh_budget_metrics()instead of the lifetime total.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/observability/prometheus_collector.py` around lines 199 - 224, The code currently passes the lifetime total_cost into _refresh_budget_metrics, which makes the monthly budget metric use retained-history instead of the current billing period; compute the billing-period start (e.g., month_start = utc_midnight.replace(day=1)) and call app_state.cost_tracker.get_total_cost(start=month_start) to obtain a billing_period_total, then pass billing_period_total to _refresh_budget_metrics(app_state, ...) while still using total_cost for _refresh_cost_gauge and daily_cost for _refresh_daily_budget_metric; keep the get_total_cost calls grouped in the same try/except and reference the variables total_cost, daily_cost, utc_midnight, billing_period_total, _refresh_budget_metrics, and app_state.cost_tracker.get_total_cost() when making the change.
245-260:⚠️ Potential issue | 🟠 MajorReset
synthorg_budget_used_percentwhen it is not computable.This method now clears the budget gauges for missing tracker/config, but
monthly <= 0andtotal_cost is Nonestill fall through without touching_budget_used_percent. After one successful scrape, that leaves the previous percentage exported even though the budget is disabled or the total query failed.🧹 Proposed fix
monthly = tracker.budget_config.total_monthly self._budget_monthly_usd.set(monthly) - if monthly > 0 and total_cost is not None: - self._budget_used_percent.set( - min(100.0, (total_cost / monthly) * 100.0), - ) + if monthly <= 0 or total_cost is None: + self._budget_used_percent.set(0.0) + return + self._budget_used_percent.set( + min(100.0, (total_cost / monthly) * 100.0), + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/observability/prometheus_collector.py` around lines 245 - 260, The budget percent gauge _budget_used_percent is not cleared when monthly <= 0 or total_cost is None, leaving stale values; inside the same method where you read app_state.has_cost_tracker and tracker = app_state.cost_tracker (in prometheus_collector.py), ensure you explicitly set self._budget_used_percent.set(0.0) whenever tracker.budget_config is missing or monthly <= 0 or total_cost is None (after setting _budget_monthly_usd), otherwise compute the percent as before using min(100.0, (total_cost / monthly) * 100.0).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 199-224: The code currently passes the lifetime total_cost into
_refresh_budget_metrics, which makes the monthly budget metric use
retained-history instead of the current billing period; compute the
billing-period start (e.g., month_start = utc_midnight.replace(day=1)) and call
app_state.cost_tracker.get_total_cost(start=month_start) to obtain a
billing_period_total, then pass billing_period_total to
_refresh_budget_metrics(app_state, ...) while still using total_cost for
_refresh_cost_gauge and daily_cost for _refresh_daily_budget_metric; keep the
get_total_cost calls grouped in the same try/except and reference the variables
total_cost, daily_cost, utc_midnight, billing_period_total,
_refresh_budget_metrics, and app_state.cost_tracker.get_total_cost() when making
the change.
- Around line 245-260: The budget percent gauge _budget_used_percent is not
cleared when monthly <= 0 or total_cost is None, leaving stale values; inside
the same method where you read app_state.has_cost_tracker and tracker =
app_state.cost_tracker (in prometheus_collector.py), ensure you explicitly set
self._budget_used_percent.set(0.0) whenever tracker.budget_config is missing or
monthly <= 0 or total_cost is None (after setting _budget_monthly_usd),
otherwise compute the percent as before using min(100.0, (total_cost / monthly)
* 100.0).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8649ec36-6f54-4435-87de-433fc53017af
📒 Files selected for processing (2)
docs/design/operations.mdsrc/synthorg/observability/prometheus_collector.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
docs/design/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Update the relevant
docs/design/page to reflect approved deviations from the spec
Files:
docs/design/operations.md
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations-- Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:(no parentheses) -- PEP 758 except syntax, enforced by ruff on Python 3.14
All public functions must have type hints. Use mypy strict mode.
Google style docstrings are required on public classes and functions. Enforced by ruff D rules.
Use immutability patterns: create new objects, never mutate existing ones. For non-Pydantic internal collections, usecopy.deepcopy()at construction +MappingProxyTypewrapping.
Use frozen Pydantic models for config/identity; separate mutable-via-copy models for runtime state. Never mix static config fields with mutable runtime fields.
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Includeallow_inf_nan=Falsein allConfigDictdeclarations.
Use@computed_fieldfor derived values instead of storing + validating redundant fields
UseNotBlankStr(fromcore.types) for all identifier/name fields, including optional and tuple variants, instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code
Line length must be 88 characters. Enforced by ruff.
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Always use variable namelogger(not_logger, notlog) for logger instances
Files:
src/synthorg/observability/prometheus_collector.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST importfrom synthorg.observability import get_loggerand setlogger = get_logger(__name__)
Never useimport loggingorlogging.getLogger()orprint()in application code. Exceptions: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, observability/http_handler.py, observability/otlp_handler.py may use stdlib logging and stderr print for handler construction.
Event names must always use constants from domain-specific modules undersynthorg.observability.events. Import directly from specific domain modules.
Use structured kwargs for logging: alwayslogger.info(EVENT, key=value)-- neverlogger.info('msg %s', val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging should be used for object creation, internal flow, and entry/exit of key functions
RetryConfig and RateLimiterConfig are set per-provider inProviderConfig
Files:
src/synthorg/observability/prometheus_collector.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Never use real vendor names (Anthropic, OpenAI, Claude, GPT) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,example-medium-001,test-provider,test-small-001, etc.
Files:
src/synthorg/observability/prometheus_collector.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/prometheus_collector.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
src/synthorg/observability/prometheus_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
src/synthorg/observability/prometheus_collector.py
🔇 Additional comments (1)
src/synthorg/observability/prometheus_collector.py (1)
391-406:⚠️ Potential issue | 🟠 MajorAvoid per-agent full-scan fan-out on every scrape.
asyncio.TaskGrouphelps the latency profile, but it does not reduce the underlying work here.src/synthorg/budget/tracker.py:202-229shows that eachget_agent_cost()call snapshots and filters the record set, so launching one or two of those per active agent makes scrape cost scale with both agent count and retained history size. Prefer a single grouped aggregation / single snapshot from the tracker for all agent labels.⛔ Skipped due to learnings
Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-17T22:08:13.456Z Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-08T19:05:17.823Z Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-01T09:39:21.584Z Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-17T18:52:05.142Z Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-17T06:30:14.180Z Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-19T07:12:14.508Z Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-17T06:30:14.180Z Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.
3bab1df to
739a9b6
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 209-218: The metrics collector currently computes billing_cost
from the calendar month start; change it to use the configured billing period
start: import billing_period_start from synthorg.budget.billing, compute
billing_period_start(app_state.cost_tracker.budget_config.reset_day) and pass
that start into app_state.cost_tracker.get_total_cost(...) when setting
billing_cost (the variables/functions to update are billing_cost and month_start
logic in prometheus_collector.py and use
app_state.cost_tracker.budget_config.reset_day to obtain reset_day).
In `@tests/unit/observability/test_prometheus_collector.py`:
- Around line 381-399: Update the inline comment in the
test_daily_budget_percent_computed test to remove the hardcoded "30 days"
assumption and instead state that the provided daily_cost (50.0) exceeds the
prorated daily budget for any month length, so the metric should cap at 100%;
edit the comment near the test function and the variables total_cost,
daily_cost, and budget_total_monthly to reflect this behavior rather than
specific 30-day math.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: cde75d34-93fb-453c-a115-2b9280701676
📒 Files selected for processing (3)
docs/design/operations.mdsrc/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do NOT usefrom __future__ import annotationsin Python code—Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:(no parentheses) for PEP 758 except syntax in Python 3.14; ruff enforces this
All public functions and classes in Python must have type hints; mypy strict mode is enforced
Use Google-style docstrings for all public classes and functions in Python; ruff D rules enforce this
Create new objects instead of mutating existing ones; for non-Pydantic internal collections (registries, BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization)
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves (e.g. agent execution state, task progress). Never mix static config fields with mutable runtime fields in one model
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict); adopt convention: useallow_inf_nan=Falsein allConfigDictdeclarations to rejectNaN/Infin numeric fields at validation time
Use@computed_fieldfor derived values in Pydantic models instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields in Pydantic models—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls); prefer structured concurrency over barecreate_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly; never silently ...
Files:
tests/unit/observability/test_prometheus_collector.pysrc/synthorg/observability/prometheus_collector.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowmarkers for Python tests
Python tests must achieve 80% minimum code coverage (enforced in CI)
Prefer@pytest.mark.parametrizefor testing similar cases in Python tests
NEVER skip, dismiss, or ignore flaky Python tests—always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()instead of widening timing margins. Useasyncio.Event().wait()for indefinite blocking instead ofasyncio.sleep(large_number)
Files:
tests/unit/observability/test_prometheus_collector.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/observability/test_prometheus_collector.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every Python module with business logic MUST have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in Python application code; exceptions:observability/setup.py,observability/sinks.py,observability/syslog_handler.py,observability/http_handler.py, andobservability/otlp_handler.pymay use stdlibloggingandprint(..., file=sys.stderr)for handler construction, bootstrap, and error reporting code
Always use variable namelogger(not_logger, notlog) in Python code
Always use event name constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api); import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Always use structured kwargs in Python logging:logger.info(EVENT, key=value)—neverlogger.info("msg %s", val)
All error paths in Python code must log at WARNING or ERROR with context before raising
All state transitions in Python code must log at INFO level
Use DEBUG level for object creation, internal flow, entry/exit of key functions in Python logging
All provider calls in Python go throughBaseCompletionProviderwhich applies retry + rate limiting automatically; never implement retry logic in driver subclasses or calling code
SetRetryConfigandRateLimiterConfigper-provider inProviderConfigin Python
Retryable errors in Python providers:RateLimitError,ProviderTimeoutError,ProviderConnectionError,ProviderInternalError(withis_retryable=True); non-retryable errors raise immediately
RetryExhaustedErrorin Python signals all retries failed—engine layer catches this to trigger fallback chains
Rate limiter in Python providers respectsRateLimitError.retry_afterfrom providers—automatically pauses future requests
NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings...
Files:
src/synthorg/observability/prometheus_collector.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/prometheus_collector.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
tests/unit/observability/test_prometheus_collector.pysrc/synthorg/observability/prometheus_collector.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
tests/unit/observability/test_prometheus_collector.pysrc/synthorg/observability/prometheus_collector.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
Applied to files:
tests/unit/observability/test_prometheus_collector.py
🔇 Additional comments (12)
src/synthorg/observability/prometheus_collector.py (6)
96-114: LGTM! New gauge registrations are well-structured.The three new metric families are correctly registered with appropriate descriptions and labels. The
agent_idlabel on per-agent gauges aligns with the PR objectives for per-agent cost breakdowns.
244-282: LGTM! Budget metrics properly handle all edge cases.The method correctly:
- Resets gauges to
0.0before early returns (addressing prior review feedback)- Uses billing-period cost for utilization calculation
- Caps percentage at 100%
- Handles missing config/tracker gracefully
284-328: LGTM! Daily budget metric implementation is correct.The method properly:
- Uses the passed
utc_midnightfor consistent month/day calculations (addressing prior TOCTOU feedback)- Resets gauge to
0.0for all early-return paths- Guards against division by zero (monthly <= 0 check precedes the calculation)
- Caps at 100%
330-372: LGTM! Agent metrics refactored to return active agents for reuse.The method correctly:
- Clears label series at the start (addressing prior feedback about stale labels)
- Returns empty tuple on early exit or error
- Returns the agents tuple for downstream use by
_refresh_agent_cost_metrics
374-441: LGTM! Per-agent cost metrics implementation is well-structured.The method correctly:
- Clears gauge label series before any early return (prior feedback addressed)
- Honors
total_monthly <= 0when computingper_agent_limit(prior feedback addressed)- Uses
asyncio.TaskGroupfor parallel fan-out (compliant with coding guidelines)- Caps budget utilization at 100%
- Isolates exceptions without crashing the scrape
443-445: LGTM! Task metrics clearing moved to method start.Consistent with the pattern applied to agent metrics—clears label series before any early return to prevent stale data.
docs/design/operations.md (2)
1304-1304: LGTM! Metrics documentation updated accurately.The
/api/v1/metricsdocumentation correctly:
- Updates count from 9 to 12 metric families
- Uses
synthorg_app_info(prior feedback addressed)- Documents all three new metrics with their labels and purposes
1852-1867: LGTM! Control-plane gap tracking updated consistently.The gap table and framing paragraph are now consistent—G1-G5 marked as closed, with G6 (budget history granularity) noted as the sole remaining low-priority gap. Prior feedback about inconsistent framing addressed.
tests/unit/observability/test_prometheus_collector.py (4)
49-59: Mock routing logic forget_total_costis correct but could be more explicit.The mock differentiates between:
- No
start→ returns_total(lifetime)start.day == 1→ returns_billing(month-to-date)- Otherwise → returns
_daily(daily)This matches the collector's query patterns. The comment at line 56-57 clarifies the billing-period heuristic.
513-711: LGTM! Comprehensive test coverage for per-agent metrics.The test class covers:
- Per-agent cost and budget percentage calculations
- Stale label clearing when agents disappear
- Early return paths (no agents, no tracker, empty tuple)
per_agent_daily_limit=0disabling budget percentage- Exception handling clearing gauges
212-260: LGTM! Billing period cost tests validate key behavior changes.
test_budget_percent_uses_billing_period_costcorrectly verifies that month-to-date cost (not lifetime) is used for utilization.test_budget_percent_reset_when_cost_unavailablevalidates resilience when the tracker fails.
327-330: LGTM! Invalid verdict rejection test added.Validates that
record_security_verdictraisesValueErrorfor unknown verdicts, preventing label cardinality explosion from arbitrary strings.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
Pre-reviewed by 12 agents, 10 findings addressed: - Refactor _refresh_agent_cost_metrics to use asyncio.TaskGroup for parallel fan-out of per-agent cost queries - Improve docstrings for _refresh_daily_budget_metric, _refresh_agent_metrics, _refresh_agent_cost_metrics - Add 5 new tests: zero daily cost, zero monthly budget, empty agents tuple, agent cost exception recovery, invalid security verdict - Update operations.md: 9 to 12 metric families with descriptions
b367c4b to
906dc4d
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/observability/prometheus_collector.py (1)
287-292:⚠️ Potential issue | 🟠 MajorReset the budget gauges when refresh fails unexpectedly.
Both
except Exception:paths only log. After any prior successful scrape, an exception here leaves Prometheus serving the previous budget values, which is worse than exporting0.0/unavailable. Initialize these gauges to0.0before the guarded work or reset them in the handlers.🧹 Proposed fix
def _refresh_budget_metrics( self, app_state: AppState, billing_cost: float | None, ) -> None: + self._budget_used_percent.set(0.0) + self._budget_monthly_usd.set(0.0) if not app_state.has_cost_tracker: - self._budget_used_percent.set(0.0) - self._budget_monthly_usd.set(0.0) return try: tracker = app_state.cost_tracker @@ def _refresh_daily_budget_metric( self, app_state: AppState, daily_cost: float | None, utc_midnight: datetime, ) -> None: + self._budget_daily_used_percent.set(0.0) if not app_state.has_cost_tracker or daily_cost is None: - self._budget_daily_used_percent.set(0.0) returnAlso applies to: 333-338
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/observability/prometheus_collector.py` around lines 287 - 292, The budget scrape except blocks only log errors and leave previous values alive; modify the guarded budget refresh code so that immediately before the try/except you initialize the Prometheus budget gauges to 0.0 (or, alternatively, in each except Exception: handler explicitly set the budget-related Gauge objects to 0.0) and then proceed with the normal scrape; update the except block shown (the logger.warning(METRICS_SCRAPE_FAILED, component="budget", exc_info=True) handler) and the other budget except handler to reset all budget gauges to 0.0 before re-raising/returning, while keeping the existing logger.warning call with exc_info.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/operations.md`:
- Line 1852: In the G1 table row update the endpoint string to the concrete API
path used elsewhere by replacing `/metrics` with `/api/v1/metrics` so the entry
matches the API Surface documentation; specifically edit the G1 row text that
references PrometheusCollector and `/metrics` (the description mentioning
`PrometheusCollector` instantiated in `on_startup()` and the OTLP sink note) to
use `/api/v1/metrics` to remove the ambiguity.
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 322-329: The calculation for daily budget uses
monthrange(utc_midnight.year, utc_midnight.month) which ignores
BudgetConfig.reset_day and can span two months; update the denominator to
compute the current billing period length by using billing_period_start(tracker,
utc_midnight) (or billing_period_start(...) already used elsewhere) and the next
reset boundary (billing period start + period length determined from reset_day)
to derive days_in_billing_period, then compute daily_budget =
tracker.budget_config.total_monthly / days_in_billing_period and call
self._budget_daily_used_percent.set(min(100.0, (daily_cost / daily_budget) *
100.0)); also add a regression test targeting the billing period logic with
reset_day != 1 to verify correct prorating across month boundaries.
---
Outside diff comments:
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 287-292: The budget scrape except blocks only log errors and leave
previous values alive; modify the guarded budget refresh code so that
immediately before the try/except you initialize the Prometheus budget gauges to
0.0 (or, alternatively, in each except Exception: handler explicitly set the
budget-related Gauge objects to 0.0) and then proceed with the normal scrape;
update the except block shown (the logger.warning(METRICS_SCRAPE_FAILED,
component="budget", exc_info=True) handler) and the other budget except handler
to reset all budget gauges to 0.0 before re-raising/returning, while keeping the
existing logger.warning call with exc_info.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 96ce9f43-e5f8-4165-8f8d-70de9644e1ab
📒 Files selected for processing (3)
docs/design/operations.mdsrc/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always usefrom synthorg.observability import get_loggerfollowed bylogger = get_logger(__name__)in every module with business logic
Never useimport logging,logging.getLogger(), orprint()in application code (exceptions:observability/setup.py,observability/sinks.py,observability/syslog_handler.py,observability/http_handler.py, andobservability/otlp_handler.pymay use stdlib logging for handler construction and bootstrap)
Useloggeras the variable name for loggers, never_loggerorlog
Always use event constants fromsynthorg.observability.eventsdomain modules (e.g.,API_REQUEST_STARTEDfromevents.api) instead of string literals for log event names
Use structured logging with kwargs in the formatlogger.info(EVENT, key=value)instead of string formatting likelogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Do NOT usefrom __future__ import annotations-- Python 3.14 has native PEP 649 support
Use PEP 758 except syntax:except A, B:without parentheses (ruff enforces this on Python 3.14)
All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Prioritize immutability: create new objects instead of mutating existing ones. For non-Pydantic internal collections, usecopy.deepcopy()at construction plusMappingProxyTypewrapping for read-only enforcement
Usecopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization) for mutable fields in frozen Pydantic models
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models withmodel_copy(update=...)for runtime state that evolves (e.g., agent execution state, task progress)...
Files:
src/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Always read the relevantdocs/design/page before implementing any feature or planning any issue
If implementation deviates from the design spec (better approach found, scope evolved, etc.), alert the user and explain why before proceeding; do NOT silently diverge
When approved deviations occur, update the relevantdocs/design/page to reflect the new reality
Files:
src/synthorg/observability/prometheus_collector.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/prometheus_collector.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowmarkers to classify tests
Maintain minimum 80% code coverage (enforced in CI)
Useasyncio_mode = "auto"configuration; do NOT add manual@pytest.mark.asynciodecorators
Global test timeout is 30 seconds per test (set inpyproject.toml); do NOT add per-filepytest.mark.timeout(30)markers (non-default overrides liketimeout(60)are allowed)
ALWAYS include-n 8when running pytest locally for parallelism; never run tests sequentially
Prefer@pytest.mark.parametrizefor testing similar cases
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names likeexample-provider,example-large-001,test-provider,test-small-001, orlarge/medium/smallaliases
Use Hypothesis for property-based testing with@givenand@settingsdecorators; configure viaHYPOTHESIS_PROFILEenv var with profiles:ci(deterministic, 10 examples, derandomized),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples, no deadline)
When Hypothesis finds a failure, it is a real bug; the shrunk example is saved for analysis but is NOT automatically replayed (fix the underlying bug and add an explicit@example(...)decorator to the test)
NEVER skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For indefinite blocking tasks, useasyncio.Event().wait()instead ofasyncio.sleep(large_number)
Files:
tests/unit/observability/test_prometheus_collector.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/observability/test_prometheus_collector.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
src/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
src/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
Applied to files:
src/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-04-08T20:20:05.318Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-08T20:20:05.318Z
Learning: Applies to src/synthorg/**/*.py : If implementation deviates from the design spec (better approach found, scope evolved, etc.), alert the user and explain why before proceeding; do NOT silently diverge
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).
Applied to files:
docs/design/operations.md
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/observability/prometheus_collector.py`:
- Around line 435-448: The current scrape fans out a get_agent_cost() task per
agent (and another per-agent when per_agent_limit > 0) using asyncio.TaskGroup
by iterating agent_ids and populating total_tasks/daily_tasks, which can create
an O(active_agents) burst; fix by first preferring a bulk tracker call (e.g., a
tracker API that returns costs for multiple agent_ids in one request) and use
that instead of per-agent get_agent_cost() when available, otherwise limit
concurrency: replace the unbounded per-agent task creation with a bounded
approach (e.g., use an asyncio.Semaphore/BoundedSemaphore or a limited worker
TaskGroup) so only N concurrent get_agent_cost() calls run at once when calling
tracker.get_agent_cost(aid, ...) for each aid, keeping references like
total_tasks, daily_tasks, get_agent_cost, tracker, agent_ids, per_agent_limit,
and utc_midnight to locate the change.
In `@tests/unit/observability/test_prometheus_collector.py`:
- Around line 405-426: The test test_daily_budget_percent_partial_day is too
loose—freeze the collector date or directly call
PrometheusCollector._refresh_daily_budget_metric with a fixed utc_midnight so
proration is deterministic; update the test to set a concrete reset_day != 1 and
assert the exact expected percentage (non-capped) instead of only checking 0 <
value < 100, using the PrometheusCollector instance (collector) and
generate_latest(collector.registry) to verify the metric output; mirror this
change for the other failing block around lines 516-537 to ensure the proration
logic is pinned.
- Around line 611-617: The test currently uses cost_lines = [...] and assert
all('agent_id="alice"' in ln for ln in cost_lines), which is vacuously true for
an empty cost_lines; change the assertions to ensure cost_lines actually
contains the surviving series and that it includes alice: first assert
cost_lines is not empty (or assert any(...) is True), then assert that at least
one line in cost_lines contains 'agent_id="alice"' (and optionally assert no
line contains 'agent_id="bob"'); reference the cost_lines variable, output_v2,
and the metric name synthorg_agent_cost_total when locating the test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 58eb2966-ed95-4390-a4f1-42e522bae0a2
📒 Files selected for processing (3)
docs/design/operations.mdsrc/synthorg/observability/prometheus_collector.pytests/unit/observability/test_prometheus_collector.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Deploy Preview
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always usefrom synthorg.observability import get_loggerfollowed bylogger = get_logger(__name__)in every module with business logic
Never useimport logging,logging.getLogger(), orprint()in application code (exceptions:observability/setup.py,observability/sinks.py,observability/syslog_handler.py,observability/http_handler.py, andobservability/otlp_handler.pymay use stdlib logging for handler construction and bootstrap)
Useloggeras the variable name for loggers, never_loggerorlog
Always use event constants fromsynthorg.observability.eventsdomain modules (e.g.,API_REQUEST_STARTEDfromevents.api) instead of string literals for log event names
Use structured logging with kwargs in the formatlogger.info(EVENT, key=value)instead of string formatting likelogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Do NOT usefrom __future__ import annotations-- Python 3.14 has native PEP 649 support
Use PEP 758 except syntax:except A, B:without parentheses (ruff enforces this on Python 3.14)
All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Prioritize immutability: create new objects instead of mutating existing ones. For non-Pydantic internal collections, usecopy.deepcopy()at construction plusMappingProxyTypewrapping for read-only enforcement
Usecopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization) for mutable fields in frozen Pydantic models
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models withmodel_copy(update=...)for runtime state that evolves (e.g., agent execution state, task progress)...
Files:
tests/unit/observability/test_prometheus_collector.pysrc/synthorg/observability/prometheus_collector.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowmarkers to classify tests
Maintain minimum 80% code coverage (enforced in CI)
Useasyncio_mode = "auto"configuration; do NOT add manual@pytest.mark.asynciodecorators
Global test timeout is 30 seconds per test (set inpyproject.toml); do NOT add per-filepytest.mark.timeout(30)markers (non-default overrides liketimeout(60)are allowed)
ALWAYS include-n 8when running pytest locally for parallelism; never run tests sequentially
Prefer@pytest.mark.parametrizefor testing similar cases
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names likeexample-provider,example-large-001,test-provider,test-small-001, orlarge/medium/smallaliases
Use Hypothesis for property-based testing with@givenand@settingsdecorators; configure viaHYPOTHESIS_PROFILEenv var with profiles:ci(deterministic, 10 examples, derandomized),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples, no deadline)
When Hypothesis finds a failure, it is a real bug; the shrunk example is saved for analysis but is NOT automatically replayed (fix the underlying bug and add an explicit@example(...)decorator to the test)
NEVER skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For indefinite blocking tasks, useasyncio.Event().wait()instead ofasyncio.sleep(large_number)
Files:
tests/unit/observability/test_prometheus_collector.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/observability/test_prometheus_collector.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Always read the relevantdocs/design/page before implementing any feature or planning any issue
If implementation deviates from the design spec (better approach found, scope evolved, etc.), alert the user and explain why before proceeding; do NOT silently diverge
When approved deviations occur, update the relevantdocs/design/page to reflect the new reality
Files:
src/synthorg/observability/prometheus_collector.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/prometheus_collector.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
📚 Learning: 2026-04-08T20:20:05.318Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-08T20:20:05.318Z
Learning: Applies to src/synthorg/**/*.py : If implementation deviates from the design spec (better approach found, scope evolved, etc.), alert the user and explain why before proceeding; do NOT silently diverge
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).
Applied to files:
docs/design/operations.md
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
tests/unit/observability/test_prometheus_collector.pysrc/synthorg/observability/prometheus_collector.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
tests/unit/observability/test_prometheus_collector.pysrc/synthorg/observability/prometheus_collector.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
Applied to files:
tests/unit/observability/test_prometheus_collector.pysrc/synthorg/observability/prometheus_collector.py
| total_tasks: dict[str, asyncio.Task[float]] = {} | ||
| daily_tasks: dict[str, asyncio.Task[float]] = {} | ||
| async with asyncio.TaskGroup() as tg: | ||
| for aid in agent_ids: | ||
| total_tasks[aid] = tg.create_task( | ||
| tracker.get_agent_cost(aid), | ||
| ) | ||
| if per_agent_limit > 0: | ||
| daily_tasks[aid] = tg.create_task( | ||
| tracker.get_agent_cost( | ||
| aid, | ||
| start=utc_midnight, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Avoid turning each scrape into an O(active_agents) cost-query burst.
This fans out one get_agent_cost() call per active agent, and a second per agent when per_agent_limit > 0. On the Prometheus scrape path that makes /metrics scale linearly with fleet size and can create a large concurrent burst against the tracker backend. If the tracker/repository can return per-agent totals in one pass, prefer that here; otherwise cap the fan-out.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/observability/prometheus_collector.py` around lines 435 - 448,
The current scrape fans out a get_agent_cost() task per agent (and another
per-agent when per_agent_limit > 0) using asyncio.TaskGroup by iterating
agent_ids and populating total_tasks/daily_tasks, which can create an
O(active_agents) burst; fix by first preferring a bulk tracker call (e.g., a
tracker API that returns costs for multiple agent_ids in one request) and use
that instead of per-agent get_agent_cost() when available, otherwise limit
concurrency: replace the unbounded per-agent task creation with a bounded
approach (e.g., use an asyncio.Semaphore/BoundedSemaphore or a limited worker
TaskGroup) so only N concurrent get_agent_cost() calls run at once when calling
tracker.get_agent_cost(aid, ...) for each aid, keeping references like
total_tasks, daily_tasks, get_agent_cost, tracker, agent_ids, per_agent_limit,
and utc_midnight to locate the change.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/observability/test_prometheus_collector.py`:
- Around line 405-440: The test test_daily_budget_percent_partial_day fails on
the 1st of a month because the helper _mock_app_state sometimes returns the
mocked _billing value instead of the intended daily cost; update _mock_app_state
so that when a daily_cost (or _daily) is provided it always returns that value
(or add a boolean flag like force_daily) rather than falling back to _billing on
month-start, then adjust the test to pass the explicit daily_cost (or
force_daily=True) so PrometheusCollector.refresh and the expected calculation
use the consistent daily cost.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3a96f71d-e5d8-4efd-b44a-e38fd942cb04
📒 Files selected for processing (1)
tests/unit/observability/test_prometheus_collector.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Always usefrom synthorg.observability import get_loggerfollowed bylogger = get_logger(__name__)in every module with business logic
Never useimport logging,logging.getLogger(), orprint()in application code (exceptions:observability/setup.py,observability/sinks.py,observability/syslog_handler.py,observability/http_handler.py, andobservability/otlp_handler.pymay use stdlib logging for handler construction and bootstrap)
Useloggeras the variable name for loggers, never_loggerorlog
Always use event constants fromsynthorg.observability.eventsdomain modules (e.g.,API_REQUEST_STARTEDfromevents.api) instead of string literals for log event names
Use structured logging with kwargs in the formatlogger.info(EVENT, key=value)instead of string formatting likelogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Do NOT usefrom __future__ import annotations-- Python 3.14 has native PEP 649 support
Use PEP 758 except syntax:except A, B:without parentheses (ruff enforces this on Python 3.14)
All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions (enforced by ruff D rules)
Prioritize immutability: create new objects instead of mutating existing ones. For non-Pydantic internal collections, usecopy.deepcopy()at construction plusMappingProxyTypewrapping for read-only enforcement
Usecopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization) for mutable fields in frozen Pydantic models
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models withmodel_copy(update=...)for runtime state that evolves (e.g., agent execution state, task progress)...
Files:
tests/unit/observability/test_prometheus_collector.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowmarkers to classify tests
Maintain minimum 80% code coverage (enforced in CI)
Useasyncio_mode = "auto"configuration; do NOT add manual@pytest.mark.asynciodecorators
Global test timeout is 30 seconds per test (set inpyproject.toml); do NOT add per-filepytest.mark.timeout(30)markers (non-default overrides liketimeout(60)are allowed)
ALWAYS include-n 8when running pytest locally for parallelism; never run tests sequentially
Prefer@pytest.mark.parametrizefor testing similar cases
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names likeexample-provider,example-large-001,test-provider,test-small-001, orlarge/medium/smallaliases
Use Hypothesis for property-based testing with@givenand@settingsdecorators; configure viaHYPOTHESIS_PROFILEenv var with profiles:ci(deterministic, 10 examples, derandomized),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples, no deadline)
When Hypothesis finds a failure, it is a real bug; the shrunk example is saved for analysis but is NOT automatically replayed (fix the underlying bug and add an explicit@example(...)decorator to the test)
NEVER skip, dismiss, or ignore flaky tests; always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For indefinite blocking tasks, useasyncio.Event().wait()instead ofasyncio.sleep(large_number)
Files:
tests/unit/observability/test_prometheus_collector.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/observability/test_prometheus_collector.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`
Applied to files:
tests/unit/observability/test_prometheus_collector.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
Applied to files:
tests/unit/observability/test_prometheus_collector.py
🔇 Additional comments (7)
tests/unit/observability/test_prometheus_collector.py (7)
53-62: Date-dependent mock logic can cause flaky tests on the reset day.The mock distinguishes daily vs. billing-period queries solely by
start.day == _reset_day. When a test runs on the reset day (e.g., the 1st of the month for defaultreset_day=1), both queries match the condition:
- Daily cost query:
start=utc_midnight→start.day == 1- Billing period query:
start=billing_period_start→start.day == 1Tests expecting
daily_cost(e.g.,test_daily_budget_percent_partial_daywithdaily_cost=3.0) will instead receive_billing=50.0on the 1st, causing intermittent failures.Consider disambiguating by comparing the full timestamp or by ensuring tests use a
reset_daythat differs from today:🛠️ Potential fix: compare hour to distinguish daily vs billing queries
async def _get_total_cost( *, start: datetime | None = None, end: datetime | None = None, ) -> float: if start is None: return _total - if start.day == _reset_day: + # Billing-period queries use billing_period_start (always 00:00:00). + # Daily queries use utc_midnight (also 00:00:00 but for today). + # Distinguish by checking if start is exactly at day boundary + # AND the month matches billing period logic. Simplest fix: + # billing period start is never "today", so check if start < today. + from datetime import UTC + today = datetime.now(UTC).replace(hour=0, minute=0, second=0, microsecond=0) + if start.day == _reset_day and start < today: return _billing return _dailyAlternatively, set
billing_cost=daily_costin tests that don't care about the billing/daily distinction.
[raise_minor_issue, request_verification]#!/bin/bash # Check if any CI runs have failed on the 1st of the month for this test file rg -n "test_daily_budget_percent_partial_day|test_daily_budget_zero_cost" tests/unit/observability/test_prometheus_collector.py
106-117: LGTM!The
nameparameter addition is backward-compatible and enables explicit agent ID specification for per-agent metric tests.
216-234: LGTM!Good coverage for billing-period vs. lifetime cost distinction. The assertion on exact value (25.0) confirms the correct cost source is used.
236-263: LGTM!Good error-recovery test: verifies the gauge resets to 0.0 when the cost tracker becomes unavailable, preventing stale metrics from persisting.
331-334: LGTM!Validates that invalid security verdicts are rejected with a clear error message.
625-632: LGTM!The stale label assertion is now robust:
len(cost_lines) > 0ensures the check is non-vacuousall(...)verifies alice survivesnot any(...)confirms bob is removed
554-754: Comprehensive per-agent metrics test coverage.The test class covers all key scenarios:
- Per-agent cost totals and budget percentages
- Stale label clearing between scrapes
- Skip conditions (no agents, no tracker, no limit, empty tuple)
- Error recovery (gauges cleared on failure)
Well-structured and thorough.
| async def test_daily_budget_percent_partial_day(self) -> None: | ||
| """Normal daily utilization produces correct percentage.""" | ||
| from datetime import UTC | ||
|
|
||
| from synthorg.budget.billing import billing_period_start | ||
|
|
||
| collector = PrometheusCollector() | ||
| state = _mock_app_state( | ||
| has_cost_tracker=True, | ||
| total_cost=50.0, | ||
| daily_cost=3.0, | ||
| budget_total_monthly=300.0, | ||
| ) | ||
| await collector.refresh(state) | ||
| output = generate_latest(collector.registry).decode() | ||
| lines = [ | ||
| ln | ||
| for ln in output.splitlines() | ||
| if ln.startswith("synthorg_budget_daily_used_percent ") | ||
| ] | ||
| assert len(lines) == 1 | ||
| value = float(lines[0].split()[-1]) | ||
| # Compute expected from current billing period length. | ||
| now = datetime.now(UTC) | ||
| ps = billing_period_start(1, now=now) | ||
| ns = ( | ||
| ps.replace( | ||
| year=ps.year + 1, | ||
| month=1, | ||
| ) | ||
| if ps.month == 12 | ||
| else ps.replace(month=ps.month + 1) | ||
| ) | ||
| days = (ns - ps).days | ||
| expected = (3.0 / (300.0 / days)) * 100.0 | ||
| assert value == pytest.approx(expected, abs=0.01) |
There was a problem hiding this comment.
Dynamic expected value calculation is well-designed but inherits the mock date-sensitivity.
The test correctly mirrors the production proration logic by computing expected from the current billing period length. This makes the assertion deterministic within a single test run.
Note: On the 1st of the month, the mock at lines 53-62 will return _billing=50.0 instead of _daily=3.0, causing the assertion to fail (see earlier comment on mock design).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/observability/test_prometheus_collector.py` around lines 405 -
440, The test test_daily_budget_percent_partial_day fails on the 1st of a month
because the helper _mock_app_state sometimes returns the mocked _billing value
instead of the intended daily cost; update _mock_app_state so that when a
daily_cost (or _daily) is provided it always returns that value (or add a
boolean flag like force_daily) rather than falling back to _billing on
month-start, then adjust the test to pass the explicit daily_cost (or
force_daily=True) so PrometheusCollector.refresh and the expected calculation
use the consistent daily cost.
Summary
Adds three new Prometheus metric families to
PrometheusCollector, completing the deferred acceptance criteria from #1122:synthorg_budget_daily_used_percent-- daily cost as percentage of prorated daily budget (total_monthly / days_in_month)synthorg_agent_cost_total{agent_id}-- per-agent accumulated cost in USDsynthorg_agent_budget_used_percent{agent_id}-- per-agent daily cost as percentage ofper_agent_daily_limitChanges
Source (
prometheus_collector.py)__init___refresh_daily_budget_metric()-- computes daily budget utilization, capped at 100%_refresh_agent_cost_metrics()-- queries per-agent costs usingasyncio.TaskGroupfor parallel fan-out_refresh_agent_metrics()now returns the agents tuple for reuse downstreamTests (
test_prometheus_collector.py)TestPrometheusCollectorDailyBudget-- 6 tests (cap at 100%, partial day, zero cost, zero monthly budget, no config, no tracker)TestPrometheusCollectorAgentCost-- 8 tests (per-agent totals, budget %, stale label clearing, empty agents, no tracker, no limit, error recovery)TestPrometheusCollectorSecurityVerdicts-- added invalid verdict rejection testDocs (
operations.md)Test plan
uv run python -m pytest tests/ -n 8-- 16011 passed, 9 skippeduv run mypy src/ tests/-- cleanuv run ruff check src/ tests/-- cleanReview coverage
Pre-reviewed by 12 agents (docs-consistency, code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, logging-audit, resilience-audit, conventions-enforcer, test-quality-reviewer, async-concurrency-reviewer, issue-resolution-verifier). 10 findings addressed.
Closes #1148