Skip to content

feat: analytics and metrics runtime pipeline (#226, #225, #227, #224)#1127

Merged
Aureliolo merged 6 commits intomainfrom
feat/analytics-metrics-pipeline
Apr 7, 2026
Merged

feat: analytics and metrics runtime pipeline (#226, #225, #227, #224)#1127
Aureliolo merged 6 commits intomainfrom
feat/analytics-metrics-pipeline

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Implements the full analytics and metrics runtime pipeline across four issues, in dependency order: coordination metrics collection (#226) -> call categorization (#225) -> per-call analytics layer (#227) -> quota polling (#224).


Issue #226 -- Coordination Metrics Runtime Collection Pipeline

  • BaselineStore (baseline_store.py): sliding-window store for single-agent baseline data (turns, error rates, tokens, durations) used by Ec, O%, Ae metrics
  • CoordinationMetricsCollector (coordination_collector.py): post-execution collector that gathers raw data, calls pure compute functions from coordination_metrics.py, logs events, and fires alerts via NotificationDispatcher
    • Single-agent runs record baseline data and return empty metrics
    • Multi-agent runs compute all 9 enabled metrics using accumulated baselines
    • Individual metric failures are logged and skipped without blocking others
    • SimilarityComputer protocol decouples redundancy rate (R) from any specific embedding provider
  • AgentEngine integration: _try_collect_coordination_metrics() called from _post_execution_pipeline() (best-effort, same pattern as _try_capture_distillation)
  • New observability event constants in events/coordination_metrics.py

Issue #225 -- Runtime LLM Call Categorization

  • ClassificationContext (frozen Pydantic): captures boolean flags (is_delegation, is_review, is_meeting, is_planning_phase, is_system_prompt, is_embedding_operation, is_quality_judge) plus tool_calls_made and agent_role
  • RulesBasedClassifier: priority-ordered classification (EMBEDDING > COORDINATION > SYSTEM > PRODUCTIVE)
  • classify_call(): convenience function using the default classifier
  • Stateless and pure -- no I/O, no side effects. Loops adopt it gradually (opt-in per design spec)
  • New observability event constants in events/call_classification.py

Issue #227 -- Full Per-Call Analytics Layer

Model extensions (all backward-compatible, new fields default to None):

  • CostRecord: latency_ms, cache_hit, retry_count, retry_reason, finish_reason, success
  • TurnRecord: latency_ms, cache_hit, retry_count, retry_reason; success as a @computed_field derived from finish_reason
  • CompletionResponse: provider_metadata: dict[str, object]

Provider-layer metadata propagation:

  • RetryHandler: last_attempt_count and last_retry_reason instance state, reset at start of each execute() call
  • BaseCompletionProvider.complete(): wraps _resilient_execute() with time.monotonic(), injects _synthorg_latency_ms, _synthorg_retry_count, _synthorg_retry_reason into provider_metadata; latency captured and logged even on exception

Engine-layer propagation:

  • make_turn_record() extracts _synthorg_* keys from provider_metadata
  • All three loops (react_loop, plan_execute_loop, hybrid_loop) pass provider_metadata to make_turn_record()
  • cost_recording.py propagates all new fields from TurnRecord to CostRecord

Analytics aggregation:

  • CallAnalyticsConfig + RetryAlertConfig: configuration with warn thresholds
  • AnalyticsAggregation: total_calls, success_count, failure_count, retry_count, retry_rate, cache_hit_count, cache_hit_rate, avg_latency_ms, p95_latency_ms, orchestration_ratio, by_finish_reason; cross-field validators enforce count invariants
  • CallAnalyticsService.get_aggregation() and check_alerts() for retry rate threshold alerting
  • New observability event constants in events/analytics.py

Issue #224 -- Proactive Quota Polling and Alerting

  • QuotaPollerConfig + QuotaAlertThresholds: warn_pct=80.0, critical_pct=95.0, poll_interval_seconds=60.0, cooldown_seconds=300.0
  • QuotaPoller: background asyncio.Task loop, start()/stop() lifecycle, poll_once() for testing
    • Uses max(requests_pct, tokens_pct) to determine severity; dimensions with limit=0 (unlimited) are skipped
    • Cooldown map keyed by (provider, window, level) prevents alert storms
    • Tracker errors are caught, logged, and do not propagate
  • New observability event constants in events/quota.py

Tests

  • tests/unit/budget/test_baseline_store.py -- window eviction, mean computation, empty store returns None
  • tests/unit/budget/test_coordination_collector.py -- all 9 metrics, config.enabled=False, selective collection, alert dispatch, metric error isolation
  • tests/unit/budget/test_coordination_collector_properties.py -- Hypothesis: collect always returns CoordinationMetrics
  • tests/unit/budget/test_call_classifier.py -- all context combinations, priority ordering
  • tests/unit/budget/test_call_classifier_properties.py -- Hypothesis: any valid context produces a valid LLMCallCategory
  • tests/unit/budget/test_call_analytics.py -- aggregation correctness, alert dispatch, empty records, threshold edges
  • tests/unit/budget/test_call_analytics_properties.py -- Hypothesis: retry_rate in [0,1], cache_hit_rate in [0,1], totals match input
  • tests/unit/budget/test_quota_poller.py -- no alert below threshold, warning/critical severity, cooldown, cooldown expiry, multiple providers, resilience
  • tests/unit/providers/test_base_provider.py -- latency injection, retry metadata, retry count reflects attempts
  • tests/unit/providers/test_retry.py -- last_attempt_count / last_retry_reason state

Closes #226
Closes #225
Closes #227
Closes #224

Copilot AI review requested due to automatic review settings April 7, 2026 19:40
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c1c9cc4b-cfe0-49f2-b0db-a033216cef18

📥 Commits

Reviewing files that changed from the base of the PR and between e35a792 and 24fbfae.

📒 Files selected for processing (5)
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/quota_poller.py
  • src/synthorg/providers/models.py
  • tests/unit/providers/test_models.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Do not use from __future__ import annotations in Python files; Python 3.14 has PEP 649 native lazy annotations

Use except A, B: syntax (no parentheses) for exception handling in Python 3.14, as enforced by ruff

All public functions in Python must have complete type hints; enforce with mypy strict mode

Use Google-style docstrings on all public classes and functions in Python; enforced by ruff D rules

Create new objects instead of mutating existing ones; use copy.deepcopy() at construction for non-Pydantic internal collections and wrap with MappingProxyType for read-only enforcement

Use frozen Pydantic models for config/identity; separate mutable-via-copy models with model_copy(update=...) for runtime state that evolves

Never mix static config fields with mutable runtime fields in one Pydantic model

Use Pydantic v2 with BaseModel, model_validator, computed_field, and ConfigDict; use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf at validation time

Use @computed_field for derived values instead of storing and validating redundant fields (e.g., TokenUsage.total_tokens)

Use NotBlankStr from core.types for all identifier/name fields in Pydantic models, including optional and tuple variants, instead of manual whitespace validators

Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new Python code (e.g., multiple tool invocations, parallel agent calls); use structured concurrency over bare create_task

Enforce 88-character line length in Python code

Keep Python functions under 50 lines and files under 800 lines

Handle errors explicitly in Python code; never silently swallow errors

Validate at system boundaries in Python code (user input, external APIs, config files)

Files:

  • src/synthorg/providers/models.py
  • tests/unit/providers/test_models.py
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/quota_poller.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Every Python module with business logic must import from synthorg.observability import get_logger and define logger = get_logger(__name__)

Never use import logging / logging.getLogger() / print() in Python application code; exceptions are observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py

Always use logger as the variable name for loggers in Python code, not _logger or log

Use event name constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool) instead of free-form event strings

Use structured logging with kwargs in Python: logger.info(EVENT, key=value) instead of format strings like logger.info('msg %s', val)

Log all error paths at WARNING or ERROR level with context before raising in Python code

Log all state transitions at INFO level in Python code

Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions in Python code

All Python provider calls must go through BaseCompletionProvider which applies retry and rate limiting automatically; never implement retry logic in driver subclasses or calling code

Set RetryConfig and RateLimiterConfig per-provider in ProviderConfig in Python code

Mark retryable Python errors with is_retryable=True: RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError

Non-retryable Python errors raise immediately without retry

Raise RetryExhaustedError when all retries fail in Python provider code; the engine layer catches this to trigger fallback chains

The Python rate limiter automatically respects RateLimitError.retry_after from providers and pauses future requests

Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned Python code, docstrings, comments, tests, or config examples; use generic names like `example-provide...

Files:

  • src/synthorg/providers/models.py
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/quota_poller.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/providers/models.py
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/quota_poller.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Mark Python tests with @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, or @pytest.mark.slow

Enforce minimum 80% test coverage in Python tests (enforced in CI)

Use @pytest.mark.parametrize for testing similar cases in Python tests

Use test-provider, test-small-001 as vendor names in Python tests instead of real vendor names

Use Hypothesis for property-based testing in Python with @given and @settings decorators

For timing-sensitive Python tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins

For Python tests that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number) for cancellation-safe blocking without timing assumptions

Files:

  • tests/unit/providers/test_models.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/providers/test_models.py
🧠 Learnings (47)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Read the relevant `docs/design/` page before implementing any feature or planning any issue; DESIGN_SPEC.md is a pointer file linking to 12 design pages
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: If implementation deviates from the design spec, alert the user and explain why before proceeding; user decides whether to accept the deviation or update the spec
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: When a spec topic is referenced, read the relevant `docs/design/` page before coding
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: When approved deviations occur, update the relevant `docs/design/` page to reflect the new reality
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Every implementation plan must be presented to the user for accept/deny before coding starts
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: At every phase of planning and implementation, be critical and look for ways to improve the design in the spirit of robustness, correctness, simplicity, and future-proofing
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Surface improvements as suggestions, not silent changes; user decides whether to accept suggestions
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Prioritize issues by dependency order, not priority labels; unblocked dependencies come first
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Pure data models, enums, and re-exports in Python do not need logging
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Use `asyncio_mode = "auto"` in Python pytest configuration; do not require manual `pytest.mark.asyncio`
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Set a global 30-second timeout per Python test in `pyproject.toml`; allow non-default overrides like `timeout(60)` but do not add per-file markers for the default
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Always run Python tests with `-n 8` for parallelism using pytest-xdist; never run tests sequentially locally
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Configure Hypothesis profiles in `tests/conftest.py`: `ci` (10 examples, deterministic), `dev` (1000 examples), `fuzz` (10,000 examples, no deadline), `extreme` (500,000 examples, no deadline); control via `HYPOTHESIS_PROFILE` env var
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Persist failing Hypothesis examples to `~/.synthorg/hypothesis-examples/` via `_WriteOnlyDatabase` in `tests/conftest.py`; `.hypothesis/` is gitignored
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: When Hypothesis finds a failure in Python tests, it is a real bug; fix the underlying bug and add an explicit `example(...)` decorator so the case is permanently covered in CI
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Never skip, dismiss, or ignore flaky Python tests; fix them fully and fundamentally
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Use `<type>: <description>` format for Python/project commits with types: feat, fix, refactor, docs, test, chore, perf, ci; enforced by commitizen
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Sign all commits on `main` with GPG/SSH; required via branch protection
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Use branch naming convention `<type>/<slug>` from main in Git
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Apply pre-commit hooks: trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, no-commit-to-branch (main), ruff check+format, gitleaks, hadolint (Dockerfile), golangci-lint + go vet (CLI), no-em-dashes, no-redundant-timeout, eslint-web (web dashboard)
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Apply pre-push hooks: mypy type-check (affected modules), pytest unit tests (affected modules), golangci-lint + go vet + go test (CLI), eslint-web (web dashboard); skip on pre-commit.ci where dedicated CI jobs handle these
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Disable autoupdate in pre-commit.ci (`autoupdate_schedule: never`); Dependabot owns hook version bumps
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Use `gh issue list` via Bash for GitHub issue queries; avoid MCP `list_issues` tool which has unreliable field data
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Use squash merge strategy in Git; PR body becomes the squash commit message on main; preserve trailers like `Release-As` and `Closes `#N`` in PR body to land in final commit
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Preserve existing `Closes `#NNN`` issue references in Git commit messages; never remove unless explicitly asked
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: After finishing an issue implementation, create a feature branch `<type>/<slug>`, commit, and push; do NOT create a PR automatically or leave work uncommitted on main
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Never create a PR directly with `gh pr create` (blocked by hookify); always use `/pre-pr-review` to create PRs with automated checks and review agents
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Use `/pre-pr-review quick` for trivial/docs-only changes to skip agents but still run automated checks
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Use `/aurelio-review-pr` to handle external reviewer feedback after the PR exists
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Fix all valid issues found by review agents, including pre-existing issues in surrounding code, suggestions, and adjacent findings; never skip or defer out-of-scope changes
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Never use `cd` in Bash commands in execution; working directory is already set to project root; use absolute paths or run commands directly
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Use `bash -c "cd <dir> && <cmd>"` (safe child process) for tools without a `-C` flag; do not prefix commands with `cd C:/Users/...`
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Never use Bash to write or modify files; use Write or Edit tools instead; do not use `cat >`, `cat << EOF`, `echo >`, `echo >>`, `sed -i`, `python -c "open(...).write(...)``, or `tee` to modify files
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: See `web/CLAUDE.md` for web dashboard commands, design system, and component inventory
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: See `cli/CLAUDE.md` for CLI Go binary commands, flags, and reference; use `go -C cli` (never `cd cli`)
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Use `go -C cli` instead of `cd cli` when running Go CLI commands
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: See `docs/reference/claude-reference.md` for documentation layout, Docker commands, package structure, releasing, and CI pipelines
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Run `uv run ruff check src/ tests/ --fix` to lint and auto-fix Python code
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Run `uv run ruff format src/ tests/` to format Python code
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Run `uv run mypy src/ tests/` for strict Python type-checking
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Run `uv run python -m pytest tests/ -n 8 --cov=synthorg --cov-fail-under=80` for full Python test suite with coverage
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Run `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n 8 -k properties` for property test fuzzing with 1000 examples
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Run `HYPOTHESIS_PROFILE=fuzz uv run python -m pytest tests/ -m unit -n 8 --timeout=0` for deep fuzzing with 10,000 examples and no deadline
Learnt from: CR
URL: 
File: CLAUDE.md:undefined-undefined
Timestamp: 2026-04-07T21:36:50.590Z
Learning: Run `uv run pre-commit run --all-files` to execute all pre-commit hooks
📚 Learning: 2026-04-07T20:53:54.421Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T20:53:54.421Z
Learning: Applies to src/synthorg/**/*provider*.py : All provider calls go through `BaseCompletionProvider` which applies retry and rate limiting automatically; never implement retry in driver subclasses

Applied to files:

  • src/synthorg/providers/models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.

Applied to files:

  • src/synthorg/providers/models.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.

Applied to files:

  • src/synthorg/providers/models.py
📚 Learning: 2026-03-31T14:17:24.182Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:17:24.182Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically; never implement retry logic in driver subclasses or calling code

Applied to files:

  • src/synthorg/providers/models.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.

Applied to files:

  • src/synthorg/providers/models.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Never implement retry logic in provider subclasses or calling code — it is handled automatically by `BaseCompletionProvider` with `RetryConfig` and `RateLimiterConfig` per-provider

Applied to files:

  • src/synthorg/providers/models.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.

Applied to files:

  • src/synthorg/providers/models.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Tests must use test-provider, test-small-001, etc. for vendor-agnostic test data.

Applied to files:

  • tests/unit/providers/test_models.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to tests/**/*.py : Tests must use `test-provider`, `test-small-001`, etc. instead of real vendor names

Applied to files:

  • tests/unit/providers/test_models.py
📚 Learning: 2026-04-07T20:53:54.421Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T20:53:54.421Z
Learning: Applies to tests/**/*.py : Use `test-provider`, `test-small-001`, etc. as vendor-agnostic names in test files

Applied to files:

  • tests/unit/providers/test_models.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src,tests}/**/*.py : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small` as aliases

Applied to files:

  • tests/unit/providers/test_models.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to {src/synthorg/**/*.py,tests/**/*.py,web/src/**/*.{ts,tsx}} : NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples — use generic names: `example-provider`, `example-large-001`, `example-medium-001`, `example-small-001`, `large`/`medium`/`small`

Applied to files:

  • tests/unit/providers/test_models.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to {src,tests,web,cli,site}/**/*.{py,ts,tsx,go,astro} : Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001. Vendor names may only appear in: (1) Operations design page provider list (docs/design/operations.md), (2) .claude/ skill/agent files, (3) third-party import paths/module names.

Applied to files:

  • tests/unit/providers/test_models.py
📚 Learning: 2026-04-07T20:53:54.421Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T20:53:54.421Z
Learning: Applies to src/synthorg/**/*.py : Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples; use generic names like `example-provider`, `example-large-001`, or `large`/`medium`/`small`

Applied to files:

  • tests/unit/providers/test_models.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to web/**/*.vue : Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in dashboard code — use generic names: `example-provider`, `example-large-001`, etc.

Applied to files:

  • tests/unit/providers/test_models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • tests/unit/providers/test_models.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.

Applied to files:

  • src/synthorg/budget/baseline_store.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO level

Applied to files:

  • src/synthorg/budget/baseline_store.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level

Applied to files:

  • src/synthorg/budget/baseline_store.py
📚 Learning: 2026-04-07T20:53:54.421Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T20:53:54.421Z
Learning: Applies to src/synthorg/**/*.py : Log all state transitions at INFO level

Applied to files:

  • src/synthorg/budget/baseline_store.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions.

Applied to files:

  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • src/synthorg/budget/baseline_store.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit

Applied to files:

  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
📚 Learning: 2026-03-31T20:07:03.035Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:07:03.035Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions

Applied to files:

  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions.

Applied to files:

  • src/synthorg/budget/baseline_store.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising; log all state transitions at INFO; log object creation/internal flow/entry-exit at DEBUG

Applied to files:

  • src/synthorg/budget/baseline_store.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/budget/call_analytics.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/budget/call_analytics.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.

Applied to files:

  • src/synthorg/budget/call_analytics.py
📚 Learning: 2026-04-07T20:53:54.421Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T20:53:54.421Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR level with context before raising

Applied to files:

  • src/synthorg/budget/call_analytics.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Handle errors explicitly—never silently swallow exceptions.

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow exceptions

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-26T15:18:16.848Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Handle errors explicitly, never silently swallow them. All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-19T21:11:37.538Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T21:11:37.538Z
Learning: Applies to **/*.py : Errors: handle explicitly, never silently swallow.

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-04-02T21:38:30.127Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T21:38:30.127Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Handle errors explicitly; never silently swallow exceptions

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-04-07T20:53:54.421Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T20:53:54.421Z
Learning: Applies to tests/**/*.py : Mock `time.monotonic()` and `asyncio.sleep()` in timing-sensitive tests to make them deterministic; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to tests/**/*.py : Never skip, dismiss, or ignore flaky tests; fix them fully and fundamentally by mocking `time.monotonic()` and `asyncio.sleep()` for determinism or using `asyncio.Event().wait()` for indefinite blocking

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • src/synthorg/budget/quota_poller.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to tests/**/*.py : Flaky tests: NEVER skip, dismiss, or ignore flaky tests -- always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • src/synthorg/budget/quota_poller.py
🔇 Additional comments (21)
src/synthorg/budget/baseline_store.py (5)

1-21: LGTM!

Module setup is clean: correct logger initialization with get_logger(__name__), proper event constant imports from domain-specific module, and the # noqa: TC001 is appropriate since NotBlankStr is used at runtime in Pydantic field definitions.


24-48: LGTM!

Well-designed frozen model with appropriate Pydantic v2 patterns:

  • ConfigDict(frozen=True, allow_inf_nan=False) prevents mutation and rejects NaN/Inf
  • NotBlankStr for identifier fields
  • Sensible domain constraints (gt=0 for turns, [0,1] range for error_rate)
  • UTC-aware timestamp default

The use of float for turns and total_tokens (vs int) is appropriate since these values are averaged via statistics.mean() in the getter methods.


51-72: LGTM!

Clean sliding-window implementation:

  • Proper boundary validation for window_size with descriptive error
  • deque(maxlen=...) handles automatic eviction correctly
  • Thread-safety assumptions documented and consistent with other budget components (CostTracker)

74-91: LGTM!

State mutation correctly logged at INFO level with comprehensive structured context. The automatic eviction behavior from deque is properly documented in the docstring.


93-119: LGTM!

Getter methods follow a consistent pattern with appropriate DEBUG-level logging for insufficient baseline data. Using statistics.mean() with generator expressions is memory-efficient for the sliding window use case.

src/synthorg/budget/call_analytics.py (7)

1-24: LGTM!

Clean module structure with proper TYPE_CHECKING pattern for runtime-deferred imports, correct logger setup via get_logger(__name__), and domain-specific event constants.


27-51: LGTM!

Well-structured service class with clear immutability contract, proper dependency injection via constructor, and appropriate DEBUG-level logging for object creation.


53-94: LGTM!

Aggregation method correctly delegates to CostTracker with proper filter propagation and logs computed metrics at DEBUG level. Previous review feedback regarding NotBlankStr type for provider parameter has been addressed.


96-126: LGTM!

Alert checking with proper early-exit guards, correct retry rate calculation (protected from division by zero by the empty-records guard), and appropriate WARNING-level logging when thresholds are exceeded.


131-179: LGTM!

Pure aggregation function with correct handling of nullable CostRecord fields, explicit division-by-zero guards, and clean use of Counter for finish-reason distribution.


182-201: LGTM!

Correct 95th percentile implementation using linear interpolation with proper edge-case handling for single-element lists and boundary indices.


204-241: LGTM!

Previous review feedback addressed: system exceptions (MemoryError, RecursionError) are properly re-raised using PEP 758 syntax, with other dispatch failures logged at WARNING level with full traceback via exc_info=True. Lazy import pattern correctly avoids circular dependencies.

src/synthorg/budget/quota_poller.py (6)

1-36: LGTM: Module structure and imports.

Good use of TYPE_CHECKING for type-only imports, proper logger setup with get_logger(__name__), and event constants from the domain-specific events module.


48-89: LGTM: Class initialization and lifecycle methods.

The start() method now correctly handles idempotent calls by checking task state before creating a new one. The stop() method properly cancels and awaits the task with appropriate CancelledError suppression.


91-133: LGTM: Polling methods with resilient exception handling.

The poll_once() method properly guards against fatal system exceptions (MemoryError, RecursionError) while logging and recovering from tracker failures. The _poll_loop() correctly re-raises cancellation and system exceptions while keeping the loop alive for other errors.


135-187: LGTM: Snapshot checking with proper cooldown management.

The cooldown logic correctly uses time.monotonic() for timing, keys by (provider, window, level) to allow escalation alerts, and includes the MemoryError/RecursionError guard on dispatch. The defensive max(0.0, ...) on line 160 handles potential floating-point edge cases.


190-207: LGTM: Pure utility function for usage calculation.

Correctly computes maximum usage percentage across request and token dimensions, properly skipping unlimited dimensions (limit = 0) and returning None when all are unlimited.


236-242: The NotificationSeverity enum already defines its values as lowercase "warning" and "critical", matching the strings passed at line 238. No issue exists.

			> Likely an incorrect or invalid review comment.
src/synthorg/providers/models.py (1)

267-268: Looks good: provider_metadata is now documented and modeled consistently.

The new field docs and default metadata container align cleanly with downstream provider metadata plumbing.

Also applies to: 285-288

tests/unit/providers/test_models.py (2)

383-383: Good normalization of test model IDs to canonical fixtures.

This keeps CompletionResponse test setup consistent across the suite.

As per coding guidelines, "Tests must use test-provider, test-small-001, etc. instead of real vendor names."

Also applies to: 406-407, 418-419, 430-431, 441-442, 453-454


465-514: Nice coverage addition for provider_metadata behavior.

These tests validate default initialization, arbitrary metadata retention, and default-factory isolation between instances.


Walkthrough

Adds budget-domain telemetry and tooling: sliding-window baseline models (BaselineRecord, BaselineStore); per-call analytics service (CallAnalyticsService, AnalyticsAggregation, config and retry-alerting); runtime call classification (ClassificationContext, RulesBasedClassifier, classify_call); coordination collection pipeline (CoordinationMetricsCollector, SimilarityComputer); async quota polling (QuotaPoller, QuotaPollerConfig, QuotaAlertThresholds); extended per-call telemetry on CostRecord/TurnRecord; provider retry metadata flow and RetryResult; new observability event constants; engine integrations and extensive unit tests; exports updated.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 7, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 24fbfae.
Ensure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive per-call analytics and coordination metrics collection system, including services for call classification, baseline tracking, and proactive quota polling. The infrastructure is updated to capture and propagate latency, cache hits, and retry metadata from the provider layer through to cost records. Review feedback identifies critical Python 3 syntax errors in exception handling and suggests improving metric accuracy by replacing hardcoded values with actual execution data, such as using turn latency for baseline durations and calculating real success rates for efficiency metrics.

Comment on lines +294 to +295
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The syntax except MemoryError, RecursionError: is invalid in Python 3 and will result in a SyntaxError at runtime. To catch multiple exceptions, they must be grouped in a tuple: except (MemoryError, RecursionError):. This error is repeated multiple times throughout this file (lines 328, 366, 406, 440, 460, 478, 512, 535, and 602).

        except (MemoryError, RecursionError):
            raise

Comment on lines +882 to +883
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The syntax except MemoryError, RecursionError: is invalid in Python 3. Use a tuple to catch multiple exceptions: except (MemoryError, RecursionError):.

        except (MemoryError, RecursionError):
            raise

turns=max(float(turns), 1.0),
error_rate=error_rate,
total_tokens=float(total_tokens),
duration_seconds=0.0, # Not available from ExecutionResult
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding duration_seconds=0.0 for baseline records will prevent the TokenSpeedupRatio metric from being computed correctly, as the baseline will have a zero duration. Since TurnRecord now includes latency_ms, you can use it to provide a meaningful duration estimate for the baseline recording.

                duration_seconds=sum(t.latency_ms or 0.0 for t in execution_result.turns) / 1000.0,

error_turns = sum(
1
for t in execution_result.turns
if t.finish_reason.value in ("error", "content_filter")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Comparing finish_reason.value against string literals is fragile. It is recommended to compare against the FinishReason Enum members directly (e.g., FinishReason.ERROR) to ensure type safety and consistency with the rest of the engine (such as the TurnRecord.success property).

return None
try:
result = compute_efficiency(
success_rate=1.0, # Approx: no per-task success rate in ExecutionResult
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of hardcoding success_rate=1.0, consider using the actual success rate derived from the execution turns (e.g., 1.0 - error_rate). This would provide a more accurate signal for the coordination efficiency metric, especially in cases where some turns fail but the overall task continues.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the runtime analytics/metrics pipeline: coordination metrics collection (with baselines + alerting), LLM call categorization, per-call metadata propagation/aggregation, and proactive quota polling/alerts.

Changes:

  • Added coordination metrics runtime collector + baseline store, wired into AgentEngine post-execution pipeline.
  • Added per-call metadata capture (latency/retry/cache/finish_reason/success) from provider → turn records → cost records, plus aggregation + retry-rate alerting.
  • Added background quota polling service with cooldown-based alert suppression and expanded observability events.

Reviewed changes

Copilot reviewed 44 out of 44 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/unit/providers/test_retry.py Tests for RetryHandler state exposure/reset behavior.
tests/unit/providers/test_models.py Tests CompletionResponse.provider_metadata defaults/behavior.
tests/unit/providers/test_base_provider.py Tests _synthorg_* metadata injection from base provider.
tests/unit/observability/test_events.py Ensures new events module is discovered.
tests/unit/engine/test_loop_protocol.py Tests new TurnRecord analytics fields + computed success.
tests/unit/engine/test_loop_helpers.py Tests extracting analytics fields from provider_metadata in make_turn_record().
tests/unit/engine/test_cost_recording.py Tests propagation of analytics fields from TurnRecordCostRecord.
tests/unit/budget/test_quota_poller.py Tests quota poller lifecycle, thresholds, cooldown, resilience.
tests/unit/budget/test_quota_poller_config.py Tests validation/defaults for quota poller config models.
tests/unit/budget/test_cost_record.py Tests new analytics fields and JSON roundtrip on CostRecord.
tests/unit/budget/test_coordination_collector.py Tests coordination metrics collection paths, selective collection, alerts, isolation.
tests/unit/budget/test_coordination_collector_properties.py Property-based invariants for coordination collector.
tests/unit/budget/test_call_classifier.py Tests rules-based call classification and context validation.
tests/unit/budget/test_call_classifier_properties.py Property-based invariants for call classification.
tests/unit/budget/test_call_analytics.py Tests aggregation correctness + retry alert dispatch behavior.
tests/unit/budget/test_call_analytics_properties.py Property-based invariants for analytics aggregation.
tests/unit/budget/test_call_analytics_config.py Tests validation/defaults for call analytics config models.
tests/unit/budget/test_baseline_store.py Tests sliding-window baseline store behavior/validation.
tests/unit/budget/conftest.py Adjusts factory defaults for retry field consistency.
src/synthorg/providers/resilience/retry.py Adds RetryHandler last attempt count + retry reason state/reset.
src/synthorg/providers/models.py Adds CompletionResponse.provider_metadata field.
src/synthorg/providers/base.py Measures latency and injects _synthorg_* metadata into responses.
src/synthorg/observability/events/quota.py Adds quota poller event constants.
src/synthorg/observability/events/coordination_metrics.py Adds runtime coordination metric pipeline event constants.
src/synthorg/observability/events/call_classification.py Adds call classification event constants.
src/synthorg/observability/events/analytics.py Adds per-call analytics event constants.
src/synthorg/engine/react_loop.py Passes provider_metadata into make_turn_record().
src/synthorg/engine/plan_execute_loop.py Passes provider_metadata into make_turn_record() (planner + step turns).
src/synthorg/engine/loop_protocol.py Extends TurnRecord with analytics fields + computed success.
src/synthorg/engine/loop_helpers.py Extracts _synthorg_* values from provider_metadata when building TurnRecord.
src/synthorg/engine/hybrid_loop.py Passes provider_metadata into make_turn_record().
src/synthorg/engine/hybrid_helpers.py Passes provider_metadata into make_turn_record() for hybrid helpers.
src/synthorg/engine/cost_recording.py Propagates new per-call fields into CostRecord.
src/synthorg/engine/agent_engine.py Adds best-effort coordination metrics collection to post-execution pipeline.
src/synthorg/budget/quota_poller.py New background quota poller with cooldown + alert dispatch.
src/synthorg/budget/quota_poller_config.py New quota poller config + alert thresholds models.
src/synthorg/budget/cost_record.py Adds per-call analytics fields + retry consistency validator.
src/synthorg/budget/coordination_collector.py New runtime coordination metrics collector + alerting.
src/synthorg/budget/call_classifier.py New stateless rules-based call categorization service.
src/synthorg/budget/call_analytics.py New call analytics aggregation + retry-rate alerting service.
src/synthorg/budget/call_analytics_models.py New AnalyticsAggregation model with invariants.
src/synthorg/budget/call_analytics_config.py New config models for call analytics + retry alerts.
src/synthorg/budget/baseline_store.py New sliding-window baseline store for SAS baselines.
src/synthorg/budget/init.py Exports newly added budget/analytics/quota APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

self._retry_handler.last_retry_reason
)

result = result.model_copy(update={"provider_metadata": metadata})
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complete() overwrites any existing CompletionResponse.provider_metadata by replacing it with a new dict containing only _synthorg_* keys. If a provider implementation populates provider_metadata with its own fields (e.g., request IDs, cache/debug info), those will be lost. Consider merging into the existing dict (copy + update) rather than replacing it, and avoid clobbering keys unless intended.

Suggested change
result = result.model_copy(update={"provider_metadata": metadata})
merged_metadata = dict(result.provider_metadata or {})
merged_metadata.update(metadata)
result = result.model_copy(update={"provider_metadata": merged_metadata})

Copilot uses AI. Check for mistakes.
Comment on lines +482 to +503
@@ -476,6 +493,14 @@ def make_turn_record(
tool_call_fingerprints=compute_fingerprints(response.tool_calls),
finish_reason=response.finish_reason,
call_category=call_category,
latency_ms=float(cast("float", latency_ms_raw))
if latency_ms_raw is not None
else None,
cache_hit=bool(cache_hit_raw) if cache_hit_raw is not None else None,
retry_count=int(cast("int", retry_count_raw))
if retry_count_raw is not None
else None,
retry_reason=str(retry_reason_raw) if retry_reason_raw is not None else None,
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_turn_record() coerces values from provider_metadata using bool(...), int(...), and float(...). This can silently misinterpret values (e.g., the string 'False' becomes True) or raise at runtime if a provider supplies an unexpected type, which would break the execution loop when building TurnRecord. Consider validating types explicitly (e.g., isinstance(x, (int, float)), isinstance(x, bool), etc.) and ignoring/ logging invalid metadata instead of coercing blindly.

Copilot uses AI. Check for mistakes.
Comment on lines +194 to +221
# For single-agent runs: record baseline, return empty metrics.
if not is_multi_agent and self._baseline_store is not None:
from synthorg.budget.baseline_store import BaselineRecord # noqa: PLC0415

baseline = BaselineRecord(
agent_id=agent_id,
task_id=task_id,
turns=max(float(turns), 1.0),
error_rate=error_rate,
total_tokens=float(total_tokens),
duration_seconds=0.0, # Not available from ExecutionResult
)
self._baseline_store.record(baseline)
logger.debug(
COORD_METRICS_COLLECTION_COMPLETED,
agent_id=agent_id,
task_id=task_id,
is_multi_agent=False,
metrics_computed=0,
)
return CoordinationMetrics()

# Multi-agent run: compute all enabled metrics.
efficiency = await self._try_collect_efficiency(turns)
overhead = await self._try_collect_overhead(turns)
error_amplification = await self._try_collect_error_amplification(error_rate)
message_density = await self._try_collect_message_density(turns)
redundancy_rate = await self._try_collect_redundancy(agent_outputs)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect() only returns early for single-agent runs when baseline_store is provided. If is_multi_agent=False and baseline_store is None, the method falls through into the “Multi-agent run” block and may compute metrics (e.g., message density / amdahl ceiling) for a single-agent execution, which contradicts the docstring/spec (“single-agent runs … return empty metrics”). Suggest changing the control flow to return CoordinationMetrics() whenever is_multi_agent is false (recording a baseline only if a store is configured).

Copilot uses AI. Check for mistakes.
Comment on lines +61 to +74
async def start(self) -> None:
"""Start the background polling loop.

Creates an ``asyncio.Task`` that calls :meth:`poll_once`
repeatedly at the configured interval.
"""
self._task = asyncio.get_running_loop().create_task(
self._poll_loop(),
name="quota-poller",
)
logger.info(
QUOTA_POLLER_STARTED,
interval=self._config.poll_interval_seconds,
)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuotaPoller.start() always creates a new background task and overwrites self._task without checking whether the poller is already running. Calling start() twice will leak the original task and create multiple polling loops. Consider guarding with if self._task is not None and not self._task.done(): return/raise, and (optionally) no-op when config.enabled is false to enforce the opt-in behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +157 to +162
await _dispatch_quota_alert(
self._dispatcher,
snap=snap,
level=level,
usage_pct=usage_pct,
)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

poll_once() is documented as “Individual poll failures are logged and do not propagate”, but _check_snapshot() awaits _dispatch_quota_alert() without any exception handling. If NotificationDispatcher.dispatch() raises, the exception will bubble up and can terminate the background _poll_loop() task. Wrap the dispatch call (or the body of _check_snapshot) in a broad try/except that logs and returns, similar to the tracker error handling.

Suggested change
await _dispatch_quota_alert(
self._dispatcher,
snap=snap,
level=level,
usage_pct=usage_pct,
)
try:
await _dispatch_quota_alert(
self._dispatcher,
snap=snap,
level=level,
usage_pct=usage_pct,
)
except Exception:
logger.exception(
"Quota alert dispatch failed",
provider=snap.provider_name,
window=snap.window.value,
level=level,
)
return

Copilot uses AI. Check for mistakes.
agent_id=agent_id,
task_id=task_id,
)
except MemoryError, RecursionError:
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except MemoryError, RecursionError: is invalid syntax in Python 3 (multiple exception types must be parenthesized). This will prevent the module from importing. Use except (MemoryError, RecursionError): instead (and apply consistently for other multi-exception handlers).

Suggested change
except MemoryError, RecursionError:
except (MemoryError, RecursionError):

Copilot uses AI. Check for mistakes.
Comment on lines +282 to +299
try:
result = compute_efficiency(
success_rate=1.0, # Approx: no per-task success rate in ExecutionResult
turns_mas=max(float(turns_mas), 1.0),
turns_sas=turns_sas,
)
logger.info(
COORD_METRICS_EFFICIENCY_COMPUTED,
value=result.value,
turns_mas=turns_mas,
turns_sas=turns_sas,
)
except MemoryError, RecursionError:
raise
except Exception as exc:
logger.warning(
COORD_METRICS_COLLECTION_FAILED,
metric="efficiency",
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file uses except MemoryError, RecursionError: in multiple places, which is invalid syntax in Python 3 (multiple exception types must be parenthesized). Replace with except (MemoryError, RecursionError): to avoid import-time syntax errors.

Copilot uses AI. Check for mistakes.
Comment on lines +551 to +557
"""Fire notifications for coordination overhead threshold crossings."""
if self._notification_dispatcher is None:
return

overhead = metrics.overhead
if overhead is None:
return
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Collector docstrings say “When notification_dispatcher is None, alerts are logged but not dispatched”, but _fire_alerts() returns immediately when the dispatcher is None, so no alert event is logged at all. Either adjust the docs or log COORD_METRICS_ALERT_FIRED (or a dedicated event) even when no dispatcher is configured.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 84.77876% with 86 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.02%. Comparing base (a9acda3) to head (24fbfae).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/budget/coordination_collector.py 72.72% 39 Missing and 6 partials ⚠️
src/synthorg/budget/quota_poller.py 79.56% 17 Missing and 2 partials ⚠️
src/synthorg/engine/agent_engine.py 36.36% 6 Missing and 1 partial ⚠️
src/synthorg/budget/call_analytics.py 84.61% 5 Missing and 1 partial ⚠️
src/synthorg/budget/call_analytics_models.py 75.00% 4 Missing and 2 partials ⚠️
src/synthorg/engine/loop_protocol.py 80.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1127      +/-   ##
==========================================
- Coverage   89.07%   89.02%   -0.06%     
==========================================
  Files         825      834       +9     
  Lines       48164    48719     +555     
  Branches     4845     4908      +63     
==========================================
+ Hits        42904    43371     +467     
- Misses       4358     4432      +74     
- Partials      902      916      +14     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/providers/resilience/retry.py (1)

39-72: ⚠️ Potential issue | 🟠 Major

Race condition on shared mutable state in concurrent scenarios.

last_attempt_count and last_retry_reason are instance attributes on a shared RetryHandler. Per src/synthorg/providers/base.py:62-68, a single RetryHandler is reused across all concurrent complete() calls on the same provider instance.

If two coroutines invoke complete() concurrently:

  1. Coroutine A finishes execute(), returns
  2. Coroutine B starts execute(), resets last_attempt_count = 0 (line 60)
  3. Coroutine A reads self._retry_handler.last_attempt_count in base.py (line 130) — gets 0 instead of A's actual attempt count

This produces incorrect telemetry under concurrent load.

🔧 Suggested fix: return retry info from execute() instead of storing on self
+from dataclasses import dataclass
+
+@dataclass(frozen=True, slots=True)
+class RetryResult(Generic[T]):
+    """Result of a retry-wrapped execution."""
+    value: T
+    attempt_count: int
+    retry_reason: str | None
+
 class RetryHandler:
-    def __init__(self, config: RetryConfig) -> None:
-        self._config = config
-        self.last_attempt_count: int = 0
-        self.last_retry_reason: str | None = None
+    def __init__(self, config: RetryConfig) -> None:
+        self._config = config

     async def execute(
         self,
         func: Callable[[], Coroutine[object, object, T]],
-    ) -> T:
+    ) -> RetryResult[T]:
         ...
-        self.last_attempt_count = 0
-        self.last_retry_reason = None
+        attempt_count = 0
+        retry_reason: str | None = None
         ...
         for attempt in range(1 + self._config.max_retries):
-            self.last_attempt_count = attempt + 1
+            attempt_count = attempt + 1
             try:
                 return await func()
+                return RetryResult(await func(), attempt_count, retry_reason)
             except ProviderError as exc:
                 ...
-                self.last_retry_reason = type(exc).__name__
+                retry_reason = type(exc).__name__

Then update base.py to destructure the result.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/providers/resilience/retry.py` around lines 39 - 72, The
instance attributes last_attempt_count and last_retry_reason on the RetryHandler
are mutated inside execute(), causing race conditions when the same handler is
reused concurrently; change execute (in
src/synthorg/providers/resilience/retry.py) to return both the function result
and a small immutable retry info object/tuple (e.g., (result, attempt_count,
retry_reason)) instead of writing to self.last_attempt_count /
self.last_retry_reason, and update the caller in base.py (complete()) to
destructure that returned tuple and use the returned attempt_count and
retry_reason for telemetry; remove or stop relying on the mutable instance
fields to eliminate the race.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/budget/call_analytics.py`:
- Around line 52-93: The provider parameter on get_aggregation is currently
typed as str | None but must match CostTracker.get_records which expects
NotBlankStr | None; update the signature of get_aggregation to use NotBlankStr |
None for provider, add the corresponding NotBlankStr import at the top of the
module, and ensure any internal calls (the call to self._tracker.get_records)
pass the provider unchanged so the types align with CostTracker.get_records.

In `@src/synthorg/budget/call_classifier.py`:
- Around line 56-63: The fields tool_calls_made and agent_role allow
whitespace-only identifiers; change their types to use NotBlankStr from
core.types so identifier-like metadata cannot be blank. Replace tool_calls_made:
tuple[str, ...] with tuple[NotBlankStr, ...] (keeping default=() and
description) and change agent_role: str | None to agent_role: NotBlankStr | None
(keeping default=None and description); keep using Field for metadata and import
NotBlankStr from core.types if not already imported.

In `@src/synthorg/budget/coordination_collector.py`:
- Around line 451-543: The derived-metrics helpers (_try_collect_amdahl,
_try_collect_straggler_gap, _try_collect_token_speedup,
_try_collect_message_overhead) are being invoked unconditionally from collect()
even when config.collect disables those metrics; update the call sites in
collect() (or add checks inside each helper) to consult the same enablement flag
used elsewhere (_is_enabled or config.collect) before computing/logging each
metric (use the metric keys "amdahl_ceiling", "straggler_gap",
"token_speedup_ratio", "message_overhead"); ensure you short-circuit and return
None when the metric is disabled so expensive compute_* functions are not
executed.
- Around line 198-205: The BaselineRecord being created with
duration_seconds=0.0 makes it unusable for token_speedup calculations; modify
the code that constructs BaselineRecord (the BaselineRecord(...) block) to only
create/save a baseline when the actual duration is > 0 (e.g., check
execution_result.duration_seconds or equivalent and skip creating the
BaselineRecord if <= 0), so downstream _try_collect_token_speedup can compute
token_speedup_ratio correctly; do not record zero-duration baselines.
- Around line 139-267: The collect() method is too large; split it into clear
phases and move blocks into small helpers: extract run stats (turns,
error_turns/error_rate, total_tokens) into a new
_extract_run_stats(execution_result) helper, move the single-agent baseline
capture into _handle_single_agent_baseline(baseline_store, agent_id, task_id,
turns, error_rate, total_tokens) (which records BaselineRecord and returns
early), orchestrate metric collection calls into _compute_metrics_phase(...)
that calls the existing _try_collect_* helpers and returns the raw metric
values, and move aggregation/logging/alerting into
_aggregate_and_finalize(metrics, agent_id, task_id, is_multi_agent) which builds
CoordinationMetrics, logs COORD_METRICS_COLLECTION_COMPLETED and calls
_fire_alerts; update collect() to call these helpers in sequence and keep it
under 50 lines. Ensure you reference and use existing symbols: collect,
_try_collect_efficiency, _try_collect_overhead,
_try_collect_error_amplification, _try_collect_message_density,
_try_collect_redundancy, _try_collect_amdahl, _try_collect_straggler_gap,
_try_collect_token_speedup, _try_collect_message_overhead, BaselineRecord,
_baseline_store.record, CoordinationMetrics, and _fire_alerts.
- Around line 145-148: The collect() flow incorrectly infers multi/single-agent
mode leading to misrouting; make execution mode explicit and short-circuit
single-agent runs: update collect() in coordination_collector.py to accept and
honor an explicit is_multi_agent boolean (and perform an early return when False
regardless of _baseline_store), ensure any multi-agent branching only runs when
is_multi_agent is True and _baseline_store exists, and update AgentEngine
(engine/agent_engine.py) call sites that invoke collect() to pass the correct
is_multi_agent flag (true for multi-agent runs, false for single-agent) — apply
the same explicit-flag change to the other collect overload/usage region around
the second occurrence (the section referenced as 195-215) so both paths behave
consistently.

In `@src/synthorg/budget/quota_poller.py`:
- Around line 61-74: The start() method currently overwrites self._task allowing
multiple poll loops; make start() idempotent or fail-fast by checking the
existing self._task before creating a new task: if self._task exists and not
self._task.done() either return immediately (idempotent behavior) or raise a
clear exception (fail-fast) so a second start() cannot spawn a second
_poll_loop; update start() to reference self._task and _poll_loop, and ensure
stop() still cancels the active task as implemented.
- Around line 112-116: The poll loop must not die on unexpected exceptions from
poll_once/_check_snapshot/_dispatch_quota_alert; wrap the await self.poll_once()
call in a try/except that catches Exception, re-raises asyncio.CancelledError,
logs the full exception (e.g., self._logger.exception or similar) and continues
the loop so future polling still runs; update _poll_loop to perform this guarded
call while preserving cancellation semantics.

In `@src/synthorg/engine/agent_engine.py`:
- Around line 877-881: CoordinationResult returned by
MultiAgentCoordinator.coordinate() isn't passed into the metrics collector, so
multi-agent runs never call CoordinationMetricsCollector.collect(); update
AgentEngine.coordinate() (or the controller that invokes it) to, after awaiting
MultiAgentCoordinator.coordinate(), extract per-agent info from
CoordinationResult.waves (e.g., agent durations and outputs), compute team_size
and total duration/cost as needed, and call
self._coordination_metrics_collector.collect(...) with is_multi_agent=True,
team_size=..., agent_durations=..., agent_outputs=... (and any other fields used
by the single-agent call) to mirror the single-agent collection performed in
AgentEngine.run().

In `@tests/unit/budget/test_call_classifier_properties.py`:
- Around line 34-88: Add a property test to assert COORDINATION precedence over
SYSTEM by creating a Hypothesis build of ClassificationContext (use the existing
patterns in TestCallClassifierProperties) with is_embedding_operation set to
False, tool_calls_made as (), agent_role None, and booleans for
is_delegation/is_review/is_meeting/is_planning_phase/is_system_prompt/is_quality_judge,
then .filter(...) to require at least one of the coordination flags
(is_delegation or is_review or is_meeting) be True and assert classify_call(ctx)
== LLMCallCategory.COORDINATION; place this new test method alongside the other
tests in TestCallClassifierProperties using the same strategies for
turn_number/agent_id/task_id to match style.

In `@tests/unit/budget/test_coordination_collector_properties.py`:
- Around line 121-149: The test test_overhead_sign_consistent_with_turns is
flaky because n_turns = max(1, round(turns_mas)) is compared to the original
float turns_sas; change the check to compare the actual integer turns used
(n_turns) against the integer baseline used to build the store (e.g., use
round(turns_sas) or compute baseline_n_turns = max(1, round(turns_sas)) and
compare n_turns >= baseline_n_turns) so the assertion on
result.overhead.value_percent reflects the turns passed to collector.collect
rather than the original float turns_sas.

In `@tests/unit/budget/test_cost_record.py`:
- Around line 393-409: The tests are missing coverage that
`_validate_retry_consistency` on CostRecord rejects invalid combinations; add
unit tests that instantiate CostRecord with retry_reason set but retry_count
omitted (None) and with retry_count set to 0, and assert that both raise
pydantic.ValidationError (use pytest.raises and match a message like
"retry_reason set implies retry_count"); reference CostRecord and the validator
name `_validate_retry_consistency` in the test names (e.g.,
test_retry_reason_without_retry_count_rejected and
test_retry_reason_with_zero_retry_count_rejected) so failures map clearly to the
validation logic.

In `@tests/unit/budget/test_quota_poller.py`:
- Around line 218-243: The test uses a slightly non-obvious datetime mock
pattern in test_cooldown_expiry_allows_re_alert: mock_dt.now.side_effect is used
to control datetime.now() returns while mock_dt.side_effect = datetime lets real
datetime(...) constructor calls pass through; update the test to add a short
clarifying comment immediately above the two lines referencing
mock_dt.now.side_effect and mock_dt.side_effect so future readers understand
that mock_dt.now controls now() and mock_dt.side_effect preserves the real
datetime constructor behavior.

In `@tests/unit/providers/test_retry.py`:
- Around line 114-126: Update the test_non_retryable_error_count_one test to
also assert that handler.last_retry_reason is None after a non-retryable error;
locate the test function (test_non_retryable_error_count_one) and the
RetryHandler instance created there, and add an assertion like assert
handler.last_retry_reason is None to confirm no retry reason was recorded when
InvalidRequestError is raised.

---

Outside diff comments:
In `@src/synthorg/providers/resilience/retry.py`:
- Around line 39-72: The instance attributes last_attempt_count and
last_retry_reason on the RetryHandler are mutated inside execute(), causing race
conditions when the same handler is reused concurrently; change execute (in
src/synthorg/providers/resilience/retry.py) to return both the function result
and a small immutable retry info object/tuple (e.g., (result, attempt_count,
retry_reason)) instead of writing to self.last_attempt_count /
self.last_retry_reason, and update the caller in base.py (complete()) to
destructure that returned tuple and use the returned attempt_count and
retry_reason for telemetry; remove or stop relying on the mutable instance
fields to eliminate the race.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0b7bd395-6806-4ab7-b6d6-10d3b90f7609

📥 Commits

Reviewing files that changed from the base of the PR and between 59b31f9 and 98e9573.

📒 Files selected for processing (44)
  • src/synthorg/budget/__init__.py
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/call_analytics_config.py
  • src/synthorg/budget/call_analytics_models.py
  • src/synthorg/budget/call_classifier.py
  • src/synthorg/budget/coordination_collector.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/budget/quota_poller.py
  • src/synthorg/budget/quota_poller_config.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/cost_recording.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/providers/base.py
  • src/synthorg/providers/models.py
  • src/synthorg/providers/resilience/retry.py
  • tests/unit/budget/conftest.py
  • tests/unit/budget/test_baseline_store.py
  • tests/unit/budget/test_call_analytics.py
  • tests/unit/budget/test_call_analytics_config.py
  • tests/unit/budget/test_call_analytics_properties.py
  • tests/unit/budget/test_call_classifier.py
  • tests/unit/budget/test_call_classifier_properties.py
  • tests/unit/budget/test_coordination_collector.py
  • tests/unit/budget/test_coordination_collector_properties.py
  • tests/unit/budget/test_cost_record.py
  • tests/unit/budget/test_quota_poller.py
  • tests/unit/budget/test_quota_poller_config.py
  • tests/unit/engine/test_cost_recording.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/observability/test_events.py
  • tests/unit/providers/test_base_provider.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_retry.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Agent
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Web
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: (no parentheses) instead of except (A, B): for multiple exceptions—PEP 758 except syntax enforced by ruff on Python 3.14.
All public functions must have type hints and Google-style docstrings (enforced by ruff D rules). mypy strict mode is required.
Line length must be 88 characters (enforced by ruff).

Files:

  • src/synthorg/engine/hybrid_loop.py
  • tests/unit/budget/conftest.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/providers/models.py
  • src/synthorg/engine/cost_recording.py
  • src/synthorg/engine/plan_execute_loop.py
  • tests/unit/observability/test_events.py
  • tests/unit/providers/test_models.py
  • src/synthorg/providers/resilience/retry.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/providers/base.py
  • tests/unit/engine/test_loop_protocol.py
  • src/synthorg/engine/loop_protocol.py
  • tests/unit/budget/test_call_analytics_properties.py
  • src/synthorg/engine/loop_helpers.py
  • tests/unit/budget/test_cost_record.py
  • tests/unit/engine/test_cost_recording.py
  • tests/unit/providers/test_retry.py
  • tests/unit/budget/test_call_analytics_config.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/budget/test_call_classifier_properties.py
  • tests/unit/budget/test_coordination_collector_properties.py
  • tests/unit/budget/test_call_analytics.py
  • src/synthorg/budget/__init__.py
  • src/synthorg/budget/cost_record.py
  • tests/unit/budget/test_call_classifier.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/engine/agent_engine.py
  • tests/unit/budget/test_baseline_store.py
  • tests/unit/budget/test_coordination_collector.py
  • tests/unit/budget/test_quota_poller_config.py
  • tests/unit/providers/test_base_provider.py
  • src/synthorg/budget/call_analytics_config.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/budget/call_analytics_models.py
  • tests/unit/budget/test_quota_poller.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/call_classifier.py
  • src/synthorg/budget/quota_poller_config.py
  • src/synthorg/budget/quota_poller.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/coordination_collector.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Prefer creating new objects over mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf at validation time.
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Functions must be < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr) for handler constructio...

Files:

  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/providers/models.py
  • src/synthorg/engine/cost_recording.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/providers/resilience/retry.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/providers/base.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/budget/__init__.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/budget/call_analytics_config.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/budget/call_analytics_models.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/call_classifier.py
  • src/synthorg/budget/quota_poller_config.py
  • src/synthorg/budget/quota_poller.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/coordination_collector.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/providers/models.py
  • src/synthorg/engine/cost_recording.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/providers/resilience/retry.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/providers/base.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/budget/__init__.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/budget/call_analytics_config.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/budget/call_analytics_models.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/call_classifier.py
  • src/synthorg/budget/quota_poller_config.py
  • src/synthorg/budget/quota_poller.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/coordination_collector.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Use asyncio_mode = "auto" for async tests—no manual @pytest.mark.asyncio needed.
Global test timeout is 30 seconds per test (configured in pyproject.toml). Do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed.
Prefer @pytest.mark.parametrize for testing similar cases.
In tests, use test-provider, test-small-001, etc. instead of real vendor names.
Use Hypothesis for property-based testing: @given + @settings. Hypothesis profiles configured in tests/conftest.py: ci (deterministic, 10 examples), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Controlled via HYPOTHESIS_PROFILE env var.

Files:

  • tests/unit/budget/conftest.py
  • tests/unit/observability/test_events.py
  • tests/unit/providers/test_models.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/budget/test_call_analytics_properties.py
  • tests/unit/budget/test_cost_record.py
  • tests/unit/engine/test_cost_recording.py
  • tests/unit/providers/test_retry.py
  • tests/unit/budget/test_call_analytics_config.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/budget/test_call_classifier_properties.py
  • tests/unit/budget/test_coordination_collector_properties.py
  • tests/unit/budget/test_call_analytics.py
  • tests/unit/budget/test_call_classifier.py
  • tests/unit/budget/test_baseline_store.py
  • tests/unit/budget/test_coordination_collector.py
  • tests/unit/budget/test_quota_poller_config.py
  • tests/unit/providers/test_base_provider.py
  • tests/unit/budget/test_quota_poller.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/budget/conftest.py
  • tests/unit/observability/test_events.py
  • tests/unit/providers/test_models.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/budget/test_call_analytics_properties.py
  • tests/unit/budget/test_cost_record.py
  • tests/unit/engine/test_cost_recording.py
  • tests/unit/providers/test_retry.py
  • tests/unit/budget/test_call_analytics_config.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/budget/test_call_classifier_properties.py
  • tests/unit/budget/test_coordination_collector_properties.py
  • tests/unit/budget/test_call_analytics.py
  • tests/unit/budget/test_call_classifier.py
  • tests/unit/budget/test_baseline_store.py
  • tests/unit/budget/test_coordination_collector.py
  • tests/unit/budget/test_quota_poller_config.py
  • tests/unit/providers/test_base_provider.py
  • tests/unit/budget/test_quota_poller.py
src/synthorg/providers/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/providers/**/*.py: All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately. RetryExhaustedError signals all retries failed.
Rate limiter respects RateLimitError.retry_after from providers—automatically pauses future requests.

Files:

  • src/synthorg/providers/models.py
  • src/synthorg/providers/resilience/retry.py
  • src/synthorg/providers/base.py
🧠 Learnings (52)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.

Applied to files:

  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/providers/models.py
  • src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.

Applied to files:

  • src/synthorg/providers/models.py
  • src/synthorg/providers/base.py
  • tests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.

Applied to files:

  • src/synthorg/providers/models.py
  • src/synthorg/providers/base.py
  • tests/unit/providers/test_base_provider.py
📚 Learning: 2026-04-07T16:48:04.262Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T16:48:04.262Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.

Applied to files:

  • src/synthorg/providers/models.py
  • src/synthorg/providers/base.py
  • tests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-31T14:17:24.182Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:17:24.182Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically; never implement retry logic in driver subclasses or calling code

Applied to files:

  • src/synthorg/providers/models.py
  • src/synthorg/providers/base.py
  • tests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.

Applied to files:

  • src/synthorg/providers/models.py
  • src/synthorg/providers/base.py
  • tests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly

Applied to files:

  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/call_classification.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed

Applied to files:

  • src/synthorg/providers/resilience/retry.py
  • tests/unit/providers/test_retry.py
📚 Learning: 2026-04-07T16:48:04.262Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T16:48:04.262Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately. `RetryExhaustedError` signals all retries failed.

Applied to files:

  • src/synthorg/providers/resilience/retry.py
  • tests/unit/providers/test_retry.py
  • tests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Applied to files:

  • src/synthorg/providers/resilience/retry.py
  • tests/unit/providers/test_retry.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.

Applied to files:

  • src/synthorg/providers/resilience/retry.py
  • tests/unit/providers/test_retry.py
  • src/synthorg/budget/call_analytics_config.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains. Rate limiter respects `RateLimitError.retry_after` from providers — automatically pauses future requests.

Applied to files:

  • src/synthorg/providers/resilience/retry.py
  • tests/unit/providers/test_retry.py
📚 Learning: 2026-04-07T16:48:04.262Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T16:48:04.262Z
Learning: Applies to src/synthorg/providers/**/*.py : Rate limiter respects `RateLimitError.retry_after` from providers—automatically pauses future requests.

Applied to files:

  • src/synthorg/providers/resilience/retry.py
  • tests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.

Applied to files:

  • src/synthorg/providers/resilience/retry.py
📚 Learning: 2026-03-31T14:17:24.182Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:17:24.182Z
Learning: Applies to src/synthorg/providers/**/*.py : Rate limiter respects `RateLimitError.retry_after` from providers — automatically pauses future requests

Applied to files:

  • src/synthorg/providers/resilience/retry.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/providers/**/*.py : Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests.

Applied to files:

  • src/synthorg/providers/resilience/retry.py
📚 Learning: 2026-03-31T21:07:37.469Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.469Z
Learning: Applies to src/synthorg/providers/**/*.py : Set `RetryConfig` and `RateLimiterConfig` per-provider in `ProviderConfig`

Applied to files:

  • src/synthorg/providers/resilience/retry.py
  • src/synthorg/engine/loop_helpers.py
  • tests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging

Applied to files:

  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/quota.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Never implement retry logic in provider subclasses or calling code — it is handled automatically by `BaseCompletionProvider` with `RetryConfig` and `RateLimiterConfig` per-provider

Applied to files:

  • src/synthorg/providers/base.py
  • tests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Applied to files:

  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Applied to files:

  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/budget/coordination_collector.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/budget/test_call_analytics_properties.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/budget/test_coordination_collector.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Set `RetryConfig` and `RateLimiterConfig` per-provider in `ProviderConfig`.

Applied to files:

  • src/synthorg/engine/loop_helpers.py
  • tests/unit/providers/test_base_provider.py
  • src/synthorg/budget/call_analytics_config.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/api/**/*.py : Authentication uses JWT + API key. Approval gate integration for high-risk operations.

Applied to files:

  • src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

  • tests/unit/budget/test_call_classifier_properties.py
  • tests/unit/budget/test_coordination_collector_properties.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • tests/unit/budget/test_call_analytics.py
  • src/synthorg/budget/__init__.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/budget/call_analytics_config.py
  • src/synthorg/budget/call_analytics_models.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/call_classifier.py
  • src/synthorg/budget/quota_poller_config.py
  • src/synthorg/budget/quota_poller.py
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/coordination_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/budget/__init__.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/budget/call_analytics_config.py
  • src/synthorg/budget/call_analytics_models.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/call_classifier.py
  • src/synthorg/budget/quota_poller_config.py
  • src/synthorg/budget/quota_poller.py
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/coordination_collector.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • src/synthorg/budget/__init__.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/budget/call_classifier.py
  • src/synthorg/budget/quota_poller.py
  • src/synthorg/budget/baseline_store.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • src/synthorg/budget/__init__.py
  • src/synthorg/budget/call_analytics.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals

Applied to files:

  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/quota.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.

Applied to files:

  • src/synthorg/engine/agent_engine.py
  • src/synthorg/budget/coordination_collector.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction and wrap with `MappingProxyType` for read-only enforcement

Applied to files:

  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-04-07T16:48:04.262Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T16:48:04.262Z
Learning: Applies to src/synthorg/providers/**/*.py : RetryConfig and RateLimiterConfig are set per-provider in `ProviderConfig`.

Applied to files:

  • tests/unit/providers/test_base_provider.py
  • src/synthorg/budget/call_analytics_config.py
📚 Learning: 2026-03-16T19:13:36.562Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:36.562Z
Learning: Applies to src/synthorg/providers/**/*.py : RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.

Applied to files:

  • tests/unit/providers/test_base_provider.py
  • src/synthorg/budget/call_analytics_config.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves

Applied to files:

  • src/synthorg/budget/call_analytics_config.py
  • src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/budget/call_analytics_config.py
  • src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state

Applied to files:

  • src/synthorg/budget/call_analytics_config.py
  • src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.

Applied to files:

  • src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-04-01T09:37:49.451Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:37:49.451Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with `model_copy(update=...)` for runtime state that evolves

Applied to files:

  • src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`

Applied to files:

  • src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-04-07T16:48:04.262Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T16:48:04.262Z
Learning: Applies to src/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/budget/quota_poller_config.py

Comment on lines +114 to +126
async def test_non_retryable_error_count_one(self) -> None:
"""Non-retryable errors raise immediately with count=1."""
from synthorg.providers.errors import InvalidRequestError

handler = RetryHandler(_config(max_retries=3))

async def _bad_request() -> str:
raise InvalidRequestError("bad") # noqa: EM101

with pytest.raises(InvalidRequestError):
await handler.execute(_bad_request)

assert handler.last_attempt_count == 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider asserting last_retry_reason is None for non-retryable errors.

The test verifies last_attempt_count == 1 but doesn't assert the expected value of last_retry_reason. Based on the test's docstring ("Non-retryable errors raise immediately with count=1"), last_retry_reason should likely be None since no retry occurred.

💡 Suggested addition
         with pytest.raises(InvalidRequestError):
             await handler.execute(_bad_request)
 
         assert handler.last_attempt_count == 1
+        assert handler.last_retry_reason is None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/providers/test_retry.py` around lines 114 - 126, Update the
test_non_retryable_error_count_one test to also assert that
handler.last_retry_reason is None after a non-retryable error; locate the test
function (test_non_retryable_error_count_one) and the RetryHandler instance
created there, and add an assertion like assert handler.last_retry_reason is
None to confirm no retry reason was recorded when InvalidRequestError is raised.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/budget/test_coordination_config.py (1)

19-27: 🧹 Nitpick | 🔵 Trivial

Consider extending test_values to cover new enum members.

The test_member_count assertion is correctly updated to 9. However, test_values only asserts the original 5 members. For completeness, consider adding assertions for the 4 new members (AMDAHL_CEILING, STRAGGLER_GAP, TOKEN_SPEEDUP_RATIO, MESSAGE_OVERHEAD).

🧪 Suggested additions to test_values
     def test_values(self) -> None:
         assert CoordinationMetricName.EFFICIENCY.value == "efficiency"
         assert CoordinationMetricName.OVERHEAD.value == "overhead"
         assert CoordinationMetricName.ERROR_AMPLIFICATION.value == "error_amplification"
         assert CoordinationMetricName.MESSAGE_DENSITY.value == "message_density"
         assert CoordinationMetricName.REDUNDANCY.value == "redundancy"
+        assert CoordinationMetricName.AMDAHL_CEILING.value == "amdahl_ceiling"
+        assert CoordinationMetricName.STRAGGLER_GAP.value == "straggler_gap"
+        assert CoordinationMetricName.TOKEN_SPEEDUP_RATIO.value == "token_speedup_ratio"
+        assert CoordinationMetricName.MESSAGE_OVERHEAD.value == "message_overhead"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/budget/test_coordination_config.py` around lines 19 - 27, Update
test_values to assert the string values for the four new enum members on
CoordinationMetricName: add assertions that
CoordinationMetricName.AMDAHL_CEILING.value == "amdahl_ceiling",
CoordinationMetricName.STRAGGLER_GAP.value == "straggler_gap",
CoordinationMetricName.TOKEN_SPEEDUP_RATIO.value == "token_speedup_ratio", and
CoordinationMetricName.MESSAGE_OVERHEAD.value == "message_overhead"; keep the
existing five assertions and leave test_member_count as-is (already expecting
9).
♻️ Duplicate comments (2)
src/synthorg/engine/agent_engine.py (1)

877-882: ⚠️ Potential issue | 🔴 Critical

Multi-agent runs still never reach the coordination collector.

This hook always records a single-agent baseline (is_multi_agent=False), and Line 475 still returns self._coordinator.coordinate(context) directly. Because CoordinationMetricsCollector.collect() returns early for single-agent runs, multi-agent executions never compute Ec/O%/R/etc. Add a post-coordinate() collection step that extracts team_size, agent_durations, and agent_outputs from CoordinationResult and calls collect(..., is_multi_agent=True).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 877 - 882, The metrics
collector call currently hardcodes is_multi_agent=False and returns the
coordinator result directly, preventing multi-agent metrics from being recorded;
after calling self._coordinator.coordinate(context) (which returns a
CoordinationResult), extract CoordinationResult.team_size,
CoordinationResult.agent_durations, and CoordinationResult.agent_outputs and
then call self._coordination_metrics_collector.collect(...) again with
execution_result set to that CoordinationResult (or relevant summary),
agent_id/task_id as before, and is_multi_agent=True so multi-agent Ec/O%/R/etc.
are computed and stored.
src/synthorg/budget/coordination_collector.py (1)

255-265: ⚠️ Potential issue | 🟠 Major

Don't persist synthetic baselines from empty or latency-derived runs.

Line 261 turns a 0-turn execution into turns=1.0, and Lines 255-257 derive duration_seconds from summed model latency only. Timeout/provider-error paths can return an ExecutionResult with no turns, so this stores a fake 1 turn / 0s SAS baseline and skews overhead, error amplification, and token-speedup. Skip baseline capture when no turns were recorded, and pass the actual run duration into the collector instead of inferring it from per-turn latency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/coordination_collector.py` around lines 255 - 265, The
code currently fakes a 1-turn baseline and infers duration from per-turn
latency; change the capture so it skips persisting a BaselineRecord when
execution_result.turns is empty, and use the actual run duration passed into the
collector instead of summing per-turn latency. Concretely: in the function that
builds BaselineRecord (references: execution_result.turns, duration_seconds,
BaselineRecord, and the local turns variable), add a guard that returns/does not
create a baseline if not execution_result.turns, remove the max(float(turns),
1.0) coercion (use float(turns) as-is), and replace the derived duration_seconds
calculation with a provided run_duration_seconds parameter (or an existing real
run-duration variable) when constructing BaselineRecord.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/budget/call_classifier.py`:
- Around line 14-20: The module is missing the required module-level logger;
import get_logger from synthorg.observability and add a module logger variable
by assigning logger = get_logger(__name__) at top-level (near the other imports)
so the module-level logging convention is satisfied for
src/synthorg/budget/call_classifier.py and the logger symbol is available to any
functions/classes in that module.

In `@src/synthorg/budget/coordination_collector.py`:
- Around line 670-689: Move the logger.warning call that emits
COORD_METRICS_ALERT_FIRED so it runs only after
self._notification_dispatcher.dispatch(...) completes successfully; currently
logger.warning(...) is executed before awaiting dispatch, which records a fired
alert even if dispatch fails. Locate the block using logger.warning(...,
COORD_METRICS_ALERT_FIRED, agent_id=agent_id, ...) and the await
self._notification_dispatcher.dispatch(Notification(...)) call and change
ordering so the dispatch is awaited first, and then call logger.warning after
the await (keeping the existing exception handling semantics and re-raising
MemoryError/RecursionError as currently done).

In `@src/synthorg/budget/quota_poller.py`:
- Around line 100-107: Add explicit re-raise guards for fatal system exceptions
before each broad "except Exception" handler in this module: where you call
self._tracker.get_all_snapshots() (the block that logs QUOTA_POLL_FAILED), the
subsequent snapshot processing block (the other broad except around snapshot
handling), and the budget update/error handling block later in the poller;
insert "except (MemoryError, RecursionError): raise" immediately before the
existing "except Exception as exc:" lines so MemoryError and RecursionError
propagate, leaving the existing logger.exception(...) and return behavior for
non-fatal exceptions intact.
- Around line 145-167: The cooldown logic uses wall-clock datetime.now(UTC) and
should use a monotonic clock to avoid clock-skew issues: change the code in the
quota poller that reads/writes self._cooldown keyed by _CooldownKey (currently
using datetime.now(UTC)) to store and compare time.monotonic() values instead,
compute elapsed = time.monotonic() - last and compare against
self._config.cooldown_seconds, and when setting the cooldown (after logging
QUOTA_THRESHOLD_ALERT) assign time.monotonic() to self._cooldown[key]; leave
logger calls and keys (snap.provider_name, snap.window, level,
QUOTA_ALERT_COOLDOWN_ACTIVE, QUOTA_THRESHOLD_ALERT) intact.

In `@src/synthorg/engine/loop_protocol.py`:
- Around line 97-100: The retry_reason field currently typed as str | None
should use NotBlankStr | None per guidelines; update the type annotation for
retry_reason in loop_protocol.py to NotBlankStr | None, import NotBlankStr from
core.types, and keep the Field(...) metadata unchanged so identifier/name
validation is enforced by the NotBlankStr type instead of relying on manual
validators.

In `@tests/unit/budget/test_quota_poller.py`:
- Around line 290-300: The tests are missing a regression asserting that
QuotaPoller.start() is idempotent; add a new assertion in the lifecycle tests
(or modify test_start_creates_background_task) to call poller.start() a second
time and assert the poller._task remains the same object (e.g., store original =
poller._task, await poller.start() again, then assert poller._task is original
and not a new Task), then cleanup with await poller.stop(); this ensures the
duplicate-start no-op contract on QuotaPoller.start() is enforced.

---

Outside diff comments:
In `@tests/unit/budget/test_coordination_config.py`:
- Around line 19-27: Update test_values to assert the string values for the four
new enum members on CoordinationMetricName: add assertions that
CoordinationMetricName.AMDAHL_CEILING.value == "amdahl_ceiling",
CoordinationMetricName.STRAGGLER_GAP.value == "straggler_gap",
CoordinationMetricName.TOKEN_SPEEDUP_RATIO.value == "token_speedup_ratio", and
CoordinationMetricName.MESSAGE_OVERHEAD.value == "message_overhead"; keep the
existing five assertions and leave test_member_count as-is (already expecting
9).

---

Duplicate comments:
In `@src/synthorg/budget/coordination_collector.py`:
- Around line 255-265: The code currently fakes a 1-turn baseline and infers
duration from per-turn latency; change the capture so it skips persisting a
BaselineRecord when execution_result.turns is empty, and use the actual run
duration passed into the collector instead of summing per-turn latency.
Concretely: in the function that builds BaselineRecord (references:
execution_result.turns, duration_seconds, BaselineRecord, and the local turns
variable), add a guard that returns/does not create a baseline if not
execution_result.turns, remove the max(float(turns), 1.0) coercion (use
float(turns) as-is), and replace the derived duration_seconds calculation with a
provided run_duration_seconds parameter (or an existing real run-duration
variable) when constructing BaselineRecord.

In `@src/synthorg/engine/agent_engine.py`:
- Around line 877-882: The metrics collector call currently hardcodes
is_multi_agent=False and returns the coordinator result directly, preventing
multi-agent metrics from being recorded; after calling
self._coordinator.coordinate(context) (which returns a CoordinationResult),
extract CoordinationResult.team_size, CoordinationResult.agent_durations, and
CoordinationResult.agent_outputs and then call
self._coordination_metrics_collector.collect(...) again with execution_result
set to that CoordinationResult (or relevant summary), agent_id/task_id as
before, and is_multi_agent=True so multi-agent Ec/O%/R/etc. are computed and
stored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 72a015ae-4f27-44f7-8feb-7e004c141465

📥 Commits

Reviewing files that changed from the base of the PR and between 98e9573 and 1d87fa8.

📒 Files selected for processing (26)
  • CLAUDE.md
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/call_classifier.py
  • src/synthorg/budget/coordination_collector.py
  • src/synthorg/budget/coordination_config.py
  • src/synthorg/budget/quota_poller.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/providers/base.py
  • src/synthorg/providers/resilience/retry.py
  • tests/unit/budget/test_call_analytics.py
  • tests/unit/budget/test_call_classifier_properties.py
  • tests/unit/budget/test_coordination_collector.py
  • tests/unit/budget/test_coordination_collector_properties.py
  • tests/unit/budget/test_coordination_config.py
  • tests/unit/budget/test_cost_record.py
  • tests/unit/budget/test_quota_poller.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/providers/resilience/test_retry.py
  • tests/unit/providers/test_retry.py

Implements the full runtime analytics and metrics pipeline across four
issues.

Issue #226: Coordination metrics runtime collection
- BaselineStore: sliding-window store for single-agent baseline data
- CoordinationMetricsCollector: post-execution collector for all 9
  coordination metrics (efficiency, overhead, error amplification,
  message density, redundancy, amdahl, message overhead, straggler
  gap, token speedup)
- AgentEngine._try_collect_coordination_metrics() integration
- SimilarityComputer protocol for pluggable redundancy computation
- 11 new observability event constants in coordination_metrics.py

Issue #225: Runtime LLM call categorization
- ClassificationContext: frozen Pydantic context for a single call
- RulesBasedClassifier: priority-ordered classifier
  (EMBEDDING, COORDINATION, SYSTEM, PRODUCTIVE)
- CallClassificationStrategy protocol for custom classifiers
- classify_call() convenience function
- call_classification.py event module

Issue #227: Full per-call analytics layer
- CostRecord: +6 optional fields (latency_ms, cache_hit, retry_count,
  retry_reason, finish_reason, success)
- TurnRecord: +4 optional fields + computed success field
- CompletionResponse.provider_metadata dict for provider enrichment
- RetryHandler tracks last_attempt_count and last_retry_reason
- BaseCompletionProvider.complete() injects _synthorg_latency_ms,
  _synthorg_retry_count, _synthorg_retry_reason via provider_metadata
- make_turn_record() extracts metadata into TurnRecord fields
- All 6 loop call sites updated to pass provider_metadata
- cost_recording.py propagates all new fields TurnRecord to CostRecord
- CallAnalyticsConfig + RetryAlertConfig frozen Pydantic configs
- AnalyticsAggregation model for aggregated call statistics
- CallAnalyticsService: get_aggregation() + check_alerts()
- 5 new event constants in analytics.py

Issue #224: Proactive quota polling and alerting
- QuotaAlertThresholds: warn_pct/critical_pct with ordering invariant
- QuotaPollerConfig: enabled/interval/cooldown/thresholds config
- QuotaPoller: background asyncio.Task loop with poll_once(), start(),
  stop(); per-provider/window/level cooldown tracking; tracker error
  resilience
- 7 new event constants in quota.py

Exports: all new types exported from synthorg.budget.__init__

Tests (TDD throughout):
- test_baseline_store.py, test_coordination_collector.py,
  test_coordination_collector_properties.py (Phase 1)
- test_call_classifier.py, test_call_classifier_properties.py (Phase 2)
- test_cost_record.py, test_loop_protocol.py, test_retry.py,
  test_base_provider.py, test_loop_helpers.py, test_cost_recording.py,
  test_call_analytics_config.py, test_call_analytics.py,
  test_call_analytics_properties.py (Phase 3)
- test_quota_poller_config.py, test_quota_poller.py (Phase 4)
- test_events.py updated with call_classification module

Closes #226
Closes #225
Closes #227
Closes #224
…nd CodeRabbit

Critical fixes:
- RetryHandler race condition: return immutable RetryResult from execute()
  instead of storing mutable state on shared instance (CodeRabbit)
- Single-agent fallthrough: short-circuit regardless of baseline_store
  to prevent single-agent runs entering multi-agent metric path (CodeRabbit, Copilot)
- Explicit is_multi_agent=False in AgentEngine metrics collection (CodeRabbit)

Major fixes:
- Derived metrics (amdahl, straggler, token_speedup, message_overhead)
  now respect _is_enabled() via new CoordinationMetricName entries (CodeRabbit)
- Baseline duration_seconds uses actual turn latencies instead of 0.0 (Gemini)
- Provider metadata merged instead of replaced in complete() (Copilot)
- make_turn_record() uses isinstance checks instead of unsafe coercion (Copilot)
- QuotaPoller.start() is now idempotent (CodeRabbit, Copilot)
- _poll_loop() guarded with try/except to survive dispatch failures (CodeRabbit)
- _dispatch_quota_alert guarded in _check_snapshot (Copilot)
- provider param typed as NotBlankStr in CallAnalyticsService (CodeRabbit)
- ClassificationContext uses NotBlankStr for tool_calls_made/agent_role (CodeRabbit)
- collect() split into _record_baseline + _collect_multi_agent helpers (CodeRabbit)
- _fire_alerts docstring corrected for dispatcher=None behavior (Copilot)
- FinishReason enum members used instead of string comparison (Gemini)
- success_rate computed from error_rate instead of hardcoded 1.0 (Gemini)

Medium fixes:
- TurnRecord gains _validate_retry_consistency validator (type-design)
- Flaky property test uses integers instead of floats (CodeRabbit)
- Retry validation rejection tests added to test_cost_record (CodeRabbit)
- COORDINATION > SYSTEM priority property test added (CodeRabbit)
- _p95() dedicated unit tests added (pr-test-analyzer)
- CLAUDE.md logging section updated with analytics/classification/quota events

Classifier integration (issue #225):
- classify_turn() helper wired into all 4 execution loops
- Planning turns classified as SYSTEM, executor turns classified dynamically
- Replaces hardcoded LLMCallCategory.PRODUCTIVE at all call sites
- Add module-level logger to call_classifier.py (logging convention)
- Move alert log after successful dispatch in _fire_alerts (log accuracy)
- Add MemoryError/RecursionError guard in quota tracker error handler
- Switch cooldown tracking from datetime.now(UTC) to time.monotonic()
- Use NotBlankStr for TurnRecord.retry_reason (type convention)
- Skip baseline recording for empty/zero-duration executions
- Add enum value assertions for 4 new CoordinationMetricName members
- Add idempotent start() regression test for QuotaPoller
@Aureliolo Aureliolo force-pushed the feat/analytics-metrics-pipeline branch from 1d87fa8 to e35a792 Compare April 7, 2026 21:16
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview April 7, 2026 21:18 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (2)
src/synthorg/budget/quota_poller.py (2)

171-183: ⚠️ Potential issue | 🟠 Major

Add MemoryError, RecursionError guard in _check_snapshot dispatch block.

The except Exception at line 179 swallows fatal system errors during alert dispatch.

🛠️ Proposed fix
         if self._dispatcher is not None:
             try:
                 await _dispatch_quota_alert(
                     self._dispatcher,
                     snap=snap,
                     level=level,
                     usage_pct=usage_pct,
                 )
+            except MemoryError, RecursionError:
+                raise
             except Exception:
                 logger.exception(
                     QUOTA_POLL_FAILED,
                     error="quota_alert_dispatch_failed",
                 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/quota_poller.py` around lines 171 - 183, The current
broad except in the _check_snapshot dispatch block (around the call to
_dispatch_quota_alert) swallows fatal exceptions like MemoryError and
RecursionError; change the handler to re-raise those fatal exceptions while
still logging other exceptions — e.g., catch the exception object, if it is an
instance of (MemoryError, RecursionError) re-raise it, otherwise call
logger.exception(QUOTA_POLL_FAILED, error="quota_alert_dispatch_failed") as
before; reference symbols: _check_snapshot, _dispatch_quota_alert,
QUOTA_POLL_FAILED, logger.exception.

119-131: ⚠️ Potential issue | 🟠 Major

Add MemoryError, RecursionError guard in _poll_loop.

The except Exception at line 126 swallows fatal system errors. Per past review and project convention, these must propagate.

🛠️ Proposed fix
     async def _poll_loop(self) -> None:
         """Background task: poll repeatedly until cancelled."""
         while True:
             try:
                 await self.poll_once()
             except asyncio.CancelledError:
                 raise
+            except MemoryError, RecursionError:
+                raise
             except Exception as exc:
                 logger.exception(
                     QUOTA_POLL_FAILED,
                     error=type(exc).__name__,
                 )
             await asyncio.sleep(self._config.poll_interval_seconds)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/quota_poller.py` around lines 119 - 131, The current
_poll_loop swallows fatal errors by using a broad "except Exception" handler;
add an explicit guard to re-raise MemoryError and RecursionError before the
generic exception handler so these fatal system errors propagate (i.e., in async
def _poll_loop(): add an "except (MemoryError, RecursionError): raise" branch
just above the existing "except Exception as exc" that logs QUOTA_POLL_FAILED).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/budget/baseline_store.py`:
- Around line 84-91: The call that currently logs baseline mutations using
logger.debug should be changed to INFO: inside the record() method of
BaselineStore, replace the logger.debug(...) invocation that emits
COORD_METRICS_BASELINE_RECORDED (including agent_id=baseline.agent_id,
task_id=baseline.task_id, turns=baseline.turns, store_size=len(self._records),
window_size=self._window_size) with logger.info(...) so state transitions to
self._records are logged at INFO level per the guidelines.

In `@src/synthorg/budget/call_analytics.py`:
- Around line 204-232: The dispatch call in _dispatch_retry_rate_alert must
guard against swallowing system exceptions: wrap the await
dispatcher.dispatch(...) in a try/except Exception as exc block, and if
isinstance(exc, (MemoryError, RecursionError)) re-raise it immediately; for
other exceptions, log the error with the module logger (create a logger via
logging.getLogger(__name__) at the top of the module if one doesn't exist) and
allow the function to complete without re-raising. Ensure the guard specifically
targets MemoryError and RecursionError and leaves other exception handling
behavior unchanged.

In `@src/synthorg/engine/cost_recording.py`:
- Around line 82-83: The current assignment success=turn.success mislabels
outcomes as successful when turn.finish_reason is None; change the logic in the
record construction (the block that sets finish_reason=turn.finish_reason,
success=turn.success) to preserve an unknown/None success state when
finish_reason is None — e.g., set success to turn.success only if
turn.finish_reason is not None, otherwise set success to None (or an explicit
"unknown" sentinel) so unknown finish_reason does not imply success.

In `@src/synthorg/providers/models.py`:
- Around line 283-286: The CompletionResponse class docstring is missing the new
field provider_metadata; update the class docstring Attributes section to
document provider_metadata (type dict[str, object]) with a short description
like "Provider metadata injected by the base class (_synthorg_* keys)." so it
matches the existing Attributes pattern and appears alongside the other fields
for CompletionResponse.

In `@tests/unit/engine/test_cost_recording.py`:
- Around line 234-318: Refactor the repeated propagation tests
(test_latency_ms_propagated, test_cache_hit_propagated,
test_retry_count_propagated, test_retry_reason_propagated,
test_finish_reason_and_success_propagated,
test_none_analytics_fields_propagated_as_none) into a single parametrized pytest
test: use `@pytest.mark.parametrize` to supply (turn_factory_or_turn,
expected_field, expected_value) cases; for each case call record_execution_costs
with _result, _identity, "agent-1", "task-1" and _FakeTracker, then assert
tracker.records[0].<expected_field> == expected_value (for finish_reason/success
include appropriate TurnRecord with FinishReason). Reuse helpers
_turn_with_analytics, _turn, _result, record_execution_costs and _FakeTracker to
build inputs; keep individual expectation tuples for latency_ms, cache_hit,
retry_count, retry_reason, finish_reason, success, and None fields.

In `@tests/unit/engine/test_loop_protocol.py`:
- Around line 201-327: The tests repeat the same assertions for TurnRecord
analytics fields and FinishReason→success mapping; refactor by replacing the
repeated individual tests with pytest.mark.parametrize-based tests: for
analytics fields create a single parametrized test (referencing TurnRecord and
self._base()) that iterates tuples like ("latency_ms", None/250.0/0.0,
negative->expect ValidationError), ("cache_hit", None/True/False),
("retry_count", None/3/0, negative->expect ValidationError), ("retry_reason",
None/"RateLimitError") and assert field values or raise, and for success mapping
create a parametrized test over FinishReason values (FinishReason.STOP,
TOOL_USE, MAX_TOKENS -> True; FinishReason.ERROR, CONTENT_FILTER -> False) using
self._base(finish_reason=...) to assert the .success property; use clear param
ids and combine negative-validation cases into separate parametrized cases to
preserve semantics.

In `@tests/unit/providers/test_models.py`:
- Around line 470-475: Replace vendor-ambiguous model values used in
CompletionResponse instances with the repository’s standard test model IDs
(e.g., change model="test" to model="test-small-001"); update the occurrences
around the CompletionResponse creation (the variable resp) and the other similar
spots noted (around lines 483-488 and 501-512) so all fixtures use the canonical
test model IDs like test-small-001 or test-provider as appropriate to keep tests
vendor-agnostic.

---

Duplicate comments:
In `@src/synthorg/budget/quota_poller.py`:
- Around line 171-183: The current broad except in the _check_snapshot dispatch
block (around the call to _dispatch_quota_alert) swallows fatal exceptions like
MemoryError and RecursionError; change the handler to re-raise those fatal
exceptions while still logging other exceptions — e.g., catch the exception
object, if it is an instance of (MemoryError, RecursionError) re-raise it,
otherwise call logger.exception(QUOTA_POLL_FAILED,
error="quota_alert_dispatch_failed") as before; reference symbols:
_check_snapshot, _dispatch_quota_alert, QUOTA_POLL_FAILED, logger.exception.
- Around line 119-131: The current _poll_loop swallows fatal errors by using a
broad "except Exception" handler; add an explicit guard to re-raise MemoryError
and RecursionError before the generic exception handler so these fatal system
errors propagate (i.e., in async def _poll_loop(): add an "except (MemoryError,
RecursionError): raise" branch just above the existing "except Exception as exc"
that logs QUOTA_POLL_FAILED).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d491a02e-152a-4391-8995-3bbec6b827e6

📥 Commits

Reviewing files that changed from the base of the PR and between 1d87fa8 and e35a792.

📒 Files selected for processing (48)
  • CLAUDE.md
  • src/synthorg/budget/__init__.py
  • src/synthorg/budget/baseline_store.py
  • src/synthorg/budget/call_analytics.py
  • src/synthorg/budget/call_analytics_config.py
  • src/synthorg/budget/call_analytics_models.py
  • src/synthorg/budget/call_classifier.py
  • src/synthorg/budget/coordination_collector.py
  • src/synthorg/budget/coordination_config.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/budget/quota_poller.py
  • src/synthorg/budget/quota_poller_config.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/engine/cost_recording.py
  • src/synthorg/engine/hybrid_helpers.py
  • src/synthorg/engine/hybrid_loop.py
  • src/synthorg/engine/loop_helpers.py
  • src/synthorg/engine/loop_protocol.py
  • src/synthorg/engine/plan_execute_loop.py
  • src/synthorg/engine/react_loop.py
  • src/synthorg/observability/events/analytics.py
  • src/synthorg/observability/events/call_classification.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/quota.py
  • src/synthorg/providers/base.py
  • src/synthorg/providers/models.py
  • src/synthorg/providers/resilience/retry.py
  • tests/unit/budget/conftest.py
  • tests/unit/budget/test_baseline_store.py
  • tests/unit/budget/test_call_analytics.py
  • tests/unit/budget/test_call_analytics_config.py
  • tests/unit/budget/test_call_analytics_properties.py
  • tests/unit/budget/test_call_classifier.py
  • tests/unit/budget/test_call_classifier_properties.py
  • tests/unit/budget/test_coordination_collector.py
  • tests/unit/budget/test_coordination_collector_properties.py
  • tests/unit/budget/test_coordination_config.py
  • tests/unit/budget/test_cost_record.py
  • tests/unit/budget/test_quota_poller.py
  • tests/unit/budget/test_quota_poller_config.py
  • tests/unit/engine/test_cost_recording.py
  • tests/unit/engine/test_loop_helpers.py
  • tests/unit/engine/test_loop_protocol.py
  • tests/unit/observability/test_events.py
  • tests/unit/providers/resilience/test_retry.py
  • tests/unit/providers/test_base_provider.py
  • tests/unit/providers/test_models.py
  • tests/unit/providers/test_retry.py

Comment on lines +204 to +232
async def _dispatch_retry_rate_alert(
dispatcher: NotificationDispatcher,
*,
retry_rate: float,
warn_rate: float,
) -> None:
"""Dispatch a retry-rate warning notification.

Args:
dispatcher: Notification dispatcher.
retry_rate: Observed retry rate.
warn_rate: Configured warn threshold.
"""
from synthorg.notifications.models import ( # noqa: PLC0415
Notification,
NotificationCategory,
NotificationSeverity,
)

body = f"Retry rate {retry_rate:.1%} exceeds warning threshold {warn_rate:.1%}."
await dispatcher.dispatch(
Notification(
category=NotificationCategory.BUDGET,
severity=NotificationSeverity.WARNING,
title="High retry rate alert",
body=body,
source="budget.call_analytics",
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add system exception re-raise guards to _dispatch_retry_rate_alert.

The dispatcher.dispatch() call can raise exceptions. Per project convention, MemoryError and RecursionError must escape; the current implementation lacks these guards.

🛠️ Proposed fix
 async def _dispatch_retry_rate_alert(
     dispatcher: NotificationDispatcher,
     *,
     retry_rate: float,
     warn_rate: float,
 ) -> None:
     """Dispatch a retry-rate warning notification.

     Args:
         dispatcher: Notification dispatcher.
         retry_rate: Observed retry rate.
         warn_rate: Configured warn threshold.
     """
     from synthorg.notifications.models import (  # noqa: PLC0415
         Notification,
         NotificationCategory,
         NotificationSeverity,
     )

     body = f"Retry rate {retry_rate:.1%} exceeds warning threshold {warn_rate:.1%}."
-    await dispatcher.dispatch(
-        Notification(
-            category=NotificationCategory.BUDGET,
-            severity=NotificationSeverity.WARNING,
-            title="High retry rate alert",
-            body=body,
-            source="budget.call_analytics",
-        ),
-    )
+    try:
+        await dispatcher.dispatch(
+            Notification(
+                category=NotificationCategory.BUDGET,
+                severity=NotificationSeverity.WARNING,
+                title="High retry rate alert",
+                body=body,
+                source="budget.call_analytics",
+            ),
+        )
+    except MemoryError, RecursionError:
+        raise
+    except Exception:
+        logger.exception(
+            ANALYTICS_RETRY_RATE_ALERT,
+            error="alert_dispatch_failed",
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/call_analytics.py` around lines 204 - 232, The dispatch
call in _dispatch_retry_rate_alert must guard against swallowing system
exceptions: wrap the await dispatcher.dispatch(...) in a try/except Exception as
exc block, and if isinstance(exc, (MemoryError, RecursionError)) re-raise it
immediately; for other exceptions, log the error with the module logger (create
a logger via logging.getLogger(__name__) at the top of the module if one doesn't
exist) and allow the function to complete without re-raising. Ensure the guard
specifically targets MemoryError and RecursionError and leaves other exception
handling behavior unchanged.

Comment on lines +82 to +83
finish_reason=turn.finish_reason,
success=turn.success,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve unknown success state when finish_reason is missing.

On Line [83], success=turn.success turns finish_reason=None into True, which misclassifies unknown outcomes as successful.

Proposed fix
         record = CostRecord(
             agent_id=agent_id,
             task_id=task_id,
             provider=identity.model.provider,
             model=identity.model.model_id,
             input_tokens=turn.input_tokens,
             output_tokens=turn.output_tokens,
             cost_usd=turn.cost_usd,
             timestamp=datetime.now(UTC),
             call_category=turn.call_category,
             latency_ms=turn.latency_ms,
             cache_hit=turn.cache_hit,
             retry_count=turn.retry_count,
             retry_reason=turn.retry_reason,
             finish_reason=turn.finish_reason,
-            success=turn.success,
+            success=(
+                turn.success if turn.finish_reason is not None else None
+            ),
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/cost_recording.py` around lines 82 - 83, The current
assignment success=turn.success mislabels outcomes as successful when
turn.finish_reason is None; change the logic in the record construction (the
block that sets finish_reason=turn.finish_reason, success=turn.success) to
preserve an unknown/None success state when finish_reason is None — e.g., set
success to turn.success only if turn.finish_reason is not None, otherwise set
success to None (or an explicit "unknown" sentinel) so unknown finish_reason
does not imply success.

Comment on lines +234 to +318
async def test_latency_ms_propagated(self) -> None:
"""latency_ms from TurnRecord lands in CostRecord."""
turn = self._turn_with_analytics(latency_ms=250.5)
tracker = _FakeTracker()
await record_execution_costs(
_result((turn,)),
_identity(),
"agent-1",
"task-1",
tracker=tracker, # type: ignore[arg-type]
)
assert tracker.records[0].latency_ms == 250.5

async def test_cache_hit_propagated(self) -> None:
turn = self._turn_with_analytics(cache_hit=True)
tracker = _FakeTracker()
await record_execution_costs(
_result((turn,)),
_identity(),
"agent-1",
"task-1",
tracker=tracker, # type: ignore[arg-type]
)
assert tracker.records[0].cache_hit is True

async def test_retry_count_propagated(self) -> None:
turn = self._turn_with_analytics(retry_count=2)
tracker = _FakeTracker()
await record_execution_costs(
_result((turn,)),
_identity(),
"agent-1",
"task-1",
tracker=tracker, # type: ignore[arg-type]
)
assert tracker.records[0].retry_count == 2

async def test_retry_reason_propagated(self) -> None:
turn = self._turn_with_analytics(retry_count=2, retry_reason="RateLimitError")
tracker = _FakeTracker()
await record_execution_costs(
_result((turn,)),
_identity(),
"agent-1",
"task-1",
tracker=tracker, # type: ignore[arg-type]
)
assert tracker.records[0].retry_reason == "RateLimitError"

async def test_finish_reason_and_success_propagated(self) -> None:
from synthorg.providers.enums import FinishReason

turn = TurnRecord(
turn_number=1,
input_tokens=100,
output_tokens=50,
cost_usd=0.01,
finish_reason=FinishReason.ERROR,
)
tracker = _FakeTracker()
await record_execution_costs(
_result((turn,)),
_identity(),
"agent-1",
"task-1",
tracker=tracker, # type: ignore[arg-type]
)
assert tracker.records[0].finish_reason == FinishReason.ERROR
assert tracker.records[0].success is False

async def test_none_analytics_fields_propagated_as_none(self) -> None:
"""When analytics fields are None on TurnRecord, CostRecord gets None."""
turn = _turn()
tracker = _FakeTracker()
await record_execution_costs(
_result((turn,)),
_identity(),
"agent-1",
"task-1",
tracker=tracker, # type: ignore[arg-type]
)
assert tracker.records[0].latency_ms is None
assert tracker.records[0].cache_hit is None
assert tracker.records[0].retry_count is None
assert tracker.records[0].retry_reason is None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Parametrize the field-propagation assertions to remove duplication.

These tests are valuable, but the repeated arrange/act/assert shape is a good fit for parametrization.

As per coding guidelines: tests/**/*.py: Prefer @pytest.mark.parametrize for testing similar cases in Python tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_cost_recording.py` around lines 234 - 318, Refactor
the repeated propagation tests (test_latency_ms_propagated,
test_cache_hit_propagated, test_retry_count_propagated,
test_retry_reason_propagated, test_finish_reason_and_success_propagated,
test_none_analytics_fields_propagated_as_none) into a single parametrized pytest
test: use `@pytest.mark.parametrize` to supply (turn_factory_or_turn,
expected_field, expected_value) cases; for each case call record_execution_costs
with _result, _identity, "agent-1", "task-1" and _FakeTracker, then assert
tracker.records[0].<expected_field> == expected_value (for finish_reason/success
include appropriate TurnRecord with FinishReason). Reuse helpers
_turn_with_analytics, _turn, _result, record_execution_costs and _FakeTracker to
build inputs; keep individual expectation tuples for latency_ms, cache_hit,
retry_count, retry_reason, finish_reason, success, and None fields.

Comment on lines +201 to +327
def test_latency_ms_default_none(self) -> None:
assert self._base().latency_ms is None

def test_latency_ms_positive_accepted(self) -> None:
record = TurnRecord(
turn_number=1,
input_tokens=10,
output_tokens=5,
cost_usd=0.001,
finish_reason=FinishReason.STOP,
latency_ms=250.0,
)
assert record.latency_ms == 250.0

def test_latency_ms_zero_accepted(self) -> None:
record = TurnRecord(
turn_number=1,
input_tokens=10,
output_tokens=5,
cost_usd=0.001,
finish_reason=FinishReason.STOP,
latency_ms=0.0,
)
assert record.latency_ms == 0.0

def test_latency_ms_negative_rejected(self) -> None:
with pytest.raises(ValidationError):
TurnRecord(
turn_number=1,
input_tokens=10,
output_tokens=5,
cost_usd=0.001,
finish_reason=FinishReason.STOP,
latency_ms=-1.0,
)

def test_cache_hit_default_none(self) -> None:
assert self._base().cache_hit is None

def test_cache_hit_true(self) -> None:
record = TurnRecord(
turn_number=1,
input_tokens=10,
output_tokens=5,
cost_usd=0.001,
finish_reason=FinishReason.STOP,
cache_hit=True,
)
assert record.cache_hit is True

def test_cache_hit_false(self) -> None:
record = TurnRecord(
turn_number=1,
input_tokens=10,
output_tokens=5,
cost_usd=0.001,
finish_reason=FinishReason.STOP,
cache_hit=False,
)
assert record.cache_hit is False

def test_retry_count_default_none(self) -> None:
assert self._base().retry_count is None

def test_retry_count_positive_accepted(self) -> None:
record = TurnRecord(
turn_number=1,
input_tokens=10,
output_tokens=5,
cost_usd=0.001,
finish_reason=FinishReason.STOP,
retry_count=3,
)
assert record.retry_count == 3

def test_retry_count_zero_accepted(self) -> None:
record = TurnRecord(
turn_number=1,
input_tokens=10,
output_tokens=5,
cost_usd=0.001,
finish_reason=FinishReason.STOP,
retry_count=0,
)
assert record.retry_count == 0

def test_retry_count_negative_rejected(self) -> None:
with pytest.raises(ValidationError):
TurnRecord(
turn_number=1,
input_tokens=10,
output_tokens=5,
cost_usd=0.001,
finish_reason=FinishReason.STOP,
retry_count=-1,
)

def test_retry_reason_default_none(self) -> None:
assert self._base().retry_reason is None

def test_retry_reason_set(self) -> None:
record = TurnRecord(
turn_number=1,
input_tokens=10,
output_tokens=5,
cost_usd=0.001,
finish_reason=FinishReason.STOP,
retry_count=1,
retry_reason="RateLimitError",
)
assert record.retry_reason == "RateLimitError"

def test_success_true_for_stop(self) -> None:
assert self._base(finish_reason=FinishReason.STOP).success is True

def test_success_true_for_tool_use(self) -> None:
assert self._base(finish_reason=FinishReason.TOOL_USE).success is True

def test_success_true_for_max_tokens(self) -> None:
assert self._base(finish_reason=FinishReason.MAX_TOKENS).success is True

def test_success_false_for_error(self) -> None:
assert self._base(finish_reason=FinishReason.ERROR).success is False

def test_success_false_for_content_filter(self) -> None:
assert self._base(finish_reason=FinishReason.CONTENT_FILTER).success is False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Use parametrization for repeated analytics-field and success-mapping cases.

This block is correct but repetitive; converting the near-identical cases to @pytest.mark.parametrize will reduce test maintenance overhead.

Refactor sketch
-    def test_success_true_for_stop(self) -> None:
-        assert self._base(finish_reason=FinishReason.STOP).success is True
-
-    def test_success_true_for_tool_use(self) -> None:
-        assert self._base(finish_reason=FinishReason.TOOL_USE).success is True
-
-    def test_success_true_for_max_tokens(self) -> None:
-        assert self._base(finish_reason=FinishReason.MAX_TOKENS).success is True
-
-    def test_success_false_for_error(self) -> None:
-        assert self._base(finish_reason=FinishReason.ERROR).success is False
-
-    def test_success_false_for_content_filter(self) -> None:
-        assert self._base(finish_reason=FinishReason.CONTENT_FILTER).success is False
+    `@pytest.mark.parametrize`(
+        ("finish_reason", "expected"),
+        [
+            (FinishReason.STOP, True),
+            (FinishReason.TOOL_USE, True),
+            (FinishReason.MAX_TOKENS, True),
+            (FinishReason.ERROR, False),
+            (FinishReason.CONTENT_FILTER, False),
+        ],
+    )
+    def test_success_by_finish_reason(
+        self,
+        finish_reason: FinishReason,
+        expected: bool,
+    ) -> None:
+        assert self._base(finish_reason=finish_reason).success is expected

As per coding guidelines: tests/**/*.py: Prefer @pytest.mark.parametrize for testing similar cases in Python tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_loop_protocol.py` around lines 201 - 327, The tests
repeat the same assertions for TurnRecord analytics fields and
FinishReason→success mapping; refactor by replacing the repeated individual
tests with pytest.mark.parametrize-based tests: for analytics fields create a
single parametrized test (referencing TurnRecord and self._base()) that iterates
tuples like ("latency_ms", None/250.0/0.0, negative->expect ValidationError),
("cache_hit", None/True/False), ("retry_count", None/3/0, negative->expect
ValidationError), ("retry_reason", None/"RateLimitError") and assert field
values or raise, and for success mapping create a parametrized test over
FinishReason values (FinishReason.STOP, TOOL_USE, MAX_TOKENS -> True;
FinishReason.ERROR, CONTENT_FILTER -> False) using self._base(finish_reason=...)
to assert the .success property; use clear param ids and combine
negative-validation cases into separate parametrized cases to preserve
semantics.

- Baseline record logging promoted to INFO (state transition convention)
- MemoryError/RecursionError guards added to _dispatch_retry_rate_alert
- MemoryError/RecursionError guards added to _poll_loop and _check_snapshot
- CompletionResponse docstring documents provider_metadata field
- Test model values use test-small-001 (vendor-agnostic convention)
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview April 7, 2026 21:36 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit ec57641 into main Apr 7, 2026
32 of 34 checks passed
@Aureliolo Aureliolo deleted the feat/analytics-metrics-pipeline branch April 7, 2026 21:42
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview April 7, 2026 21:42 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Apr 7, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.6.4](v0.6.3...v0.6.4)
(2026-04-07)


### Features

* analytics and metrics runtime pipeline
([#226](#226),
[#225](#225),
[#227](#227),
[#224](#224))
([#1127](#1127))
([ec57641](ec57641))
* engine intelligence -- quality signals, health monitoring, trajectory
scoring, coordination metrics
([#1099](#1099))
([aac2029](aac2029)),
closes [#697](#697)
[#707](#707)
[#705](#705)
[#703](#703)
* enterprise-grade auth -- HttpOnly cookie sessions, CSRF, lockout,
session limits
([#1102](#1102))
([d3022c7](d3022c7)),
closes [#1068](#1068)
* implement core tool categories and granular sub-constraints
([#1101](#1101))
([0611b53](0611b53)),
closes [#1034](#1034)
[#220](#220)
* memory evolution -- GraphRAG/consistency research +
SelfEditingMemoryStrategy
([#1036](#1036),
[#208](#208))
([#1129](#1129))
([a9acda3](a9acda3))
* security hardening -- sandbox, risk override, SSRF self-heal, DAST fix
([#1100](#1100))
([31e7273](31e7273)),
closes [#1098](#1098)
[#696](#696)
[#222](#222)
[#671](#671)


### Bug Fixes

* harden agent identity versioning post-review
([#1128](#1128))
([8eb2859](8eb2859)),
closes [#1076](#1076)


### Documentation

* engine architecture research
([#688](#688)
[#690](#690)
[#848](#848)
[#687](#687))
([#1114](#1114))
([59b31f9](59b31f9))


### Maintenance

* add .claudeignore and split CLAUDE.md for token optimization
([#1112](#1112))
([b0fbd18](b0fbd18))
* bump github.com/sigstore/protobuf-specs from 0.5.0 to 0.5.1 in /cli in
the all group
([#1106](#1106))
([73089c9](73089c9))
* bump jsdom from 29.0.1 to 29.0.2 in /site in the all group
([#1107](#1107))
([8e99dce](8e99dce))
* bump jsdom from 29.0.1 to 29.0.2 in /web in the all group
([#1108](#1108))
([ce8c749](ce8c749))
* bump python from `fb83750` to `6869258` in /docker/backend in the all
group ([#1104](#1104))
([4911726](4911726))
* bump python from `fb83750` to `6869258` in /docker/web in the all
group ([#1103](#1103))
([87bdf09](87bdf09))
* bump the all group across 1 directory with 4 updates
([#1111](#1111))
([f702464](f702464))
* bump the all group in /docker/sandbox with 2 updates
([#1105](#1105))
([05a91ca](05a91ca))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants