feat: analytics and metrics runtime pipeline (#226, #225, #227, #224)#1127
feat: analytics and metrics runtime pipeline (#226, #225, #227, #224)#1127
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
🧰 Additional context used📓 Path-based instructions (4)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/synthorg/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.py⚙️ CodeRabbit configuration file
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (47)📓 Common learnings📚 Learning: 2026-04-07T20:53:54.421ZApplied to files:
📚 Learning: 2026-03-17T22:08:13.456ZApplied to files:
📚 Learning: 2026-03-17T18:52:05.142ZApplied to files:
📚 Learning: 2026-03-31T14:17:24.182ZApplied to files:
📚 Learning: 2026-03-20T08:28:32.845ZApplied to files:
📚 Learning: 2026-03-20T21:44:04.528ZApplied to files:
📚 Learning: 2026-03-20T08:28:32.845ZApplied to files:
📚 Learning: 2026-03-15T18:28:13.207ZApplied to files:
📚 Learning: 2026-04-01T15:36:39.993ZApplied to files:
📚 Learning: 2026-04-07T20:53:54.421ZApplied to files:
📚 Learning: 2026-04-01T15:36:39.993ZApplied to files:
📚 Learning: 2026-04-01T15:36:39.993ZApplied to files:
📚 Learning: 2026-03-15T18:28:13.207ZApplied to files:
📚 Learning: 2026-04-07T20:53:54.421ZApplied to files:
📚 Learning: 2026-03-19T11:33:01.580ZApplied to files:
📚 Learning: 2026-03-17T22:08:13.456ZApplied to files:
📚 Learning: 2026-03-19T07:13:44.964ZApplied to files:
📚 Learning: 2026-03-17T06:30:14.180ZApplied to files:
📚 Learning: 2026-03-15T18:28:13.207ZApplied to files:
📚 Learning: 2026-03-19T07:12:14.508ZApplied to files:
📚 Learning: 2026-03-14T15:43:05.601ZApplied to files:
📚 Learning: 2026-04-07T20:53:54.421ZApplied to files:
📚 Learning: 2026-03-17T22:08:13.456ZApplied to files:
📚 Learning: 2026-03-17T06:43:14.114ZApplied to files:
📚 Learning: 2026-03-16T07:22:28.134ZApplied to files:
📚 Learning: 2026-03-31T20:07:03.035ZApplied to files:
📚 Learning: 2026-03-14T16:18:57.267ZApplied to files:
📚 Learning: 2026-03-19T11:33:01.580ZApplied to files:
📚 Learning: 2026-03-14T16:18:57.267ZApplied to files:
📚 Learning: 2026-03-15T16:55:07.730ZApplied to files:
📚 Learning: 2026-03-15T18:38:44.202ZApplied to files:
📚 Learning: 2026-03-20T11:18:48.128ZApplied to files:
📚 Learning: 2026-04-07T20:53:54.421ZApplied to files:
📚 Learning: 2026-03-17T22:08:13.456ZApplied to files:
📚 Learning: 2026-03-14T16:18:57.267ZApplied to files:
📚 Learning: 2026-03-16T07:22:28.134ZApplied to files:
📚 Learning: 2026-03-14T15:43:05.601ZApplied to files:
📚 Learning: 2026-03-26T15:18:16.848ZApplied to files:
📚 Learning: 2026-03-19T21:11:37.538ZApplied to files:
📚 Learning: 2026-04-02T21:38:30.127ZApplied to files:
📚 Learning: 2026-03-31T21:07:37.470ZApplied to files:
📚 Learning: 2026-03-20T21:44:04.528ZApplied to files:
📚 Learning: 2026-04-07T20:53:54.421ZApplied to files:
📚 Learning: 2026-04-02T07:18:02.381ZApplied to files:
📚 Learning: 2026-03-16T07:22:28.134ZApplied to files:
📚 Learning: 2026-04-01T15:36:39.993ZApplied to files:
🔇 Additional comments (21)
WalkthroughAdds budget-domain telemetry and tooling: sliding-window baseline models ( |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive per-call analytics and coordination metrics collection system, including services for call classification, baseline tracking, and proactive quota polling. The infrastructure is updated to capture and propagate latency, cache hits, and retry metadata from the provider layer through to cost records. Review feedback identifies critical Python 3 syntax errors in exception handling and suggests improving metric accuracy by replacing hardcoded values with actual execution data, such as using turn latency for baseline durations and calculating real success rates for efficiency metrics.
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
The syntax except MemoryError, RecursionError: is invalid in Python 3 and will result in a SyntaxError at runtime. To catch multiple exceptions, they must be grouped in a tuple: except (MemoryError, RecursionError):. This error is repeated multiple times throughout this file (lines 328, 366, 406, 440, 460, 478, 512, 535, and 602).
except (MemoryError, RecursionError):
raise| except MemoryError, RecursionError: | ||
| raise |
| turns=max(float(turns), 1.0), | ||
| error_rate=error_rate, | ||
| total_tokens=float(total_tokens), | ||
| duration_seconds=0.0, # Not available from ExecutionResult |
There was a problem hiding this comment.
Hardcoding duration_seconds=0.0 for baseline records will prevent the TokenSpeedupRatio metric from being computed correctly, as the baseline will have a zero duration. Since TurnRecord now includes latency_ms, you can use it to provide a meaningful duration estimate for the baseline recording.
duration_seconds=sum(t.latency_ms or 0.0 for t in execution_result.turns) / 1000.0,| error_turns = sum( | ||
| 1 | ||
| for t in execution_result.turns | ||
| if t.finish_reason.value in ("error", "content_filter") |
There was a problem hiding this comment.
| return None | ||
| try: | ||
| result = compute_efficiency( | ||
| success_rate=1.0, # Approx: no per-task success rate in ExecutionResult |
There was a problem hiding this comment.
There was a problem hiding this comment.
Pull request overview
Implements the runtime analytics/metrics pipeline: coordination metrics collection (with baselines + alerting), LLM call categorization, per-call metadata propagation/aggregation, and proactive quota polling/alerts.
Changes:
- Added coordination metrics runtime collector + baseline store, wired into
AgentEnginepost-execution pipeline. - Added per-call metadata capture (latency/retry/cache/finish_reason/success) from provider → turn records → cost records, plus aggregation + retry-rate alerting.
- Added background quota polling service with cooldown-based alert suppression and expanded observability events.
Reviewed changes
Copilot reviewed 44 out of 44 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/providers/test_retry.py | Tests for RetryHandler state exposure/reset behavior. |
| tests/unit/providers/test_models.py | Tests CompletionResponse.provider_metadata defaults/behavior. |
| tests/unit/providers/test_base_provider.py | Tests _synthorg_* metadata injection from base provider. |
| tests/unit/observability/test_events.py | Ensures new events module is discovered. |
| tests/unit/engine/test_loop_protocol.py | Tests new TurnRecord analytics fields + computed success. |
| tests/unit/engine/test_loop_helpers.py | Tests extracting analytics fields from provider_metadata in make_turn_record(). |
| tests/unit/engine/test_cost_recording.py | Tests propagation of analytics fields from TurnRecord → CostRecord. |
| tests/unit/budget/test_quota_poller.py | Tests quota poller lifecycle, thresholds, cooldown, resilience. |
| tests/unit/budget/test_quota_poller_config.py | Tests validation/defaults for quota poller config models. |
| tests/unit/budget/test_cost_record.py | Tests new analytics fields and JSON roundtrip on CostRecord. |
| tests/unit/budget/test_coordination_collector.py | Tests coordination metrics collection paths, selective collection, alerts, isolation. |
| tests/unit/budget/test_coordination_collector_properties.py | Property-based invariants for coordination collector. |
| tests/unit/budget/test_call_classifier.py | Tests rules-based call classification and context validation. |
| tests/unit/budget/test_call_classifier_properties.py | Property-based invariants for call classification. |
| tests/unit/budget/test_call_analytics.py | Tests aggregation correctness + retry alert dispatch behavior. |
| tests/unit/budget/test_call_analytics_properties.py | Property-based invariants for analytics aggregation. |
| tests/unit/budget/test_call_analytics_config.py | Tests validation/defaults for call analytics config models. |
| tests/unit/budget/test_baseline_store.py | Tests sliding-window baseline store behavior/validation. |
| tests/unit/budget/conftest.py | Adjusts factory defaults for retry field consistency. |
| src/synthorg/providers/resilience/retry.py | Adds RetryHandler last attempt count + retry reason state/reset. |
| src/synthorg/providers/models.py | Adds CompletionResponse.provider_metadata field. |
| src/synthorg/providers/base.py | Measures latency and injects _synthorg_* metadata into responses. |
| src/synthorg/observability/events/quota.py | Adds quota poller event constants. |
| src/synthorg/observability/events/coordination_metrics.py | Adds runtime coordination metric pipeline event constants. |
| src/synthorg/observability/events/call_classification.py | Adds call classification event constants. |
| src/synthorg/observability/events/analytics.py | Adds per-call analytics event constants. |
| src/synthorg/engine/react_loop.py | Passes provider_metadata into make_turn_record(). |
| src/synthorg/engine/plan_execute_loop.py | Passes provider_metadata into make_turn_record() (planner + step turns). |
| src/synthorg/engine/loop_protocol.py | Extends TurnRecord with analytics fields + computed success. |
| src/synthorg/engine/loop_helpers.py | Extracts _synthorg_* values from provider_metadata when building TurnRecord. |
| src/synthorg/engine/hybrid_loop.py | Passes provider_metadata into make_turn_record(). |
| src/synthorg/engine/hybrid_helpers.py | Passes provider_metadata into make_turn_record() for hybrid helpers. |
| src/synthorg/engine/cost_recording.py | Propagates new per-call fields into CostRecord. |
| src/synthorg/engine/agent_engine.py | Adds best-effort coordination metrics collection to post-execution pipeline. |
| src/synthorg/budget/quota_poller.py | New background quota poller with cooldown + alert dispatch. |
| src/synthorg/budget/quota_poller_config.py | New quota poller config + alert thresholds models. |
| src/synthorg/budget/cost_record.py | Adds per-call analytics fields + retry consistency validator. |
| src/synthorg/budget/coordination_collector.py | New runtime coordination metrics collector + alerting. |
| src/synthorg/budget/call_classifier.py | New stateless rules-based call categorization service. |
| src/synthorg/budget/call_analytics.py | New call analytics aggregation + retry-rate alerting service. |
| src/synthorg/budget/call_analytics_models.py | New AnalyticsAggregation model with invariants. |
| src/synthorg/budget/call_analytics_config.py | New config models for call analytics + retry alerts. |
| src/synthorg/budget/baseline_store.py | New sliding-window baseline store for SAS baselines. |
| src/synthorg/budget/init.py | Exports newly added budget/analytics/quota APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/synthorg/providers/base.py
Outdated
| self._retry_handler.last_retry_reason | ||
| ) | ||
|
|
||
| result = result.model_copy(update={"provider_metadata": metadata}) |
There was a problem hiding this comment.
complete() overwrites any existing CompletionResponse.provider_metadata by replacing it with a new dict containing only _synthorg_* keys. If a provider implementation populates provider_metadata with its own fields (e.g., request IDs, cache/debug info), those will be lost. Consider merging into the existing dict (copy + update) rather than replacing it, and avoid clobbering keys unless intended.
| result = result.model_copy(update={"provider_metadata": metadata}) | |
| merged_metadata = dict(result.provider_metadata or {}) | |
| merged_metadata.update(metadata) | |
| result = result.model_copy(update={"provider_metadata": merged_metadata}) |
src/synthorg/engine/loop_helpers.py
Outdated
| @@ -476,6 +493,14 @@ def make_turn_record( | |||
| tool_call_fingerprints=compute_fingerprints(response.tool_calls), | |||
| finish_reason=response.finish_reason, | |||
| call_category=call_category, | |||
| latency_ms=float(cast("float", latency_ms_raw)) | |||
| if latency_ms_raw is not None | |||
| else None, | |||
| cache_hit=bool(cache_hit_raw) if cache_hit_raw is not None else None, | |||
| retry_count=int(cast("int", retry_count_raw)) | |||
| if retry_count_raw is not None | |||
| else None, | |||
| retry_reason=str(retry_reason_raw) if retry_reason_raw is not None else None, | |||
There was a problem hiding this comment.
make_turn_record() coerces values from provider_metadata using bool(...), int(...), and float(...). This can silently misinterpret values (e.g., the string 'False' becomes True) or raise at runtime if a provider supplies an unexpected type, which would break the execution loop when building TurnRecord. Consider validating types explicitly (e.g., isinstance(x, (int, float)), isinstance(x, bool), etc.) and ignoring/ logging invalid metadata instead of coercing blindly.
| # For single-agent runs: record baseline, return empty metrics. | ||
| if not is_multi_agent and self._baseline_store is not None: | ||
| from synthorg.budget.baseline_store import BaselineRecord # noqa: PLC0415 | ||
|
|
||
| baseline = BaselineRecord( | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| turns=max(float(turns), 1.0), | ||
| error_rate=error_rate, | ||
| total_tokens=float(total_tokens), | ||
| duration_seconds=0.0, # Not available from ExecutionResult | ||
| ) | ||
| self._baseline_store.record(baseline) | ||
| logger.debug( | ||
| COORD_METRICS_COLLECTION_COMPLETED, | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| is_multi_agent=False, | ||
| metrics_computed=0, | ||
| ) | ||
| return CoordinationMetrics() | ||
|
|
||
| # Multi-agent run: compute all enabled metrics. | ||
| efficiency = await self._try_collect_efficiency(turns) | ||
| overhead = await self._try_collect_overhead(turns) | ||
| error_amplification = await self._try_collect_error_amplification(error_rate) | ||
| message_density = await self._try_collect_message_density(turns) | ||
| redundancy_rate = await self._try_collect_redundancy(agent_outputs) |
There was a problem hiding this comment.
collect() only returns early for single-agent runs when baseline_store is provided. If is_multi_agent=False and baseline_store is None, the method falls through into the “Multi-agent run” block and may compute metrics (e.g., message density / amdahl ceiling) for a single-agent execution, which contradicts the docstring/spec (“single-agent runs … return empty metrics”). Suggest changing the control flow to return CoordinationMetrics() whenever is_multi_agent is false (recording a baseline only if a store is configured).
| async def start(self) -> None: | ||
| """Start the background polling loop. | ||
|
|
||
| Creates an ``asyncio.Task`` that calls :meth:`poll_once` | ||
| repeatedly at the configured interval. | ||
| """ | ||
| self._task = asyncio.get_running_loop().create_task( | ||
| self._poll_loop(), | ||
| name="quota-poller", | ||
| ) | ||
| logger.info( | ||
| QUOTA_POLLER_STARTED, | ||
| interval=self._config.poll_interval_seconds, | ||
| ) |
There was a problem hiding this comment.
QuotaPoller.start() always creates a new background task and overwrites self._task without checking whether the poller is already running. Calling start() twice will leak the original task and create multiple polling loops. Consider guarding with if self._task is not None and not self._task.done(): return/raise, and (optionally) no-op when config.enabled is false to enforce the opt-in behavior.
src/synthorg/budget/quota_poller.py
Outdated
| await _dispatch_quota_alert( | ||
| self._dispatcher, | ||
| snap=snap, | ||
| level=level, | ||
| usage_pct=usage_pct, | ||
| ) |
There was a problem hiding this comment.
poll_once() is documented as “Individual poll failures are logged and do not propagate”, but _check_snapshot() awaits _dispatch_quota_alert() without any exception handling. If NotificationDispatcher.dispatch() raises, the exception will bubble up and can terminate the background _poll_loop() task. Wrap the dispatch call (or the body of _check_snapshot) in a broad try/except that logs and returns, similar to the tracker error handling.
| await _dispatch_quota_alert( | |
| self._dispatcher, | |
| snap=snap, | |
| level=level, | |
| usage_pct=usage_pct, | |
| ) | |
| try: | |
| await _dispatch_quota_alert( | |
| self._dispatcher, | |
| snap=snap, | |
| level=level, | |
| usage_pct=usage_pct, | |
| ) | |
| except Exception: | |
| logger.exception( | |
| "Quota alert dispatch failed", | |
| provider=snap.provider_name, | |
| window=snap.window.value, | |
| level=level, | |
| ) | |
| return |
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
except MemoryError, RecursionError: is invalid syntax in Python 3 (multiple exception types must be parenthesized). This will prevent the module from importing. Use except (MemoryError, RecursionError): instead (and apply consistently for other multi-exception handlers).
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| try: | ||
| result = compute_efficiency( | ||
| success_rate=1.0, # Approx: no per-task success rate in ExecutionResult | ||
| turns_mas=max(float(turns_mas), 1.0), | ||
| turns_sas=turns_sas, | ||
| ) | ||
| logger.info( | ||
| COORD_METRICS_EFFICIENCY_COMPUTED, | ||
| value=result.value, | ||
| turns_mas=turns_mas, | ||
| turns_sas=turns_sas, | ||
| ) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception as exc: | ||
| logger.warning( | ||
| COORD_METRICS_COLLECTION_FAILED, | ||
| metric="efficiency", |
There was a problem hiding this comment.
This file uses except MemoryError, RecursionError: in multiple places, which is invalid syntax in Python 3 (multiple exception types must be parenthesized). Replace with except (MemoryError, RecursionError): to avoid import-time syntax errors.
| """Fire notifications for coordination overhead threshold crossings.""" | ||
| if self._notification_dispatcher is None: | ||
| return | ||
|
|
||
| overhead = metrics.overhead | ||
| if overhead is None: | ||
| return |
There was a problem hiding this comment.
Collector docstrings say “When notification_dispatcher is None, alerts are logged but not dispatched”, but _fire_alerts() returns immediately when the dispatcher is None, so no alert event is logged at all. Either adjust the docs or log COORD_METRICS_ALERT_FIRED (or a dedicated event) even when no dispatcher is configured.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1127 +/- ##
==========================================
- Coverage 89.07% 89.02% -0.06%
==========================================
Files 825 834 +9
Lines 48164 48719 +555
Branches 4845 4908 +63
==========================================
+ Hits 42904 43371 +467
- Misses 4358 4432 +74
- Partials 902 916 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/providers/resilience/retry.py (1)
39-72:⚠️ Potential issue | 🟠 MajorRace condition on shared mutable state in concurrent scenarios.
last_attempt_countandlast_retry_reasonare instance attributes on a sharedRetryHandler. Persrc/synthorg/providers/base.py:62-68, a singleRetryHandleris reused across all concurrentcomplete()calls on the same provider instance.If two coroutines invoke
complete()concurrently:
- Coroutine A finishes
execute(), returns- Coroutine B starts
execute(), resetslast_attempt_count = 0(line 60)- Coroutine A reads
self._retry_handler.last_attempt_countinbase.py(line 130) — gets 0 instead of A's actual attempt countThis produces incorrect telemetry under concurrent load.
🔧 Suggested fix: return retry info from execute() instead of storing on self
+from dataclasses import dataclass + +@dataclass(frozen=True, slots=True) +class RetryResult(Generic[T]): + """Result of a retry-wrapped execution.""" + value: T + attempt_count: int + retry_reason: str | None + class RetryHandler: - def __init__(self, config: RetryConfig) -> None: - self._config = config - self.last_attempt_count: int = 0 - self.last_retry_reason: str | None = None + def __init__(self, config: RetryConfig) -> None: + self._config = config async def execute( self, func: Callable[[], Coroutine[object, object, T]], - ) -> T: + ) -> RetryResult[T]: ... - self.last_attempt_count = 0 - self.last_retry_reason = None + attempt_count = 0 + retry_reason: str | None = None ... for attempt in range(1 + self._config.max_retries): - self.last_attempt_count = attempt + 1 + attempt_count = attempt + 1 try: return await func() + return RetryResult(await func(), attempt_count, retry_reason) except ProviderError as exc: ... - self.last_retry_reason = type(exc).__name__ + retry_reason = type(exc).__name__Then update
base.pyto destructure the result.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/providers/resilience/retry.py` around lines 39 - 72, The instance attributes last_attempt_count and last_retry_reason on the RetryHandler are mutated inside execute(), causing race conditions when the same handler is reused concurrently; change execute (in src/synthorg/providers/resilience/retry.py) to return both the function result and a small immutable retry info object/tuple (e.g., (result, attempt_count, retry_reason)) instead of writing to self.last_attempt_count / self.last_retry_reason, and update the caller in base.py (complete()) to destructure that returned tuple and use the returned attempt_count and retry_reason for telemetry; remove or stop relying on the mutable instance fields to eliminate the race.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/budget/call_analytics.py`:
- Around line 52-93: The provider parameter on get_aggregation is currently
typed as str | None but must match CostTracker.get_records which expects
NotBlankStr | None; update the signature of get_aggregation to use NotBlankStr |
None for provider, add the corresponding NotBlankStr import at the top of the
module, and ensure any internal calls (the call to self._tracker.get_records)
pass the provider unchanged so the types align with CostTracker.get_records.
In `@src/synthorg/budget/call_classifier.py`:
- Around line 56-63: The fields tool_calls_made and agent_role allow
whitespace-only identifiers; change their types to use NotBlankStr from
core.types so identifier-like metadata cannot be blank. Replace tool_calls_made:
tuple[str, ...] with tuple[NotBlankStr, ...] (keeping default=() and
description) and change agent_role: str | None to agent_role: NotBlankStr | None
(keeping default=None and description); keep using Field for metadata and import
NotBlankStr from core.types if not already imported.
In `@src/synthorg/budget/coordination_collector.py`:
- Around line 451-543: The derived-metrics helpers (_try_collect_amdahl,
_try_collect_straggler_gap, _try_collect_token_speedup,
_try_collect_message_overhead) are being invoked unconditionally from collect()
even when config.collect disables those metrics; update the call sites in
collect() (or add checks inside each helper) to consult the same enablement flag
used elsewhere (_is_enabled or config.collect) before computing/logging each
metric (use the metric keys "amdahl_ceiling", "straggler_gap",
"token_speedup_ratio", "message_overhead"); ensure you short-circuit and return
None when the metric is disabled so expensive compute_* functions are not
executed.
- Around line 198-205: The BaselineRecord being created with
duration_seconds=0.0 makes it unusable for token_speedup calculations; modify
the code that constructs BaselineRecord (the BaselineRecord(...) block) to only
create/save a baseline when the actual duration is > 0 (e.g., check
execution_result.duration_seconds or equivalent and skip creating the
BaselineRecord if <= 0), so downstream _try_collect_token_speedup can compute
token_speedup_ratio correctly; do not record zero-duration baselines.
- Around line 139-267: The collect() method is too large; split it into clear
phases and move blocks into small helpers: extract run stats (turns,
error_turns/error_rate, total_tokens) into a new
_extract_run_stats(execution_result) helper, move the single-agent baseline
capture into _handle_single_agent_baseline(baseline_store, agent_id, task_id,
turns, error_rate, total_tokens) (which records BaselineRecord and returns
early), orchestrate metric collection calls into _compute_metrics_phase(...)
that calls the existing _try_collect_* helpers and returns the raw metric
values, and move aggregation/logging/alerting into
_aggregate_and_finalize(metrics, agent_id, task_id, is_multi_agent) which builds
CoordinationMetrics, logs COORD_METRICS_COLLECTION_COMPLETED and calls
_fire_alerts; update collect() to call these helpers in sequence and keep it
under 50 lines. Ensure you reference and use existing symbols: collect,
_try_collect_efficiency, _try_collect_overhead,
_try_collect_error_amplification, _try_collect_message_density,
_try_collect_redundancy, _try_collect_amdahl, _try_collect_straggler_gap,
_try_collect_token_speedup, _try_collect_message_overhead, BaselineRecord,
_baseline_store.record, CoordinationMetrics, and _fire_alerts.
- Around line 145-148: The collect() flow incorrectly infers multi/single-agent
mode leading to misrouting; make execution mode explicit and short-circuit
single-agent runs: update collect() in coordination_collector.py to accept and
honor an explicit is_multi_agent boolean (and perform an early return when False
regardless of _baseline_store), ensure any multi-agent branching only runs when
is_multi_agent is True and _baseline_store exists, and update AgentEngine
(engine/agent_engine.py) call sites that invoke collect() to pass the correct
is_multi_agent flag (true for multi-agent runs, false for single-agent) — apply
the same explicit-flag change to the other collect overload/usage region around
the second occurrence (the section referenced as 195-215) so both paths behave
consistently.
In `@src/synthorg/budget/quota_poller.py`:
- Around line 61-74: The start() method currently overwrites self._task allowing
multiple poll loops; make start() idempotent or fail-fast by checking the
existing self._task before creating a new task: if self._task exists and not
self._task.done() either return immediately (idempotent behavior) or raise a
clear exception (fail-fast) so a second start() cannot spawn a second
_poll_loop; update start() to reference self._task and _poll_loop, and ensure
stop() still cancels the active task as implemented.
- Around line 112-116: The poll loop must not die on unexpected exceptions from
poll_once/_check_snapshot/_dispatch_quota_alert; wrap the await self.poll_once()
call in a try/except that catches Exception, re-raises asyncio.CancelledError,
logs the full exception (e.g., self._logger.exception or similar) and continues
the loop so future polling still runs; update _poll_loop to perform this guarded
call while preserving cancellation semantics.
In `@src/synthorg/engine/agent_engine.py`:
- Around line 877-881: CoordinationResult returned by
MultiAgentCoordinator.coordinate() isn't passed into the metrics collector, so
multi-agent runs never call CoordinationMetricsCollector.collect(); update
AgentEngine.coordinate() (or the controller that invokes it) to, after awaiting
MultiAgentCoordinator.coordinate(), extract per-agent info from
CoordinationResult.waves (e.g., agent durations and outputs), compute team_size
and total duration/cost as needed, and call
self._coordination_metrics_collector.collect(...) with is_multi_agent=True,
team_size=..., agent_durations=..., agent_outputs=... (and any other fields used
by the single-agent call) to mirror the single-agent collection performed in
AgentEngine.run().
In `@tests/unit/budget/test_call_classifier_properties.py`:
- Around line 34-88: Add a property test to assert COORDINATION precedence over
SYSTEM by creating a Hypothesis build of ClassificationContext (use the existing
patterns in TestCallClassifierProperties) with is_embedding_operation set to
False, tool_calls_made as (), agent_role None, and booleans for
is_delegation/is_review/is_meeting/is_planning_phase/is_system_prompt/is_quality_judge,
then .filter(...) to require at least one of the coordination flags
(is_delegation or is_review or is_meeting) be True and assert classify_call(ctx)
== LLMCallCategory.COORDINATION; place this new test method alongside the other
tests in TestCallClassifierProperties using the same strategies for
turn_number/agent_id/task_id to match style.
In `@tests/unit/budget/test_coordination_collector_properties.py`:
- Around line 121-149: The test test_overhead_sign_consistent_with_turns is
flaky because n_turns = max(1, round(turns_mas)) is compared to the original
float turns_sas; change the check to compare the actual integer turns used
(n_turns) against the integer baseline used to build the store (e.g., use
round(turns_sas) or compute baseline_n_turns = max(1, round(turns_sas)) and
compare n_turns >= baseline_n_turns) so the assertion on
result.overhead.value_percent reflects the turns passed to collector.collect
rather than the original float turns_sas.
In `@tests/unit/budget/test_cost_record.py`:
- Around line 393-409: The tests are missing coverage that
`_validate_retry_consistency` on CostRecord rejects invalid combinations; add
unit tests that instantiate CostRecord with retry_reason set but retry_count
omitted (None) and with retry_count set to 0, and assert that both raise
pydantic.ValidationError (use pytest.raises and match a message like
"retry_reason set implies retry_count"); reference CostRecord and the validator
name `_validate_retry_consistency` in the test names (e.g.,
test_retry_reason_without_retry_count_rejected and
test_retry_reason_with_zero_retry_count_rejected) so failures map clearly to the
validation logic.
In `@tests/unit/budget/test_quota_poller.py`:
- Around line 218-243: The test uses a slightly non-obvious datetime mock
pattern in test_cooldown_expiry_allows_re_alert: mock_dt.now.side_effect is used
to control datetime.now() returns while mock_dt.side_effect = datetime lets real
datetime(...) constructor calls pass through; update the test to add a short
clarifying comment immediately above the two lines referencing
mock_dt.now.side_effect and mock_dt.side_effect so future readers understand
that mock_dt.now controls now() and mock_dt.side_effect preserves the real
datetime constructor behavior.
In `@tests/unit/providers/test_retry.py`:
- Around line 114-126: Update the test_non_retryable_error_count_one test to
also assert that handler.last_retry_reason is None after a non-retryable error;
locate the test function (test_non_retryable_error_count_one) and the
RetryHandler instance created there, and add an assertion like assert
handler.last_retry_reason is None to confirm no retry reason was recorded when
InvalidRequestError is raised.
---
Outside diff comments:
In `@src/synthorg/providers/resilience/retry.py`:
- Around line 39-72: The instance attributes last_attempt_count and
last_retry_reason on the RetryHandler are mutated inside execute(), causing race
conditions when the same handler is reused concurrently; change execute (in
src/synthorg/providers/resilience/retry.py) to return both the function result
and a small immutable retry info object/tuple (e.g., (result, attempt_count,
retry_reason)) instead of writing to self.last_attempt_count /
self.last_retry_reason, and update the caller in base.py (complete()) to
destructure that returned tuple and use the returned attempt_count and
retry_reason for telemetry; remove or stop relying on the mutable instance
fields to eliminate the race.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0b7bd395-6806-4ab7-b6d6-10d3b90f7609
📒 Files selected for processing (44)
src/synthorg/budget/__init__.pysrc/synthorg/budget/baseline_store.pysrc/synthorg/budget/call_analytics.pysrc/synthorg/budget/call_analytics_config.pysrc/synthorg/budget/call_analytics_models.pysrc/synthorg/budget/call_classifier.pysrc/synthorg/budget/coordination_collector.pysrc/synthorg/budget/cost_record.pysrc/synthorg/budget/quota_poller.pysrc/synthorg/budget/quota_poller_config.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/cost_recording.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/loop_protocol.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/react_loop.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/quota.pysrc/synthorg/providers/base.pysrc/synthorg/providers/models.pysrc/synthorg/providers/resilience/retry.pytests/unit/budget/conftest.pytests/unit/budget/test_baseline_store.pytests/unit/budget/test_call_analytics.pytests/unit/budget/test_call_analytics_config.pytests/unit/budget/test_call_analytics_properties.pytests/unit/budget/test_call_classifier.pytests/unit/budget/test_call_classifier_properties.pytests/unit/budget/test_coordination_collector.pytests/unit/budget/test_coordination_collector_properties.pytests/unit/budget/test_cost_record.pytests/unit/budget/test_quota_poller.pytests/unit/budget/test_quota_poller_config.pytests/unit/engine/test_cost_recording.pytests/unit/engine/test_loop_helpers.pytests/unit/engine/test_loop_protocol.pytests/unit/observability/test_events.pytests/unit/providers/test_base_provider.pytests/unit/providers/test_models.pytests/unit/providers/test_retry.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Agent
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Useexcept A, B:(no parentheses) instead ofexcept (A, B):for multiple exceptions—PEP 758 except syntax enforced by ruff on Python 3.14.
All public functions must have type hints and Google-style docstrings (enforced by ruff D rules). mypy strict mode is required.
Line length must be 88 characters (enforced by ruff).
Files:
src/synthorg/engine/hybrid_loop.pytests/unit/budget/conftest.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/react_loop.pysrc/synthorg/providers/models.pysrc/synthorg/engine/cost_recording.pysrc/synthorg/engine/plan_execute_loop.pytests/unit/observability/test_events.pytests/unit/providers/test_models.pysrc/synthorg/providers/resilience/retry.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/providers/base.pytests/unit/engine/test_loop_protocol.pysrc/synthorg/engine/loop_protocol.pytests/unit/budget/test_call_analytics_properties.pysrc/synthorg/engine/loop_helpers.pytests/unit/budget/test_cost_record.pytests/unit/engine/test_cost_recording.pytests/unit/providers/test_retry.pytests/unit/budget/test_call_analytics_config.pytests/unit/engine/test_loop_helpers.pytests/unit/budget/test_call_classifier_properties.pytests/unit/budget/test_coordination_collector_properties.pytests/unit/budget/test_call_analytics.pysrc/synthorg/budget/__init__.pysrc/synthorg/budget/cost_record.pytests/unit/budget/test_call_classifier.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/engine/agent_engine.pytests/unit/budget/test_baseline_store.pytests/unit/budget/test_coordination_collector.pytests/unit/budget/test_quota_poller_config.pytests/unit/providers/test_base_provider.pysrc/synthorg/budget/call_analytics_config.pysrc/synthorg/observability/events/quota.pysrc/synthorg/budget/call_analytics_models.pytests/unit/budget/test_quota_poller.pysrc/synthorg/budget/call_analytics.pysrc/synthorg/budget/call_classifier.pysrc/synthorg/budget/quota_poller_config.pysrc/synthorg/budget/quota_poller.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/budget/baseline_store.pysrc/synthorg/budget/coordination_collector.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Prefer creating new objects over mutating existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Useallow_inf_nan=Falsein allConfigDictdeclarations to rejectNaN/Infat validation time.
Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens).
UseNotBlankStr(fromcore.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Functions must be < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Every module with business logic MUST have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code. Exception:observability/setup.py,observability/sinks.py,observability/syslog_handler.py, andobservability/http_handler.pymay use stdlibloggingandprint(..., file=sys.stderr)for handler constructio...
Files:
src/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/react_loop.pysrc/synthorg/providers/models.pysrc/synthorg/engine/cost_recording.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/providers/resilience/retry.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/providers/base.pysrc/synthorg/engine/loop_protocol.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/budget/__init__.pysrc/synthorg/budget/cost_record.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/budget/call_analytics_config.pysrc/synthorg/observability/events/quota.pysrc/synthorg/budget/call_analytics_models.pysrc/synthorg/budget/call_analytics.pysrc/synthorg/budget/call_classifier.pysrc/synthorg/budget/quota_poller_config.pysrc/synthorg/budget/quota_poller.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/budget/baseline_store.pysrc/synthorg/budget/coordination_collector.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/react_loop.pysrc/synthorg/providers/models.pysrc/synthorg/engine/cost_recording.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/providers/resilience/retry.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/providers/base.pysrc/synthorg/engine/loop_protocol.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/budget/__init__.pysrc/synthorg/budget/cost_record.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/budget/call_analytics_config.pysrc/synthorg/observability/events/quota.pysrc/synthorg/budget/call_analytics_models.pysrc/synthorg/budget/call_analytics.pysrc/synthorg/budget/call_classifier.pysrc/synthorg/budget/quota_poller_config.pysrc/synthorg/budget/quota_poller.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/budget/baseline_store.pysrc/synthorg/budget/coordination_collector.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Useasyncio_mode = "auto"for async tests—no manual@pytest.mark.asyncioneeded.
Global test timeout is 30 seconds per test (configured inpyproject.toml). Do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed.
Prefer@pytest.mark.parametrizefor testing similar cases.
In tests, usetest-provider,test-small-001, etc. instead of real vendor names.
Use Hypothesis for property-based testing:@given+@settings. Hypothesis profiles configured intests/conftest.py:ci(deterministic, 10 examples),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples, no deadline). Controlled viaHYPOTHESIS_PROFILEenv var.
Files:
tests/unit/budget/conftest.pytests/unit/observability/test_events.pytests/unit/providers/test_models.pytests/unit/engine/test_loop_protocol.pytests/unit/budget/test_call_analytics_properties.pytests/unit/budget/test_cost_record.pytests/unit/engine/test_cost_recording.pytests/unit/providers/test_retry.pytests/unit/budget/test_call_analytics_config.pytests/unit/engine/test_loop_helpers.pytests/unit/budget/test_call_classifier_properties.pytests/unit/budget/test_coordination_collector_properties.pytests/unit/budget/test_call_analytics.pytests/unit/budget/test_call_classifier.pytests/unit/budget/test_baseline_store.pytests/unit/budget/test_coordination_collector.pytests/unit/budget/test_quota_poller_config.pytests/unit/providers/test_base_provider.pytests/unit/budget/test_quota_poller.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/budget/conftest.pytests/unit/observability/test_events.pytests/unit/providers/test_models.pytests/unit/engine/test_loop_protocol.pytests/unit/budget/test_call_analytics_properties.pytests/unit/budget/test_cost_record.pytests/unit/engine/test_cost_recording.pytests/unit/providers/test_retry.pytests/unit/budget/test_call_analytics_config.pytests/unit/engine/test_loop_helpers.pytests/unit/budget/test_call_classifier_properties.pytests/unit/budget/test_coordination_collector_properties.pytests/unit/budget/test_call_analytics.pytests/unit/budget/test_call_classifier.pytests/unit/budget/test_baseline_store.pytests/unit/budget/test_coordination_collector.pytests/unit/budget/test_quota_poller_config.pytests/unit/providers/test_base_provider.pytests/unit/budget/test_quota_poller.py
src/synthorg/providers/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/providers/**/*.py: All provider calls go throughBaseCompletionProviderwhich applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider inProviderConfig.
Retryable errors (is_retryable=True):RateLimitError,ProviderTimeoutError,ProviderConnectionError,ProviderInternalError. Non-retryable errors raise immediately.RetryExhaustedErrorsignals all retries failed.
Rate limiter respectsRateLimitError.retry_afterfrom providers—automatically pauses future requests.
Files:
src/synthorg/providers/models.pysrc/synthorg/providers/resilience/retry.pysrc/synthorg/providers/base.py
🧠 Learnings (52)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.
Applied to files:
src/synthorg/engine/hybrid_loop.pysrc/synthorg/providers/models.pysrc/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.
Applied to files:
src/synthorg/providers/models.pysrc/synthorg/providers/base.pytests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.
Applied to files:
src/synthorg/providers/models.pysrc/synthorg/providers/base.pytests/unit/providers/test_base_provider.py
📚 Learning: 2026-04-07T16:48:04.262Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T16:48:04.262Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
Applied to files:
src/synthorg/providers/models.pysrc/synthorg/providers/base.pytests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-31T14:17:24.182Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:17:24.182Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically; never implement retry logic in driver subclasses or calling code
Applied to files:
src/synthorg/providers/models.pysrc/synthorg/providers/base.pytests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : All provider calls go through `BaseCompletionProvider` which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
Applied to files:
src/synthorg/providers/models.pysrc/synthorg/providers/base.pytests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Applied to files:
tests/unit/observability/test_events.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/quota.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/quota.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/quota.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/quota.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/call_classification.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/quota.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/quota.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/quota.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/quota.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed
Applied to files:
src/synthorg/providers/resilience/retry.pytests/unit/providers/test_retry.py
📚 Learning: 2026-04-07T16:48:04.262Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T16:48:04.262Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately. `RetryExhaustedError` signals all retries failed.
Applied to files:
src/synthorg/providers/resilience/retry.pytests/unit/providers/test_retry.pytests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.
Applied to files:
src/synthorg/providers/resilience/retry.pytests/unit/providers/test_retry.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.
Applied to files:
src/synthorg/providers/resilience/retry.pytests/unit/providers/test_retry.pysrc/synthorg/budget/call_analytics_config.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains. Rate limiter respects `RateLimitError.retry_after` from providers — automatically pauses future requests.
Applied to files:
src/synthorg/providers/resilience/retry.pytests/unit/providers/test_retry.py
📚 Learning: 2026-04-07T16:48:04.262Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T16:48:04.262Z
Learning: Applies to src/synthorg/providers/**/*.py : Rate limiter respects `RateLimitError.retry_after` from providers—automatically pauses future requests.
Applied to files:
src/synthorg/providers/resilience/retry.pytests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.
Applied to files:
src/synthorg/providers/resilience/retry.py
📚 Learning: 2026-03-31T14:17:24.182Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:17:24.182Z
Learning: Applies to src/synthorg/providers/**/*.py : Rate limiter respects `RateLimitError.retry_after` from providers — automatically pauses future requests
Applied to files:
src/synthorg/providers/resilience/retry.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/providers/**/*.py : Rate limiter respects RateLimitError.retry_after from providers — automatically pauses future requests.
Applied to files:
src/synthorg/providers/resilience/retry.py
📚 Learning: 2026-03-31T21:07:37.469Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.469Z
Learning: Applies to src/synthorg/providers/**/*.py : Set `RetryConfig` and `RateLimiterConfig` per-provider in `ProviderConfig`
Applied to files:
src/synthorg/providers/resilience/retry.pysrc/synthorg/engine/loop_helpers.pytests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging
Applied to files:
src/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/quota.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Never implement retry logic in provider subclasses or calling code — it is handled automatically by `BaseCompletionProvider` with `RetryConfig` and `RateLimiterConfig` per-provider
Applied to files:
src/synthorg/providers/base.pytests/unit/providers/test_base_provider.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Applied to files:
src/synthorg/engine/loop_protocol.pysrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Applied to files:
src/synthorg/engine/loop_protocol.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/budget/coordination_collector.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/budget/test_call_analytics_properties.pytests/unit/engine/test_loop_helpers.pytests/unit/budget/test_coordination_collector.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Set `RetryConfig` and `RateLimiterConfig` per-provider in `ProviderConfig`.
Applied to files:
src/synthorg/engine/loop_helpers.pytests/unit/providers/test_base_provider.pysrc/synthorg/budget/call_analytics_config.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/api/**/*.py : Authentication uses JWT + API key. Approval gate integration for high-risk operations.
Applied to files:
src/synthorg/engine/loop_helpers.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
Applied to files:
tests/unit/budget/test_call_classifier_properties.pytests/unit/budget/test_coordination_collector_properties.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
tests/unit/budget/test_call_analytics.pysrc/synthorg/budget/__init__.pysrc/synthorg/budget/cost_record.pysrc/synthorg/budget/call_analytics_config.pysrc/synthorg/budget/call_analytics_models.pysrc/synthorg/budget/call_analytics.pysrc/synthorg/budget/call_classifier.pysrc/synthorg/budget/quota_poller_config.pysrc/synthorg/budget/quota_poller.pysrc/synthorg/budget/baseline_store.pysrc/synthorg/budget/coordination_collector.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
src/synthorg/budget/__init__.pysrc/synthorg/budget/cost_record.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/budget/call_analytics_config.pysrc/synthorg/budget/call_analytics_models.pysrc/synthorg/budget/call_analytics.pysrc/synthorg/budget/call_classifier.pysrc/synthorg/budget/quota_poller_config.pysrc/synthorg/budget/quota_poller.pysrc/synthorg/budget/baseline_store.pysrc/synthorg/budget/coordination_collector.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
src/synthorg/budget/__init__.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/budget/call_classifier.pysrc/synthorg/budget/quota_poller.pysrc/synthorg/budget/baseline_store.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
Applied to files:
src/synthorg/budget/__init__.pysrc/synthorg/budget/call_analytics.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals
Applied to files:
src/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/quota.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.
Applied to files:
src/synthorg/engine/agent_engine.pysrc/synthorg/budget/coordination_collector.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction and wrap with `MappingProxyType` for read-only enforcement
Applied to files:
src/synthorg/engine/agent_engine.py
📚 Learning: 2026-04-07T16:48:04.262Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T16:48:04.262Z
Learning: Applies to src/synthorg/providers/**/*.py : RetryConfig and RateLimiterConfig are set per-provider in `ProviderConfig`.
Applied to files:
tests/unit/providers/test_base_provider.pysrc/synthorg/budget/call_analytics_config.py
📚 Learning: 2026-03-16T19:13:36.562Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:36.562Z
Learning: Applies to src/synthorg/providers/**/*.py : RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Applied to files:
tests/unit/providers/test_base_provider.pysrc/synthorg/budget/call_analytics_config.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves
Applied to files:
src/synthorg/budget/call_analytics_config.pysrc/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/budget/call_analytics_config.pysrc/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state
Applied to files:
src/synthorg/budget/call_analytics_config.pysrc/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.
Applied to files:
src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-04-01T09:37:49.451Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:37:49.451Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with `model_copy(update=...)` for runtime state that evolves
Applied to files:
src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`
Applied to files:
src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-04-07T16:48:04.262Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-07T16:48:04.262Z
Learning: Applies to src/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/budget/quota_poller_config.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/budget/quota_poller_config.py
tests/unit/providers/test_retry.py
Outdated
| async def test_non_retryable_error_count_one(self) -> None: | ||
| """Non-retryable errors raise immediately with count=1.""" | ||
| from synthorg.providers.errors import InvalidRequestError | ||
|
|
||
| handler = RetryHandler(_config(max_retries=3)) | ||
|
|
||
| async def _bad_request() -> str: | ||
| raise InvalidRequestError("bad") # noqa: EM101 | ||
|
|
||
| with pytest.raises(InvalidRequestError): | ||
| await handler.execute(_bad_request) | ||
|
|
||
| assert handler.last_attempt_count == 1 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider asserting last_retry_reason is None for non-retryable errors.
The test verifies last_attempt_count == 1 but doesn't assert the expected value of last_retry_reason. Based on the test's docstring ("Non-retryable errors raise immediately with count=1"), last_retry_reason should likely be None since no retry occurred.
💡 Suggested addition
with pytest.raises(InvalidRequestError):
await handler.execute(_bad_request)
assert handler.last_attempt_count == 1
+ assert handler.last_retry_reason is None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/providers/test_retry.py` around lines 114 - 126, Update the
test_non_retryable_error_count_one test to also assert that
handler.last_retry_reason is None after a non-retryable error; locate the test
function (test_non_retryable_error_count_one) and the RetryHandler instance
created there, and add an assertion like assert handler.last_retry_reason is
None to confirm no retry reason was recorded when InvalidRequestError is raised.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/budget/test_coordination_config.py (1)
19-27: 🧹 Nitpick | 🔵 TrivialConsider extending
test_valuesto cover new enum members.The
test_member_countassertion is correctly updated to 9. However,test_valuesonly asserts the original 5 members. For completeness, consider adding assertions for the 4 new members (AMDAHL_CEILING,STRAGGLER_GAP,TOKEN_SPEEDUP_RATIO,MESSAGE_OVERHEAD).🧪 Suggested additions to test_values
def test_values(self) -> None: assert CoordinationMetricName.EFFICIENCY.value == "efficiency" assert CoordinationMetricName.OVERHEAD.value == "overhead" assert CoordinationMetricName.ERROR_AMPLIFICATION.value == "error_amplification" assert CoordinationMetricName.MESSAGE_DENSITY.value == "message_density" assert CoordinationMetricName.REDUNDANCY.value == "redundancy" + assert CoordinationMetricName.AMDAHL_CEILING.value == "amdahl_ceiling" + assert CoordinationMetricName.STRAGGLER_GAP.value == "straggler_gap" + assert CoordinationMetricName.TOKEN_SPEEDUP_RATIO.value == "token_speedup_ratio" + assert CoordinationMetricName.MESSAGE_OVERHEAD.value == "message_overhead"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/budget/test_coordination_config.py` around lines 19 - 27, Update test_values to assert the string values for the four new enum members on CoordinationMetricName: add assertions that CoordinationMetricName.AMDAHL_CEILING.value == "amdahl_ceiling", CoordinationMetricName.STRAGGLER_GAP.value == "straggler_gap", CoordinationMetricName.TOKEN_SPEEDUP_RATIO.value == "token_speedup_ratio", and CoordinationMetricName.MESSAGE_OVERHEAD.value == "message_overhead"; keep the existing five assertions and leave test_member_count as-is (already expecting 9).
♻️ Duplicate comments (2)
src/synthorg/engine/agent_engine.py (1)
877-882:⚠️ Potential issue | 🔴 CriticalMulti-agent runs still never reach the coordination collector.
This hook always records a single-agent baseline (
is_multi_agent=False), and Line 475 still returnsself._coordinator.coordinate(context)directly. BecauseCoordinationMetricsCollector.collect()returns early for single-agent runs, multi-agent executions never compute Ec/O%/R/etc. Add a post-coordinate()collection step that extractsteam_size,agent_durations, andagent_outputsfromCoordinationResultand callscollect(..., is_multi_agent=True).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/agent_engine.py` around lines 877 - 882, The metrics collector call currently hardcodes is_multi_agent=False and returns the coordinator result directly, preventing multi-agent metrics from being recorded; after calling self._coordinator.coordinate(context) (which returns a CoordinationResult), extract CoordinationResult.team_size, CoordinationResult.agent_durations, and CoordinationResult.agent_outputs and then call self._coordination_metrics_collector.collect(...) again with execution_result set to that CoordinationResult (or relevant summary), agent_id/task_id as before, and is_multi_agent=True so multi-agent Ec/O%/R/etc. are computed and stored.src/synthorg/budget/coordination_collector.py (1)
255-265:⚠️ Potential issue | 🟠 MajorDon't persist synthetic baselines from empty or latency-derived runs.
Line 261 turns a 0-turn execution into
turns=1.0, and Lines 255-257 deriveduration_secondsfrom summed model latency only. Timeout/provider-error paths can return anExecutionResultwith no turns, so this stores a fake1 turn / 0sSAS baseline and skews overhead, error amplification, and token-speedup. Skip baseline capture when no turns were recorded, and pass the actual run duration into the collector instead of inferring it from per-turn latency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/budget/coordination_collector.py` around lines 255 - 265, The code currently fakes a 1-turn baseline and infers duration from per-turn latency; change the capture so it skips persisting a BaselineRecord when execution_result.turns is empty, and use the actual run duration passed into the collector instead of summing per-turn latency. Concretely: in the function that builds BaselineRecord (references: execution_result.turns, duration_seconds, BaselineRecord, and the local turns variable), add a guard that returns/does not create a baseline if not execution_result.turns, remove the max(float(turns), 1.0) coercion (use float(turns) as-is), and replace the derived duration_seconds calculation with a provided run_duration_seconds parameter (or an existing real run-duration variable) when constructing BaselineRecord.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/budget/call_classifier.py`:
- Around line 14-20: The module is missing the required module-level logger;
import get_logger from synthorg.observability and add a module logger variable
by assigning logger = get_logger(__name__) at top-level (near the other imports)
so the module-level logging convention is satisfied for
src/synthorg/budget/call_classifier.py and the logger symbol is available to any
functions/classes in that module.
In `@src/synthorg/budget/coordination_collector.py`:
- Around line 670-689: Move the logger.warning call that emits
COORD_METRICS_ALERT_FIRED so it runs only after
self._notification_dispatcher.dispatch(...) completes successfully; currently
logger.warning(...) is executed before awaiting dispatch, which records a fired
alert even if dispatch fails. Locate the block using logger.warning(...,
COORD_METRICS_ALERT_FIRED, agent_id=agent_id, ...) and the await
self._notification_dispatcher.dispatch(Notification(...)) call and change
ordering so the dispatch is awaited first, and then call logger.warning after
the await (keeping the existing exception handling semantics and re-raising
MemoryError/RecursionError as currently done).
In `@src/synthorg/budget/quota_poller.py`:
- Around line 100-107: Add explicit re-raise guards for fatal system exceptions
before each broad "except Exception" handler in this module: where you call
self._tracker.get_all_snapshots() (the block that logs QUOTA_POLL_FAILED), the
subsequent snapshot processing block (the other broad except around snapshot
handling), and the budget update/error handling block later in the poller;
insert "except (MemoryError, RecursionError): raise" immediately before the
existing "except Exception as exc:" lines so MemoryError and RecursionError
propagate, leaving the existing logger.exception(...) and return behavior for
non-fatal exceptions intact.
- Around line 145-167: The cooldown logic uses wall-clock datetime.now(UTC) and
should use a monotonic clock to avoid clock-skew issues: change the code in the
quota poller that reads/writes self._cooldown keyed by _CooldownKey (currently
using datetime.now(UTC)) to store and compare time.monotonic() values instead,
compute elapsed = time.monotonic() - last and compare against
self._config.cooldown_seconds, and when setting the cooldown (after logging
QUOTA_THRESHOLD_ALERT) assign time.monotonic() to self._cooldown[key]; leave
logger calls and keys (snap.provider_name, snap.window, level,
QUOTA_ALERT_COOLDOWN_ACTIVE, QUOTA_THRESHOLD_ALERT) intact.
In `@src/synthorg/engine/loop_protocol.py`:
- Around line 97-100: The retry_reason field currently typed as str | None
should use NotBlankStr | None per guidelines; update the type annotation for
retry_reason in loop_protocol.py to NotBlankStr | None, import NotBlankStr from
core.types, and keep the Field(...) metadata unchanged so identifier/name
validation is enforced by the NotBlankStr type instead of relying on manual
validators.
In `@tests/unit/budget/test_quota_poller.py`:
- Around line 290-300: The tests are missing a regression asserting that
QuotaPoller.start() is idempotent; add a new assertion in the lifecycle tests
(or modify test_start_creates_background_task) to call poller.start() a second
time and assert the poller._task remains the same object (e.g., store original =
poller._task, await poller.start() again, then assert poller._task is original
and not a new Task), then cleanup with await poller.stop(); this ensures the
duplicate-start no-op contract on QuotaPoller.start() is enforced.
---
Outside diff comments:
In `@tests/unit/budget/test_coordination_config.py`:
- Around line 19-27: Update test_values to assert the string values for the four
new enum members on CoordinationMetricName: add assertions that
CoordinationMetricName.AMDAHL_CEILING.value == "amdahl_ceiling",
CoordinationMetricName.STRAGGLER_GAP.value == "straggler_gap",
CoordinationMetricName.TOKEN_SPEEDUP_RATIO.value == "token_speedup_ratio", and
CoordinationMetricName.MESSAGE_OVERHEAD.value == "message_overhead"; keep the
existing five assertions and leave test_member_count as-is (already expecting
9).
---
Duplicate comments:
In `@src/synthorg/budget/coordination_collector.py`:
- Around line 255-265: The code currently fakes a 1-turn baseline and infers
duration from per-turn latency; change the capture so it skips persisting a
BaselineRecord when execution_result.turns is empty, and use the actual run
duration passed into the collector instead of summing per-turn latency.
Concretely: in the function that builds BaselineRecord (references:
execution_result.turns, duration_seconds, BaselineRecord, and the local turns
variable), add a guard that returns/does not create a baseline if not
execution_result.turns, remove the max(float(turns), 1.0) coercion (use
float(turns) as-is), and replace the derived duration_seconds calculation with a
provided run_duration_seconds parameter (or an existing real run-duration
variable) when constructing BaselineRecord.
In `@src/synthorg/engine/agent_engine.py`:
- Around line 877-882: The metrics collector call currently hardcodes
is_multi_agent=False and returns the coordinator result directly, preventing
multi-agent metrics from being recorded; after calling
self._coordinator.coordinate(context) (which returns a CoordinationResult),
extract CoordinationResult.team_size, CoordinationResult.agent_durations, and
CoordinationResult.agent_outputs and then call
self._coordination_metrics_collector.collect(...) again with execution_result
set to that CoordinationResult (or relevant summary), agent_id/task_id as
before, and is_multi_agent=True so multi-agent Ec/O%/R/etc. are computed and
stored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 72a015ae-4f27-44f7-8feb-7e004c141465
📒 Files selected for processing (26)
CLAUDE.mdsrc/synthorg/budget/call_analytics.pysrc/synthorg/budget/call_classifier.pysrc/synthorg/budget/coordination_collector.pysrc/synthorg/budget/coordination_config.pysrc/synthorg/budget/quota_poller.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/loop_protocol.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/react_loop.pysrc/synthorg/providers/base.pysrc/synthorg/providers/resilience/retry.pytests/unit/budget/test_call_analytics.pytests/unit/budget/test_call_classifier_properties.pytests/unit/budget/test_coordination_collector.pytests/unit/budget/test_coordination_collector_properties.pytests/unit/budget/test_coordination_config.pytests/unit/budget/test_cost_record.pytests/unit/budget/test_quota_poller.pytests/unit/engine/test_loop_helpers.pytests/unit/engine/test_loop_protocol.pytests/unit/providers/resilience/test_retry.pytests/unit/providers/test_retry.py
Implements the full runtime analytics and metrics pipeline across four issues. Issue #226: Coordination metrics runtime collection - BaselineStore: sliding-window store for single-agent baseline data - CoordinationMetricsCollector: post-execution collector for all 9 coordination metrics (efficiency, overhead, error amplification, message density, redundancy, amdahl, message overhead, straggler gap, token speedup) - AgentEngine._try_collect_coordination_metrics() integration - SimilarityComputer protocol for pluggable redundancy computation - 11 new observability event constants in coordination_metrics.py Issue #225: Runtime LLM call categorization - ClassificationContext: frozen Pydantic context for a single call - RulesBasedClassifier: priority-ordered classifier (EMBEDDING, COORDINATION, SYSTEM, PRODUCTIVE) - CallClassificationStrategy protocol for custom classifiers - classify_call() convenience function - call_classification.py event module Issue #227: Full per-call analytics layer - CostRecord: +6 optional fields (latency_ms, cache_hit, retry_count, retry_reason, finish_reason, success) - TurnRecord: +4 optional fields + computed success field - CompletionResponse.provider_metadata dict for provider enrichment - RetryHandler tracks last_attempt_count and last_retry_reason - BaseCompletionProvider.complete() injects _synthorg_latency_ms, _synthorg_retry_count, _synthorg_retry_reason via provider_metadata - make_turn_record() extracts metadata into TurnRecord fields - All 6 loop call sites updated to pass provider_metadata - cost_recording.py propagates all new fields TurnRecord to CostRecord - CallAnalyticsConfig + RetryAlertConfig frozen Pydantic configs - AnalyticsAggregation model for aggregated call statistics - CallAnalyticsService: get_aggregation() + check_alerts() - 5 new event constants in analytics.py Issue #224: Proactive quota polling and alerting - QuotaAlertThresholds: warn_pct/critical_pct with ordering invariant - QuotaPollerConfig: enabled/interval/cooldown/thresholds config - QuotaPoller: background asyncio.Task loop with poll_once(), start(), stop(); per-provider/window/level cooldown tracking; tracker error resilience - 7 new event constants in quota.py Exports: all new types exported from synthorg.budget.__init__ Tests (TDD throughout): - test_baseline_store.py, test_coordination_collector.py, test_coordination_collector_properties.py (Phase 1) - test_call_classifier.py, test_call_classifier_properties.py (Phase 2) - test_cost_record.py, test_loop_protocol.py, test_retry.py, test_base_provider.py, test_loop_helpers.py, test_cost_recording.py, test_call_analytics_config.py, test_call_analytics.py, test_call_analytics_properties.py (Phase 3) - test_quota_poller_config.py, test_quota_poller.py (Phase 4) - test_events.py updated with call_classification module Closes #226 Closes #225 Closes #227 Closes #224
…nd CodeRabbit Critical fixes: - RetryHandler race condition: return immutable RetryResult from execute() instead of storing mutable state on shared instance (CodeRabbit) - Single-agent fallthrough: short-circuit regardless of baseline_store to prevent single-agent runs entering multi-agent metric path (CodeRabbit, Copilot) - Explicit is_multi_agent=False in AgentEngine metrics collection (CodeRabbit) Major fixes: - Derived metrics (amdahl, straggler, token_speedup, message_overhead) now respect _is_enabled() via new CoordinationMetricName entries (CodeRabbit) - Baseline duration_seconds uses actual turn latencies instead of 0.0 (Gemini) - Provider metadata merged instead of replaced in complete() (Copilot) - make_turn_record() uses isinstance checks instead of unsafe coercion (Copilot) - QuotaPoller.start() is now idempotent (CodeRabbit, Copilot) - _poll_loop() guarded with try/except to survive dispatch failures (CodeRabbit) - _dispatch_quota_alert guarded in _check_snapshot (Copilot) - provider param typed as NotBlankStr in CallAnalyticsService (CodeRabbit) - ClassificationContext uses NotBlankStr for tool_calls_made/agent_role (CodeRabbit) - collect() split into _record_baseline + _collect_multi_agent helpers (CodeRabbit) - _fire_alerts docstring corrected for dispatcher=None behavior (Copilot) - FinishReason enum members used instead of string comparison (Gemini) - success_rate computed from error_rate instead of hardcoded 1.0 (Gemini) Medium fixes: - TurnRecord gains _validate_retry_consistency validator (type-design) - Flaky property test uses integers instead of floats (CodeRabbit) - Retry validation rejection tests added to test_cost_record (CodeRabbit) - COORDINATION > SYSTEM priority property test added (CodeRabbit) - _p95() dedicated unit tests added (pr-test-analyzer) - CLAUDE.md logging section updated with analytics/classification/quota events Classifier integration (issue #225): - classify_turn() helper wired into all 4 execution loops - Planning turns classified as SYSTEM, executor turns classified dynamically - Replaces hardcoded LLMCallCategory.PRODUCTIVE at all call sites
- Add module-level logger to call_classifier.py (logging convention) - Move alert log after successful dispatch in _fire_alerts (log accuracy) - Add MemoryError/RecursionError guard in quota tracker error handler - Switch cooldown tracking from datetime.now(UTC) to time.monotonic() - Use NotBlankStr for TurnRecord.retry_reason (type convention) - Skip baseline recording for empty/zero-duration executions - Add enum value assertions for 4 new CoordinationMetricName members - Add idempotent start() regression test for QuotaPoller
1d87fa8 to
e35a792
Compare
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (2)
src/synthorg/budget/quota_poller.py (2)
171-183:⚠️ Potential issue | 🟠 MajorAdd
MemoryError, RecursionErrorguard in_check_snapshotdispatch block.The
except Exceptionat line 179 swallows fatal system errors during alert dispatch.🛠️ Proposed fix
if self._dispatcher is not None: try: await _dispatch_quota_alert( self._dispatcher, snap=snap, level=level, usage_pct=usage_pct, ) + except MemoryError, RecursionError: + raise except Exception: logger.exception( QUOTA_POLL_FAILED, error="quota_alert_dispatch_failed", )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/budget/quota_poller.py` around lines 171 - 183, The current broad except in the _check_snapshot dispatch block (around the call to _dispatch_quota_alert) swallows fatal exceptions like MemoryError and RecursionError; change the handler to re-raise those fatal exceptions while still logging other exceptions — e.g., catch the exception object, if it is an instance of (MemoryError, RecursionError) re-raise it, otherwise call logger.exception(QUOTA_POLL_FAILED, error="quota_alert_dispatch_failed") as before; reference symbols: _check_snapshot, _dispatch_quota_alert, QUOTA_POLL_FAILED, logger.exception.
119-131:⚠️ Potential issue | 🟠 MajorAdd
MemoryError, RecursionErrorguard in_poll_loop.The
except Exceptionat line 126 swallows fatal system errors. Per past review and project convention, these must propagate.🛠️ Proposed fix
async def _poll_loop(self) -> None: """Background task: poll repeatedly until cancelled.""" while True: try: await self.poll_once() except asyncio.CancelledError: raise + except MemoryError, RecursionError: + raise except Exception as exc: logger.exception( QUOTA_POLL_FAILED, error=type(exc).__name__, ) await asyncio.sleep(self._config.poll_interval_seconds)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/budget/quota_poller.py` around lines 119 - 131, The current _poll_loop swallows fatal errors by using a broad "except Exception" handler; add an explicit guard to re-raise MemoryError and RecursionError before the generic exception handler so these fatal system errors propagate (i.e., in async def _poll_loop(): add an "except (MemoryError, RecursionError): raise" branch just above the existing "except Exception as exc" that logs QUOTA_POLL_FAILED).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/budget/baseline_store.py`:
- Around line 84-91: The call that currently logs baseline mutations using
logger.debug should be changed to INFO: inside the record() method of
BaselineStore, replace the logger.debug(...) invocation that emits
COORD_METRICS_BASELINE_RECORDED (including agent_id=baseline.agent_id,
task_id=baseline.task_id, turns=baseline.turns, store_size=len(self._records),
window_size=self._window_size) with logger.info(...) so state transitions to
self._records are logged at INFO level per the guidelines.
In `@src/synthorg/budget/call_analytics.py`:
- Around line 204-232: The dispatch call in _dispatch_retry_rate_alert must
guard against swallowing system exceptions: wrap the await
dispatcher.dispatch(...) in a try/except Exception as exc block, and if
isinstance(exc, (MemoryError, RecursionError)) re-raise it immediately; for
other exceptions, log the error with the module logger (create a logger via
logging.getLogger(__name__) at the top of the module if one doesn't exist) and
allow the function to complete without re-raising. Ensure the guard specifically
targets MemoryError and RecursionError and leaves other exception handling
behavior unchanged.
In `@src/synthorg/engine/cost_recording.py`:
- Around line 82-83: The current assignment success=turn.success mislabels
outcomes as successful when turn.finish_reason is None; change the logic in the
record construction (the block that sets finish_reason=turn.finish_reason,
success=turn.success) to preserve an unknown/None success state when
finish_reason is None — e.g., set success to turn.success only if
turn.finish_reason is not None, otherwise set success to None (or an explicit
"unknown" sentinel) so unknown finish_reason does not imply success.
In `@src/synthorg/providers/models.py`:
- Around line 283-286: The CompletionResponse class docstring is missing the new
field provider_metadata; update the class docstring Attributes section to
document provider_metadata (type dict[str, object]) with a short description
like "Provider metadata injected by the base class (_synthorg_* keys)." so it
matches the existing Attributes pattern and appears alongside the other fields
for CompletionResponse.
In `@tests/unit/engine/test_cost_recording.py`:
- Around line 234-318: Refactor the repeated propagation tests
(test_latency_ms_propagated, test_cache_hit_propagated,
test_retry_count_propagated, test_retry_reason_propagated,
test_finish_reason_and_success_propagated,
test_none_analytics_fields_propagated_as_none) into a single parametrized pytest
test: use `@pytest.mark.parametrize` to supply (turn_factory_or_turn,
expected_field, expected_value) cases; for each case call record_execution_costs
with _result, _identity, "agent-1", "task-1" and _FakeTracker, then assert
tracker.records[0].<expected_field> == expected_value (for finish_reason/success
include appropriate TurnRecord with FinishReason). Reuse helpers
_turn_with_analytics, _turn, _result, record_execution_costs and _FakeTracker to
build inputs; keep individual expectation tuples for latency_ms, cache_hit,
retry_count, retry_reason, finish_reason, success, and None fields.
In `@tests/unit/engine/test_loop_protocol.py`:
- Around line 201-327: The tests repeat the same assertions for TurnRecord
analytics fields and FinishReason→success mapping; refactor by replacing the
repeated individual tests with pytest.mark.parametrize-based tests: for
analytics fields create a single parametrized test (referencing TurnRecord and
self._base()) that iterates tuples like ("latency_ms", None/250.0/0.0,
negative->expect ValidationError), ("cache_hit", None/True/False),
("retry_count", None/3/0, negative->expect ValidationError), ("retry_reason",
None/"RateLimitError") and assert field values or raise, and for success mapping
create a parametrized test over FinishReason values (FinishReason.STOP,
TOOL_USE, MAX_TOKENS -> True; FinishReason.ERROR, CONTENT_FILTER -> False) using
self._base(finish_reason=...) to assert the .success property; use clear param
ids and combine negative-validation cases into separate parametrized cases to
preserve semantics.
In `@tests/unit/providers/test_models.py`:
- Around line 470-475: Replace vendor-ambiguous model values used in
CompletionResponse instances with the repository’s standard test model IDs
(e.g., change model="test" to model="test-small-001"); update the occurrences
around the CompletionResponse creation (the variable resp) and the other similar
spots noted (around lines 483-488 and 501-512) so all fixtures use the canonical
test model IDs like test-small-001 or test-provider as appropriate to keep tests
vendor-agnostic.
---
Duplicate comments:
In `@src/synthorg/budget/quota_poller.py`:
- Around line 171-183: The current broad except in the _check_snapshot dispatch
block (around the call to _dispatch_quota_alert) swallows fatal exceptions like
MemoryError and RecursionError; change the handler to re-raise those fatal
exceptions while still logging other exceptions — e.g., catch the exception
object, if it is an instance of (MemoryError, RecursionError) re-raise it,
otherwise call logger.exception(QUOTA_POLL_FAILED,
error="quota_alert_dispatch_failed") as before; reference symbols:
_check_snapshot, _dispatch_quota_alert, QUOTA_POLL_FAILED, logger.exception.
- Around line 119-131: The current _poll_loop swallows fatal errors by using a
broad "except Exception" handler; add an explicit guard to re-raise MemoryError
and RecursionError before the generic exception handler so these fatal system
errors propagate (i.e., in async def _poll_loop(): add an "except (MemoryError,
RecursionError): raise" branch just above the existing "except Exception as exc"
that logs QUOTA_POLL_FAILED).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d491a02e-152a-4391-8995-3bbec6b827e6
📒 Files selected for processing (48)
CLAUDE.mdsrc/synthorg/budget/__init__.pysrc/synthorg/budget/baseline_store.pysrc/synthorg/budget/call_analytics.pysrc/synthorg/budget/call_analytics_config.pysrc/synthorg/budget/call_analytics_models.pysrc/synthorg/budget/call_classifier.pysrc/synthorg/budget/coordination_collector.pysrc/synthorg/budget/coordination_config.pysrc/synthorg/budget/cost_record.pysrc/synthorg/budget/quota_poller.pysrc/synthorg/budget/quota_poller_config.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/engine/cost_recording.pysrc/synthorg/engine/hybrid_helpers.pysrc/synthorg/engine/hybrid_loop.pysrc/synthorg/engine/loop_helpers.pysrc/synthorg/engine/loop_protocol.pysrc/synthorg/engine/plan_execute_loop.pysrc/synthorg/engine/react_loop.pysrc/synthorg/observability/events/analytics.pysrc/synthorg/observability/events/call_classification.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/quota.pysrc/synthorg/providers/base.pysrc/synthorg/providers/models.pysrc/synthorg/providers/resilience/retry.pytests/unit/budget/conftest.pytests/unit/budget/test_baseline_store.pytests/unit/budget/test_call_analytics.pytests/unit/budget/test_call_analytics_config.pytests/unit/budget/test_call_analytics_properties.pytests/unit/budget/test_call_classifier.pytests/unit/budget/test_call_classifier_properties.pytests/unit/budget/test_coordination_collector.pytests/unit/budget/test_coordination_collector_properties.pytests/unit/budget/test_coordination_config.pytests/unit/budget/test_cost_record.pytests/unit/budget/test_quota_poller.pytests/unit/budget/test_quota_poller_config.pytests/unit/engine/test_cost_recording.pytests/unit/engine/test_loop_helpers.pytests/unit/engine/test_loop_protocol.pytests/unit/observability/test_events.pytests/unit/providers/resilience/test_retry.pytests/unit/providers/test_base_provider.pytests/unit/providers/test_models.pytests/unit/providers/test_retry.py
| async def _dispatch_retry_rate_alert( | ||
| dispatcher: NotificationDispatcher, | ||
| *, | ||
| retry_rate: float, | ||
| warn_rate: float, | ||
| ) -> None: | ||
| """Dispatch a retry-rate warning notification. | ||
|
|
||
| Args: | ||
| dispatcher: Notification dispatcher. | ||
| retry_rate: Observed retry rate. | ||
| warn_rate: Configured warn threshold. | ||
| """ | ||
| from synthorg.notifications.models import ( # noqa: PLC0415 | ||
| Notification, | ||
| NotificationCategory, | ||
| NotificationSeverity, | ||
| ) | ||
|
|
||
| body = f"Retry rate {retry_rate:.1%} exceeds warning threshold {warn_rate:.1%}." | ||
| await dispatcher.dispatch( | ||
| Notification( | ||
| category=NotificationCategory.BUDGET, | ||
| severity=NotificationSeverity.WARNING, | ||
| title="High retry rate alert", | ||
| body=body, | ||
| source="budget.call_analytics", | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Add system exception re-raise guards to _dispatch_retry_rate_alert.
The dispatcher.dispatch() call can raise exceptions. Per project convention, MemoryError and RecursionError must escape; the current implementation lacks these guards.
🛠️ Proposed fix
async def _dispatch_retry_rate_alert(
dispatcher: NotificationDispatcher,
*,
retry_rate: float,
warn_rate: float,
) -> None:
"""Dispatch a retry-rate warning notification.
Args:
dispatcher: Notification dispatcher.
retry_rate: Observed retry rate.
warn_rate: Configured warn threshold.
"""
from synthorg.notifications.models import ( # noqa: PLC0415
Notification,
NotificationCategory,
NotificationSeverity,
)
body = f"Retry rate {retry_rate:.1%} exceeds warning threshold {warn_rate:.1%}."
- await dispatcher.dispatch(
- Notification(
- category=NotificationCategory.BUDGET,
- severity=NotificationSeverity.WARNING,
- title="High retry rate alert",
- body=body,
- source="budget.call_analytics",
- ),
- )
+ try:
+ await dispatcher.dispatch(
+ Notification(
+ category=NotificationCategory.BUDGET,
+ severity=NotificationSeverity.WARNING,
+ title="High retry rate alert",
+ body=body,
+ source="budget.call_analytics",
+ ),
+ )
+ except MemoryError, RecursionError:
+ raise
+ except Exception:
+ logger.exception(
+ ANALYTICS_RETRY_RATE_ALERT,
+ error="alert_dispatch_failed",
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/budget/call_analytics.py` around lines 204 - 232, The dispatch
call in _dispatch_retry_rate_alert must guard against swallowing system
exceptions: wrap the await dispatcher.dispatch(...) in a try/except Exception as
exc block, and if isinstance(exc, (MemoryError, RecursionError)) re-raise it
immediately; for other exceptions, log the error with the module logger (create
a logger via logging.getLogger(__name__) at the top of the module if one doesn't
exist) and allow the function to complete without re-raising. Ensure the guard
specifically targets MemoryError and RecursionError and leaves other exception
handling behavior unchanged.
| finish_reason=turn.finish_reason, | ||
| success=turn.success, |
There was a problem hiding this comment.
Preserve unknown success state when finish_reason is missing.
On Line [83], success=turn.success turns finish_reason=None into True, which misclassifies unknown outcomes as successful.
Proposed fix
record = CostRecord(
agent_id=agent_id,
task_id=task_id,
provider=identity.model.provider,
model=identity.model.model_id,
input_tokens=turn.input_tokens,
output_tokens=turn.output_tokens,
cost_usd=turn.cost_usd,
timestamp=datetime.now(UTC),
call_category=turn.call_category,
latency_ms=turn.latency_ms,
cache_hit=turn.cache_hit,
retry_count=turn.retry_count,
retry_reason=turn.retry_reason,
finish_reason=turn.finish_reason,
- success=turn.success,
+ success=(
+ turn.success if turn.finish_reason is not None else None
+ ),
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/engine/cost_recording.py` around lines 82 - 83, The current
assignment success=turn.success mislabels outcomes as successful when
turn.finish_reason is None; change the logic in the record construction (the
block that sets finish_reason=turn.finish_reason, success=turn.success) to
preserve an unknown/None success state when finish_reason is None — e.g., set
success to turn.success only if turn.finish_reason is not None, otherwise set
success to None (or an explicit "unknown" sentinel) so unknown finish_reason
does not imply success.
| async def test_latency_ms_propagated(self) -> None: | ||
| """latency_ms from TurnRecord lands in CostRecord.""" | ||
| turn = self._turn_with_analytics(latency_ms=250.5) | ||
| tracker = _FakeTracker() | ||
| await record_execution_costs( | ||
| _result((turn,)), | ||
| _identity(), | ||
| "agent-1", | ||
| "task-1", | ||
| tracker=tracker, # type: ignore[arg-type] | ||
| ) | ||
| assert tracker.records[0].latency_ms == 250.5 | ||
|
|
||
| async def test_cache_hit_propagated(self) -> None: | ||
| turn = self._turn_with_analytics(cache_hit=True) | ||
| tracker = _FakeTracker() | ||
| await record_execution_costs( | ||
| _result((turn,)), | ||
| _identity(), | ||
| "agent-1", | ||
| "task-1", | ||
| tracker=tracker, # type: ignore[arg-type] | ||
| ) | ||
| assert tracker.records[0].cache_hit is True | ||
|
|
||
| async def test_retry_count_propagated(self) -> None: | ||
| turn = self._turn_with_analytics(retry_count=2) | ||
| tracker = _FakeTracker() | ||
| await record_execution_costs( | ||
| _result((turn,)), | ||
| _identity(), | ||
| "agent-1", | ||
| "task-1", | ||
| tracker=tracker, # type: ignore[arg-type] | ||
| ) | ||
| assert tracker.records[0].retry_count == 2 | ||
|
|
||
| async def test_retry_reason_propagated(self) -> None: | ||
| turn = self._turn_with_analytics(retry_count=2, retry_reason="RateLimitError") | ||
| tracker = _FakeTracker() | ||
| await record_execution_costs( | ||
| _result((turn,)), | ||
| _identity(), | ||
| "agent-1", | ||
| "task-1", | ||
| tracker=tracker, # type: ignore[arg-type] | ||
| ) | ||
| assert tracker.records[0].retry_reason == "RateLimitError" | ||
|
|
||
| async def test_finish_reason_and_success_propagated(self) -> None: | ||
| from synthorg.providers.enums import FinishReason | ||
|
|
||
| turn = TurnRecord( | ||
| turn_number=1, | ||
| input_tokens=100, | ||
| output_tokens=50, | ||
| cost_usd=0.01, | ||
| finish_reason=FinishReason.ERROR, | ||
| ) | ||
| tracker = _FakeTracker() | ||
| await record_execution_costs( | ||
| _result((turn,)), | ||
| _identity(), | ||
| "agent-1", | ||
| "task-1", | ||
| tracker=tracker, # type: ignore[arg-type] | ||
| ) | ||
| assert tracker.records[0].finish_reason == FinishReason.ERROR | ||
| assert tracker.records[0].success is False | ||
|
|
||
| async def test_none_analytics_fields_propagated_as_none(self) -> None: | ||
| """When analytics fields are None on TurnRecord, CostRecord gets None.""" | ||
| turn = _turn() | ||
| tracker = _FakeTracker() | ||
| await record_execution_costs( | ||
| _result((turn,)), | ||
| _identity(), | ||
| "agent-1", | ||
| "task-1", | ||
| tracker=tracker, # type: ignore[arg-type] | ||
| ) | ||
| assert tracker.records[0].latency_ms is None | ||
| assert tracker.records[0].cache_hit is None | ||
| assert tracker.records[0].retry_count is None | ||
| assert tracker.records[0].retry_reason is None |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Parametrize the field-propagation assertions to remove duplication.
These tests are valuable, but the repeated arrange/act/assert shape is a good fit for parametrization.
As per coding guidelines: tests/**/*.py: Prefer @pytest.mark.parametrize for testing similar cases in Python tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_cost_recording.py` around lines 234 - 318, Refactor
the repeated propagation tests (test_latency_ms_propagated,
test_cache_hit_propagated, test_retry_count_propagated,
test_retry_reason_propagated, test_finish_reason_and_success_propagated,
test_none_analytics_fields_propagated_as_none) into a single parametrized pytest
test: use `@pytest.mark.parametrize` to supply (turn_factory_or_turn,
expected_field, expected_value) cases; for each case call record_execution_costs
with _result, _identity, "agent-1", "task-1" and _FakeTracker, then assert
tracker.records[0].<expected_field> == expected_value (for finish_reason/success
include appropriate TurnRecord with FinishReason). Reuse helpers
_turn_with_analytics, _turn, _result, record_execution_costs and _FakeTracker to
build inputs; keep individual expectation tuples for latency_ms, cache_hit,
retry_count, retry_reason, finish_reason, success, and None fields.
| def test_latency_ms_default_none(self) -> None: | ||
| assert self._base().latency_ms is None | ||
|
|
||
| def test_latency_ms_positive_accepted(self) -> None: | ||
| record = TurnRecord( | ||
| turn_number=1, | ||
| input_tokens=10, | ||
| output_tokens=5, | ||
| cost_usd=0.001, | ||
| finish_reason=FinishReason.STOP, | ||
| latency_ms=250.0, | ||
| ) | ||
| assert record.latency_ms == 250.0 | ||
|
|
||
| def test_latency_ms_zero_accepted(self) -> None: | ||
| record = TurnRecord( | ||
| turn_number=1, | ||
| input_tokens=10, | ||
| output_tokens=5, | ||
| cost_usd=0.001, | ||
| finish_reason=FinishReason.STOP, | ||
| latency_ms=0.0, | ||
| ) | ||
| assert record.latency_ms == 0.0 | ||
|
|
||
| def test_latency_ms_negative_rejected(self) -> None: | ||
| with pytest.raises(ValidationError): | ||
| TurnRecord( | ||
| turn_number=1, | ||
| input_tokens=10, | ||
| output_tokens=5, | ||
| cost_usd=0.001, | ||
| finish_reason=FinishReason.STOP, | ||
| latency_ms=-1.0, | ||
| ) | ||
|
|
||
| def test_cache_hit_default_none(self) -> None: | ||
| assert self._base().cache_hit is None | ||
|
|
||
| def test_cache_hit_true(self) -> None: | ||
| record = TurnRecord( | ||
| turn_number=1, | ||
| input_tokens=10, | ||
| output_tokens=5, | ||
| cost_usd=0.001, | ||
| finish_reason=FinishReason.STOP, | ||
| cache_hit=True, | ||
| ) | ||
| assert record.cache_hit is True | ||
|
|
||
| def test_cache_hit_false(self) -> None: | ||
| record = TurnRecord( | ||
| turn_number=1, | ||
| input_tokens=10, | ||
| output_tokens=5, | ||
| cost_usd=0.001, | ||
| finish_reason=FinishReason.STOP, | ||
| cache_hit=False, | ||
| ) | ||
| assert record.cache_hit is False | ||
|
|
||
| def test_retry_count_default_none(self) -> None: | ||
| assert self._base().retry_count is None | ||
|
|
||
| def test_retry_count_positive_accepted(self) -> None: | ||
| record = TurnRecord( | ||
| turn_number=1, | ||
| input_tokens=10, | ||
| output_tokens=5, | ||
| cost_usd=0.001, | ||
| finish_reason=FinishReason.STOP, | ||
| retry_count=3, | ||
| ) | ||
| assert record.retry_count == 3 | ||
|
|
||
| def test_retry_count_zero_accepted(self) -> None: | ||
| record = TurnRecord( | ||
| turn_number=1, | ||
| input_tokens=10, | ||
| output_tokens=5, | ||
| cost_usd=0.001, | ||
| finish_reason=FinishReason.STOP, | ||
| retry_count=0, | ||
| ) | ||
| assert record.retry_count == 0 | ||
|
|
||
| def test_retry_count_negative_rejected(self) -> None: | ||
| with pytest.raises(ValidationError): | ||
| TurnRecord( | ||
| turn_number=1, | ||
| input_tokens=10, | ||
| output_tokens=5, | ||
| cost_usd=0.001, | ||
| finish_reason=FinishReason.STOP, | ||
| retry_count=-1, | ||
| ) | ||
|
|
||
| def test_retry_reason_default_none(self) -> None: | ||
| assert self._base().retry_reason is None | ||
|
|
||
| def test_retry_reason_set(self) -> None: | ||
| record = TurnRecord( | ||
| turn_number=1, | ||
| input_tokens=10, | ||
| output_tokens=5, | ||
| cost_usd=0.001, | ||
| finish_reason=FinishReason.STOP, | ||
| retry_count=1, | ||
| retry_reason="RateLimitError", | ||
| ) | ||
| assert record.retry_reason == "RateLimitError" | ||
|
|
||
| def test_success_true_for_stop(self) -> None: | ||
| assert self._base(finish_reason=FinishReason.STOP).success is True | ||
|
|
||
| def test_success_true_for_tool_use(self) -> None: | ||
| assert self._base(finish_reason=FinishReason.TOOL_USE).success is True | ||
|
|
||
| def test_success_true_for_max_tokens(self) -> None: | ||
| assert self._base(finish_reason=FinishReason.MAX_TOKENS).success is True | ||
|
|
||
| def test_success_false_for_error(self) -> None: | ||
| assert self._base(finish_reason=FinishReason.ERROR).success is False | ||
|
|
||
| def test_success_false_for_content_filter(self) -> None: | ||
| assert self._base(finish_reason=FinishReason.CONTENT_FILTER).success is False | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Use parametrization for repeated analytics-field and success-mapping cases.
This block is correct but repetitive; converting the near-identical cases to @pytest.mark.parametrize will reduce test maintenance overhead.
Refactor sketch
- def test_success_true_for_stop(self) -> None:
- assert self._base(finish_reason=FinishReason.STOP).success is True
-
- def test_success_true_for_tool_use(self) -> None:
- assert self._base(finish_reason=FinishReason.TOOL_USE).success is True
-
- def test_success_true_for_max_tokens(self) -> None:
- assert self._base(finish_reason=FinishReason.MAX_TOKENS).success is True
-
- def test_success_false_for_error(self) -> None:
- assert self._base(finish_reason=FinishReason.ERROR).success is False
-
- def test_success_false_for_content_filter(self) -> None:
- assert self._base(finish_reason=FinishReason.CONTENT_FILTER).success is False
+ `@pytest.mark.parametrize`(
+ ("finish_reason", "expected"),
+ [
+ (FinishReason.STOP, True),
+ (FinishReason.TOOL_USE, True),
+ (FinishReason.MAX_TOKENS, True),
+ (FinishReason.ERROR, False),
+ (FinishReason.CONTENT_FILTER, False),
+ ],
+ )
+ def test_success_by_finish_reason(
+ self,
+ finish_reason: FinishReason,
+ expected: bool,
+ ) -> None:
+ assert self._base(finish_reason=finish_reason).success is expectedAs per coding guidelines: tests/**/*.py: Prefer @pytest.mark.parametrize for testing similar cases in Python tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_loop_protocol.py` around lines 201 - 327, The tests
repeat the same assertions for TurnRecord analytics fields and
FinishReason→success mapping; refactor by replacing the repeated individual
tests with pytest.mark.parametrize-based tests: for analytics fields create a
single parametrized test (referencing TurnRecord and self._base()) that iterates
tuples like ("latency_ms", None/250.0/0.0, negative->expect ValidationError),
("cache_hit", None/True/False), ("retry_count", None/3/0, negative->expect
ValidationError), ("retry_reason", None/"RateLimitError") and assert field
values or raise, and for success mapping create a parametrized test over
FinishReason values (FinishReason.STOP, TOOL_USE, MAX_TOKENS -> True;
FinishReason.ERROR, CONTENT_FILTER -> False) using self._base(finish_reason=...)
to assert the .success property; use clear param ids and combine
negative-validation cases into separate parametrized cases to preserve
semantics.
- Baseline record logging promoted to INFO (state transition convention) - MemoryError/RecursionError guards added to _dispatch_retry_rate_alert - MemoryError/RecursionError guards added to _poll_loop and _check_snapshot - CompletionResponse docstring documents provider_metadata field - Test model values use test-small-001 (vendor-agnostic convention)
🤖 I have created a release *beep* *boop* --- ## [0.6.4](v0.6.3...v0.6.4) (2026-04-07) ### Features * analytics and metrics runtime pipeline ([#226](#226), [#225](#225), [#227](#227), [#224](#224)) ([#1127](#1127)) ([ec57641](ec57641)) * engine intelligence -- quality signals, health monitoring, trajectory scoring, coordination metrics ([#1099](#1099)) ([aac2029](aac2029)), closes [#697](#697) [#707](#707) [#705](#705) [#703](#703) * enterprise-grade auth -- HttpOnly cookie sessions, CSRF, lockout, session limits ([#1102](#1102)) ([d3022c7](d3022c7)), closes [#1068](#1068) * implement core tool categories and granular sub-constraints ([#1101](#1101)) ([0611b53](0611b53)), closes [#1034](#1034) [#220](#220) * memory evolution -- GraphRAG/consistency research + SelfEditingMemoryStrategy ([#1036](#1036), [#208](#208)) ([#1129](#1129)) ([a9acda3](a9acda3)) * security hardening -- sandbox, risk override, SSRF self-heal, DAST fix ([#1100](#1100)) ([31e7273](31e7273)), closes [#1098](#1098) [#696](#696) [#222](#222) [#671](#671) ### Bug Fixes * harden agent identity versioning post-review ([#1128](#1128)) ([8eb2859](8eb2859)), closes [#1076](#1076) ### Documentation * engine architecture research ([#688](#688) [#690](#690) [#848](#848) [#687](#687)) ([#1114](#1114)) ([59b31f9](59b31f9)) ### Maintenance * add .claudeignore and split CLAUDE.md for token optimization ([#1112](#1112)) ([b0fbd18](b0fbd18)) * bump github.com/sigstore/protobuf-specs from 0.5.0 to 0.5.1 in /cli in the all group ([#1106](#1106)) ([73089c9](73089c9)) * bump jsdom from 29.0.1 to 29.0.2 in /site in the all group ([#1107](#1107)) ([8e99dce](8e99dce)) * bump jsdom from 29.0.1 to 29.0.2 in /web in the all group ([#1108](#1108)) ([ce8c749](ce8c749)) * bump python from `fb83750` to `6869258` in /docker/backend in the all group ([#1104](#1104)) ([4911726](4911726)) * bump python from `fb83750` to `6869258` in /docker/web in the all group ([#1103](#1103)) ([87bdf09](87bdf09)) * bump the all group across 1 directory with 4 updates ([#1111](#1111)) ([f702464](f702464)) * bump the all group in /docker/sandbox with 2 updates ([#1105](#1105)) ([05a91ca](05a91ca)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Implements the full analytics and metrics runtime pipeline across four issues, in dependency order: coordination metrics collection (#226) -> call categorization (#225) -> per-call analytics layer (#227) -> quota polling (#224).
Issue #226 -- Coordination Metrics Runtime Collection Pipeline
BaselineStore(baseline_store.py): sliding-window store for single-agent baseline data (turns, error rates, tokens, durations) used by Ec, O%, Ae metricsCoordinationMetricsCollector(coordination_collector.py): post-execution collector that gathers raw data, calls pure compute functions fromcoordination_metrics.py, logs events, and fires alerts viaNotificationDispatcherSimilarityComputerprotocol decouples redundancy rate (R) from any specific embedding providerAgentEngineintegration:_try_collect_coordination_metrics()called from_post_execution_pipeline()(best-effort, same pattern as_try_capture_distillation)events/coordination_metrics.pyIssue #225 -- Runtime LLM Call Categorization
ClassificationContext(frozen Pydantic): captures boolean flags (is_delegation,is_review,is_meeting,is_planning_phase,is_system_prompt,is_embedding_operation,is_quality_judge) plustool_calls_madeandagent_roleRulesBasedClassifier: priority-ordered classification (EMBEDDING > COORDINATION > SYSTEM > PRODUCTIVE)classify_call(): convenience function using the default classifierevents/call_classification.pyIssue #227 -- Full Per-Call Analytics Layer
Model extensions (all backward-compatible, new fields default to
None):CostRecord:latency_ms,cache_hit,retry_count,retry_reason,finish_reason,successTurnRecord:latency_ms,cache_hit,retry_count,retry_reason;successas a@computed_fieldderived fromfinish_reasonCompletionResponse:provider_metadata: dict[str, object]Provider-layer metadata propagation:
RetryHandler:last_attempt_countandlast_retry_reasoninstance state, reset at start of eachexecute()callBaseCompletionProvider.complete(): wraps_resilient_execute()withtime.monotonic(), injects_synthorg_latency_ms,_synthorg_retry_count,_synthorg_retry_reasonintoprovider_metadata; latency captured and logged even on exceptionEngine-layer propagation:
make_turn_record()extracts_synthorg_*keys fromprovider_metadatareact_loop,plan_execute_loop,hybrid_loop) passprovider_metadatatomake_turn_record()cost_recording.pypropagates all new fields fromTurnRecordtoCostRecordAnalytics aggregation:
CallAnalyticsConfig+RetryAlertConfig: configuration with warn thresholdsAnalyticsAggregation:total_calls,success_count,failure_count,retry_count,retry_rate,cache_hit_count,cache_hit_rate,avg_latency_ms,p95_latency_ms,orchestration_ratio,by_finish_reason; cross-field validators enforce count invariantsCallAnalyticsService.get_aggregation()andcheck_alerts()for retry rate threshold alertingevents/analytics.pyIssue #224 -- Proactive Quota Polling and Alerting
QuotaPollerConfig+QuotaAlertThresholds:warn_pct=80.0,critical_pct=95.0,poll_interval_seconds=60.0,cooldown_seconds=300.0QuotaPoller: backgroundasyncio.Taskloop,start()/stop()lifecycle,poll_once()for testingmax(requests_pct, tokens_pct)to determine severity; dimensions withlimit=0(unlimited) are skipped(provider, window, level)prevents alert stormsevents/quota.pyTests
tests/unit/budget/test_baseline_store.py-- window eviction, mean computation, empty store returns Nonetests/unit/budget/test_coordination_collector.py-- all 9 metrics, config.enabled=False, selective collection, alert dispatch, metric error isolationtests/unit/budget/test_coordination_collector_properties.py-- Hypothesis: collect always returns CoordinationMetricstests/unit/budget/test_call_classifier.py-- all context combinations, priority orderingtests/unit/budget/test_call_classifier_properties.py-- Hypothesis: any valid context produces a valid LLMCallCategorytests/unit/budget/test_call_analytics.py-- aggregation correctness, alert dispatch, empty records, threshold edgestests/unit/budget/test_call_analytics_properties.py-- Hypothesis: retry_rate in [0,1], cache_hit_rate in [0,1], totals match inputtests/unit/budget/test_quota_poller.py-- no alert below threshold, warning/critical severity, cooldown, cooldown expiry, multiple providers, resiliencetests/unit/providers/test_base_provider.py-- latency injection, retry metadata, retry count reflects attemptstests/unit/providers/test_retry.py-- last_attempt_count / last_retry_reason stateCloses #226
Closes #225
Closes #227
Closes #224