Conversation
Add ternary step classification (correct/neutral/incorrect) with rule-based classifier, accuracy-effort ratio computation, and weak-model-trap warning. - StepQuality enum, StepQualitySignal, AccuracyEffortRatio models - StepQualityClassifier protocol + RuleBasedStepClassifier - compute_accuracy_effort pure function with expected_steps normalization - Wire accuracy_effort_ratio into TaskCompletionMetrics and CostRecord - Event constants: quality.step_classified, accuracy_effort_computed, weak_model_warning - 52 unit tests + 5 Hypothesis property tests Closes #697
Add HealthJudge (sensitive) + TriageFilter (conservative) two-layer health monitoring with NotificationSink delivery. - EscalationTicket, EscalationSeverity, EscalationCause models - HealthJudge: emits tickets on stagnation, error+recovery, quality degradation - TriageFilter: rule-based dismiss LOW, escalate HIGH/CRITICAL, threshold MEDIUM - HealthMonitoringPipeline: compose judge + triage + NotificationSink - Add HEALTH category to NotificationCategory - Event constants: health.ticket_emitted/dismissed/escalated/pipeline_error - 37 unit tests Closes #707
Add trajectory scoring with self-consistency filter, verbalized confidence (VC), trace length (Len), and budget guard. - TrajectoryConfig, CandidateResult, TrajectoryScore models - TrajectoryScorer: score_candidates + select_best with majority-vote consistency - check_trajectory_budget: budget guard for K-candidate sampling - Wire TrajectoryConfig into HybridLoopConfig - Event constants: trajectory.scoring_start/candidate_scored/best_selected/budget_guard_blocked/consistency_filtered - 38 unit tests + 3 Hypothesis property tests Closes #705
Add Amdahl ceiling, straggler gap, token/speedup ratio alert, and O(n^2) message overhead detection. - AmdahlCeiling: S_max = 1/(1-p), recommended_team_size at 90% speedup - StragglerGap: slowest - mean duration, gap ratio - TokenSpeedupRatio: token_multiplier / latency_speedup, alert at 2.0 - MessageOverhead: quadratic growth detection (messages vs n^2 * threshold) - Pure computation functions for all 4 metrics - Extend CoordinationMetrics container (5 to 9 metrics) - Event constants: coordination_metrics domain - 25 new unit tests + 3 Hypothesis property tests Closes #703
- Fix compute_token_speedup_ratio: validate all 4 inputs - Use HEALTH notification category in health pipeline mapping - Add StragglerGap cross-field validator (slowest must exceed mean) - Fix VC docstring: single-value, not per-turn aggregation - Use dict lookup in select_best instead of positional index - Replace hasattr duck-typing with isinstance + warning log - Make AccuracyEffortRatio.accuracy a computed field - Use NotBlankStr for evidence, reason, complexity_gate fields - Add VC degradation logging in trajectory scorer - Update CLAUDE.md: package structure, events, notification categories - Add tests: StragglerGap validator, token validation, HEALTH category Pre-reviewed by 6 agents, 20 findings addressed
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
🧰 Additional context used📓 Path-based instructions (3)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/synthorg/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.py⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (6)📓 Common learnings📚 Learning: 2026-03-19T07:13:44.964ZApplied to files:
📚 Learning: 2026-03-17T06:30:14.180ZApplied to files:
📚 Learning: 2026-04-02T07:18:02.381ZApplied to files:
📚 Learning: 2026-03-19T07:12:14.508ZApplied to files:
📚 Learning: 2026-03-17T22:08:13.456ZApplied to files:
🔇 Additional comments (1)
WalkthroughAdds step-level quality models and utilities (StepQuality enum, StepQualitySignal, AccuracyEffortRatio, RuleBasedStepClassifier, compute_accuracy_effort), a two-layer health monitoring stack (EscalationTicket, EscalationCause/Severity, HealthJudge, TriageFilter, HealthMonitoringPipeline), trajectory scoring and budget guard (TrajectoryConfig, CandidateResult, TrajectoryScore, TrajectoryScorer, check_trajectory_budget), expanded coordination metrics and compute functions (AmdahlCeiling, StragglerGap, TokenSpeedupRatio, MessageOverhead, compute_* helpers), new observability event constants across multiple domains, small schema additions to existing budget/metrics models, and many accompanying unit and property tests. Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive health monitoring and trajectory scoring system for the engine, alongside expanded coordination metrics and step-level quality signals. Key additions include a two-layer health pipeline (Judge and Triage), best-of-K trajectory selection based on verbalized confidence and trace length, and new empirical metrics like Amdahl's Law ceiling and straggler gap. Feedback highlights a critical syntax error in exception handling within the health pipeline and a logic issue in the trajectory consistency filter where plurality is used instead of a true majority. Additionally, the joint scoring mechanism for trajectories may be imbalanced due to the significantly different magnitudes of confidence scores versus token counts.
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
The except MemoryError, RecursionError: syntax is invalid in Python 3 and will raise a SyntaxError at import time. In Python 3, multiple exceptions must be enclosed in a tuple: except (MemoryError, RecursionError):. Additionally, the current implementation fails to catch RecursionError specifically (it would be caught and swallowed by the subsequent except Exception: block) and would incorrectly assign the MemoryError instance to the name RecursionError if the syntax were valid.
| except MemoryError, RecursionError: | |
| raise | |
| except (MemoryError, RecursionError): | |
| raise |
| majority_fp, _ = counter.most_common(1)[0] | ||
|
|
||
| result: dict[int, bool] = {} | ||
| filtered_count = 0 | ||
| for candidate, fps in zip(candidates, fingerprint_sets, strict=True): | ||
| is_consistent = fps == majority_fp | ||
| result[candidate.candidate_index] = is_consistent | ||
| if not is_consistent: | ||
| filtered_count += 1 |
There was a problem hiding this comment.
The current implementation of the self-consistency filter uses a plurality vote with arbitrary tie-breaking, which does not align with the stated "majority-vote" design. In cases where no candidate has more than 50% agreement (e.g., a 1-vs-1 tie or all unique fingerprints), the code arbitrarily marks the first candidate as consistent and others as inconsistent. This introduces bias and potentially filters out valid trajectories. The filter should only mark candidates as inconsistent if a clear majority exists.
| majority_fp, _ = counter.most_common(1)[0] | |
| result: dict[int, bool] = {} | |
| filtered_count = 0 | |
| for candidate, fps in zip(candidates, fingerprint_sets, strict=True): | |
| is_consistent = fps == majority_fp | |
| result[candidate.candidate_index] = is_consistent | |
| if not is_consistent: | |
| filtered_count += 1 | |
| majority_fp, count = counter.most_common(1)[0] | |
| has_majority = count > len(candidates) / 2 | |
| result: dict[int, bool] = {} | |
| filtered_count = 0 | |
| for candidate, fps in zip(candidates, fingerprint_sets, strict=True): | |
| is_consistent = (fps == majority_fp) if has_majority else True | |
| result[candidate.candidate_index] = is_consistent | |
| if not is_consistent: | |
| filtered_count += 1 |
| @property | ||
| def joint_score(self) -> float: | ||
| """Combined score: VC + Len (least-negative wins).""" | ||
| return self.vc_score + self.len_score |
There was a problem hiding this comment.
The joint_score calculation combines vc_score (log-space confidence, typically range -4.6 to 0) and len_score (negative token count, typically range -100 to -2000+). Because of the vast difference in magnitude, the len_score will almost always dominate the selection, making the verbalized confidence signal effectively irrelevant unless the trace lengths are nearly identical. Consider scaling the components (e.g., multiplying vc_score by a factor or normalizing len_score) to ensure both signals meaningfully contribute to the joint score.
There was a problem hiding this comment.
Pull request overview
Adds an “engine intelligence” computation layer: step-level quality signals + accuracy/effort metric, a two-layer health monitoring pipeline (judge + triage + notification), best-of-K trajectory scoring utilities, and expanded coordination metrics from distributed-systems theory.
Changes:
- Introduces new
engine/quality,engine/health, andengine/trajectorysubpackages (models + pure computations + scoring/triage logic) plus new observability event domains. - Extends budget coordination metrics from 5 → 9 metrics (Amdahl ceiling, straggler gap, token/speedup ratio, message overhead) with accompanying tests.
- Wires new fields/config surfaces into existing models (
TaskCompletionMetrics,HybridLoopConfig,NotificationCategory,CostRecord) and expands unit/property tests accordingly.
Reviewed changes
Copilot reviewed 39 out of 42 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/observability/test_events.py | Updates expected observability event domains to include new modules. |
| tests/unit/engine/trajectory/test_scorer.py | Unit tests for trajectory scoring and selection behavior. |
| tests/unit/engine/trajectory/test_scorer_properties.py | Property-based tests for deterministic and bounded trajectory scoring. |
| tests/unit/engine/trajectory/test_models.py | Validation tests for trajectory config/result/score models. |
| tests/unit/engine/trajectory/test_budget_guard.py | Unit tests for K-candidate budget guard logic. |
| tests/unit/engine/trajectory/conftest.py | Shared fixtures for trajectory tests (minimal_context). |
| tests/unit/engine/trajectory/init.py | Marks the trajectory test package. |
| tests/unit/engine/quality/test_models.py | Validation tests for step-quality and accuracy/effort models. |
| tests/unit/engine/quality/test_effort.py | Unit tests for compute_accuracy_effort(). |
| tests/unit/engine/quality/test_effort_properties.py | Property-based tests for accuracy/effort invariants. |
| tests/unit/engine/quality/test_classifier.py | Unit tests for the rule-based step quality classifier. |
| tests/unit/engine/quality/init.py | Marks the quality test package. |
| tests/unit/engine/health/test_triage.py | Unit tests for conservative triage rules. |
| tests/unit/engine/health/test_pipeline.py | End-to-end tests for judge→triage→notification pipeline behavior. |
| tests/unit/engine/health/test_models.py | Validation tests for escalation ticket models/enums. |
| tests/unit/engine/health/test_judge.py | Unit tests for sensitive-layer ticket emission rules. |
| tests/unit/engine/health/init.py | Marks the health test package. |
| tests/unit/budget/test_coordination_metrics.py | Extends coordination metrics tests to cover new metrics. |
| tests/unit/budget/test_coordination_metrics_properties.py | Property-based tests for new coordination metrics. |
| src/synthorg/observability/events/trajectory.py | Adds trajectory-scoring event constants. |
| src/synthorg/observability/events/quality.py | Adds quality-signal event constants. |
| src/synthorg/observability/events/health.py | Adds health-monitoring event constants. |
| src/synthorg/observability/events/coordination_metrics.py | Adds coordination-metrics event constants. |
| src/synthorg/notifications/models.py | Adds HEALTH notification category. |
| src/synthorg/engine/trajectory/scorer.py | Implements self-consistency + VC + length scoring and best-candidate selection. |
| src/synthorg/engine/trajectory/models.py | Adds frozen Pydantic models for trajectory config/results/scores. |
| src/synthorg/engine/trajectory/budget_guard.py | Adds budget guard for best-of-K candidate sampling. |
| src/synthorg/engine/trajectory/init.py | Public exports for trajectory scoring components. |
| src/synthorg/engine/quality/models.py | Adds step-quality and accuracy/effort metric models. |
| src/synthorg/engine/quality/effort.py | Computes accuracy/effort ratio and emits observability events. |
| src/synthorg/engine/quality/classifier.py | Adds classifier protocol + deterministic rule-based implementation. |
| src/synthorg/engine/quality/init.py | Public exports for quality components. |
| src/synthorg/engine/metrics.py | Adds accuracy_effort_ratio to task completion metrics and extracts from metadata. |
| src/synthorg/engine/hybrid_models.py | Adds optional trajectory scoring config to HybridLoopConfig. |
| src/synthorg/engine/health/triage.py | Implements conservative triage filter rules + event logging. |
| src/synthorg/engine/health/pipeline.py | Composes judge + triage + notification sink with best-effort error handling. |
| src/synthorg/engine/health/models.py | Adds escalation ticket models/enums (frozen, metadata snapshotting). |
| src/synthorg/engine/health/judge.py | Implements sensitive health judge ticket emission logic. |
| src/synthorg/engine/health/init.py | Public exports for health monitoring components. |
| src/synthorg/budget/cost_record.py | Adds accuracy_effort_ratio field to per-call cost records. |
| src/synthorg/budget/coordination_metrics.py | Extends coordination metrics models and pure compute functions (5 → 9). |
| CLAUDE.md | Updates component inventory and logging/event constant guidance to include new domains. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Find majority fingerprint set. | ||
| counter: Counter[tuple[str, ...]] = Counter(fingerprint_sets) | ||
| majority_fp, _ = counter.most_common(1)[0] | ||
|
|
||
| result: dict[int, bool] = {} | ||
| filtered_count = 0 | ||
| for candidate, fps in zip(candidates, fingerprint_sets, strict=True): | ||
| is_consistent = fps == majority_fp | ||
| result[candidate.candidate_index] = is_consistent |
There was a problem hiding this comment.
The self-consistency filter treats the most-common fingerprint set as a "majority" even when there is no strict majority (e.g., K=2 with two different fingerprint sets, or a tie across multiple sets). In those tie cases, this will arbitrarily mark only the first fingerprint set as consistent and can force selection of a worse candidate because select_best() prefers consistent candidates. Consider detecting ties (no count > K/2, or multiple sets share the top count) and disabling filtering / marking all candidates consistent in that case.
| logger.warning( | ||
| "execution.metrics.accuracy_effort_unexpected_type", | ||
| type=type(ae_data).__name__, | ||
| task_id=result.task_id, | ||
| ) |
There was a problem hiding this comment.
This log call uses a raw string event name ("execution.metrics.accuracy_effort_unexpected_type"). Repo logging conventions require importing event constants from synthorg.observability.events.* (see CLAUDE.md logging section). Please add an appropriate constant (e.g., under observability/events/execution.py or a dedicated events module) and use it here.
| accuracy_effort_ratio: float | None = Field( | ||
| default=None, | ||
| ge=0.0, | ||
| description=( | ||
| "Accuracy-effort ratio for the task this call belongs to " | ||
| "(populated at task completion when quality signals are available)" | ||
| ), |
There was a problem hiding this comment.
CostRecord is an immutable per-API-call record, but the new accuracy_effort_ratio field is documented as "populated at task completion". As-is, there is no code path constructing CostRecord with this field set, and the append-only/frozen design makes post-hoc backfilling difficult. Either implement a concrete mechanism to populate it (e.g., include the ratio at record creation time via a task-level cache, or store this metric in a task-level summary model instead of per-call records), or update the field docs to reflect the actual behavior.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1099 +/- ##
==========================================
+ Coverage 89.29% 89.31% +0.02%
==========================================
Files 770 787 +17
Lines 45719 46091 +372
Branches 4597 4624 +27
==========================================
+ Hits 40826 41168 +342
- Misses 4048 4070 +22
- Partials 845 853 +8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 14
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/budget/coordination_metrics.py`:
- Around line 583-585: The guard clauses that raise ValueError for invalid
inputs (e.g., the check on agent_durations that raises
ValueError("agent_durations must not be empty") and the similar guards around
lines 620-631) must log the failure first; update those checks to call the
module/class logger (e.g., logger or self.logger as used elsewhere in this
module) with WARNING or ERROR including contextual details (the variable name,
its value/len, and any relevant ids) immediately before raising the ValueError
so the runtime telemetry captures the cause and context.
In `@src/synthorg/budget/cost_record.py`:
- Around line 50-57: Add the new accuracy_effort_ratio field to the class
docstring Attributes section so the public API is fully documented: update the
class docstring (the Attributes block in the class containing
accuracy_effort_ratio) to include a line describing accuracy_effort_ratio: its
type (float | None), meaning ("Accuracy-effort ratio for the task this call
belongs to"), when it is populated ("populated at task completion when quality
signals are available"), and any constraints (>= 0.0). Ensure the new entry is
placed after call_category to match the fields ordering.
In `@src/synthorg/engine/health/judge.py`:
- Around line 37-43: Validate the quality_degradation_threshold in
Judge.__init__: when the quality_degradation_threshold argument (and thus
self._quality_threshold) is non-positive (<= 0) either raise a ValueError with a
clear message or coerce it to a sane minimum (e.g., 1) so the downstream
escalation check that uses self._quality_threshold (the escalation check around
line ~101) does not produce persistent false positives; update the __init__ to
perform this guard against non-positive values and document the behavior in the
constructor's docstring or error message.
In `@src/synthorg/engine/health/pipeline.py`:
- Around line 91-109: The current process() catch-all swallows exceptions and
returns None, losing "escalated ticket" state; change flow to call and assign
the result = await self._process_inner(...), then only catch and log non-fatal
exceptions without overwriting or dropping an already-emitted escalation: if an
exception occurs during notification delivery, log HEALTH_PIPELINE_ERROR (as
done) but either re-raise the exception or return the previously obtained result
when result indicates an escalated ticket (use whatever flag/property on result
denotes escalation), ensuring MemoryError/RecursionError still propagate and
that process() does not return None for cases where a ticket was already
emitted.
In `@src/synthorg/engine/health/triage.py`:
- Around line 76-83: Change the dismissal log calls from DEBUG to INFO: replace
logger.debug(...) that emits HEALTH_TICKET_DISMISSED with logger.info(...) and
keep the same structured fields (ticket_id=ticket.id,
severity=ticket.severity.value, cause=ticket.cause.value, reason=...) so
dismissal state transitions are logged at INFO; apply the same change to the
other dismissal logging block around the HEALTH_TICKET_DISMISSED usage (the
similar logger.debug call near the 101-109 region) so all dismissal transitions
follow the “state transitions must log at INFO” guideline.
- Around line 41-48: The constructor for the triage class accepts
stall_duration_threshold and steps_threshold without validation; add boundary
checks in __init__ to validate that stall_duration_threshold is a non-negative
number (float/int) and steps_threshold is a non-negative integer (or at least >=
0), and raise a ValueError with a clear message if either is invalid so the
class fails fast; update assignment to self._stall_threshold and
self._steps_threshold only after these checks and include the parameter names
(stall_duration_threshold, steps_threshold) and attribute names
(_stall_threshold, _steps_threshold) in the error text for easier debugging.
In `@src/synthorg/engine/metrics.py`:
- Around line 70-76: The class docstring's Attributes section is missing
documentation for the new field accuracy_effort_ratio; update the class
docstring (the Google-style "Attributes" block for the class that defines
accuracy_effort_ratio) to include a line documenting accuracy_effort_ratio: type
float | None and a brief description like "Accuracy-effort ratio from step-level
quality signals (None when quality signals are unavailable)", matching the Field
description and style of the other attributes.
- Around line 120-124: Replace the hardcoded event string in the logger.warning
call in metrics.py with a constant from synthorg.observability.events: add a new
constant (e.g., EXECUTION_METRICS_ACCURACY_EFFORT_UNEXPECTED_TYPE or
EXECUTION_METRICS_UNEXPECTED_TYPE) in an appropriate events module (like
events/execution.py or events/metrics.py) inside synthorg.observability.events,
import that constant into src/synthorg/engine/metrics.py, and use it in the
logger.warning(...) call (replacing the literal
"execution.metrics.accuracy_effort_unexpected_type") while keeping the existing
kwargs (type=..., task_id=...).
In `@src/synthorg/engine/quality/effort.py`:
- Around line 43-45: The empty-input guard that checks "if not signals" and
raises ValueError currently lacks logging; before raising the ValueError, emit a
warning or error log (e.g., logger.warning or logger.error) including the same
message ("Cannot compute accuracy-effort ratio from empty signals") and
contextual data (the variable name signals, caller context or any request id
available) so the error path is recorded; update the function containing the "if
not signals" branch to log at WARNING/ERROR with context and then raise the
ValueError.
In `@src/synthorg/engine/trajectory/budget_guard.py`:
- Around line 16-42: check_trajectory_budget lacks validation for k and margin;
ensure k is an integer >= 1 and margin is within [0.0, 1.0) (reject margin < 0.0
or margin >= 1.0) before computing required/available, and if invalid call the
same TRAJECTORY_BUDGET_GUARD_BLOCKED logger path (include remaining_budget,
estimated_step_cost, k, margin and a reason like "invalid k" or "invalid
margin") and return False; update the validation at the start of
check_trajectory_budget to enforce these constraints and short-circuit on
failure.
In `@src/synthorg/engine/trajectory/scorer.py`:
- Around line 61-63: The empty-candidate branch in scorer.py (the "if not
candidates" check) raises ValueError without logging; before raising, log the
error at WARNING or ERROR with context (e.g., function/method name, candidates
length/type, any relevant IDs) using the module or instance logger (e.g., logger
or self.logger) and then raise the ValueError; if no logger exists, obtain one
via logging.getLogger(__name__) to satisfy the guideline that all error paths
must log before raising.
- Around line 165-173: The current majority-vote uses Counter.most_common(1)
which can pick an arbitrary tie and bias select_best(); change the logic in
scorer.py around the fingerprint majority calculation to call
counter.most_common(2), compute whether the top count > len(candidates) // 2
(strict >50% of candidates), and only mark candidates whose fps == majority_fp
as consistent when that strict majority exists—otherwise set all entries in
result (keys are candidate.candidate_index) to False to indicate no clear
majority; additionally, add a logger.warning(...) immediately before the
existing raise ValueError in the empty-candidate check (the check referenced at
line 61) so the error path is logged.
In `@tests/unit/budget/test_coordination_metrics_properties.py`:
- Around line 25-26: Remove the explicit Hypothesis decorators that hardcode
max_examples and derandomize so the test-suite can honor the HYPOTHESIS_PROFILE
from tests/conftest.py; specifically, delete the `@settings`(max_examples=10,
derandomize=True) decorators from the property tests (e.g.,
test_max_speedup_monotonically_increases and the three other property-test
functions that currently have those decorators) so they use the configured
profile/default settings instead of forcing 10 examples and deterministic
seeding.
In `@tests/unit/engine/trajectory/test_scorer_properties.py`:
- Around line 84-131: These property tests hardcode Hypothesis settings
(max_examples=10, derandomize=True) which override the global profile; remove
the explicit settings arguments from the `@settings` decorator in
test_selection_is_deterministic, test_joint_score_ordering_consistent (and the
other property tests like test_len_score_always_non_positive) so they rely on
the project-wide Hypothesis profile configured in tests/conftest.py; keep the
existing `@given` and strategy definitions and the use of
TrajectoryScorer.select_best and TrajectoryScorer.score_candidates unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1c4a8e75-cbc4-489a-b796-8f47717da610
📒 Files selected for processing (42)
CLAUDE.mdsrc/synthorg/budget/coordination_metrics.pysrc/synthorg/budget/cost_record.pysrc/synthorg/engine/health/__init__.pysrc/synthorg/engine/health/judge.pysrc/synthorg/engine/health/models.pysrc/synthorg/engine/health/pipeline.pysrc/synthorg/engine/health/triage.pysrc/synthorg/engine/hybrid_models.pysrc/synthorg/engine/metrics.pysrc/synthorg/engine/quality/__init__.pysrc/synthorg/engine/quality/classifier.pysrc/synthorg/engine/quality/effort.pysrc/synthorg/engine/quality/models.pysrc/synthorg/engine/trajectory/__init__.pysrc/synthorg/engine/trajectory/budget_guard.pysrc/synthorg/engine/trajectory/models.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/notifications/models.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/quality.pysrc/synthorg/observability/events/trajectory.pytests/unit/budget/test_coordination_metrics.pytests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/health/__init__.pytests/unit/engine/health/test_judge.pytests/unit/engine/health/test_models.pytests/unit/engine/health/test_pipeline.pytests/unit/engine/health/test_triage.pytests/unit/engine/quality/__init__.pytests/unit/engine/quality/test_classifier.pytests/unit/engine/quality/test_effort.pytests/unit/engine/quality/test_effort_properties.pytests/unit/engine/quality/test_models.pytests/unit/engine/trajectory/__init__.pytests/unit/engine/trajectory/conftest.pytests/unit/engine/trajectory/test_budget_guard.pytests/unit/engine/trajectory/test_models.pytests/unit/engine/trajectory/test_scorer.pytests/unit/engine/trajectory/test_scorer_properties.pytests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Agent
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax:except A, B:(no parentheses)—ruff enforces this on Python 3.14.
Type hints: all public functions required, mypy strict mode enforced.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Variable name: always logger (not _logger, not log).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
Files:
src/synthorg/notifications/models.pysrc/synthorg/engine/trajectory/__init__.pytests/unit/observability/test_events.pysrc/synthorg/engine/metrics.pytests/unit/engine/trajectory/test_budget_guard.pysrc/synthorg/budget/cost_record.pytests/unit/engine/trajectory/conftest.pysrc/synthorg/observability/events/quality.pytests/unit/engine/health/test_triage.pysrc/synthorg/engine/health/__init__.pysrc/synthorg/engine/quality/__init__.pytests/unit/engine/health/test_models.pysrc/synthorg/observability/events/health.pytests/unit/engine/health/test_pipeline.pytests/unit/budget/test_coordination_metrics_properties.pysrc/synthorg/observability/events/coordination_metrics.pytests/unit/engine/trajectory/test_models.pysrc/synthorg/engine/hybrid_models.pytests/unit/engine/quality/test_effort.pytests/unit/engine/quality/test_effort_properties.pytests/unit/budget/test_coordination_metrics.pysrc/synthorg/engine/trajectory/budget_guard.pysrc/synthorg/observability/events/trajectory.pytests/unit/engine/trajectory/test_scorer_properties.pytests/unit/engine/quality/test_classifier.pysrc/synthorg/engine/quality/effort.pytests/unit/engine/quality/test_models.pysrc/synthorg/engine/health/triage.pytests/unit/engine/health/test_judge.pysrc/synthorg/engine/trajectory/scorer.pytests/unit/engine/trajectory/test_scorer.pysrc/synthorg/engine/trajectory/models.pysrc/synthorg/engine/health/judge.pysrc/synthorg/engine/health/pipeline.pysrc/synthorg/engine/quality/models.pysrc/synthorg/engine/quality/classifier.pysrc/synthorg/engine/health/models.pysrc/synthorg/budget/coordination_metrics.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(name)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr).
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events. import EVENT_CONSTANT
Structured kwargs: always logger.info(EVENT, key=value)—never logger.info("msg %s", val)
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError.
Non-retryable errors raise immediately without retry.
RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.
Rate limiter respects RateLimitError.retry_after from providers—automatically pauses future requests.
Files:
src/synthorg/notifications/models.pysrc/synthorg/engine/trajectory/__init__.pysrc/synthorg/engine/metrics.pysrc/synthorg/budget/cost_record.pysrc/synthorg/observability/events/quality.pysrc/synthorg/engine/health/__init__.pysrc/synthorg/engine/quality/__init__.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/engine/hybrid_models.pysrc/synthorg/engine/trajectory/budget_guard.pysrc/synthorg/observability/events/trajectory.pysrc/synthorg/engine/quality/effort.pysrc/synthorg/engine/health/triage.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/engine/trajectory/models.pysrc/synthorg/engine/health/judge.pysrc/synthorg/engine/health/pipeline.pysrc/synthorg/engine/quality/models.pysrc/synthorg/engine/quality/classifier.pysrc/synthorg/engine/health/models.pysrc/synthorg/budget/coordination_metrics.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/notifications/models.pysrc/synthorg/engine/trajectory/__init__.pysrc/synthorg/engine/metrics.pysrc/synthorg/budget/cost_record.pysrc/synthorg/observability/events/quality.pysrc/synthorg/engine/health/__init__.pysrc/synthorg/engine/quality/__init__.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/engine/hybrid_models.pysrc/synthorg/engine/trajectory/budget_guard.pysrc/synthorg/observability/events/trajectory.pysrc/synthorg/engine/quality/effort.pysrc/synthorg/engine/health/triage.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/engine/trajectory/models.pysrc/synthorg/engine/health/judge.pysrc/synthorg/engine/health/pipeline.pysrc/synthorg/engine/quality/models.pysrc/synthorg/engine/quality/classifier.pysrc/synthorg/engine/health/models.pysrc/synthorg/budget/coordination_metrics.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.
Async: asyncio_mode = "auto"—no manual@pytest.mark.asyncioneeded.
Timeout: 30 seconds per test (global in pyproject.toml—do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer@pytest.mark.parametrizefor testing similar cases.
Tests must use test-provider, test-small-001, etc. (not real vendor names).
Property-based testing: Python uses Hypothesis (@given+@settings).
Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Controlled via HYPOTHESIS_PROFILE env var.
Never skip, dismiss, or ignore flaky tests—always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).
Files:
tests/unit/observability/test_events.pytests/unit/engine/trajectory/test_budget_guard.pytests/unit/engine/trajectory/conftest.pytests/unit/engine/health/test_triage.pytests/unit/engine/health/test_models.pytests/unit/engine/health/test_pipeline.pytests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/trajectory/test_models.pytests/unit/engine/quality/test_effort.pytests/unit/engine/quality/test_effort_properties.pytests/unit/budget/test_coordination_metrics.pytests/unit/engine/trajectory/test_scorer_properties.pytests/unit/engine/quality/test_classifier.pytests/unit/engine/quality/test_models.pytests/unit/engine/health/test_judge.pytests/unit/engine/trajectory/test_scorer.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/observability/test_events.pytests/unit/engine/trajectory/test_budget_guard.pytests/unit/engine/trajectory/conftest.pytests/unit/engine/health/test_triage.pytests/unit/engine/health/test_models.pytests/unit/engine/health/test_pipeline.pytests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/trajectory/test_models.pytests/unit/engine/quality/test_effort.pytests/unit/engine/quality/test_effort_properties.pytests/unit/budget/test_coordination_metrics.pytests/unit/engine/trajectory/test_scorer_properties.pytests/unit/engine/quality/test_classifier.pytests/unit/engine/quality/test_models.pytests/unit/engine/health/test_judge.pytests/unit/engine/trajectory/test_scorer.py
🧠 Learnings (52)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
src/synthorg/engine/trajectory/__init__.pysrc/synthorg/engine/health/__init__.pysrc/synthorg/engine/quality/__init__.pysrc/synthorg/engine/trajectory/models.pyCLAUDE.md
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/quality.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/quality.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/quality.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/quality.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/trajectory.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/quality.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/quality.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/trajectory.pyCLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/quality.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls
Applied to files:
tests/unit/observability/test_events.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/quality.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
tests/unit/observability/test_events.pysrc/synthorg/observability/events/quality.pysrc/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
tests/unit/engine/trajectory/test_budget_guard.pysrc/synthorg/engine/trajectory/budget_guard.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
tests/unit/engine/trajectory/test_budget_guard.pysrc/synthorg/engine/trajectory/budget_guard.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging
Applied to files:
src/synthorg/observability/events/quality.pysrc/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Applied to files:
src/synthorg/engine/health/__init__.pysrc/synthorg/engine/health/pipeline.pyCLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/security/**/*.py : Security package (security/): SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)
Applied to files:
src/synthorg/engine/health/__init__.pysrc/synthorg/engine/health/triage.pyCLAUDE.md
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/health.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to tests/**/*.py : Property-based testing: Python uses Hypothesis (given + settings).
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/quality/test_effort_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/quality/test_effort_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-31T16:58:29.857Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing: `given` + `settings`; dev profile has 1000 examples (via `HYPOTHESIS_PROFILE=dev env var`), ci profile (default) has 50
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/quality/test_effort_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing in Python with `given` + `settings`. Control via `HYPOTHESIS_PROFILE` env var (dev: 1000 examples, ci: 200 examples).
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/quality/test_effort_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/quality/test_effort_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state
Applied to files:
tests/unit/engine/trajectory/test_models.pysrc/synthorg/engine/trajectory/models.pysrc/synthorg/engine/quality/models.pysrc/synthorg/engine/health/models.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves
Applied to files:
tests/unit/engine/trajectory/test_models.pysrc/synthorg/engine/trajectory/models.pysrc/synthorg/engine/quality/models.pysrc/synthorg/engine/health/models.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/budget/test_coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: CI runs 10 deterministic examples per property test (derandomize=True—same inputs every run, no flakes).
Applied to files:
tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-31T14:28:28.895Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:28:28.895Z
Learning: Applies to src/synthorg/**/*.py : Maintain 80% minimum test coverage (enforced in CI)
Applied to files:
tests/unit/engine/quality/test_models.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/security/**/*.py : Security module includes SecOps agent, rule engine (soft-allow/hard-deny), audit log, output scanner, risk classifier, autonomy levels (4 strategies), timeout policies.
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/trajectory/models.pysrc/synthorg/engine/quality/models.pysrc/synthorg/engine/health/models.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/core/**/*.py : Core module must contain shared domain models, base classes, resilience config (RetryConfig, RateLimiterConfig)
Applied to files:
src/synthorg/engine/trajectory/models.pysrc/synthorg/engine/quality/models.py
📚 Learning: 2026-04-01T09:37:49.451Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:37:49.451Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with `model_copy(update=...)` for runtime state that evolves
Applied to files:
src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`
Applied to files:
src/synthorg/engine/trajectory/models.pysrc/synthorg/engine/quality/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.
Applied to files:
src/synthorg/engine/trajectory/models.pysrc/synthorg/engine/quality/models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/engine/trajectory/models.pysrc/synthorg/engine/quality/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Applied to files:
src/synthorg/engine/trajectory/models.pysrc/synthorg/engine/health/models.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)
Applied to files:
src/synthorg/engine/health/models.pyCLAUDE.md
📚 Learning: 2026-04-06T16:35:12.934Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: All project conventions, commands, and standards are defined in CLAUDE.md - refer to it for project structure, package layout, code conventions, quick commands, git workflow, testing standards, design specifications, logging, resilience, and security patterns
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine must provide: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection), promotion/demotion
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/api/**/*.py : REST API: Litestar framework, controllers with guards, channels for WebSocket, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint. RFC 9457 structured errors (ErrorCategory, ErrorCode, ErrorDetail, ProblemDetail, CATEGORY_TITLES, category_title, category_type_uri, content negotiation).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have `from synthorg.observability import get_logger` followed by `logger = get_logger(__name__)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import logger via `from synthorg.observability import get_logger` and initialize with `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Applied to files:
CLAUDE.md
| accuracy_effort_ratio: float | None = Field( | ||
| default=None, | ||
| ge=0.0, | ||
| description=( | ||
| "Accuracy-effort ratio for the task this call belongs to " | ||
| "(populated at task completion when quality signals are available)" | ||
| ), | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Update the class docstring to document the new field.
The accuracy_effort_ratio field is not documented in the class docstring's Attributes section (lines 23-34). Google-style docstrings require documenting all public attributes.
📝 Proposed docstring update
Add to the Attributes section (after call_category):
call_category: Optional LLM call category (productive,
coordination, system, embedding).
+ accuracy_effort_ratio: Accuracy-effort ratio for the task
+ (populated at task completion when quality signals are
+ available, ``None`` otherwise).
"""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/budget/cost_record.py` around lines 50 - 57, Add the new
accuracy_effort_ratio field to the class docstring Attributes section so the
public API is fully documented: update the class docstring (the Attributes block
in the class containing accuracy_effort_ratio) to include a line describing
accuracy_effort_ratio: its type (float | None), meaning ("Accuracy-effort ratio
for the task this call belongs to"), when it is populated ("populated at task
completion when quality signals are available"), and any constraints (>= 0.0).
Ensure the new entry is placed after call_category to match the fields ordering.
| # Find majority fingerprint set. | ||
| counter: Counter[tuple[str, ...]] = Counter(fingerprint_sets) | ||
| majority_fp, _ = counter.most_common(1)[0] | ||
|
|
||
| result: dict[int, bool] = {} | ||
| filtered_count = 0 | ||
| for candidate, fps in zip(candidates, fingerprint_sets, strict=True): | ||
| is_consistent = fps == majority_fp | ||
| result[candidate.candidate_index] = is_consistent |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify implementation pattern in scorer and inspect tie-sensitive logic.
rg -n -C3 'most_common\(1\)\[0\]|most_common\(2\)|is_consistent = fps == majority_fp|consistent_scores = \[s for s in scores if s.consistent\]' src/synthorg/engine/trajectory/scorer.py
# Demonstrate Counter tie behavior used by this pattern.
python - <<'PY'
from collections import Counter
votes = [("fp_a",), ("fp_b",)] # tie
c = Counter(votes)
winner, count = c.most_common(1)[0]
print("most_common(1):", winner, count)
print("flags:", [v == winner for v in votes])
PYRepository: Aureliolo/synthorg
Length of output: 896
🏁 Script executed:
cat -n src/synthorg/engine/trajectory/scorer.py | head -80Repository: Aureliolo/synthorg
Length of output: 3038
🏁 Script executed:
# Get lines around _check_consistency to see full function
sed -n '155,190p' src/synthorg/engine/trajectory/scorer.py | cat -nRepository: Aureliolo/synthorg
Length of output: 1515
🏁 Script executed:
# Check score_candidates for empty-input error path
rg -n -A10 'def score_candidates' src/synthorg/engine/trajectory/scorer.py | head -30Repository: Aureliolo/synthorg
Length of output: 410
🏁 Script executed:
# Verify if there's any logging in the file and check the import structure
rg -n 'from synthorg.observability|import logging|logger' src/synthorg/engine/trajectory/scorer.pyRepository: Aureliolo/synthorg
Length of output: 359
Tie votes are treated as arbitrary majority, biasing candidate selection.
most_common(1) picks one tied fingerprint set arbitrarily, marking one candidate "consistent" and others not. This conflicts with the docstring promise of "Majority-vote" and forces select_best() at line 123 to choose based on input order rather than actual vote clarity.
Additionally, the empty-candidate check at line 61 raises ValueError without logging, violating the error-path logging requirement.
Fixes needed:
- Detect ties (
most_common(2)) and apply strict majority threshold (> 50% of candidates) before marking consistency. If no clear majority, mark all candidates inconsistent. - Add
logger.warning()before theraise ValueErrorat line 61.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/engine/trajectory/scorer.py` around lines 165 - 173, The current
majority-vote uses Counter.most_common(1) which can pick an arbitrary tie and
bias select_best(); change the logic in scorer.py around the fingerprint
majority calculation to call counter.most_common(2), compute whether the top
count > len(candidates) // 2 (strict >50% of candidates), and only mark
candidates whose fps == majority_fp as consistent when that strict majority
exists—otherwise set all entries in result (keys are candidate.candidate_index)
to False to indicate no clear majority; additionally, add a logger.warning(...)
immediately before the existing raise ValueError in the empty-candidate check
(the check referenced at line 61) so the error path is logged.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
src/synthorg/engine/health/triage.py (1)
47-54:⚠️ Potential issue | 🟡 MinorThreshold validation added; missing log before raising.
The validation for non-negative thresholds has been added per the prior review. However, the
ValueErroris raised without logging context first.As per coding guidelines: "All error paths must log at WARNING or ERROR with context before raising."
🛡️ Proposed fix to add logging before raising
def __init__( self, *, stall_duration_threshold: float = (_MEDIUM_STALL_DURATION_THRESHOLD), steps_threshold: int = _MEDIUM_STEPS_THRESHOLD, ) -> None: if stall_duration_threshold < 0: msg = ( f"stall_duration_threshold must be >= 0, got {stall_duration_threshold}" ) + logger.warning( + HEALTH_TICKET_DISMISSED, + error=msg, + stall_duration_threshold=stall_duration_threshold, + ) raise ValueError(msg) if steps_threshold < 0: msg = f"steps_threshold must be >= 0, got {steps_threshold}" + logger.warning( + HEALTH_TICKET_DISMISSED, + error=msg, + steps_threshold=steps_threshold, + ) raise ValueError(msg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/health/triage.py` around lines 47 - 54, Add a log call before each ValueError is raised for the threshold checks: when validating stall_duration_threshold and steps_threshold (the variables referenced in the if blocks), emit a WARNING or ERROR with context (including the invalid value and which threshold failed) using the module logger (existing logger or logging.getLogger(__name__)), then raise the ValueError as before; update the checks around stall_duration_threshold and steps_threshold in triage.py to log the problem before raising.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/budget/coordination_metrics.py`:
- Around line 570-599: compute_straggler_gap currently only rejects an empty
sequence but must validate each sample before aggregation: iterate over
agent_durations and ensure each agent_id is a non-blank identifier (use
NotBlankStr from core.types or enforce the repo's NotBlankStr contract) and each
duration is a non-negative float; on the first invalid entry log the same
COORD_METRICS_VALIDATION_ERROR with parameter="agent_durations" and a
descriptive error message then raise ValueError, only after this per-item
validation proceed to compute slowest_id/slowest_dur (from max(...)) and
mean_dur; update signatures or local checks to treat agent_id as
NotBlankStr-equivalent to keep public ID types aligned.
- Around line 628-663: The current guards only check <= 0 so NaN/Inf slip
through; update the validation for tokens_mas, tokens_sas, duration_mas, and
duration_sas to first ensure each value is finite using math.isfinite(...) and
log the same COORD_METRICS_VALIDATION_ERROR with parameter/name and value before
raising ValueError (e.g., "tokens_mas must be finite and positive"); keep the
existing <= 0 checks but combine or sequence them so non-finite values trigger
the logger+ValueError before constructing TokenSpeedupRatio or any Pydantic
model.
---
Duplicate comments:
In `@src/synthorg/engine/health/triage.py`:
- Around line 47-54: Add a log call before each ValueError is raised for the
threshold checks: when validating stall_duration_threshold and steps_threshold
(the variables referenced in the if blocks), emit a WARNING or ERROR with
context (including the invalid value and which threshold failed) using the
module logger (existing logger or logging.getLogger(__name__)), then raise the
ValueError as before; update the checks around stall_duration_threshold and
steps_threshold in triage.py to log the problem before raising.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2acef51b-5b17-4082-9f12-408fe6ee8284
📒 Files selected for processing (17)
src/synthorg/budget/coordination_metrics.pysrc/synthorg/budget/cost_record.pysrc/synthorg/engine/health/judge.pysrc/synthorg/engine/health/pipeline.pysrc/synthorg/engine/health/triage.pysrc/synthorg/engine/metrics.pysrc/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/budget_guard.pysrc/synthorg/engine/trajectory/models.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/observability/events/execution.pytests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/health/test_pipeline.pytests/unit/engine/trajectory/test_models.pytests/unit/engine/trajectory/test_scorer.pytests/unit/engine/trajectory/test_scorer_properties.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax:except A, B:(no parentheses)—ruff enforces this on Python 3.14.
Type hints: all public functions required, mypy strict mode enforced.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Variable name: always logger (not _logger, not log).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
Files:
src/synthorg/observability/events/execution.pysrc/synthorg/budget/cost_record.pysrc/synthorg/engine/metrics.pytests/unit/budget/test_coordination_metrics_properties.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/engine/trajectory/budget_guard.pytests/unit/engine/health/test_pipeline.pysrc/synthorg/engine/quality/effort.pytests/unit/engine/trajectory/test_models.pytests/unit/engine/trajectory/test_scorer.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/engine/health/triage.pysrc/synthorg/engine/health/pipeline.pysrc/synthorg/engine/health/judge.pysrc/synthorg/engine/trajectory/models.pysrc/synthorg/budget/coordination_metrics.pytests/unit/engine/trajectory/test_scorer_properties.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(name)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr).
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events. import EVENT_CONSTANT
Structured kwargs: always logger.info(EVENT, key=value)—never logger.info("msg %s", val)
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError.
Non-retryable errors raise immediately without retry.
RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.
Rate limiter respects RateLimitError.retry_after from providers—automatically pauses future requests.
Files:
src/synthorg/observability/events/execution.pysrc/synthorg/budget/cost_record.pysrc/synthorg/engine/metrics.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/engine/trajectory/budget_guard.pysrc/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/engine/health/triage.pysrc/synthorg/engine/health/pipeline.pysrc/synthorg/engine/health/judge.pysrc/synthorg/engine/trajectory/models.pysrc/synthorg/budget/coordination_metrics.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/events/execution.pysrc/synthorg/budget/cost_record.pysrc/synthorg/engine/metrics.pysrc/synthorg/observability/events/coordination_metrics.pysrc/synthorg/engine/trajectory/budget_guard.pysrc/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/engine/health/triage.pysrc/synthorg/engine/health/pipeline.pysrc/synthorg/engine/health/judge.pysrc/synthorg/engine/trajectory/models.pysrc/synthorg/budget/coordination_metrics.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow.
Async: asyncio_mode = "auto"—no manual@pytest.mark.asyncioneeded.
Timeout: 30 seconds per test (global in pyproject.toml—do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer@pytest.mark.parametrizefor testing similar cases.
Tests must use test-provider, test-small-001, etc. (not real vendor names).
Property-based testing: Python uses Hypothesis (@given+@settings).
Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Controlled via HYPOTHESIS_PROFILE env var.
Never skip, dismiss, or ignore flaky tests—always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).
Files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/health/test_pipeline.pytests/unit/engine/trajectory/test_models.pytests/unit/engine/trajectory/test_scorer.pytests/unit/engine/trajectory/test_scorer_properties.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/health/test_pipeline.pytests/unit/engine/trajectory/test_models.pytests/unit/engine/trajectory/test_scorer.pytests/unit/engine/trajectory/test_scorer_properties.py
🧠 Learnings (69)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Quality tier (Sonnet-class) include: code-reviewer, python-reviewer, frontend-reviewer, go-reviewer, conventions-enforcer, logging-audit, resilience-audit, and api-contract-drift
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls
Applied to files:
src/synthorg/engine/metrics.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging
Applied to files:
src/synthorg/engine/metrics.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Use event constants from `synthorg.observability.events.<domain>` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly and log with structured kwargs: `logger.info(EVENT, key=value)`, never interpolated strings
Applied to files:
src/synthorg/engine/metrics.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/engine/metrics.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals
Applied to files:
src/synthorg/engine/metrics.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Applied to files:
src/synthorg/engine/metrics.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/engine/metrics.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.
Applied to files:
src/synthorg/engine/metrics.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/engine/metrics.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Applied to files:
src/synthorg/engine/metrics.pysrc/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-31T16:58:29.857Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing: `given` + `settings`; dev profile has 1000 examples (via `HYPOTHESIS_PROFILE=dev env var`), ci profile (default) has 50
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing in Python with `given` + `settings`. Control via `HYPOTHESIS_PROFILE` env var (dev: 1000 examples, ci: 200 examples).
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to tests/**/*.py : Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Controlled via HYPOTHESIS_PROFILE env var.
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to tests/**/*.py : Property-based testing: Python uses Hypothesis (given + settings).
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-31T16:58:29.857Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Run property tests with `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties`
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.pytests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/budget/test_coordination_metrics_properties.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)
Applied to files:
src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module
Applied to files:
src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
src/synthorg/engine/trajectory/budget_guard.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
src/synthorg/engine/trajectory/budget_guard.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit
Applied to files:
src/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions.
Applied to files:
src/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-31T20:07:03.035Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:07:03.035Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Applied to files:
src/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.
Applied to files:
src/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/scorer.pysrc/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/engine/quality/effort.pysrc/synthorg/engine/trajectory/scorer.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state
Applied to files:
tests/unit/engine/trajectory/test_models.pysrc/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves
Applied to files:
tests/unit/engine/trajectory/test_models.pysrc/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T20:14:00.937Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:14:00.937Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-01T14:22:06.315Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T14:22:06.315Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO level
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : All state transitions must log at INFO.
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions.
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising; log all state transitions at INFO; log object creation/internal flow/entry-exit at DEBUG
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to **/*.py : Use `except A, B:` (no parentheses) per PEP 758 exception syntax on Python 3.14
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (without parentheses) per PEP 758 for exception handling in Python 3.14
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (no parentheses) for exception handling — PEP 758 exception syntax enforced by ruff on Python 3.14
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Use PEP 758 except syntax with `except A, B:` (no parentheses) for multiple exceptions—ruff enforces this on Python 3.14.
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses)—ruff enforces this on Python 3.14.
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) — enforced by ruff on Python 3.14
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow exceptions
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Handle errors explicitly—never silently swallow exceptions.
Applied to files:
src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-04-01T09:37:49.451Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:37:49.451Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with `model_copy(update=...)` for runtime state that evolves
Applied to files:
src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`
Applied to files:
src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to **/*.py : Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.
Applied to files:
src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/core/**/*.py : Core module must contain shared domain models, base classes, resilience config (RetryConfig, RateLimiterConfig)
Applied to files:
src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: CI runs 10 deterministic examples per property test (derandomize=True—same inputs every run, no flakes).
Applied to files:
tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`
Applied to files:
tests/unit/engine/trajectory/test_scorer_properties.py
🔇 Additional comments (27)
src/synthorg/engine/trajectory/budget_guard.py (2)
34-47: Validation forkandmarginhas been added as requested.The input validation now properly guards against
k < 1andmarginoutside[0.0, 1.0], logging warnings with context before returningFalse. This addresses the prior review feedback.
16-73: Function exported but not yet wired—confirm follow-up PR scope.
check_trajectory_budgetis correctly exported from the package but has no callers in the current codebase. Per PR objectives, integration into the K-candidate sampling pipeline is deferred to a follow-up PR.src/synthorg/engine/quality/effort.py (2)
43-50: Empty-input error path now logs before raising.The
ValueErrorfor empty signals is now preceded by a warning log with context (signal_count=0and the error message). This addresses the prior review feedback.
25-92: LGTM!The implementation correctly:
- Validates non-empty input with proper logging before raising
- Counts steps by quality category
- Handles
expected_steps <= 0by falling back to actual total- Logs the computed metrics at INFO level
- Detects the weak-model trap (high accuracy with very few steps)
The function aligns with the
AccuracyEffortRatiomodel requirements (all 5 required fields provided).src/synthorg/engine/metrics.py (3)
41-43: Docstring updated to document the newaccuracy_effort_ratiofield.The class docstring now includes the new field with appropriate description. This addresses the prior review feedback.
124-128: Event constant now used instead of string literal.The warning log now uses
EXECUTION_METRICS_UNEXPECTED_TYPEimported from the domain events module instead of a hardcoded string. This addresses the prior review feedback.
118-137: Extraction logic is defensive and correct.The code gracefully handles missing keys, type mismatches, and correct types:
ae_data is None→ae_ratioremainsNone- Wrong type → logs warning with context, leaves
ae_ratioasNone- Correct type → extracts
.ratioUpstream population of
metadata["accuracy_effort"]is currently deferred, as expected. A follow-up PR will add wiring in the hybrid loop or plan executor to populate this key.src/synthorg/budget/cost_record.py (2)
34-36: Docstring updated to document the newaccuracy_effort_ratiofield.The class docstring now includes the new field with its type, meaning, and population timing. This addresses the prior review feedback.
53-60: Field definition is correct, but population mechanism does not exist in the codebase.The field is properly defined with
ge=0.0validation and correct description. However, none of the threeCostRecordinstantiation sites (incost_recording.py,llm_judge_quality_strategy.py, andmemory/backends/mem0/adapter.py) passaccuracy_effort_ratio. SinceCostRecordis a frozen model (immutable), this field cannot be populated post-creation—it can only be set during instantiation.The accuracy-effort ratio is computed separately for
TaskCompletionMetrics(extracted from execution metadata inengine/metrics.py), butCostRecordinstances are created per-API-call, not per-task. The field's description suggests task-level population at completion, but the per-call, immutable design ofCostRecordmakes this infeasible with the current frozen model pattern.Either the field should be populated at instantiation (requiring design changes to pass task metadata earlier), or this field should be removed in favor of tracking the ratio only in task-level metrics.
src/synthorg/engine/health/triage.py (2)
84-91: Dismissal transitions now log at INFO level.Both dismissal paths (LOW severity and MEDIUM below thresholds) now use
logger.info()instead oflogger.debug(). This addresses the prior review feedback about state transitions logging at INFO.
58-118: LGTM!The
should_escalatemethod correctly implements the documented triage rules:
- HIGH/CRITICAL: always escalate
- LOW: always dismiss
- MEDIUM: escalate if stall duration OR stuck steps meet thresholds
All decision paths log at INFO level with appropriate structured fields.
src/synthorg/engine/health/judge.py (3)
42-51: Validation forquality_degradation_threshold >= 1has been added.The constructor now validates that the threshold is at least 1, logging a warning before raising
ValueError. This addresses the prior review feedback about guarding against non-positive thresholds.
54-133: LGTM!The
emit_ticketmethod correctly implements the escalation priority:
- Stagnation termination → HIGH severity
- Error with recovery → MEDIUM severity
- Quality degradation (consecutive INCORRECT) → HIGH or CRITICAL based on count
The severity escalation to CRITICAL when
consecutive >= 2*thresholdis a sensible design for severe quality degradation.
136-146: LGTM!The
_count_trailing_incorrecthelper correctly counts consecutive INCORRECT signals from the end using reverse iteration with early break on non-INCORRECT.src/synthorg/engine/health/pipeline.py (4)
137-151: Notification delivery failure now preserves ticket semantics.The pipeline now:
- Attempts notification delivery with a try/except
- Re-raises
MemoryError/RecursionError(fatal errors)- Logs other failures with context including
ticket_idanddetail- Still returns the ticket so callers know escalation occurred
This addresses the prior review feedback about losing escalated ticket state when delivery fails.
100-101: Correct use of PEP 758 except syntax.The code correctly uses
except MemoryError, RecursionError:(comma-separated, no parentheses) per Python 3.14 and project coding guidelines.Also applies to: 141-142
67-108: LGTM!The
processmethod provides a clean public interface that:
- Delegates to the inner implementation
- Catches and logs unexpected exceptions without crashing
- Properly propagates fatal errors (
MemoryError,RecursionError)- Returns
Noneon failures (consistent with "no ticket" semantic)
154-175: LGTM!The
_ticket_to_notificationhelper correctly maps:
- Cause → Category (with
SYSTEMfallback)- Severity → NotificationSeverity (with
WARNINGfallback)- Includes all relevant metadata fields for observability
src/synthorg/observability/events/execution.py (1)
101-103: LGTM!The new event constant follows the established naming convention (
execution.metrics.unexpected_type) and is correctly typed withFinal[str]. This addresses the prior review feedback about using event constants instead of string literals inmetrics.py.tests/unit/engine/health/test_pipeline.py (1)
1-183: LGTM!The test suite provides comprehensive end-to-end coverage for
HealthMonitoringPipeline: stagnation escalation, completed (no-op), error+recovery dismissal/escalation thresholds, quality degradation,NotificationCategory.HEALTHusage, metadata propagation, and graceful sink-failure handling. The test doubles (_FakeSink,_FailingSink) are appropriately minimal and the assertions are focused.tests/unit/budget/test_coordination_metrics_properties.py (1)
1-81: LGTM!The property-based tests correctly rely on profile-based Hypothesis configuration via
HYPOTHESIS_PROFILEenvironment variable (no hardcoded@settingsdecorators). The strategies are well-defined with appropriate bounds (allow_nan=False,allow_infinity=False), and the property invariants being tested (monotonic speedup, positive team size, non-negative gap) are meaningful.src/synthorg/observability/events/coordination_metrics.py (1)
1-15: LGTM!Event constants follow the established naming pattern (
COORD_METRICS_*→coordination.metrics.*) and are properly typed withFinal[str]. The# noqa: S105on line 10 correctly suppresses the false positive for "token" in the event name string.tests/unit/engine/trajectory/test_models.py (1)
1-156: LGTM!Comprehensive validation coverage for the trajectory models: defaults, bounds (
k_candidates2–5,budget_guard_margin0.0–1.0,verbalized_confidence0–100), immutability (frozen=True), andextra="forbid"rejection. Thejoint_scorecomputed field test correctly validates the formulavc_score * abs(len_score) + len_score.tests/unit/engine/trajectory/test_scorer_properties.py (1)
1-140: LGTM!The property tests correctly rely on profile-based Hypothesis configuration without hardcoded
@settingsdecorators. The three invariants tested (selection determinism, joint score formula consistency, len_score ≤ 0) provide good coverage of scorer behavior across randomized inputs.tests/unit/engine/trajectory/test_scorer.py (1)
1-223: LGTM!Excellent coverage of
TrajectoryScorerbehavior: single/multiple candidate selection, trace-length preference, VC scoring (log-space,None→ 0.0, zero → -100.0 floor), majority-vote consistency filtering, and error handling for empty inputs. The tests align with the implementation logic inscorer.pyand the_check_consistencyalgorithm.src/synthorg/engine/trajectory/scorer.py (1)
1-220: LGTM!The implementation correctly addresses previous review findings:
- Empty-candidate error path now logs at WARNING before raising (lines 63–68).
- Tie-vote handling uses strict >50% threshold (
majority_count > len(candidates) / 2at line 173), marking all candidates consistent when no clear majority exists—a sensible fallback that avoids over-aggressive filtering.Logging follows guidelines:
get_logger(__name__), structured kwargs, INFO for state transitions (TRAJECTORY_BEST_SELECTED,TRAJECTORY_CONSISTENCY_FILTERED), DEBUG for internal flow.src/synthorg/engine/trajectory/models.py (1)
1-128: LGTM!The trajectory models follow Pydantic v2 best practices:
frozen=Truefor immutability,allow_inf_nan=Falsefor numeric safety,extra="forbid"on config,@computed_fieldfor derivedjoint_score, andNotBlankStrfor thecomplexity_gatetuple. Field constraints (ge/le) appropriately boundk_candidates,budget_guard_margin,verbalized_confidence, andlen_score.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/synthorg/engine/health/triage.py (1)
44-57:⚠️ Potential issue | 🟠 MajorValidate threshold types before numeric comparisons.
stall_duration_threshold/steps_thresholdare config-boundary inputs; non-numeric values currently raiseTypeErrorbefore your warning +ValueErrorpath, so failure mode is inconsistent and less diagnosable.Proposed fix
def __init__( self, *, stall_duration_threshold: float = (_MEDIUM_STALL_DURATION_THRESHOLD), steps_threshold: int = _MEDIUM_STEPS_THRESHOLD, ) -> None: + if not isinstance(stall_duration_threshold, int | float): + msg = ( + "stall_duration_threshold must be int|float >= 0, " + f"got {type(stall_duration_threshold).__name__}" + ) + logger.warning(HEALTH_TICKET_DISMISSED, error=msg) + raise ValueError(msg) + if not isinstance(steps_threshold, int): + msg = ( + "steps_threshold must be int >= 0, " + f"got {type(steps_threshold).__name__}" + ) + logger.warning(HEALTH_TICKET_DISMISSED, error=msg) + raise ValueError(msg) if stall_duration_threshold < 0: msg = ( f"stall_duration_threshold must be >= 0, got {stall_duration_threshold}" )As per coding guidelines, “Validate at system boundaries (user input, external APIs, config files).”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/health/triage.py` around lines 44 - 57, The checks for stall_duration_threshold and steps_threshold must validate types before doing numeric comparisons; in the function where those params are accepted (the constructor or function that declares stall_duration_threshold and steps_threshold) add explicit type checks (e.g., isinstance(..., (int, float)) or numbers.Number) for both stall_duration_threshold and steps_threshold before the < 0 comparisons, and when a non-numeric value is detected log using logger.warning with HEALTH_TICKET_DISMISSED and a clear message referencing the parameter name, then raise a TypeError with that message so the failure mode is consistent and diagnosable (keep the existing negative-value checks/ValueError after the type checks).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/budget/coordination_metrics.py`:
- Around line 385-400: The new metric model classes AmdahlCeiling, StragglerGap,
TokenSpeedupRatio, and MessageOverhead are not exported from the package API;
update the package-level exports by importing these names from their module
(where CoordinationEfficiency, ErrorAmplification, MessageDensity etc. are
imported) and add them to the __all__ (or existing export list) so external
consumers can import them for types and construction; ensure the import
statements mirror the pattern used for existing metrics and include each of the
four class names in the exported list.
- Around line 238-247: The validator method _validate_slowest_ge_mean currently
raises ValueError without logging; modify it to log a warning with context
(include slowest_duration_seconds and mean_duration_seconds and a short message)
immediately before raising so the failure is recorded in telemetry—use the
module logger (or the existing logger instance) to call logger.warning(...) and
keep the raised ValueError unchanged.
In `@src/synthorg/engine/health/triage.py`:
- Around line 51-54: Replace the use of HEALTH_TICKET_DISMISSED inside triage
class constructor logging with a dedicated triage-config event constant (e.g.,
HEALTH_TRIAGE_CONFIG_ERROR) to avoid polluting dismissal telemetry; locate the
logger.warning calls in __init__ that pass error=msg (the blocks around
HEALTH_TICKET_DISMISSED at the two noted spots) and change them to emit the new
event name (and consider using logger.error or logger.warning consistently for
config failures), creating the HEALTH_TRIAGE_CONFIG_ERROR constant if it doesn't
exist and reusing the same message payload (error=msg).
---
Duplicate comments:
In `@src/synthorg/engine/health/triage.py`:
- Around line 44-57: The checks for stall_duration_threshold and steps_threshold
must validate types before doing numeric comparisons; in the function where
those params are accepted (the constructor or function that declares
stall_duration_threshold and steps_threshold) add explicit type checks (e.g.,
isinstance(..., (int, float)) or numbers.Number) for both
stall_duration_threshold and steps_threshold before the < 0 comparisons, and
when a non-numeric value is detected log using logger.warning with
HEALTH_TICKET_DISMISSED and a clear message referencing the parameter name, then
raise a TypeError with that message so the failure mode is consistent and
diagnosable (keep the existing negative-value checks/ValueError after the type
checks).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 99bdf959-4dbd-49c6-8fe0-9dc0bb8eef47
📒 Files selected for processing (2)
src/synthorg/budget/coordination_metrics.pysrc/synthorg/engine/health/triage.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax:except A, B:(no parentheses)—ruff enforces this on Python 3.14.
Type hints: all public functions required, mypy strict mode enforced.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Variable name: always logger (not _logger, not log).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
Files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(name)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr).
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events. import EVENT_CONSTANT
Structured kwargs: always logger.info(EVENT, key=value)—never logger.info("msg %s", val)
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError.
Non-retryable errors raise immediately without retry.
RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.
Rate limiter respects RateLimitError.retry_after from providers—automatically pauses future requests.
Files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
🧠 Learnings (22)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Quality tier (Sonnet-class) include: code-reviewer, python-reviewer, frontend-reviewer, go-reviewer, conventions-enforcer, logging-audit, resilience-audit, and api-contract-drift
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Parallel tier (Haiku-class) include 14 reviewers: async-concurrency-reviewer, comment-analyzer, design-token-audit, docs-consistency, go-conventions-enforcer, go-security-reviewer, infra-reviewer, issue-resolution-verifier, persistence-reviewer, pr-test-analyzer, security-reviewer, silent-failure-hunter, test-quality-reviewer, and type-design-analyzer
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T20:14:00.937Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:14:00.937Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-01T14:22:06.315Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T14:22:06.315Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO level
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : All state transitions must log at INFO.
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions.
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-31T20:07:03.035Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:07:03.035Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions.
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising; log all state transitions at INFO; log object creation/internal flow/entry-exit at DEBUG
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to src/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `allow_inf_nan=False` in all `ConfigDict` declarations to reject `NaN`/`Inf` in numeric fields at validation time
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-02T11:49:21.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T11:49:21.202Z
Learning: Applies to **/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`) with `allow_inf_nan=False` in all `ConfigDict` declarations
Applied to files:
src/synthorg/budget/coordination_metrics.py
🔇 Additional comments (9)
src/synthorg/engine/health/triage.py (1)
76-126: Severity routing and MEDIUM threshold boundaries look solid.Coverage of all
EscalationSeverityvalues and>=boundary behavior is clear and deterministic.src/synthorg/budget/coordination_metrics.py (8)
1-24: Imports and module setup look correct.The updated imports properly bring in
math,Self,model_validator,NotBlankStr, and the observability event constant. Logger initialization follows project conventions.
165-209: AmdahlCeiling model is well-implemented.The model correctly implements Amdahl's Law with:
- Proper field constraint (
lt=1.0) preventing division by zero inmax_speedup- Sound derivation for
recommended_team_size(solving for 90% of ceiling)- Defensive guard at line 205-206 for
p <= 0edge case
266-305: TokenSpeedupRatio model correctly implements alert threshold.The model properly captures the token-cost-vs-latency tradeoff with computed
ratioandalertfields. The 2.0 threshold aligns with the PR objective for alerting when tokens scale disproportionately to speedup.
307-344: MessageOverhead model correctly detects O(n²) growth.The
is_quadraticcomputed field properly compares message count againstteam_size² × threshold. Field constraints are appropriate (gt=0for team_size, configurable threshold with sensible default).
547-568: compute_amdahl_ceiling delegates validation to Pydantic.This is acceptable since the
AmdahlCeilingmodel'sField(ge=0.0, lt=1.0)constraint handles boundary validation. AValidationErrorwith clear messaging will be raised for invalid inputs.
570-626: compute_straggler_gap properly validates inputs with logging.The function correctly:
- Rejects empty sequences with logged warning
- Validates each entry for blank agent_id and non-finite/negative duration
- Logs
COORD_METRICS_VALIDATION_ERRORbefore raisingValueErrorThis addresses the previous review findings.
629-668: compute_token_speedup_ratio validates all inputs comprehensively.The consolidated loop checking
math.isfinite(value) or value <= 0correctly catches NaN, Inf, zero, and negative values before model construction. Logging with parameter context ensures observability for invalid input scenarios.
671-691: compute_message_overhead is a thin wrapper with appropriate delegation.The function relies on Pydantic's field-level validation (
gt=0,ge=0, bounded threshold). This is consistent withcompute_amdahl_ceiling's approach and acceptable for non-division operations where Pydantic provides clear error messages.
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/budget/__init__.py`:
- Around line 24-35: Add the four new coordination metric classes to the module
exports by updating the __all__ list to include AmdahlCeiling,
CoordinationEfficiency, CoordinationMetrics, and CoordinationOverhead (inserted
in alphabetical order among existing entries) so consumers can import them via
from synthorg.budget import AmdahlCeiling, CoordinationEfficiency,
CoordinationMetrics, CoordinationOverhead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 670a9f77-9c4b-4527-bb27-cd4174ae2166
📒 Files selected for processing (4)
src/synthorg/budget/__init__.pysrc/synthorg/budget/coordination_metrics.pysrc/synthorg/engine/health/triage.pysrc/synthorg/observability/events/health.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Deploy Preview
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Build Backend
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax:except A, B:(no parentheses)—ruff enforces this on Python 3.14.
Type hints: all public functions required, mypy strict mode enforced.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Variable name: always logger (not _logger, not log).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.
Files:
src/synthorg/budget/__init__.pysrc/synthorg/observability/events/health.pysrc/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(name)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr).
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events. import EVENT_CONSTANT
Structured kwargs: always logger.info(EVENT, key=value)—never logger.info("msg %s", val)
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError.
Non-retryable errors raise immediately without retry.
RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.
Rate limiter respects RateLimitError.retry_after from providers—automatically pauses future requests.
Files:
src/synthorg/budget/__init__.pysrc/synthorg/observability/events/health.pysrc/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/budget/__init__.pysrc/synthorg/observability/events/health.pysrc/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
🧠 Learnings (38)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Quality tier (Sonnet-class) include: code-reviewer, python-reviewer, frontend-reviewer, go-reviewer, conventions-enforcer, logging-audit, resilience-audit, and api-contract-drift
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Parallel tier (Haiku-class) include 14 reviewers: async-concurrency-reviewer, comment-analyzer, design-token-audit, docs-consistency, go-conventions-enforcer, go-security-reviewer, infra-reviewer, issue-resolution-verifier, persistence-reviewer, pr-test-analyzer, security-reviewer, silent-failure-hunter, test-quality-reviewer, and type-design-analyzer
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)
Applied to files:
src/synthorg/budget/__init__.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.
Applied to files:
src/synthorg/budget/__init__.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to {pyproject.toml,src/synthorg/__init__.py} : Update version in `pyproject.toml` (`[tool.commitizen].version`) and `src/synthorg/__init__.py` (`__version__`)
Applied to files:
src/synthorg/budget/__init__.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
src/synthorg/budget/__init__.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).
Applied to files:
src/synthorg/budget/__init__.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)
Applied to files:
src/synthorg/observability/events/health.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module
Applied to files:
src/synthorg/observability/events/health.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.
Applied to files:
src/synthorg/observability/events/health.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/health.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Applied to files:
src/synthorg/observability/events/health.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/health.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Applied to files:
src/synthorg/observability/events/health.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/observability/events/health.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging
Applied to files:
src/synthorg/observability/events/health.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals
Applied to files:
src/synthorg/observability/events/health.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T20:14:00.937Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:14:00.937Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-01T14:22:06.315Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T14:22:06.315Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Validate at system boundaries (user input, external APIs, config files)
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files).
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO level
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : All state transitions must log at INFO.
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions.
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-31T20:07:03.035Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:07:03.035Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.
Applied to files:
src/synthorg/engine/health/triage.pysrc/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions.
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising; log all state transitions at INFO; log object creation/internal flow/entry-exit at DEBUG
Applied to files:
src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to src/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `allow_inf_nan=False` in all `ConfigDict` declarations to reject `NaN`/`Inf` in numeric fields at validation time
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-02T11:49:21.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T11:49:21.202Z
Learning: Applies to **/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`) with `allow_inf_nan=False` in all `ConfigDict` declarations
Applied to files:
src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/budget/coordination_metrics.py
🔇 Additional comments (14)
src/synthorg/observability/events/health.py (1)
1-9: LGTM!Event constants follow the established pattern with proper
Final[str]typing and a clearexecution.health.*namespace. The module correctly omits logging as per guidelines for pure data/constant modules.src/synthorg/engine/health/triage.py (3)
1-16: LGTM!Module setup follows guidelines: proper docstring, correct imports from domain-specific event modules, and logger instantiation via
get_logger(__name__).
42-65: LGTM!Constructor properly validates thresholds at construction time, logs configuration errors with
HEALTH_TRIAGE_CONFIG_ERROR(notHEALTH_TICKET_DISMISSED), and raisesValueErrorafter logging—correctly addressing the boundary validation and event separation concerns from prior reviews.
67-127: LGTM!The
should_escalatemethod correctly implements the severity-based triage rules:
- HIGH/CRITICAL always escalated
- LOW always dismissed
- MEDIUM escalated when stall OR steps thresholds are met
All state transitions log at INFO with structured kwargs and proper event constants. The
>=threshold comparisons correctly match the docstring semantics ("stall < 60s AND steps < 5" dismissed).src/synthorg/budget/coordination_metrics.py (10)
1-24: LGTM!Module setup follows project conventions: proper logger initialization, event constants imported from domain-specific module, and necessary Pydantic/stdlib imports added.
165-209: LGTM!
AmdahlCeilingmodel correctly implements Amdahl's Law with proper field constraints (preventing division by zero), computed properties formax_speedupandrecommended_team_size, and handles the edge case wherep <= 0.
211-270: LGTM!
StragglerGapmodel correctly enforces the invariant thatslowest_duration_seconds >= mean_duration_secondsviamodel_validator, usesNotBlankStrfor the agent identifier, and properly logs validation failures before raising.
272-311: LGTM!
TokenSpeedupRatiomodel correctly computes the token-to-speedup ratio with proper positive constraints on inputs, and the alert threshold logic is clear.
313-350: LGTM!
MessageOverheadmodel correctly detects quadratic message growth patterns with a configurable threshold and appropriate field constraints.
352-406: LGTM!
CoordinationMetricscontainer properly extended with four new optional metric fields, maintaining consistency with existing patterns.
553-573: LGTM!
compute_amdahl_ceilingis a clean wrapper that delegates validation to the Pydantic model constraints.
576-632: LGTM!
compute_straggler_gapcorrectly validates inputs with proper logging before raising, handles the empty sequence case, validates each entry for blank IDs and invalid durations, and correctly computes the straggler metrics.
635-674: LGTM!
compute_token_speedup_ratiocorrectly validates all inputs for finiteness and positivity with proper logging before raising, preventing NaN/Inf from bypassing the error path.
677-697: LGTM!
compute_message_overheadis a clean wrapper that delegates validation to the Pydantic model constraints.
| from synthorg.budget.coordination_metrics import ( | ||
| AmdahlCeiling, | ||
| CoordinationEfficiency, | ||
| CoordinationMetrics, | ||
| CoordinationOverhead, | ||
| ErrorAmplification, | ||
| MessageDensity, | ||
| MessageOverhead, | ||
| RedundancyRate, | ||
| StragglerGap, | ||
| TokenSpeedupRatio, | ||
| ) |
There was a problem hiding this comment.
Missing __all__ exports for the new coordination metric classes.
The four new classes are imported but not added to __all__, breaking the public API contract. Consumers using from synthorg.budget import AmdahlCeiling will get an ImportError.
🐛 Proposed fix to add missing exports
Add the following entries to __all__ in alphabetical order:
"AgentSpending",
+ "AmdahlCeiling",
"AnomalyDetectionResult", "MessageDensity",
+ "MessageOverhead",
"ModelDistribution", "SpendingSummary",
+ "StragglerGap",
"SubscriptionConfig", "TeamBudget",
+ "TokenSpeedupRatio",
"billing_period_start",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/budget/__init__.py` around lines 24 - 35, Add the four new
coordination metric classes to the module exports by updating the __all__ list
to include AmdahlCeiling, CoordinationEfficiency, CoordinationMetrics, and
CoordinationOverhead (inserted in alphabetical order among existing entries) so
consumers can import them via from synthorg.budget import AmdahlCeiling,
CoordinationEfficiency, CoordinationMetrics, CoordinationOverhead.
🤖 I have created a release *beep* *boop* --- ## [0.6.4](v0.6.3...v0.6.4) (2026-04-07) ### Features * analytics and metrics runtime pipeline ([#226](#226), [#225](#225), [#227](#227), [#224](#224)) ([#1127](#1127)) ([ec57641](ec57641)) * engine intelligence -- quality signals, health monitoring, trajectory scoring, coordination metrics ([#1099](#1099)) ([aac2029](aac2029)), closes [#697](#697) [#707](#707) [#705](#705) [#703](#703) * enterprise-grade auth -- HttpOnly cookie sessions, CSRF, lockout, session limits ([#1102](#1102)) ([d3022c7](d3022c7)), closes [#1068](#1068) * implement core tool categories and granular sub-constraints ([#1101](#1101)) ([0611b53](0611b53)), closes [#1034](#1034) [#220](#220) * memory evolution -- GraphRAG/consistency research + SelfEditingMemoryStrategy ([#1036](#1036), [#208](#208)) ([#1129](#1129)) ([a9acda3](a9acda3)) * security hardening -- sandbox, risk override, SSRF self-heal, DAST fix ([#1100](#1100)) ([31e7273](31e7273)), closes [#1098](#1098) [#696](#696) [#222](#222) [#671](#671) ### Bug Fixes * harden agent identity versioning post-review ([#1128](#1128)) ([8eb2859](8eb2859)), closes [#1076](#1076) ### Documentation * engine architecture research ([#688](#688) [#690](#690) [#848](#848) [#687](#687)) ([#1114](#1114)) ([59b31f9](59b31f9)) ### Maintenance * add .claudeignore and split CLAUDE.md for token optimization ([#1112](#1112)) ([b0fbd18](b0fbd18)) * bump github.com/sigstore/protobuf-specs from 0.5.0 to 0.5.1 in /cli in the all group ([#1106](#1106)) ([73089c9](73089c9)) * bump jsdom from 29.0.1 to 29.0.2 in /site in the all group ([#1107](#1107)) ([8e99dce](8e99dce)) * bump jsdom from 29.0.1 to 29.0.2 in /web in the all group ([#1108](#1108)) ([ce8c749](ce8c749)) * bump python from `fb83750` to `6869258` in /docker/backend in the all group ([#1104](#1104)) ([4911726](4911726)) * bump python from `fb83750` to `6869258` in /docker/web in the all group ([#1103](#1103)) ([87bdf09](87bdf09)) * bump the all group across 1 directory with 4 updates ([#1111](#1111)) ([f702464](f702464)) * bump the all group in /docker/sandbox with 2 updates ([#1105](#1105)) ([05a91ca](05a91ca)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
Adds four engine intelligence features: step-level quality signals, two-layer health monitoring, best-of-K trajectory scoring, and distributed systems coordination metrics.
#697: Step-Level Quality Signals
StepQualityternary enum (correct/neutral/incorrect) based on AgentProcessBenchStepQualityClassifierprotocol +RuleBasedStepClassifier(deterministic, no LLM cost)AccuracyEffortRatiomodel with computedaccuracyandratiofieldscompute_accuracy_effort()with weak-model-trap warning (early termination detection)accuracy_effort_ratiointoTaskCompletionMetricsandCostRecord#707: Two-Layer Health Monitoring Pipeline
EscalationTicketmodel with severity, cause, evidence, quality signalsHealthJudge(sensitive layer): emits tickets on stagnation, error+recovery, quality degradationTriageFilter(conservative layer): rule-based dismiss LOW, escalate HIGH/CRITICAL, threshold MEDIUMHealthMonitoringPipeline: composes judge + triage + NotificationSink deliveryHEALTHnotification category#705: TrajectoryScorer for HybridLoop
TrajectoryConfig(off by default, K=2-5, complexity-gated, budget margin)TrajectoryScorerwith self-consistency filter (majority-vote on fingerprints), verbalized confidence (log-space), trace length scoringcheck_trajectory_budget()budget guard for K-candidate samplingCandidateResultandTrajectoryScoremodels with computedjoint_scoreTrajectoryConfigintoHybridLoopConfig#703: Coordination Metrics from Distributed Systems Theory
AmdahlCeiling: S_max = 1/(1-p), recommended team size at 90% speedupStragglerGap: slowest - mean duration with cross-field validationTokenSpeedupRatio: token_multiplier / latency_speedup, alert at 2.0MessageOverhead: O(n^2) message growth detectionCoordinationMetricscontainer from 5 to 9 metricsScope Note
This PR implements the models, scoring logic, and metrics computation layer for all four issues. The hybrid loop integration (wiring trajectory scoring into
hybrid_loop.pywithasyncio.TaskGroupfor K-parallel candidates) and approval gate wiring (consuming quality signals at review boundaries) are follow-up work -- the infrastructure is complete and tested, the loop integration requires careful coordination with the execution path.Design Spec Updates Needed
After merge, update:
docs/design/engine.md-- add Quality Signals, Health Monitoring, Trajectory Scoring sectionsdocs/design/operations.md-- update Coordination Metrics table (5 to 9 metrics)Test Plan
Files
Closes #697
Closes #707
Closes #705
Closes #703