Skip to content

feat: engine intelligence -- quality signals, health monitoring, trajectory scoring, coordination metrics#1099

Merged
Aureliolo merged 9 commits intomainfrom
feat/engine-intelligence
Apr 7, 2026
Merged

feat: engine intelligence -- quality signals, health monitoring, trajectory scoring, coordination metrics#1099
Aureliolo merged 9 commits intomainfrom
feat/engine-intelligence

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Adds four engine intelligence features: step-level quality signals, two-layer health monitoring, best-of-K trajectory scoring, and distributed systems coordination metrics.

#697: Step-Level Quality Signals

  • StepQuality ternary enum (correct/neutral/incorrect) based on AgentProcessBench
  • StepQualityClassifier protocol + RuleBasedStepClassifier (deterministic, no LLM cost)
  • AccuracyEffortRatio model with computed accuracy and ratio fields
  • compute_accuracy_effort() with weak-model-trap warning (early termination detection)
  • Wired accuracy_effort_ratio into TaskCompletionMetrics and CostRecord

#707: Two-Layer Health Monitoring Pipeline

  • EscalationTicket model with severity, cause, evidence, quality signals
  • HealthJudge (sensitive layer): emits tickets on stagnation, error+recovery, quality degradation
  • TriageFilter (conservative layer): rule-based dismiss LOW, escalate HIGH/CRITICAL, threshold MEDIUM
  • HealthMonitoringPipeline: composes judge + triage + NotificationSink delivery
  • Added HEALTH notification category

#705: TrajectoryScorer for HybridLoop

  • TrajectoryConfig (off by default, K=2-5, complexity-gated, budget margin)
  • TrajectoryScorer with self-consistency filter (majority-vote on fingerprints), verbalized confidence (log-space), trace length scoring
  • check_trajectory_budget() budget guard for K-candidate sampling
  • CandidateResult and TrajectoryScore models with computed joint_score
  • Wired TrajectoryConfig into HybridLoopConfig

#703: Coordination Metrics from Distributed Systems Theory

  • AmdahlCeiling: S_max = 1/(1-p), recommended team size at 90% speedup
  • StragglerGap: slowest - mean duration with cross-field validation
  • TokenSpeedupRatio: token_multiplier / latency_speedup, alert at 2.0
  • MessageOverhead: O(n^2) message growth detection
  • Extended CoordinationMetrics container from 5 to 9 metrics

Scope Note

This PR implements the models, scoring logic, and metrics computation layer for all four issues. The hybrid loop integration (wiring trajectory scoring into hybrid_loop.py with asyncio.TaskGroup for K-parallel candidates) and approval gate wiring (consuming quality signals at review boundaries) are follow-up work -- the infrastructure is complete and tested, the loop integration requires careful coordination with the execution path.

Design Spec Updates Needed

After merge, update:

  • docs/design/engine.md -- add Quality Signals, Health Monitoring, Trajectory Scoring sections
  • docs/design/operations.md -- update Coordination Metrics table (5 to 9 metrics)

Test Plan

  • 163 new unit tests + 14 Hypothesis property tests (177 total new tests)
  • Full suite: 14766 passed, 0 failed
  • Pre-reviewed by 6 agents (code-reviewer, type-design-analyzer, silent-failure-hunter, docs-consistency, test-quality-reviewer, issue-resolution-verifier), 20 findings addressed

Files

  • 17 new source files across 3 new engine subpackages + 4 event modules
  • 14 new test files with conftest
  • 9 modified files (CLAUDE.md, coordination_metrics, cost_record, hybrid_models, metrics, notifications/models, test_coordination_metrics, test_events)

Closes #697
Closes #707
Closes #705
Closes #703

Add ternary step classification (correct/neutral/incorrect) with
rule-based classifier, accuracy-effort ratio computation, and
weak-model-trap warning.

- StepQuality enum, StepQualitySignal, AccuracyEffortRatio models
- StepQualityClassifier protocol + RuleBasedStepClassifier
- compute_accuracy_effort pure function with expected_steps normalization
- Wire accuracy_effort_ratio into TaskCompletionMetrics and CostRecord
- Event constants: quality.step_classified, accuracy_effort_computed, weak_model_warning
- 52 unit tests + 5 Hypothesis property tests

Closes #697
Add HealthJudge (sensitive) + TriageFilter (conservative) two-layer
health monitoring with NotificationSink delivery.

- EscalationTicket, EscalationSeverity, EscalationCause models
- HealthJudge: emits tickets on stagnation, error+recovery, quality degradation
- TriageFilter: rule-based dismiss LOW, escalate HIGH/CRITICAL, threshold MEDIUM
- HealthMonitoringPipeline: compose judge + triage + NotificationSink
- Add HEALTH category to NotificationCategory
- Event constants: health.ticket_emitted/dismissed/escalated/pipeline_error
- 37 unit tests

Closes #707
Add trajectory scoring with self-consistency filter, verbalized
confidence (VC), trace length (Len), and budget guard.

- TrajectoryConfig, CandidateResult, TrajectoryScore models
- TrajectoryScorer: score_candidates + select_best with majority-vote consistency
- check_trajectory_budget: budget guard for K-candidate sampling
- Wire TrajectoryConfig into HybridLoopConfig
- Event constants: trajectory.scoring_start/candidate_scored/best_selected/budget_guard_blocked/consistency_filtered
- 38 unit tests + 3 Hypothesis property tests

Closes #705
Add Amdahl ceiling, straggler gap, token/speedup ratio alert, and
O(n^2) message overhead detection.

- AmdahlCeiling: S_max = 1/(1-p), recommended_team_size at 90% speedup
- StragglerGap: slowest - mean duration, gap ratio
- TokenSpeedupRatio: token_multiplier / latency_speedup, alert at 2.0
- MessageOverhead: quadratic growth detection (messages vs n^2 * threshold)
- Pure computation functions for all 4 metrics
- Extend CoordinationMetrics container (5 to 9 metrics)
- Event constants: coordination_metrics domain
- 25 new unit tests + 3 Hypothesis property tests

Closes #703
- Fix compute_token_speedup_ratio: validate all 4 inputs
- Use HEALTH notification category in health pipeline mapping
- Add StragglerGap cross-field validator (slowest must exceed mean)
- Fix VC docstring: single-value, not per-turn aggregation
- Use dict lookup in select_best instead of positional index
- Replace hasattr duck-typing with isinstance + warning log
- Make AccuracyEffortRatio.accuracy a computed field
- Use NotBlankStr for evidence, reason, complexity_gate fields
- Add VC degradation logging in trajectory scorer
- Update CLAUDE.md: package structure, events, notification categories
- Add tests: StragglerGap validator, token validation, HEALTH category

Pre-reviewed by 6 agents, 20 findings addressed
Copilot AI review requested due to automatic review settings April 6, 2026 22:26
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 6, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: de8262ad-3613-4b8a-a916-069f7212d998

📥 Commits

Reviewing files that changed from the base of the PR and between 32cb66c and 8f971d2.

📒 Files selected for processing (1)
  • src/synthorg/budget/__init__.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (go)
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses)—ruff enforces this on Python 3.14.
Type hints: all public functions required, mypy strict mode enforced.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Variable name: always logger (not _logger, not log).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.

Files:

  • src/synthorg/budget/__init__.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(name)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr).
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events. import EVENT_CONSTANT
Structured kwargs: always logger.info(EVENT, key=value)—never logger.info("msg %s", val)
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError.
Non-retryable errors raise immediately without retry.
RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.
Rate limiter respects RateLimitError.retry_after from providers—automatically pauses future requests.

Files:

  • src/synthorg/budget/__init__.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/budget/__init__.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Quality tier (Sonnet-class) include: code-reviewer, python-reviewer, frontend-reviewer, go-reviewer, conventions-enforcer, logging-audit, resilience-audit, and api-contract-drift
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Parallel tier (Haiku-class) include 14 reviewers: async-concurrency-reviewer, comment-analyzer, design-token-audit, docs-consistency, go-conventions-enforcer, go-security-reviewer, infra-reviewer, issue-resolution-verifier, persistence-reviewer, pr-test-analyzer, security-reviewer, silent-failure-hunter, test-quality-reviewer, and type-design-analyzer
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/budget/__init__.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/budget/__init__.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to {pyproject.toml,src/synthorg/__init__.py} : Update version in `pyproject.toml` (`[tool.commitizen].version`) and `src/synthorg/__init__.py` (`__version__`)

Applied to files:

  • src/synthorg/budget/__init__.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • src/synthorg/budget/__init__.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • src/synthorg/budget/__init__.py
🔇 Additional comments (1)
src/synthorg/budget/__init__.py (1)

24-35: LGTM!

The four new coordination metric classes (AmdahlCeiling, MessageOverhead, StragglerGap, TokenSpeedupRatio) are properly imported and exported in __all__ in correct alphabetical order. The previous review concern about missing exports has been addressed.

Also applies to: 138-138, 182-182, 218-218, 223-223


Walkthrough

Adds step-level quality models and utilities (StepQuality enum, StepQualitySignal, AccuracyEffortRatio, RuleBasedStepClassifier, compute_accuracy_effort), a two-layer health monitoring stack (EscalationTicket, EscalationCause/Severity, HealthJudge, TriageFilter, HealthMonitoringPipeline), trajectory scoring and budget guard (TrajectoryConfig, CandidateResult, TrajectoryScore, TrajectoryScorer, check_trajectory_budget), expanded coordination metrics and compute functions (AmdahlCeiling, StragglerGap, TokenSpeedupRatio, MessageOverhead, compute_* helpers), new observability event constants across multiple domains, small schema additions to existing budget/metrics models, and many accompanying unit and property tests.

Suggested labels

autorelease: tagged

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.49% which is insufficient. The required threshold is 40.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and comprehensively captures the main changes: four engine intelligence features (quality signals, health monitoring, trajectory scoring, coordination metrics) matching the PR's core additions.
Description check ✅ Passed The description clearly relates to the changeset, detailing all four major features (#697, #707, #705, #703) with implementation notes and scope clarifications matching the file changes.
Linked Issues check ✅ Passed The PR implements all primary coding objectives from linked issues: step-quality signals (#697), health monitoring pipeline (#707), trajectory scoring (#705), and coordination metrics (#703). Models, logic, and metrics layers are complete; deferred execution wiring is noted.
Out of Scope Changes check ✅ Passed All changes align with linked issue objectives. Minor scope-boundary updates (CLAUDE.md documentation, observability constants) and related field wiring (TaskCompletionMetrics, CostRecord, HybridLoopConfig) are justified and in-scope.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 8f971d2.
Ensure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive health monitoring and trajectory scoring system for the engine, alongside expanded coordination metrics and step-level quality signals. Key additions include a two-layer health pipeline (Judge and Triage), best-of-K trajectory selection based on verbalized confidence and trace length, and new empirical metrics like Amdahl's Law ceiling and straggler gap. Feedback highlights a critical syntax error in exception handling within the health pipeline and a logic issue in the trajectory consistency filter where plurality is used instead of a true majority. Additionally, the joint scoring mechanism for trajectories may be imbalanced due to the significantly different magnitudes of confidence scores versus token counts.

Comment on lines +100 to +101
except MemoryError, RecursionError:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The except MemoryError, RecursionError: syntax is invalid in Python 3 and will raise a SyntaxError at import time. In Python 3, multiple exceptions must be enclosed in a tuple: except (MemoryError, RecursionError):. Additionally, the current implementation fails to catch RecursionError specifically (it would be caught and swallowed by the subsequent except Exception: block) and would incorrectly assign the MemoryError instance to the name RecursionError if the syntax were valid.

Suggested change
except MemoryError, RecursionError:
raise
except (MemoryError, RecursionError):
raise

Comment on lines +167 to +175
majority_fp, _ = counter.most_common(1)[0]

result: dict[int, bool] = {}
filtered_count = 0
for candidate, fps in zip(candidates, fingerprint_sets, strict=True):
is_consistent = fps == majority_fp
result[candidate.candidate_index] = is_consistent
if not is_consistent:
filtered_count += 1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation of the self-consistency filter uses a plurality vote with arbitrary tie-breaking, which does not align with the stated "majority-vote" design. In cases where no candidate has more than 50% agreement (e.g., a 1-vs-1 tie or all unique fingerprints), the code arbitrarily marks the first candidate as consistent and others as inconsistent. This introduces bias and potentially filters out valid trajectories. The filter should only mark candidates as inconsistent if a clear majority exists.

Suggested change
majority_fp, _ = counter.most_common(1)[0]
result: dict[int, bool] = {}
filtered_count = 0
for candidate, fps in zip(candidates, fingerprint_sets, strict=True):
is_consistent = fps == majority_fp
result[candidate.candidate_index] = is_consistent
if not is_consistent:
filtered_count += 1
majority_fp, count = counter.most_common(1)[0]
has_majority = count > len(candidates) / 2
result: dict[int, bool] = {}
filtered_count = 0
for candidate, fps in zip(candidates, fingerprint_sets, strict=True):
is_consistent = (fps == majority_fp) if has_majority else True
result[candidate.candidate_index] = is_consistent
if not is_consistent:
filtered_count += 1

@property
def joint_score(self) -> float:
"""Combined score: VC + Len (least-negative wins)."""
return self.vc_score + self.len_score
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The joint_score calculation combines vc_score (log-space confidence, typically range -4.6 to 0) and len_score (negative token count, typically range -100 to -2000+). Because of the vast difference in magnitude, the len_score will almost always dominate the selection, making the verbalized confidence signal effectively irrelevant unless the trace lengths are nearly identical. Consider scaling the components (e.g., multiplying vc_score by a factor or normalizing len_score) to ensure both signals meaningfully contribute to the joint score.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an “engine intelligence” computation layer: step-level quality signals + accuracy/effort metric, a two-layer health monitoring pipeline (judge + triage + notification), best-of-K trajectory scoring utilities, and expanded coordination metrics from distributed-systems theory.

Changes:

  • Introduces new engine/quality, engine/health, and engine/trajectory subpackages (models + pure computations + scoring/triage logic) plus new observability event domains.
  • Extends budget coordination metrics from 5 → 9 metrics (Amdahl ceiling, straggler gap, token/speedup ratio, message overhead) with accompanying tests.
  • Wires new fields/config surfaces into existing models (TaskCompletionMetrics, HybridLoopConfig, NotificationCategory, CostRecord) and expands unit/property tests accordingly.

Reviewed changes

Copilot reviewed 39 out of 42 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/unit/observability/test_events.py Updates expected observability event domains to include new modules.
tests/unit/engine/trajectory/test_scorer.py Unit tests for trajectory scoring and selection behavior.
tests/unit/engine/trajectory/test_scorer_properties.py Property-based tests for deterministic and bounded trajectory scoring.
tests/unit/engine/trajectory/test_models.py Validation tests for trajectory config/result/score models.
tests/unit/engine/trajectory/test_budget_guard.py Unit tests for K-candidate budget guard logic.
tests/unit/engine/trajectory/conftest.py Shared fixtures for trajectory tests (minimal_context).
tests/unit/engine/trajectory/init.py Marks the trajectory test package.
tests/unit/engine/quality/test_models.py Validation tests for step-quality and accuracy/effort models.
tests/unit/engine/quality/test_effort.py Unit tests for compute_accuracy_effort().
tests/unit/engine/quality/test_effort_properties.py Property-based tests for accuracy/effort invariants.
tests/unit/engine/quality/test_classifier.py Unit tests for the rule-based step quality classifier.
tests/unit/engine/quality/init.py Marks the quality test package.
tests/unit/engine/health/test_triage.py Unit tests for conservative triage rules.
tests/unit/engine/health/test_pipeline.py End-to-end tests for judge→triage→notification pipeline behavior.
tests/unit/engine/health/test_models.py Validation tests for escalation ticket models/enums.
tests/unit/engine/health/test_judge.py Unit tests for sensitive-layer ticket emission rules.
tests/unit/engine/health/init.py Marks the health test package.
tests/unit/budget/test_coordination_metrics.py Extends coordination metrics tests to cover new metrics.
tests/unit/budget/test_coordination_metrics_properties.py Property-based tests for new coordination metrics.
src/synthorg/observability/events/trajectory.py Adds trajectory-scoring event constants.
src/synthorg/observability/events/quality.py Adds quality-signal event constants.
src/synthorg/observability/events/health.py Adds health-monitoring event constants.
src/synthorg/observability/events/coordination_metrics.py Adds coordination-metrics event constants.
src/synthorg/notifications/models.py Adds HEALTH notification category.
src/synthorg/engine/trajectory/scorer.py Implements self-consistency + VC + length scoring and best-candidate selection.
src/synthorg/engine/trajectory/models.py Adds frozen Pydantic models for trajectory config/results/scores.
src/synthorg/engine/trajectory/budget_guard.py Adds budget guard for best-of-K candidate sampling.
src/synthorg/engine/trajectory/init.py Public exports for trajectory scoring components.
src/synthorg/engine/quality/models.py Adds step-quality and accuracy/effort metric models.
src/synthorg/engine/quality/effort.py Computes accuracy/effort ratio and emits observability events.
src/synthorg/engine/quality/classifier.py Adds classifier protocol + deterministic rule-based implementation.
src/synthorg/engine/quality/init.py Public exports for quality components.
src/synthorg/engine/metrics.py Adds accuracy_effort_ratio to task completion metrics and extracts from metadata.
src/synthorg/engine/hybrid_models.py Adds optional trajectory scoring config to HybridLoopConfig.
src/synthorg/engine/health/triage.py Implements conservative triage filter rules + event logging.
src/synthorg/engine/health/pipeline.py Composes judge + triage + notification sink with best-effort error handling.
src/synthorg/engine/health/models.py Adds escalation ticket models/enums (frozen, metadata snapshotting).
src/synthorg/engine/health/judge.py Implements sensitive health judge ticket emission logic.
src/synthorg/engine/health/init.py Public exports for health monitoring components.
src/synthorg/budget/cost_record.py Adds accuracy_effort_ratio field to per-call cost records.
src/synthorg/budget/coordination_metrics.py Extends coordination metrics models and pure compute functions (5 → 9).
CLAUDE.md Updates component inventory and logging/event constant guidance to include new domains.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +165 to +173
# Find majority fingerprint set.
counter: Counter[tuple[str, ...]] = Counter(fingerprint_sets)
majority_fp, _ = counter.most_common(1)[0]

result: dict[int, bool] = {}
filtered_count = 0
for candidate, fps in zip(candidates, fingerprint_sets, strict=True):
is_consistent = fps == majority_fp
result[candidate.candidate_index] = is_consistent
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The self-consistency filter treats the most-common fingerprint set as a "majority" even when there is no strict majority (e.g., K=2 with two different fingerprint sets, or a tie across multiple sets). In those tie cases, this will arbitrarily mark only the first fingerprint set as consistent and can force selection of a worse candidate because select_best() prefers consistent candidates. Consider detecting ties (no count > K/2, or multiple sets share the top count) and disabling filtering / marking all candidates consistent in that case.

Copilot uses AI. Check for mistakes.
Comment on lines +120 to +124
logger.warning(
"execution.metrics.accuracy_effort_unexpected_type",
type=type(ae_data).__name__,
task_id=result.task_id,
)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This log call uses a raw string event name ("execution.metrics.accuracy_effort_unexpected_type"). Repo logging conventions require importing event constants from synthorg.observability.events.* (see CLAUDE.md logging section). Please add an appropriate constant (e.g., under observability/events/execution.py or a dedicated events module) and use it here.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +56
accuracy_effort_ratio: float | None = Field(
default=None,
ge=0.0,
description=(
"Accuracy-effort ratio for the task this call belongs to "
"(populated at task completion when quality signals are available)"
),
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CostRecord is an immutable per-API-call record, but the new accuracy_effort_ratio field is documented as "populated at task completion". As-is, there is no code path constructing CostRecord with this field set, and the append-only/frozen design makes post-hoc backfilling difficult. Either implement a concrete mechanism to populate it (e.g., include the ratio at record creation time via a task-level cache, or store this metric in a task-level summary model instead of per-call records), or update the field docs to reflect the actual behavior.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 91.97861% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.31%. Comparing base (c845d22) to head (8f971d2).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/budget/coordination_metrics.py 91.48% 6 Missing and 2 partials ⚠️
src/synthorg/engine/health/triage.py 73.33% 6 Missing and 2 partials ⚠️
src/synthorg/engine/trajectory/budget_guard.py 68.42% 4 Missing and 2 partials ⚠️
src/synthorg/engine/health/judge.py 75.00% 3 Missing and 1 partial ⚠️
src/synthorg/engine/metrics.py 63.63% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1099      +/-   ##
==========================================
+ Coverage   89.29%   89.31%   +0.02%     
==========================================
  Files         770      787      +17     
  Lines       45719    46091     +372     
  Branches     4597     4624      +27     
==========================================
+ Hits        40826    41168     +342     
- Misses       4048     4070      +22     
- Partials      845      853       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/budget/coordination_metrics.py`:
- Around line 583-585: The guard clauses that raise ValueError for invalid
inputs (e.g., the check on agent_durations that raises
ValueError("agent_durations must not be empty") and the similar guards around
lines 620-631) must log the failure first; update those checks to call the
module/class logger (e.g., logger or self.logger as used elsewhere in this
module) with WARNING or ERROR including contextual details (the variable name,
its value/len, and any relevant ids) immediately before raising the ValueError
so the runtime telemetry captures the cause and context.

In `@src/synthorg/budget/cost_record.py`:
- Around line 50-57: Add the new accuracy_effort_ratio field to the class
docstring Attributes section so the public API is fully documented: update the
class docstring (the Attributes block in the class containing
accuracy_effort_ratio) to include a line describing accuracy_effort_ratio: its
type (float | None), meaning ("Accuracy-effort ratio for the task this call
belongs to"), when it is populated ("populated at task completion when quality
signals are available"), and any constraints (>= 0.0). Ensure the new entry is
placed after call_category to match the fields ordering.

In `@src/synthorg/engine/health/judge.py`:
- Around line 37-43: Validate the quality_degradation_threshold in
Judge.__init__: when the quality_degradation_threshold argument (and thus
self._quality_threshold) is non-positive (<= 0) either raise a ValueError with a
clear message or coerce it to a sane minimum (e.g., 1) so the downstream
escalation check that uses self._quality_threshold (the escalation check around
line ~101) does not produce persistent false positives; update the __init__ to
perform this guard against non-positive values and document the behavior in the
constructor's docstring or error message.

In `@src/synthorg/engine/health/pipeline.py`:
- Around line 91-109: The current process() catch-all swallows exceptions and
returns None, losing "escalated ticket" state; change flow to call and assign
the result = await self._process_inner(...), then only catch and log non-fatal
exceptions without overwriting or dropping an already-emitted escalation: if an
exception occurs during notification delivery, log HEALTH_PIPELINE_ERROR (as
done) but either re-raise the exception or return the previously obtained result
when result indicates an escalated ticket (use whatever flag/property on result
denotes escalation), ensuring MemoryError/RecursionError still propagate and
that process() does not return None for cases where a ticket was already
emitted.

In `@src/synthorg/engine/health/triage.py`:
- Around line 76-83: Change the dismissal log calls from DEBUG to INFO: replace
logger.debug(...) that emits HEALTH_TICKET_DISMISSED with logger.info(...) and
keep the same structured fields (ticket_id=ticket.id,
severity=ticket.severity.value, cause=ticket.cause.value, reason=...) so
dismissal state transitions are logged at INFO; apply the same change to the
other dismissal logging block around the HEALTH_TICKET_DISMISSED usage (the
similar logger.debug call near the 101-109 region) so all dismissal transitions
follow the “state transitions must log at INFO” guideline.
- Around line 41-48: The constructor for the triage class accepts
stall_duration_threshold and steps_threshold without validation; add boundary
checks in __init__ to validate that stall_duration_threshold is a non-negative
number (float/int) and steps_threshold is a non-negative integer (or at least >=
0), and raise a ValueError with a clear message if either is invalid so the
class fails fast; update assignment to self._stall_threshold and
self._steps_threshold only after these checks and include the parameter names
(stall_duration_threshold, steps_threshold) and attribute names
(_stall_threshold, _steps_threshold) in the error text for easier debugging.

In `@src/synthorg/engine/metrics.py`:
- Around line 70-76: The class docstring's Attributes section is missing
documentation for the new field accuracy_effort_ratio; update the class
docstring (the Google-style "Attributes" block for the class that defines
accuracy_effort_ratio) to include a line documenting accuracy_effort_ratio: type
float | None and a brief description like "Accuracy-effort ratio from step-level
quality signals (None when quality signals are unavailable)", matching the Field
description and style of the other attributes.
- Around line 120-124: Replace the hardcoded event string in the logger.warning
call in metrics.py with a constant from synthorg.observability.events: add a new
constant (e.g., EXECUTION_METRICS_ACCURACY_EFFORT_UNEXPECTED_TYPE or
EXECUTION_METRICS_UNEXPECTED_TYPE) in an appropriate events module (like
events/execution.py or events/metrics.py) inside synthorg.observability.events,
import that constant into src/synthorg/engine/metrics.py, and use it in the
logger.warning(...) call (replacing the literal
"execution.metrics.accuracy_effort_unexpected_type") while keeping the existing
kwargs (type=..., task_id=...).

In `@src/synthorg/engine/quality/effort.py`:
- Around line 43-45: The empty-input guard that checks "if not signals" and
raises ValueError currently lacks logging; before raising the ValueError, emit a
warning or error log (e.g., logger.warning or logger.error) including the same
message ("Cannot compute accuracy-effort ratio from empty signals") and
contextual data (the variable name signals, caller context or any request id
available) so the error path is recorded; update the function containing the "if
not signals" branch to log at WARNING/ERROR with context and then raise the
ValueError.

In `@src/synthorg/engine/trajectory/budget_guard.py`:
- Around line 16-42: check_trajectory_budget lacks validation for k and margin;
ensure k is an integer >= 1 and margin is within [0.0, 1.0) (reject margin < 0.0
or margin >= 1.0) before computing required/available, and if invalid call the
same TRAJECTORY_BUDGET_GUARD_BLOCKED logger path (include remaining_budget,
estimated_step_cost, k, margin and a reason like "invalid k" or "invalid
margin") and return False; update the validation at the start of
check_trajectory_budget to enforce these constraints and short-circuit on
failure.

In `@src/synthorg/engine/trajectory/scorer.py`:
- Around line 61-63: The empty-candidate branch in scorer.py (the "if not
candidates" check) raises ValueError without logging; before raising, log the
error at WARNING or ERROR with context (e.g., function/method name, candidates
length/type, any relevant IDs) using the module or instance logger (e.g., logger
or self.logger) and then raise the ValueError; if no logger exists, obtain one
via logging.getLogger(__name__) to satisfy the guideline that all error paths
must log before raising.
- Around line 165-173: The current majority-vote uses Counter.most_common(1)
which can pick an arbitrary tie and bias select_best(); change the logic in
scorer.py around the fingerprint majority calculation to call
counter.most_common(2), compute whether the top count > len(candidates) // 2
(strict >50% of candidates), and only mark candidates whose fps == majority_fp
as consistent when that strict majority exists—otherwise set all entries in
result (keys are candidate.candidate_index) to False to indicate no clear
majority; additionally, add a logger.warning(...) immediately before the
existing raise ValueError in the empty-candidate check (the check referenced at
line 61) so the error path is logged.

In `@tests/unit/budget/test_coordination_metrics_properties.py`:
- Around line 25-26: Remove the explicit Hypothesis decorators that hardcode
max_examples and derandomize so the test-suite can honor the HYPOTHESIS_PROFILE
from tests/conftest.py; specifically, delete the `@settings`(max_examples=10,
derandomize=True) decorators from the property tests (e.g.,
test_max_speedup_monotonically_increases and the three other property-test
functions that currently have those decorators) so they use the configured
profile/default settings instead of forcing 10 examples and deterministic
seeding.

In `@tests/unit/engine/trajectory/test_scorer_properties.py`:
- Around line 84-131: These property tests hardcode Hypothesis settings
(max_examples=10, derandomize=True) which override the global profile; remove
the explicit settings arguments from the `@settings` decorator in
test_selection_is_deterministic, test_joint_score_ordering_consistent (and the
other property tests like test_len_score_always_non_positive) so they rely on
the project-wide Hypothesis profile configured in tests/conftest.py; keep the
existing `@given` and strategy definitions and the use of
TrajectoryScorer.select_best and TrajectoryScorer.score_candidates unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1c4a8e75-cbc4-489a-b796-8f47717da610

📥 Commits

Reviewing files that changed from the base of the PR and between c845d22 and df80fc1.

📒 Files selected for processing (42)
  • CLAUDE.md
  • src/synthorg/budget/coordination_metrics.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/engine/health/__init__.py
  • src/synthorg/engine/health/judge.py
  • src/synthorg/engine/health/models.py
  • src/synthorg/engine/health/pipeline.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/engine/hybrid_models.py
  • src/synthorg/engine/metrics.py
  • src/synthorg/engine/quality/__init__.py
  • src/synthorg/engine/quality/classifier.py
  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/quality/models.py
  • src/synthorg/engine/trajectory/__init__.py
  • src/synthorg/engine/trajectory/budget_guard.py
  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/notifications/models.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/trajectory.py
  • tests/unit/budget/test_coordination_metrics.py
  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/health/__init__.py
  • tests/unit/engine/health/test_judge.py
  • tests/unit/engine/health/test_models.py
  • tests/unit/engine/health/test_pipeline.py
  • tests/unit/engine/health/test_triage.py
  • tests/unit/engine/quality/__init__.py
  • tests/unit/engine/quality/test_classifier.py
  • tests/unit/engine/quality/test_effort.py
  • tests/unit/engine/quality/test_effort_properties.py
  • tests/unit/engine/quality/test_models.py
  • tests/unit/engine/trajectory/__init__.py
  • tests/unit/engine/trajectory/conftest.py
  • tests/unit/engine/trajectory/test_budget_guard.py
  • tests/unit/engine/trajectory/test_models.py
  • tests/unit/engine/trajectory/test_scorer.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
  • tests/unit/observability/test_events.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Agent
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses)—ruff enforces this on Python 3.14.
Type hints: all public functions required, mypy strict mode enforced.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Variable name: always logger (not _logger, not log).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.

Files:

  • src/synthorg/notifications/models.py
  • src/synthorg/engine/trajectory/__init__.py
  • tests/unit/observability/test_events.py
  • src/synthorg/engine/metrics.py
  • tests/unit/engine/trajectory/test_budget_guard.py
  • src/synthorg/budget/cost_record.py
  • tests/unit/engine/trajectory/conftest.py
  • src/synthorg/observability/events/quality.py
  • tests/unit/engine/health/test_triage.py
  • src/synthorg/engine/health/__init__.py
  • src/synthorg/engine/quality/__init__.py
  • tests/unit/engine/health/test_models.py
  • src/synthorg/observability/events/health.py
  • tests/unit/engine/health/test_pipeline.py
  • tests/unit/budget/test_coordination_metrics_properties.py
  • src/synthorg/observability/events/coordination_metrics.py
  • tests/unit/engine/trajectory/test_models.py
  • src/synthorg/engine/hybrid_models.py
  • tests/unit/engine/quality/test_effort.py
  • tests/unit/engine/quality/test_effort_properties.py
  • tests/unit/budget/test_coordination_metrics.py
  • src/synthorg/engine/trajectory/budget_guard.py
  • src/synthorg/observability/events/trajectory.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
  • tests/unit/engine/quality/test_classifier.py
  • src/synthorg/engine/quality/effort.py
  • tests/unit/engine/quality/test_models.py
  • src/synthorg/engine/health/triage.py
  • tests/unit/engine/health/test_judge.py
  • src/synthorg/engine/trajectory/scorer.py
  • tests/unit/engine/trajectory/test_scorer.py
  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/health/judge.py
  • src/synthorg/engine/health/pipeline.py
  • src/synthorg/engine/quality/models.py
  • src/synthorg/engine/quality/classifier.py
  • src/synthorg/engine/health/models.py
  • src/synthorg/budget/coordination_metrics.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(name)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr).
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events. import EVENT_CONSTANT
Structured kwargs: always logger.info(EVENT, key=value)—never logger.info("msg %s", val)
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError.
Non-retryable errors raise immediately without retry.
RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.
Rate limiter respects RateLimitError.retry_after from providers—automatically pauses future requests.

Files:

  • src/synthorg/notifications/models.py
  • src/synthorg/engine/trajectory/__init__.py
  • src/synthorg/engine/metrics.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/engine/health/__init__.py
  • src/synthorg/engine/quality/__init__.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/engine/hybrid_models.py
  • src/synthorg/engine/trajectory/budget_guard.py
  • src/synthorg/observability/events/trajectory.py
  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/health/judge.py
  • src/synthorg/engine/health/pipeline.py
  • src/synthorg/engine/quality/models.py
  • src/synthorg/engine/quality/classifier.py
  • src/synthorg/engine/health/models.py
  • src/synthorg/budget/coordination_metrics.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/notifications/models.py
  • src/synthorg/engine/trajectory/__init__.py
  • src/synthorg/engine/metrics.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/engine/health/__init__.py
  • src/synthorg/engine/quality/__init__.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/engine/hybrid_models.py
  • src/synthorg/engine/trajectory/budget_guard.py
  • src/synthorg/observability/events/trajectory.py
  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/health/judge.py
  • src/synthorg/engine/health/pipeline.py
  • src/synthorg/engine/quality/models.py
  • src/synthorg/engine/quality/classifier.py
  • src/synthorg/engine/health/models.py
  • src/synthorg/budget/coordination_metrics.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Async: asyncio_mode = "auto"—no manual @pytest.mark.asyncio needed.
Timeout: 30 seconds per test (global in pyproject.toml—do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer @pytest.mark.parametrize for testing similar cases.
Tests must use test-provider, test-small-001, etc. (not real vendor names).
Property-based testing: Python uses Hypothesis (@given + @settings).
Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Controlled via HYPOTHESIS_PROFILE env var.
Never skip, dismiss, or ignore flaky tests—always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/observability/test_events.py
  • tests/unit/engine/trajectory/test_budget_guard.py
  • tests/unit/engine/trajectory/conftest.py
  • tests/unit/engine/health/test_triage.py
  • tests/unit/engine/health/test_models.py
  • tests/unit/engine/health/test_pipeline.py
  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/trajectory/test_models.py
  • tests/unit/engine/quality/test_effort.py
  • tests/unit/engine/quality/test_effort_properties.py
  • tests/unit/budget/test_coordination_metrics.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
  • tests/unit/engine/quality/test_classifier.py
  • tests/unit/engine/quality/test_models.py
  • tests/unit/engine/health/test_judge.py
  • tests/unit/engine/trajectory/test_scorer.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/observability/test_events.py
  • tests/unit/engine/trajectory/test_budget_guard.py
  • tests/unit/engine/trajectory/conftest.py
  • tests/unit/engine/health/test_triage.py
  • tests/unit/engine/health/test_models.py
  • tests/unit/engine/health/test_pipeline.py
  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/trajectory/test_models.py
  • tests/unit/engine/quality/test_effort.py
  • tests/unit/engine/quality/test_effort_properties.py
  • tests/unit/budget/test_coordination_metrics.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
  • tests/unit/engine/quality/test_classifier.py
  • tests/unit/engine/quality/test_models.py
  • tests/unit/engine/health/test_judge.py
  • tests/unit/engine/trajectory/test_scorer.py
🧠 Learnings (52)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • src/synthorg/engine/trajectory/__init__.py
  • src/synthorg/engine/health/__init__.py
  • src/synthorg/engine/quality/__init__.py
  • src/synthorg/engine/trajectory/models.py
  • CLAUDE.md
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/trajectory.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/trajectory.py
  • CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls

Applied to files:

  • tests/unit/observability/test_events.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • tests/unit/observability/test_events.py
  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • tests/unit/engine/trajectory/test_budget_guard.py
  • src/synthorg/engine/trajectory/budget_guard.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • tests/unit/engine/trajectory/test_budget_guard.py
  • src/synthorg/engine/trajectory/budget_guard.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging

Applied to files:

  • src/synthorg/observability/events/quality.py
  • src/synthorg/observability/events/trajectory.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Applied to files:

  • src/synthorg/engine/health/__init__.py
  • src/synthorg/engine/health/pipeline.py
  • CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/security/**/*.py : Security package (security/): SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)

Applied to files:

  • src/synthorg/engine/health/__init__.py
  • src/synthorg/engine/health/triage.py
  • CLAUDE.md
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/health.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to tests/**/*.py : Property-based testing: Python uses Hypothesis (given + settings).

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/quality/test_effort_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/quality/test_effort_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-31T16:58:29.857Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing: `given` + `settings`; dev profile has 1000 examples (via `HYPOTHESIS_PROFILE=dev env var`), ci profile (default) has 50

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/quality/test_effort_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing in Python with `given` + `settings`. Control via `HYPOTHESIS_PROFILE` env var (dev: 1000 examples, ci: 200 examples).

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/quality/test_effort_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/quality/test_effort_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state

Applied to files:

  • tests/unit/engine/trajectory/test_models.py
  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/quality/models.py
  • src/synthorg/engine/health/models.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves

Applied to files:

  • tests/unit/engine/trajectory/test_models.py
  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/quality/models.py
  • src/synthorg/engine/health/models.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/budget/test_coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: CI runs 10 deterministic examples per property test (derandomize=True—same inputs every run, no flakes).

Applied to files:

  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-31T14:28:28.895Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T14:28:28.895Z
Learning: Applies to src/synthorg/**/*.py : Maintain 80% minimum test coverage (enforced in CI)

Applied to files:

  • tests/unit/engine/quality/test_models.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/security/**/*.py : Security module includes SecOps agent, rule engine (soft-allow/hard-deny), audit log, output scanner, risk classifier, autonomy levels (4 strategies), timeout policies.

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/quality/models.py
  • src/synthorg/engine/health/models.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/core/**/*.py : Core module must contain shared domain models, base classes, resilience config (RetryConfig, RateLimiterConfig)

Applied to files:

  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/quality/models.py
📚 Learning: 2026-04-01T09:37:49.451Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:37:49.451Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with `model_copy(update=...)` for runtime state that evolves

Applied to files:

  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`

Applied to files:

  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/quality/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.

Applied to files:

  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/quality/models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/quality/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/health/models.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)

Applied to files:

  • src/synthorg/engine/health/models.py
  • CLAUDE.md
📚 Learning: 2026-04-06T16:35:12.934Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: All project conventions, commands, and standards are defined in CLAUDE.md - refer to it for project structure, package layout, code conventions, quick commands, git workflow, testing standards, design specifications, logging, resilience, and security patterns

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine must provide: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection), promotion/demotion

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/api/**/*.py : REST API: Litestar framework, controllers with guards, channels for WebSocket, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint. RFC 9457 structured errors (ErrorCategory, ErrorCode, ErrorDetail, ProblemDetail, CATEGORY_TITLES, category_title, category_type_uri, content negotiation).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have `from synthorg.observability import get_logger` followed by `logger = get_logger(__name__)`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import logger via `from synthorg.observability import get_logger` and initialize with `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)

Applied to files:

  • CLAUDE.md

Comment on lines +50 to +57
accuracy_effort_ratio: float | None = Field(
default=None,
ge=0.0,
description=(
"Accuracy-effort ratio for the task this call belongs to "
"(populated at task completion when quality signals are available)"
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Update the class docstring to document the new field.

The accuracy_effort_ratio field is not documented in the class docstring's Attributes section (lines 23-34). Google-style docstrings require documenting all public attributes.

📝 Proposed docstring update

Add to the Attributes section (after call_category):

         call_category: Optional LLM call category (productive,
             coordination, system, embedding).
+        accuracy_effort_ratio: Accuracy-effort ratio for the task
+            (populated at task completion when quality signals are
+            available, ``None`` otherwise).
     """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/cost_record.py` around lines 50 - 57, Add the new
accuracy_effort_ratio field to the class docstring Attributes section so the
public API is fully documented: update the class docstring (the Attributes block
in the class containing accuracy_effort_ratio) to include a line describing
accuracy_effort_ratio: its type (float | None), meaning ("Accuracy-effort ratio
for the task this call belongs to"), when it is populated ("populated at task
completion when quality signals are available"), and any constraints (>= 0.0).
Ensure the new entry is placed after call_category to match the fields ordering.

Comment on lines +165 to +173
# Find majority fingerprint set.
counter: Counter[tuple[str, ...]] = Counter(fingerprint_sets)
majority_fp, _ = counter.most_common(1)[0]

result: dict[int, bool] = {}
filtered_count = 0
for candidate, fps in zip(candidates, fingerprint_sets, strict=True):
is_consistent = fps == majority_fp
result[candidate.candidate_index] = is_consistent
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify implementation pattern in scorer and inspect tie-sensitive logic.
rg -n -C3 'most_common\(1\)\[0\]|most_common\(2\)|is_consistent = fps == majority_fp|consistent_scores = \[s for s in scores if s.consistent\]' src/synthorg/engine/trajectory/scorer.py

# Demonstrate Counter tie behavior used by this pattern.
python - <<'PY'
from collections import Counter
votes = [("fp_a",), ("fp_b",)]  # tie
c = Counter(votes)
winner, count = c.most_common(1)[0]
print("most_common(1):", winner, count)
print("flags:", [v == winner for v in votes])
PY

Repository: Aureliolo/synthorg

Length of output: 896


🏁 Script executed:

cat -n src/synthorg/engine/trajectory/scorer.py | head -80

Repository: Aureliolo/synthorg

Length of output: 3038


🏁 Script executed:

# Get lines around _check_consistency to see full function
sed -n '155,190p' src/synthorg/engine/trajectory/scorer.py | cat -n

Repository: Aureliolo/synthorg

Length of output: 1515


🏁 Script executed:

# Check score_candidates for empty-input error path
rg -n -A10 'def score_candidates' src/synthorg/engine/trajectory/scorer.py | head -30

Repository: Aureliolo/synthorg

Length of output: 410


🏁 Script executed:

# Verify if there's any logging in the file and check the import structure
rg -n 'from synthorg.observability|import logging|logger' src/synthorg/engine/trajectory/scorer.py

Repository: Aureliolo/synthorg

Length of output: 359


Tie votes are treated as arbitrary majority, biasing candidate selection.

most_common(1) picks one tied fingerprint set arbitrarily, marking one candidate "consistent" and others not. This conflicts with the docstring promise of "Majority-vote" and forces select_best() at line 123 to choose based on input order rather than actual vote clarity.

Additionally, the empty-candidate check at line 61 raises ValueError without logging, violating the error-path logging requirement.

Fixes needed:

  1. Detect ties (most_common(2)) and apply strict majority threshold (> 50% of candidates) before marking consistency. If no clear majority, mark all candidates inconsistent.
  2. Add logger.warning() before the raise ValueError at line 61.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/trajectory/scorer.py` around lines 165 - 173, The current
majority-vote uses Counter.most_common(1) which can pick an arbitrary tie and
bias select_best(); change the logic in scorer.py around the fingerprint
majority calculation to call counter.most_common(2), compute whether the top
count > len(candidates) // 2 (strict >50% of candidates), and only mark
candidates whose fps == majority_fp as consistent when that strict majority
exists—otherwise set all entries in result (keys are candidate.candidate_index)
to False to indicate no clear majority; additionally, add a logger.warning(...)
immediately before the existing raise ValueError in the empty-candidate check
(the check referenced at line 61) so the error path is logged.

@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
src/synthorg/engine/health/triage.py (1)

47-54: ⚠️ Potential issue | 🟡 Minor

Threshold validation added; missing log before raising.

The validation for non-negative thresholds has been added per the prior review. However, the ValueError is raised without logging context first.

As per coding guidelines: "All error paths must log at WARNING or ERROR with context before raising."

🛡️ Proposed fix to add logging before raising
     def __init__(
         self,
         *,
         stall_duration_threshold: float = (_MEDIUM_STALL_DURATION_THRESHOLD),
         steps_threshold: int = _MEDIUM_STEPS_THRESHOLD,
     ) -> None:
         if stall_duration_threshold < 0:
             msg = (
                 f"stall_duration_threshold must be >= 0, got {stall_duration_threshold}"
             )
+            logger.warning(
+                HEALTH_TICKET_DISMISSED,
+                error=msg,
+                stall_duration_threshold=stall_duration_threshold,
+            )
             raise ValueError(msg)
         if steps_threshold < 0:
             msg = f"steps_threshold must be >= 0, got {steps_threshold}"
+            logger.warning(
+                HEALTH_TICKET_DISMISSED,
+                error=msg,
+                steps_threshold=steps_threshold,
+            )
             raise ValueError(msg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/health/triage.py` around lines 47 - 54, Add a log call
before each ValueError is raised for the threshold checks: when validating
stall_duration_threshold and steps_threshold (the variables referenced in the if
blocks), emit a WARNING or ERROR with context (including the invalid value and
which threshold failed) using the module logger (existing logger or
logging.getLogger(__name__)), then raise the ValueError as before; update the
checks around stall_duration_threshold and steps_threshold in triage.py to log
the problem before raising.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/budget/coordination_metrics.py`:
- Around line 570-599: compute_straggler_gap currently only rejects an empty
sequence but must validate each sample before aggregation: iterate over
agent_durations and ensure each agent_id is a non-blank identifier (use
NotBlankStr from core.types or enforce the repo's NotBlankStr contract) and each
duration is a non-negative float; on the first invalid entry log the same
COORD_METRICS_VALIDATION_ERROR with parameter="agent_durations" and a
descriptive error message then raise ValueError, only after this per-item
validation proceed to compute slowest_id/slowest_dur (from max(...)) and
mean_dur; update signatures or local checks to treat agent_id as
NotBlankStr-equivalent to keep public ID types aligned.
- Around line 628-663: The current guards only check <= 0 so NaN/Inf slip
through; update the validation for tokens_mas, tokens_sas, duration_mas, and
duration_sas to first ensure each value is finite using math.isfinite(...) and
log the same COORD_METRICS_VALIDATION_ERROR with parameter/name and value before
raising ValueError (e.g., "tokens_mas must be finite and positive"); keep the
existing <= 0 checks but combine or sequence them so non-finite values trigger
the logger+ValueError before constructing TokenSpeedupRatio or any Pydantic
model.

---

Duplicate comments:
In `@src/synthorg/engine/health/triage.py`:
- Around line 47-54: Add a log call before each ValueError is raised for the
threshold checks: when validating stall_duration_threshold and steps_threshold
(the variables referenced in the if blocks), emit a WARNING or ERROR with
context (including the invalid value and which threshold failed) using the
module logger (existing logger or logging.getLogger(__name__)), then raise the
ValueError as before; update the checks around stall_duration_threshold and
steps_threshold in triage.py to log the problem before raising.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2acef51b-5b17-4082-9f12-408fe6ee8284

📥 Commits

Reviewing files that changed from the base of the PR and between df80fc1 and 76f10d5.

📒 Files selected for processing (17)
  • src/synthorg/budget/coordination_metrics.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/engine/health/judge.py
  • src/synthorg/engine/health/pipeline.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/engine/metrics.py
  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/budget_guard.py
  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/observability/events/execution.py
  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/health/test_pipeline.py
  • tests/unit/engine/trajectory/test_models.py
  • tests/unit/engine/trajectory/test_scorer.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses)—ruff enforces this on Python 3.14.
Type hints: all public functions required, mypy strict mode enforced.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Variable name: always logger (not _logger, not log).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.

Files:

  • src/synthorg/observability/events/execution.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/engine/metrics.py
  • tests/unit/budget/test_coordination_metrics_properties.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/engine/trajectory/budget_guard.py
  • tests/unit/engine/health/test_pipeline.py
  • src/synthorg/engine/quality/effort.py
  • tests/unit/engine/trajectory/test_models.py
  • tests/unit/engine/trajectory/test_scorer.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/engine/health/pipeline.py
  • src/synthorg/engine/health/judge.py
  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/budget/coordination_metrics.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(name)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr).
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events. import EVENT_CONSTANT
Structured kwargs: always logger.info(EVENT, key=value)—never logger.info("msg %s", val)
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError.
Non-retryable errors raise immediately without retry.
RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.
Rate limiter respects RateLimitError.retry_after from providers—automatically pauses future requests.

Files:

  • src/synthorg/observability/events/execution.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/engine/metrics.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/engine/trajectory/budget_guard.py
  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/engine/health/pipeline.py
  • src/synthorg/engine/health/judge.py
  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/budget/coordination_metrics.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/execution.py
  • src/synthorg/budget/cost_record.py
  • src/synthorg/engine/metrics.py
  • src/synthorg/observability/events/coordination_metrics.py
  • src/synthorg/engine/trajectory/budget_guard.py
  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/engine/health/pipeline.py
  • src/synthorg/engine/health/judge.py
  • src/synthorg/engine/trajectory/models.py
  • src/synthorg/budget/coordination_metrics.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow.
Async: asyncio_mode = "auto"—no manual @pytest.mark.asyncio needed.
Timeout: 30 seconds per test (global in pyproject.toml—do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Parametrize: Prefer @pytest.mark.parametrize for testing similar cases.
Tests must use test-provider, test-small-001, etc. (not real vendor names).
Property-based testing: Python uses Hypothesis (@given + @settings).
Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Controlled via HYPOTHESIS_PROFILE env var.
Never skip, dismiss, or ignore flaky tests—always fix them fully and fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/health/test_pipeline.py
  • tests/unit/engine/trajectory/test_models.py
  • tests/unit/engine/trajectory/test_scorer.py
  • tests/unit/engine/trajectory/test_scorer_properties.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/health/test_pipeline.py
  • tests/unit/engine/trajectory/test_models.py
  • tests/unit/engine/trajectory/test_scorer.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
🧠 Learnings (69)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Quality tier (Sonnet-class) include: code-reviewer, python-reviewer, frontend-reviewer, go-reviewer, conventions-enforcer, logging-audit, resilience-audit, and api-contract-drift
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls

Applied to files:

  • src/synthorg/engine/metrics.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging

Applied to files:

  • src/synthorg/engine/metrics.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Use event constants from `synthorg.observability.events.<domain>` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly and log with structured kwargs: `logger.info(EVENT, key=value)`, never interpolated strings

Applied to files:

  • src/synthorg/engine/metrics.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/engine/metrics.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals

Applied to files:

  • src/synthorg/engine/metrics.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • src/synthorg/engine/metrics.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/engine/metrics.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/engine/metrics.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/engine/metrics.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT

Applied to files:

  • src/synthorg/engine/metrics.py
  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing with `given` + `settings` decorators; control profiles via `HYPOTHESIS_PROFILE` env var (`ci` for 200 examples, `dev` for 1000 examples)

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-31T16:58:29.857Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing: `given` + `settings`; dev profile has 1000 examples (via `HYPOTHESIS_PROFILE=dev env var`), ci profile (default) has 50

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to tests/**/*.py : Use Hypothesis for property-based testing in Python with `given` + `settings`. Control via `HYPOTHESIS_PROFILE` env var (dev: 1000 examples, ci: 200 examples).

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Property-based testing: Python uses Hypothesis (given + settings). Hypothesis profiles: ci (200 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var. Run dev profile: HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties.

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to tests/**/*.py : Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Controlled via HYPOTHESIS_PROFILE env var.

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to tests/**/*.py : Property-based testing: Python uses Hypothesis (given + settings).

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-31T16:58:29.857Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:58:29.857Z
Learning: Run property tests with `HYPOTHESIS_PROFILE=dev uv run python -m pytest tests/ -m unit -n auto -k properties`

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/budget/test_coordination_metrics_properties.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module

Applied to files:

  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/coordination_metrics.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/engine/trajectory/budget_guard.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/engine/trajectory/budget_guard.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit

Applied to files:

  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions.

Applied to files:

  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-31T20:07:03.035Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:07:03.035Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions

Applied to files:

  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/scorer.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/engine/quality/effort.py
  • src/synthorg/engine/trajectory/scorer.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state

Applied to files:

  • tests/unit/engine/trajectory/test_models.py
  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves

Applied to files:

  • tests/unit/engine/trajectory/test_models.py
  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T20:14:00.937Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:14:00.937Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-01T14:22:06.315Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T14:22:06.315Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO level

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : All state transitions must log at INFO.

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions.

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising; log all state transitions at INFO; log object creation/internal flow/entry-exit at DEBUG

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to **/*.py : Use `except A, B:` (no parentheses) per PEP 758 exception syntax on Python 3.14

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (without parentheses) per PEP 758 for exception handling in Python 3.14

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (no parentheses) for exception handling — PEP 758 exception syntax enforced by ruff on Python 3.14

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Use PEP 758 except syntax with `except A, B:` (no parentheses) for multiple exceptions—ruff enforces this on Python 3.14.

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses)—ruff enforces this on Python 3.14.

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) — enforced by ruff on Python 3.14

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Handle errors explicitly; never silently swallow exceptions

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Handle errors explicitly, never silently swallow exceptions

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Handle errors explicitly—never silently swallow exceptions.

Applied to files:

  • src/synthorg/engine/health/pipeline.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-04-01T09:37:49.451Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:37:49.451Z
Learning: Applies to **/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models with `model_copy(update=...)` for runtime state that evolves

Applied to files:

  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`

Applied to files:

  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to **/*.py : Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.

Applied to files:

  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/core/**/*.py : Core module must contain shared domain models, base classes, resilience config (RetryConfig, RateLimiterConfig)

Applied to files:

  • src/synthorg/engine/trajectory/models.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: CI runs 10 deterministic examples per property test (derandomize=True—same inputs every run, no flakes).

Applied to files:

  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/engine/trajectory/test_scorer_properties.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

  • tests/unit/engine/trajectory/test_scorer_properties.py
🔇 Additional comments (27)
src/synthorg/engine/trajectory/budget_guard.py (2)

34-47: Validation for k and margin has been added as requested.

The input validation now properly guards against k < 1 and margin outside [0.0, 1.0], logging warnings with context before returning False. This addresses the prior review feedback.


16-73: Function exported but not yet wired—confirm follow-up PR scope.

check_trajectory_budget is correctly exported from the package but has no callers in the current codebase. Per PR objectives, integration into the K-candidate sampling pipeline is deferred to a follow-up PR.

src/synthorg/engine/quality/effort.py (2)

43-50: Empty-input error path now logs before raising.

The ValueError for empty signals is now preceded by a warning log with context (signal_count=0 and the error message). This addresses the prior review feedback.


25-92: LGTM!

The implementation correctly:

  • Validates non-empty input with proper logging before raising
  • Counts steps by quality category
  • Handles expected_steps <= 0 by falling back to actual total
  • Logs the computed metrics at INFO level
  • Detects the weak-model trap (high accuracy with very few steps)

The function aligns with the AccuracyEffortRatio model requirements (all 5 required fields provided).

src/synthorg/engine/metrics.py (3)

41-43: Docstring updated to document the new accuracy_effort_ratio field.

The class docstring now includes the new field with appropriate description. This addresses the prior review feedback.


124-128: Event constant now used instead of string literal.

The warning log now uses EXECUTION_METRICS_UNEXPECTED_TYPE imported from the domain events module instead of a hardcoded string. This addresses the prior review feedback.


118-137: Extraction logic is defensive and correct.

The code gracefully handles missing keys, type mismatches, and correct types:

  • ae_data is Noneae_ratio remains None
  • Wrong type → logs warning with context, leaves ae_ratio as None
  • Correct type → extracts .ratio

Upstream population of metadata["accuracy_effort"] is currently deferred, as expected. A follow-up PR will add wiring in the hybrid loop or plan executor to populate this key.

src/synthorg/budget/cost_record.py (2)

34-36: Docstring updated to document the new accuracy_effort_ratio field.

The class docstring now includes the new field with its type, meaning, and population timing. This addresses the prior review feedback.


53-60: Field definition is correct, but population mechanism does not exist in the codebase.

The field is properly defined with ge=0.0 validation and correct description. However, none of the three CostRecord instantiation sites (in cost_recording.py, llm_judge_quality_strategy.py, and memory/backends/mem0/adapter.py) pass accuracy_effort_ratio. Since CostRecord is a frozen model (immutable), this field cannot be populated post-creation—it can only be set during instantiation.

The accuracy-effort ratio is computed separately for TaskCompletionMetrics (extracted from execution metadata in engine/metrics.py), but CostRecord instances are created per-API-call, not per-task. The field's description suggests task-level population at completion, but the per-call, immutable design of CostRecord makes this infeasible with the current frozen model pattern.

Either the field should be populated at instantiation (requiring design changes to pass task metadata earlier), or this field should be removed in favor of tracking the ratio only in task-level metrics.

src/synthorg/engine/health/triage.py (2)

84-91: Dismissal transitions now log at INFO level.

Both dismissal paths (LOW severity and MEDIUM below thresholds) now use logger.info() instead of logger.debug(). This addresses the prior review feedback about state transitions logging at INFO.


58-118: LGTM!

The should_escalate method correctly implements the documented triage rules:

  • HIGH/CRITICAL: always escalate
  • LOW: always dismiss
  • MEDIUM: escalate if stall duration OR stuck steps meet thresholds

All decision paths log at INFO level with appropriate structured fields.

src/synthorg/engine/health/judge.py (3)

42-51: Validation for quality_degradation_threshold >= 1 has been added.

The constructor now validates that the threshold is at least 1, logging a warning before raising ValueError. This addresses the prior review feedback about guarding against non-positive thresholds.


54-133: LGTM!

The emit_ticket method correctly implements the escalation priority:

  1. Stagnation termination → HIGH severity
  2. Error with recovery → MEDIUM severity
  3. Quality degradation (consecutive INCORRECT) → HIGH or CRITICAL based on count

The severity escalation to CRITICAL when consecutive >= 2*threshold is a sensible design for severe quality degradation.


136-146: LGTM!

The _count_trailing_incorrect helper correctly counts consecutive INCORRECT signals from the end using reverse iteration with early break on non-INCORRECT.

src/synthorg/engine/health/pipeline.py (4)

137-151: Notification delivery failure now preserves ticket semantics.

The pipeline now:

  1. Attempts notification delivery with a try/except
  2. Re-raises MemoryError/RecursionError (fatal errors)
  3. Logs other failures with context including ticket_id and detail
  4. Still returns the ticket so callers know escalation occurred

This addresses the prior review feedback about losing escalated ticket state when delivery fails.


100-101: Correct use of PEP 758 except syntax.

The code correctly uses except MemoryError, RecursionError: (comma-separated, no parentheses) per Python 3.14 and project coding guidelines.

Also applies to: 141-142


67-108: LGTM!

The process method provides a clean public interface that:

  • Delegates to the inner implementation
  • Catches and logs unexpected exceptions without crashing
  • Properly propagates fatal errors (MemoryError, RecursionError)
  • Returns None on failures (consistent with "no ticket" semantic)

154-175: LGTM!

The _ticket_to_notification helper correctly maps:

  • Cause → Category (with SYSTEM fallback)
  • Severity → NotificationSeverity (with WARNING fallback)
  • Includes all relevant metadata fields for observability
src/synthorg/observability/events/execution.py (1)

101-103: LGTM!

The new event constant follows the established naming convention (execution.metrics.unexpected_type) and is correctly typed with Final[str]. This addresses the prior review feedback about using event constants instead of string literals in metrics.py.

tests/unit/engine/health/test_pipeline.py (1)

1-183: LGTM!

The test suite provides comprehensive end-to-end coverage for HealthMonitoringPipeline: stagnation escalation, completed (no-op), error+recovery dismissal/escalation thresholds, quality degradation, NotificationCategory.HEALTH usage, metadata propagation, and graceful sink-failure handling. The test doubles (_FakeSink, _FailingSink) are appropriately minimal and the assertions are focused.

tests/unit/budget/test_coordination_metrics_properties.py (1)

1-81: LGTM!

The property-based tests correctly rely on profile-based Hypothesis configuration via HYPOTHESIS_PROFILE environment variable (no hardcoded @settings decorators). The strategies are well-defined with appropriate bounds (allow_nan=False, allow_infinity=False), and the property invariants being tested (monotonic speedup, positive team size, non-negative gap) are meaningful.

src/synthorg/observability/events/coordination_metrics.py (1)

1-15: LGTM!

Event constants follow the established naming pattern (COORD_METRICS_*coordination.metrics.*) and are properly typed with Final[str]. The # noqa: S105 on line 10 correctly suppresses the false positive for "token" in the event name string.

tests/unit/engine/trajectory/test_models.py (1)

1-156: LGTM!

Comprehensive validation coverage for the trajectory models: defaults, bounds (k_candidates 2–5, budget_guard_margin 0.0–1.0, verbalized_confidence 0–100), immutability (frozen=True), and extra="forbid" rejection. The joint_score computed field test correctly validates the formula vc_score * abs(len_score) + len_score.

tests/unit/engine/trajectory/test_scorer_properties.py (1)

1-140: LGTM!

The property tests correctly rely on profile-based Hypothesis configuration without hardcoded @settings decorators. The three invariants tested (selection determinism, joint score formula consistency, len_score ≤ 0) provide good coverage of scorer behavior across randomized inputs.

tests/unit/engine/trajectory/test_scorer.py (1)

1-223: LGTM!

Excellent coverage of TrajectoryScorer behavior: single/multiple candidate selection, trace-length preference, VC scoring (log-space, None → 0.0, zero → -100.0 floor), majority-vote consistency filtering, and error handling for empty inputs. The tests align with the implementation logic in scorer.py and the _check_consistency algorithm.

src/synthorg/engine/trajectory/scorer.py (1)

1-220: LGTM!

The implementation correctly addresses previous review findings:

  • Empty-candidate error path now logs at WARNING before raising (lines 63–68).
  • Tie-vote handling uses strict >50% threshold (majority_count > len(candidates) / 2 at line 173), marking all candidates consistent when no clear majority exists—a sensible fallback that avoids over-aggressive filtering.

Logging follows guidelines: get_logger(__name__), structured kwargs, INFO for state transitions (TRAJECTORY_BEST_SELECTED, TRAJECTORY_CONSISTENCY_FILTERED), DEBUG for internal flow.

src/synthorg/engine/trajectory/models.py (1)

1-128: LGTM!

The trajectory models follow Pydantic v2 best practices: frozen=True for immutability, allow_inf_nan=False for numeric safety, extra="forbid" on config, @computed_field for derived joint_score, and NotBlankStr for the complexity_gate tuple. Field constraints (ge/le) appropriately bound k_candidates, budget_guard_margin, verbalized_confidence, and len_score.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
src/synthorg/engine/health/triage.py (1)

44-57: ⚠️ Potential issue | 🟠 Major

Validate threshold types before numeric comparisons.

stall_duration_threshold/steps_threshold are config-boundary inputs; non-numeric values currently raise TypeError before your warning + ValueError path, so failure mode is inconsistent and less diagnosable.

Proposed fix
     def __init__(
         self,
         *,
         stall_duration_threshold: float = (_MEDIUM_STALL_DURATION_THRESHOLD),
         steps_threshold: int = _MEDIUM_STEPS_THRESHOLD,
     ) -> None:
+        if not isinstance(stall_duration_threshold, int | float):
+            msg = (
+                "stall_duration_threshold must be int|float >= 0, "
+                f"got {type(stall_duration_threshold).__name__}"
+            )
+            logger.warning(HEALTH_TICKET_DISMISSED, error=msg)
+            raise ValueError(msg)
+        if not isinstance(steps_threshold, int):
+            msg = (
+                "steps_threshold must be int >= 0, "
+                f"got {type(steps_threshold).__name__}"
+            )
+            logger.warning(HEALTH_TICKET_DISMISSED, error=msg)
+            raise ValueError(msg)
         if stall_duration_threshold < 0:
             msg = (
                 f"stall_duration_threshold must be >= 0, got {stall_duration_threshold}"
             )

As per coding guidelines, “Validate at system boundaries (user input, external APIs, config files).”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/health/triage.py` around lines 44 - 57, The checks for
stall_duration_threshold and steps_threshold must validate types before doing
numeric comparisons; in the function where those params are accepted (the
constructor or function that declares stall_duration_threshold and
steps_threshold) add explicit type checks (e.g., isinstance(..., (int, float))
or numbers.Number) for both stall_duration_threshold and steps_threshold before
the < 0 comparisons, and when a non-numeric value is detected log using
logger.warning with HEALTH_TICKET_DISMISSED and a clear message referencing the
parameter name, then raise a TypeError with that message so the failure mode is
consistent and diagnosable (keep the existing negative-value checks/ValueError
after the type checks).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/budget/coordination_metrics.py`:
- Around line 385-400: The new metric model classes AmdahlCeiling, StragglerGap,
TokenSpeedupRatio, and MessageOverhead are not exported from the package API;
update the package-level exports by importing these names from their module
(where CoordinationEfficiency, ErrorAmplification, MessageDensity etc. are
imported) and add them to the __all__ (or existing export list) so external
consumers can import them for types and construction; ensure the import
statements mirror the pattern used for existing metrics and include each of the
four class names in the exported list.
- Around line 238-247: The validator method _validate_slowest_ge_mean currently
raises ValueError without logging; modify it to log a warning with context
(include slowest_duration_seconds and mean_duration_seconds and a short message)
immediately before raising so the failure is recorded in telemetry—use the
module logger (or the existing logger instance) to call logger.warning(...) and
keep the raised ValueError unchanged.

In `@src/synthorg/engine/health/triage.py`:
- Around line 51-54: Replace the use of HEALTH_TICKET_DISMISSED inside triage
class constructor logging with a dedicated triage-config event constant (e.g.,
HEALTH_TRIAGE_CONFIG_ERROR) to avoid polluting dismissal telemetry; locate the
logger.warning calls in __init__ that pass error=msg (the blocks around
HEALTH_TICKET_DISMISSED at the two noted spots) and change them to emit the new
event name (and consider using logger.error or logger.warning consistently for
config failures), creating the HEALTH_TRIAGE_CONFIG_ERROR constant if it doesn't
exist and reusing the same message payload (error=msg).

---

Duplicate comments:
In `@src/synthorg/engine/health/triage.py`:
- Around line 44-57: The checks for stall_duration_threshold and steps_threshold
must validate types before doing numeric comparisons; in the function where
those params are accepted (the constructor or function that declares
stall_duration_threshold and steps_threshold) add explicit type checks (e.g.,
isinstance(..., (int, float)) or numbers.Number) for both
stall_duration_threshold and steps_threshold before the < 0 comparisons, and
when a non-numeric value is detected log using logger.warning with
HEALTH_TICKET_DISMISSED and a clear message referencing the parameter name, then
raise a TypeError with that message so the failure mode is consistent and
diagnosable (keep the existing negative-value checks/ValueError after the type
checks).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 99bdf959-4dbd-49c6-8fe0-9dc0bb8eef47

📥 Commits

Reviewing files that changed from the base of the PR and between 76f10d5 and 1990f81.

📒 Files selected for processing (2)
  • src/synthorg/budget/coordination_metrics.py
  • src/synthorg/engine/health/triage.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses)—ruff enforces this on Python 3.14.
Type hints: all public functions required, mypy strict mode enforced.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Variable name: always logger (not _logger, not log).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.

Files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(name)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr).
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events. import EVENT_CONSTANT
Structured kwargs: always logger.info(EVENT, key=value)—never logger.info("msg %s", val)
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError.
Non-retryable errors raise immediately without retry.
RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.
Rate limiter respects RateLimitError.retry_after from providers—automatically pauses future requests.

Files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
🧠 Learnings (22)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Quality tier (Sonnet-class) include: code-reviewer, python-reviewer, frontend-reviewer, go-reviewer, conventions-enforcer, logging-audit, resilience-audit, and api-contract-drift
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Parallel tier (Haiku-class) include 14 reviewers: async-concurrency-reviewer, comment-analyzer, design-token-audit, docs-consistency, go-conventions-enforcer, go-security-reviewer, infra-reviewer, issue-resolution-verifier, persistence-reviewer, pr-test-analyzer, security-reviewer, silent-failure-hunter, test-quality-reviewer, and type-design-analyzer
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T20:14:00.937Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:14:00.937Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-01T14:22:06.315Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T14:22:06.315Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO level

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : All state transitions must log at INFO.

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions.

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-31T20:07:03.035Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:07:03.035Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions.

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising; log all state transitions at INFO; log object creation/internal flow/entry-exit at DEBUG

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to src/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `allow_inf_nan=False` in all `ConfigDict` declarations to reject `NaN`/`Inf` in numeric fields at validation time

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-02T11:49:21.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T11:49:21.202Z
Learning: Applies to **/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`) with `allow_inf_nan=False` in all `ConfigDict` declarations

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
🔇 Additional comments (9)
src/synthorg/engine/health/triage.py (1)

76-126: Severity routing and MEDIUM threshold boundaries look solid.

Coverage of all EscalationSeverity values and >= boundary behavior is clear and deterministic.

src/synthorg/budget/coordination_metrics.py (8)

1-24: Imports and module setup look correct.

The updated imports properly bring in math, Self, model_validator, NotBlankStr, and the observability event constant. Logger initialization follows project conventions.


165-209: AmdahlCeiling model is well-implemented.

The model correctly implements Amdahl's Law with:

  • Proper field constraint (lt=1.0) preventing division by zero in max_speedup
  • Sound derivation for recommended_team_size (solving for 90% of ceiling)
  • Defensive guard at line 205-206 for p <= 0 edge case

266-305: TokenSpeedupRatio model correctly implements alert threshold.

The model properly captures the token-cost-vs-latency tradeoff with computed ratio and alert fields. The 2.0 threshold aligns with the PR objective for alerting when tokens scale disproportionately to speedup.


307-344: MessageOverhead model correctly detects O(n²) growth.

The is_quadratic computed field properly compares message count against team_size² × threshold. Field constraints are appropriate (gt=0 for team_size, configurable threshold with sensible default).


547-568: compute_amdahl_ceiling delegates validation to Pydantic.

This is acceptable since the AmdahlCeiling model's Field(ge=0.0, lt=1.0) constraint handles boundary validation. A ValidationError with clear messaging will be raised for invalid inputs.


570-626: compute_straggler_gap properly validates inputs with logging.

The function correctly:

  • Rejects empty sequences with logged warning
  • Validates each entry for blank agent_id and non-finite/negative duration
  • Logs COORD_METRICS_VALIDATION_ERROR before raising ValueError

This addresses the previous review findings.


629-668: compute_token_speedup_ratio validates all inputs comprehensively.

The consolidated loop checking math.isfinite(value) or value <= 0 correctly catches NaN, Inf, zero, and negative values before model construction. Logging with parameter context ensures observability for invalid input scenarios.


671-691: compute_message_overhead is a thin wrapper with appropriate delegation.

The function relies on Pydantic's field-level validation (gt=0, ge=0, bounded threshold). This is consistent with compute_amdahl_ceiling's approach and acceptable for non-division operations where Pydantic provides clear error messages.

@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/budget/__init__.py`:
- Around line 24-35: Add the four new coordination metric classes to the module
exports by updating the __all__ list to include AmdahlCeiling,
CoordinationEfficiency, CoordinationMetrics, and CoordinationOverhead (inserted
in alphabetical order among existing entries) so consumers can import them via
from synthorg.budget import AmdahlCeiling, CoordinationEfficiency,
CoordinationMetrics, CoordinationOverhead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 670a9f77-9c4b-4527-bb27-cd4174ae2166

📥 Commits

Reviewing files that changed from the base of the PR and between 1990f81 and 32cb66c.

📒 Files selected for processing (4)
  • src/synthorg/budget/__init__.py
  • src/synthorg/budget/coordination_metrics.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/observability/events/health.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Deploy Preview
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
  • GitHub Check: Analyze (go)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
Use PEP 758 except syntax: except A, B: (no parentheses)—ruff enforces this on Python 3.14.
Type hints: all public functions required, mypy strict mode enforced.
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules).
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields—including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants—instead of manual whitespace validators.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff enforced).
Functions: < 50 lines, files < 800 lines.
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).
Variable name: always logger (not _logger, not log).
All error paths must log at WARNING or ERROR with context before raising.
All state transitions must log at INFO.
DEBUG for object creation, internal flow, entry/exit of key functions.
Pure data models, enums, and re-exports do NOT need logging.

Files:

  • src/synthorg/budget/__init__.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(name)
Never use import logging / logging.getLogger() / print() in application code. Exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr).
Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events. import EVENT_CONSTANT
Structured kwargs: always logger.info(EVENT, key=value)—never logger.info("msg %s", val)
All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError.
Non-retryable errors raise immediately without retry.
RetryExhaustedError signals that all retries failed—the engine layer catches this to trigger fallback chains.
Rate limiter respects RateLimitError.retry_after from providers—automatically pauses future requests.

Files:

  • src/synthorg/budget/__init__.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/budget/__init__.py
  • src/synthorg/observability/events/health.py
  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
🧠 Learnings (38)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Quality tier (Sonnet-class) include: code-reviewer, python-reviewer, frontend-reviewer, go-reviewer, conventions-enforcer, logging-audit, resilience-audit, and api-contract-drift
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-04-06T16:35:12.934Z
Learning: Agents in the Parallel tier (Haiku-class) include 14 reviewers: async-concurrency-reviewer, comment-analyzer, design-token-audit, docs-consistency, go-conventions-enforcer, go-security-reviewer, infra-reviewer, issue-resolution-verifier, persistence-reviewer, pr-test-analyzer, security-reviewer, silent-failure-hunter, test-quality-reviewer, and type-design-analyzer
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/budget/__init__.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/budget/__init__.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to {pyproject.toml,src/synthorg/__init__.py} : Update version in `pyproject.toml` (`[tool.commitizen].version`) and `src/synthorg/__init__.py` (`__version__`)

Applied to files:

  • src/synthorg/budget/__init__.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • src/synthorg/budget/__init__.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • src/synthorg/budget/__init__.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • src/synthorg/observability/events/health.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module

Applied to files:

  • src/synthorg/observability/events/health.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/observability/events/health.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/health.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • src/synthorg/observability/events/health.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/health.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT

Applied to files:

  • src/synthorg/observability/events/health.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/health.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging

Applied to files:

  • src/synthorg/observability/events/health.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals

Applied to files:

  • src/synthorg/observability/events/health.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Handle errors explicitly, never silently swallow. Validate at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T20:14:00.937Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T20:14:00.937Z
Learning: Applies to **/*.py : Validate: at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-01T14:22:06.315Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T14:22:06.315Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Validate at system boundaries (user input, external APIs, config files)

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Validate at system boundaries (user input, external APIs, config files).

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO.

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : All state transitions must log at INFO level

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : All state transitions must log at INFO level

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : All state transitions must log at INFO.

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions.

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-31T20:07:03.035Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:07:03.035Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • src/synthorg/engine/health/triage.py
  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use DEBUG logging for object creation, internal flow, and entry/exit of key functions.

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR with context before raising; log all state transitions at INFO; log object creation/internal flow/entry-exit at DEBUG

Applied to files:

  • src/synthorg/engine/health/triage.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time.

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to src/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `allow_inf_nan=False` in all `ConfigDict` declarations to reject `NaN`/`Inf` in numeric fields at validation time

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-02T11:49:21.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T11:49:21.202Z
Learning: Applies to **/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`) with `allow_inf_nan=False` in all `ConfigDict` declarations

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
📚 Learning: 2026-04-06T20:42:41.219Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T20:42:41.219Z
Learning: Applies to **/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/budget/coordination_metrics.py
🔇 Additional comments (14)
src/synthorg/observability/events/health.py (1)

1-9: LGTM!

Event constants follow the established pattern with proper Final[str] typing and a clear execution.health.* namespace. The module correctly omits logging as per guidelines for pure data/constant modules.

src/synthorg/engine/health/triage.py (3)

1-16: LGTM!

Module setup follows guidelines: proper docstring, correct imports from domain-specific event modules, and logger instantiation via get_logger(__name__).


42-65: LGTM!

Constructor properly validates thresholds at construction time, logs configuration errors with HEALTH_TRIAGE_CONFIG_ERROR (not HEALTH_TICKET_DISMISSED), and raises ValueError after logging—correctly addressing the boundary validation and event separation concerns from prior reviews.


67-127: LGTM!

The should_escalate method correctly implements the severity-based triage rules:

  • HIGH/CRITICAL always escalated
  • LOW always dismissed
  • MEDIUM escalated when stall OR steps thresholds are met

All state transitions log at INFO with structured kwargs and proper event constants. The >= threshold comparisons correctly match the docstring semantics ("stall < 60s AND steps < 5" dismissed).

src/synthorg/budget/coordination_metrics.py (10)

1-24: LGTM!

Module setup follows project conventions: proper logger initialization, event constants imported from domain-specific module, and necessary Pydantic/stdlib imports added.


165-209: LGTM!

AmdahlCeiling model correctly implements Amdahl's Law with proper field constraints (preventing division by zero), computed properties for max_speedup and recommended_team_size, and handles the edge case where p <= 0.


211-270: LGTM!

StragglerGap model correctly enforces the invariant that slowest_duration_seconds >= mean_duration_seconds via model_validator, uses NotBlankStr for the agent identifier, and properly logs validation failures before raising.


272-311: LGTM!

TokenSpeedupRatio model correctly computes the token-to-speedup ratio with proper positive constraints on inputs, and the alert threshold logic is clear.


313-350: LGTM!

MessageOverhead model correctly detects quadratic message growth patterns with a configurable threshold and appropriate field constraints.


352-406: LGTM!

CoordinationMetrics container properly extended with four new optional metric fields, maintaining consistency with existing patterns.


553-573: LGTM!

compute_amdahl_ceiling is a clean wrapper that delegates validation to the Pydantic model constraints.


576-632: LGTM!

compute_straggler_gap correctly validates inputs with proper logging before raising, handles the empty sequence case, validates each entry for blank IDs and invalid durations, and correctly computes the straggler metrics.


635-674: LGTM!

compute_token_speedup_ratio correctly validates all inputs for finiteness and positivity with proper logging before raising, preventing NaN/Inf from bypassing the error path.


677-697: LGTM!

compute_message_overhead is a clean wrapper that delegates validation to the Pydantic model constraints.

Comment on lines 24 to 35
from synthorg.budget.coordination_metrics import (
AmdahlCeiling,
CoordinationEfficiency,
CoordinationMetrics,
CoordinationOverhead,
ErrorAmplification,
MessageDensity,
MessageOverhead,
RedundancyRate,
StragglerGap,
TokenSpeedupRatio,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing __all__ exports for the new coordination metric classes.

The four new classes are imported but not added to __all__, breaking the public API contract. Consumers using from synthorg.budget import AmdahlCeiling will get an ImportError.

🐛 Proposed fix to add missing exports

Add the following entries to __all__ in alphabetical order:

     "AgentSpending",
+    "AmdahlCeiling",
     "AnomalyDetectionResult",
     "MessageDensity",
+    "MessageOverhead",
     "ModelDistribution",
     "SpendingSummary",
+    "StragglerGap",
     "SubscriptionConfig",
     "TeamBudget",
+    "TokenSpeedupRatio",
     "billing_period_start",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/__init__.py` around lines 24 - 35, Add the four new
coordination metric classes to the module exports by updating the __all__ list
to include AmdahlCeiling, CoordinationEfficiency, CoordinationMetrics, and
CoordinationOverhead (inserted in alphabetical order among existing entries) so
consumers can import them via from synthorg.budget import AmdahlCeiling,
CoordinationEfficiency, CoordinationMetrics, CoordinationOverhead.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview April 7, 2026 07:01 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit aac2029 into main Apr 7, 2026
34 checks passed
@Aureliolo Aureliolo deleted the feat/engine-intelligence branch April 7, 2026 07:25
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview April 7, 2026 07:25 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Apr 7, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.6.4](v0.6.3...v0.6.4)
(2026-04-07)


### Features

* analytics and metrics runtime pipeline
([#226](#226),
[#225](#225),
[#227](#227),
[#224](#224))
([#1127](#1127))
([ec57641](ec57641))
* engine intelligence -- quality signals, health monitoring, trajectory
scoring, coordination metrics
([#1099](#1099))
([aac2029](aac2029)),
closes [#697](#697)
[#707](#707)
[#705](#705)
[#703](#703)
* enterprise-grade auth -- HttpOnly cookie sessions, CSRF, lockout,
session limits
([#1102](#1102))
([d3022c7](d3022c7)),
closes [#1068](#1068)
* implement core tool categories and granular sub-constraints
([#1101](#1101))
([0611b53](0611b53)),
closes [#1034](#1034)
[#220](#220)
* memory evolution -- GraphRAG/consistency research +
SelfEditingMemoryStrategy
([#1036](#1036),
[#208](#208))
([#1129](#1129))
([a9acda3](a9acda3))
* security hardening -- sandbox, risk override, SSRF self-heal, DAST fix
([#1100](#1100))
([31e7273](31e7273)),
closes [#1098](#1098)
[#696](#696)
[#222](#222)
[#671](#671)


### Bug Fixes

* harden agent identity versioning post-review
([#1128](#1128))
([8eb2859](8eb2859)),
closes [#1076](#1076)


### Documentation

* engine architecture research
([#688](#688)
[#690](#690)
[#848](#848)
[#687](#687))
([#1114](#1114))
([59b31f9](59b31f9))


### Maintenance

* add .claudeignore and split CLAUDE.md for token optimization
([#1112](#1112))
([b0fbd18](b0fbd18))
* bump github.com/sigstore/protobuf-specs from 0.5.0 to 0.5.1 in /cli in
the all group
([#1106](#1106))
([73089c9](73089c9))
* bump jsdom from 29.0.1 to 29.0.2 in /site in the all group
([#1107](#1107))
([8e99dce](8e99dce))
* bump jsdom from 29.0.1 to 29.0.2 in /web in the all group
([#1108](#1108))
([ce8c749](ce8c749))
* bump python from `fb83750` to `6869258` in /docker/backend in the all
group ([#1104](#1104))
([4911726](4911726))
* bump python from `fb83750` to `6869258` in /docker/web in the all
group ([#1103](#1103))
([87bdf09](87bdf09))
* bump the all group across 1 directory with 4 updates
([#1111](#1111))
([f702464](f702464))
* bump the all group in /docker/sandbox with 2 updates
([#1105](#1105))
([05a91ca](05a91ca))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants