feat: department health, provider status, and activity feed endpoints#818
feat: department health, provider status, and activity feed endpoints#818
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
🧰 Additional context used🧠 Learnings (9)📚 Learning: 2026-03-15T21:32:02.880ZApplied to files:
📚 Learning: 2026-03-15T18:17:43.675ZApplied to files:
📚 Learning: 2026-03-24T20:58:10.766ZApplied to files:
📚 Learning: 2026-03-15T21:49:53.264ZApplied to files:
📚 Learning: 2026-03-24T20:58:10.766ZApplied to files:
📚 Learning: 2026-03-15T18:17:43.675ZApplied to files:
📚 Learning: 2026-03-24T20:58:10.766ZApplied to files:
📚 Learning: 2026-03-15T21:32:02.880ZApplied to files:
📚 Learning: 2026-03-15T12:00:18.113ZApplied to files:
🔇 Additional comments (2)
WalkthroughAdded in-memory provider health models and a concurrency-safe ProviderHealthTracker with recording, pruning, and 24h summary APIs. Exposed provider health via GET /api/v1/providers/{name}/health. Added department health aggregation and GET /api/v1/departments/{name}/health. Introduced an activity feed controller GET /api/v1/activities with filtering, time-window selection, merging of lifecycle events and task metrics, and pagination. Wired the tracker through auto-wiring into AppState, updated controller exports, added observability events, and included comprehensive unit tests and documentation updates. Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the API's monitoring capabilities by introducing new endpoints for department health, provider status, and an organization-wide activity feed. It provides critical real-time insights into system performance and operational events, enabling better visibility and management. The implementation emphasizes modularity, reuses existing data structures where appropriate, and leverages asynchronous programming for efficient data aggregation, ensuring that these new features are robust and scalable. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/providers/health.py`:
- Around line 110-220: The ProviderHealthTracker currently appends to _records
indefinitely; add a pruning mechanism (e.g. an async method named prune_expired)
that computes cutoff = (now or datetime.now(UTC)) -
timedelta(hours=_HEALTH_WINDOW_HOURS), acquires self._lock, removes records with
r.timestamp < cutoff from self._records, and returns the count removed; then
ensure long-running services call prune_expired periodically (via existing
scheduler or caller responsibility) to prevent unbounded memory growth while
keeping get_summary and get_all_summaries behavior unchanged.
In `@tests/unit/api/controllers/test_activities.py`:
- Around line 224-230: Remove the redundant patching of the lifecycle_events
property on fake_persistence (the with patch.object(..., "lifecycle_events",
new_callable=PropertyMock, return_value=fake_persistence.lifecycle_events)
block) since it does nothing; delete that context manager and its wrapper
indentation, leaving the rest of the test body (including the get_task_metrics
monkeypatch and subsequent assertions) at the top level within the test so the
failure injection via get_task_metrics still runs as intended.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 357e77c7-f5eb-43e4-864d-67e6fbe8f40f
📒 Files selected for processing (13)
src/synthorg/api/app.pysrc/synthorg/api/controllers/__init__.pysrc/synthorg/api/controllers/activities.pysrc/synthorg/api/controllers/departments.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/providers/health.pytests/unit/api/conftest.pytests/unit/api/controllers/test_activities.pytests/unit/api/controllers/test_departments_health.pytests/unit/api/controllers/test_provider_health.pytests/unit/providers/test_health.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotationsin Python code—Python 3.14 has PEP 649 native lazy annotations
Use PEP 758 except syntax:except A, B:(no parentheses) on Python 3.14—ruff enforces this
All public functions must have type hints; mypy strict mode is enforced
Use Google-style docstrings on all public classes and functions; enforced by ruff D rules
Create new objects instead of mutating existing ones—use immutability patterns. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction andMappingProxyTypewrapping for read-only enforcement.
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Use@computed_fieldfor derived values instead of storing redundant fields (e.g.TokenUsage.total_tokens). UseNotBlankStrfor all identifier/name fields including optional and tuple variants instead of manual whitespace validators.
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Line length limit is 88 characters (ruff standard)
Functions must be less than 50 lines; files must be less than 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Linter: ruff for Python (check + format). Runuv run ruff check src/ tests/anduv run ruff format src/ tests/
Type-checker: mypy for Python with strict mode. Runuv run mypy src/ tests/
Files:
src/synthorg/api/controllers/__init__.pytests/unit/api/conftest.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.pytests/unit/api/controllers/test_activities.pytests/unit/api/controllers/test_departments_health.pysrc/synthorg/api/controllers/departments.pysrc/synthorg/api/controllers/activities.pytests/unit/providers/test_health.pytests/unit/api/controllers/test_provider_health.pysrc/synthorg/providers/health.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Useloggeras the logging variable name (not_loggerorlog)
Always use event constants fromsynthorg.observability.events.<domain>modules (e.g.,API_REQUEST_STARTEDfromevents.api,TOOL_INVOKE_STARTfromevents.tool). Import directly and uselogger.info(EVENT, key=value)in structured kwargs format—neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
DEBUG logging for object creation, internal flow, entry/exit of key functions
Files:
src/synthorg/api/controllers/__init__.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/api/controllers/departments.pysrc/synthorg/api/controllers/activities.pysrc/synthorg/providers/health.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Test markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Global timeout is 30 seconds per test (configured inpyproject.toml)—do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed
Prefer@pytest.mark.parametrizefor testing similar cases
Tests must usetest-provider,test-small-001, etc. instead of real vendor names
Use Hypothesis for property-based testing in Python (@given+@settings). Profiles:ci(50 examples, default) anddev(1000 examples), controlled viaHYPOTHESIS_PROFILEenv var.
NEVER skip, dismiss, or ignore flaky tests—always fix them fully and fundamentally. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/api/conftest.pytests/unit/api/controllers/test_activities.pytests/unit/api/controllers/test_departments_health.pytests/unit/providers/test_health.pytests/unit/api/controllers/test_provider_health.py
src/synthorg/providers/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/providers/**/*.py: All provider calls go throughBaseCompletionProviderwhich applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
RetryConfigandRateLimiterConfigare set per-provider inProviderConfig. Retryable errors (is_retryable=True):RateLimitError,ProviderTimeoutError,ProviderConnectionError,ProviderInternalError. Non-retryable errors raise immediately.
Rate limiter respectsRateLimitError.retry_afterfrom providers—automatically pauses future requests
Files:
src/synthorg/providers/health.py
src/synthorg/@(engine|providers)/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedErrorsignals that all retries failed—the engine layer catches this to trigger fallback chains
Files:
src/synthorg/providers/health.py
🧠 Learnings (30)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers
Applied to files:
src/synthorg/api/controllers/__init__.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/api/controllers/departments.pysrc/synthorg/api/controllers/activities.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/api/**/*.py : Use Litestar for REST + WebSocket API. Controllers, guards, channels, JWT + API key + WS ticket auth, RFC 9457 structured errors.
Applied to files:
src/synthorg/api/controllers/__init__.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/api/**/*.py : REST API: Litestar framework, controllers with guards, channels for WebSocket, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint. RFC 9457 structured errors (ErrorCategory, ErrorCode, ErrorDetail, ProblemDetail, CATEGORY_TITLES, category_title, category_type_uri, content negotiation).
Applied to files:
src/synthorg/api/controllers/__init__.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
src/synthorg/api/controllers/__init__.pysrc/synthorg/api/app.pysrc/synthorg/api/controllers/departments.py
📚 Learning: 2026-03-24T19:31:05.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-24T19:31:05.964Z
Learning: Applies to tests/**/*.py : Tests must use `test-provider`, `test-small-001`, etc. instead of real vendor names
Applied to files:
tests/unit/api/conftest.pytests/unit/api/controllers/test_provider_health.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Tests must use test-provider, test-small-001, etc. for vendor-agnostic test data.
Applied to files:
tests/unit/api/conftest.pytests/unit/api/controllers/test_provider_health.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine must provide: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection), promotion/demotion
Applied to files:
tests/unit/api/conftest.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Applied to files:
tests/unit/api/conftest.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)
Applied to files:
tests/unit/api/conftest.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.
Applied to files:
tests/unit/api/conftest.pysrc/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/providers/health.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.
Applied to files:
src/synthorg/api/state.pysrc/synthorg/api/app.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/persistence/**/*.py : Persistence uses pluggable PersistenceBackend protocol. SQLite is the initial backend. Settings use SettingsRepository (namespaced settings CRUD).
Applied to files:
src/synthorg/api/app.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Persistence backend: pluggable PersistenceBackend protocol in `src/synthorg/persistence/`, SQLite initial, SettingsRepository (namespaced settings CRUD).
Applied to files:
src/synthorg/api/app.py
📚 Learning: 2026-03-16T19:13:36.562Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:36.562Z
Learning: Applies to src/synthorg/providers/**/*.py : RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Applied to files:
src/synthorg/api/app.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/settings/**/*.py : Settings package (settings/): runtime-editable settings persistence (DB > env > YAML > code defaults), typed definitions (9 namespaces), Fernet encryption for sensitive values, config bridge (JSON serialization for Pydantic/collections), ConfigResolver (typed accessors), validation, registry, change notifications via message bus, SettingsSubscriber protocol, SettingsChangeDispatcher (polls `#settings` channel, routes to subscribers, restart_required filtering)
Applied to files:
src/synthorg/api/app.py
📚 Learning: 2026-03-24T19:31:05.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-24T19:31:05.964Z
Learning: Applies to src/synthorg/providers/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.
Applied to files:
src/synthorg/api/app.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/settings/**/*.py : Settings use runtime-editable persistence with precedence: DB > env > YAML > code defaults. 8 namespaces with Fernet encryption for sensitive values.
Applied to files:
src/synthorg/api/app.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains. Rate limiter respects `RateLimitError.retry_after` from providers — automatically pauses future requests.
Applied to files:
src/synthorg/api/app.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Use event constants from `synthorg.observability.events.<domain>` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly and log with structured kwargs: `logger.info(EVENT, key=value)`, never interpolated strings
Applied to files:
src/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-24T19:31:05.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-24T19:31:05.964Z
Learning: Applies to src/synthorg/**/*.py : Always use event constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly and use `logger.info(EVENT, key=value)` in structured kwargs format—never `logger.info("msg %s", val)`
Applied to files:
src/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls
Applied to files:
src/synthorg/observability/events/api.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.
Applied to files:
src/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.
Applied to files:
src/synthorg/api/controllers/providers.py
🧬 Code graph analysis (9)
src/synthorg/api/controllers/__init__.py (1)
src/synthorg/api/controllers/activities.py (1)
ActivityController(39-136)
tests/unit/api/conftest.py (2)
src/synthorg/providers/health.py (1)
ProviderHealthTracker(110-220)src/synthorg/api/state.py (1)
provider_health_tracker(349-354)
src/synthorg/api/state.py (2)
src/synthorg/providers/health.py (1)
ProviderHealthTracker(110-220)tests/unit/api/conftest.py (1)
provider_health_tracker(208-210)
src/synthorg/api/app.py (3)
src/synthorg/providers/health.py (1)
ProviderHealthTracker(110-220)src/synthorg/api/state.py (1)
provider_health_tracker(349-354)tests/unit/api/conftest.py (1)
provider_health_tracker(208-210)
src/synthorg/api/controllers/providers.py (3)
src/synthorg/providers/health.py (3)
ProviderHealthSummary(60-98)get_summary(132-177)health_status(96-98)src/synthorg/api/state.py (3)
AppState(43-528)config_resolver(331-333)provider_health_tracker(349-354)src/synthorg/settings/resolver.py (1)
get_provider_configs(453-472)
tests/unit/api/controllers/test_activities.py (4)
tests/unit/api/fakes.py (1)
FakePersistenceBackend(420-523)tests/unit/api/conftest.py (8)
test_client(224-261)fake_persistence(168-171)performance_tracker(197-198)_make_test_auth_service(81-83)_seed_test_users(264-295)make_auth_headers(125-151)auth_service(163-164)cost_tracker(182-183)src/synthorg/api/state.py (6)
performance_tracker(298-303)auth_service(189-191)settings_service(316-318)persistence(169-171)message_bus(179-181)cost_tracker(184-186)src/synthorg/api/app.py (1)
create_app(408-643)
tests/unit/api/controllers/test_departments_health.py (5)
src/synthorg/budget/tracker.py (1)
CostTracker(68-464)src/synthorg/hr/performance/models.py (1)
TaskMetricRecord(25-67)src/synthorg/hr/performance/tracker.py (1)
record_task_metric(136-159)tests/unit/api/conftest.py (4)
make_auth_headers(125-151)_make_test_auth_service(81-83)_seed_test_users(264-295)auth_service(163-164)src/synthorg/api/controllers/departments.py (2)
_mean_optional(200-205)DepartmentHealth(40-93)
src/synthorg/api/controllers/departments.py (1)
src/synthorg/budget/tracker.py (1)
get_records(194-234)
tests/unit/api/controllers/test_provider_health.py (3)
tests/unit/api/fakes.py (1)
FakePersistenceBackend(420-523)tests/unit/api/conftest.py (5)
make_auth_headers(125-151)provider_health_tracker(208-210)_make_test_auth_service(81-83)_seed_test_users(264-295)auth_service(163-164)src/synthorg/api/state.py (3)
provider_health_tracker(349-354)auth_service(189-191)persistence(169-171)
🔇 Additional comments (26)
src/synthorg/observability/events/api.py (1)
69-71: LGTM!The new event constants follow the established naming convention and are correctly typed with
Final[str]. They integrate well with the existing API event constants.src/synthorg/api/controllers/departments.py (4)
40-94: LGTM!The
DepartmentHealthmodel is well-designed:
- Frozen Pydantic model with
allow_inf_nan=Falsefor numeric safety.- Uses
NotBlankStrfor the identifier field as per guidelines.- Model validator ensures
active_agent_count <= agent_count.- Clear Google-style docstring documenting all attributes.
107-162: LGTM!The helper functions follow best practices:
- Use
asyncio.TaskGroupfor parallel fan-out as per coding guidelines.- Properly re-raise
MemoryErrorandRecursionErrorbefore catching generic exceptions.- Log warnings with context on failures before falling back to safe defaults.
- Use
math.fsumfor numerically stable summation.
263-333: LGTM!The
_assemble_department_healthfunction demonstrates excellent structured concurrency:
- Phase 1 uses
TaskGroupfor parallel queries (active count, cost records, agent IDs).- Properly handles
ExceptionGroupby extracting and re-raising fatal exceptions (MemoryError,RecursionError).- Gracefully degrades to minimal health response on non-fatal failures.
- Phase 2 sequential call for snapshots correctly depends on resolved agent IDs.
395-448: LGTM!The endpoint implementation is clean and follows all guidelines:
- Properly validates department existence with a clear 404 response.
- Uses the event constant
API_DEPARTMENT_HEALTH_QUERIEDfor structured logging.- Logs at DEBUG level as appropriate for routine queries.
- Returns typed
ApiResponse[DepartmentHealth].src/synthorg/api/controllers/activities.py (1)
46-136: LGTM!The activity feed endpoint is well-implemented:
- Uses
IntEnumfor constrained time windows with clear semantic values.- Safety cap (
_MAX_LIFECYCLE_EVENTS = 10_000) prevents unbounded queries.- Graceful degradation when
performance_trackerfails (falls back to empty tuple).- Properly re-raises fatal exceptions (
MemoryError,RecursionError) before catching generic exceptions.- Uses structured logging with event constants.
src/synthorg/api/controllers/__init__.py (1)
6-6: LGTM!The
ActivityControlleris correctly imported, registered inALL_CONTROLLERS, and exported in__all__. The alphabetical ordering in__all__is maintained.Also applies to: 32-32, 54-54
src/synthorg/providers/health.py (1)
33-108: LGTM!The health models are well-designed:
- Uses
@computed_fieldforhealth_statusas per coding guidelines (derived fromerror_rate_percent_24h).- Both models are frozen with
allow_inf_nan=Falsefor numeric safety.- Uses
NotBlankStrfor the provider name identifier.- Clear threshold constants (
_DEGRADED_THRESHOLD,_DOWN_THRESHOLD) with helper function for testability.tests/unit/providers/test_health.py (1)
1-374: LGTM!Comprehensive test coverage including:
- Model validation (frozen behavior, defaults, field constraints).
- Tracker aggregation logic (single/multiple records, multiple providers).
- 24-hour window filtering for stale records.
- Boundary value testing for health status thresholds using
@pytest.mark.parametrize.- Concurrency safety test using
asyncio.TaskGroup.All tests use
@pytest.mark.unitas required by guidelines.src/synthorg/api/state.py (1)
348-359: LGTM!The
provider_health_trackerwiring follows the established pattern used for other optional services:
- Property accessor uses
_require_servicefor consistent 503 behavior when not configured.has_provider_health_trackerboolean accessor for availability checks.- Properly added to
__slots__and initialized in__init__.src/synthorg/api/app.py (1)
424-447: LGTM!The
provider_health_trackerparameter is correctly wired into thecreate_appfactory:
- Optional parameter with
Nonedefault, consistent with other service parameters.- Docstring updated to document the new parameter.
- Correctly passed through to
AppStateconstructor.Also applies to: 553-553
tests/unit/api/conftest.py (2)
207-210: LGTM!The new
provider_health_trackerfixture follows the established pattern of other service fixtures in this file (e.g.,cost_tracker,performance_tracker). Clean and consistent.
234-256: LGTM!The
test_clientfixture correctly accepts and wires the newprovider_health_trackerdependency intocreate_app, matching the production app factory signature. This enables provider health endpoint tests to function properly.src/synthorg/api/controllers/providers.py (1)
208-245: LGTM!The new
get_provider_healthendpoint is well-implemented:
- Validates provider existence before querying health tracker (preventing phantom health data for non-existent providers)
- Uses appropriate
require_read_accessguard for read-only operation- Follows observability patterns with structured logging using event constants
- Properly documents Args/Returns/Raises in Google-style docstring
tests/unit/api/controllers/test_activities.py (3)
59-74: LGTM!Good basic coverage for empty feed and authentication requirements. The
@pytest.mark.unitmarker is correctly applied at the class level.
76-164: LGTM!Good coverage of core functionality: lifecycle events, task metrics, filtering by type and agent ID. The async test methods correctly use
awaitfor persistence operations.
166-209: LGTM!Solid validation coverage for
last_n_hoursparameter (default, valid values, invalid values) and pagination parameters.tests/unit/api/controllers/test_departments_health.py (6)
32-90: LGTM!Well-structured helper functions that follow existing patterns in the test suite. The use of
test-providerandtest-small-001complies with vendor-agnostic test data requirements. Based on learnings: "Tests must usetest-provider,test-small-001, etc. instead of real vendor names."
92-123: LGTM!The
_build_dept_clienthelper correctly wires up all necessary dependencies for isolated department health tests. Allowing optional overrides forcost_tracker,performance_tracker, andagent_registryenables flexible test scenarios.
129-176: LGTM!Good coverage of edge cases: department not found (404), auth required (401), and empty department defaults. The empty department test correctly verifies all expected default values.
177-256: LGTM!Comprehensive integration test with agents (active/inactive), cost records, and performance data. Good assertions for aggregated values (agent counts, utilization percent, 7-day cost).
258-321: LGTM!Good coverage for cross-department isolation and cost_trend structure validation. The 7-bucket assertion (line 317) aligns with the endpoint's 7-day sparkline requirement.
326-365: LGTM!Thorough unit tests for the
_mean_optionalhelper covering all edge cases (empty, all-None, mixed, all-present). The model validation test correctly verifies theactive_agent_count > agent_countconstraint is enforced.tests/unit/api/controllers/test_provider_health.py (3)
26-83: LGTM!Well-structured test helpers.
_make_health_recordprovides sensible defaults, and_build_provider_clientcorrectly sets up a provider configuration so the endpoint's existence check passes. Usingtest-providerandtest-small-001complies with vendor-agnostic naming. Based on learnings: "Tests must usetest-provider,test-small-001, etc. instead of real vendor names."
86-123: LGTM!Good coverage of basic scenarios: provider not found (404), auth required (401), and empty health data returning default "up" status with null/zero metrics.
125-209: LGTM!Excellent coverage of health status derivation scenarios:
- Healthy provider (0% error rate → "up")
- Degraded provider (20% error rate → "degraded")
- Down provider (100% error rate → "down")
The async test methods correctly use
awaitfor recording health records before exercising the endpoint.
| # Make performance_tracker property raise | ||
| with patch.object( | ||
| type(fake_persistence), | ||
| "lifecycle_events", | ||
| new_callable=PropertyMock, | ||
| return_value=fake_persistence.lifecycle_events, | ||
| ): |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Redundant patch block.
This patch.object context manager patches lifecycle_events property to return itself—effectively a no-op. The actual failure injection happens at line 241 with the get_task_metrics monkeypatch. Consider removing this redundant patch block to improve test clarity.
♻️ Suggested simplification
- # Make performance_tracker property raise
- with patch.object(
- type(fake_persistence),
- "lifecycle_events",
- new_callable=PropertyMock,
- return_value=fake_persistence.lifecycle_events,
- ):
- # The test_client's performance_tracker is real but we can
- # test fallback by having get_task_metrics raise
- from synthorg.hr.performance.tracker import PerformanceTracker
+ # The test_client's performance_tracker is real but we can
+ # test fallback by having get_task_metrics raise
+ from synthorg.hr.performance.tracker import PerformanceTrackerThen remove the corresponding indentation from the rest of the test body.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/api/controllers/test_activities.py` around lines 224 - 230, Remove
the redundant patching of the lifecycle_events property on fake_persistence (the
with patch.object(..., "lifecycle_events", new_callable=PropertyMock,
return_value=fake_persistence.lifecycle_events) block) since it does nothing;
delete that context manager and its wrapper indentation, leaving the rest of the
test body (including the get_task_metrics monkeypatch and subsequent assertions)
at the top level within the test so the failure injection via get_task_metrics
still runs as intended.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #818 +/- ##
==========================================
+ Coverage 92.26% 92.30% +0.04%
==========================================
Files 578 580 +2
Lines 30120 30343 +223
Branches 2910 2924 +14
==========================================
+ Hits 27789 28009 +220
- Misses 1842 1846 +4
+ Partials 489 488 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant new API functionality by adding an org-wide activity feed, department health aggregation, and provider health tracking. A new ActivityController provides a paginated activity feed merging lifecycle events and task metrics, with filtering options. The DepartmentController is extended with a /health endpoint that aggregates agent count, utilization, cost, performance, and collaboration data for a given department. A new ProviderHealthTracker and associated models are implemented to track provider call outcomes and generate health summaries, exposed via a new /providers/{name}/health endpoint in the ProviderController. Corresponding imports, AppState modifications, and API event constants have been added. Comprehensive unit tests have been included for all new controllers and the ProviderHealthTracker.
Review comments highlight several instances of deprecated Python 3.x except syntax (except MemoryError, RecursionError:), which should be updated to except (MemoryError, RecursionError):. Additionally, an efficiency improvement is suggested for department lookup in get_department_health by using a dictionary for O(1) access instead of a linear scan. Finally, the use of type: ignore[prop-decorator] in ProviderHealthSummary should be addressed either by resolving the underlying type incompatibility or providing a more detailed explanation.
| since=since, | ||
| until=now, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
The syntax except MemoryError, RecursionError: is deprecated in Python 3.x. It should be updated to except (MemoryError, RecursionError): for modern Python compatibility and to avoid potential syntax errors in future versions.
| except MemoryError, RecursionError: | |
| except (MemoryError, RecursionError): |
| active = await app_state.agent_registry.list_active() | ||
| return sum(1 for a in active if str(a.name).lower() in dept_agent_names) | ||
| except MemoryError, RecursionError: | ||
| raise |
There was a problem hiding this comment.
| results[idx] = await app_state.performance_tracker.get_snapshot( | ||
| aid, | ||
| ) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
| identity = await app_state.agent_registry.get_by_name(name) | ||
| if identity is not None: | ||
| results[idx] = str(identity.id) | ||
| except MemoryError, RecursionError: |
There was a problem hiding this comment.
| _resolve_active_count(app_state, name_set), | ||
| ) | ||
| t_cost = tg.create_task( | ||
| app_state.cost_tracker.get_records( |
There was a problem hiding this comment.
The syntax except ExceptionGroup as eg: is used here, but the nested except MemoryError, RecursionError: is deprecated. For consistency and modern Python practices, this should be updated.
except ExceptionGroup as eg:
fatal = eg.subgroup((MemoryError, RecursionError))
if fatal is not None:
raise fatal from eg
logger.warning(
API_REQUEST_ERROR,
endpoint="departments.health",
department=dept_name,
error_count=len(eg.exceptions),
exc_info=True,
)
return _build_degraded_health(dept_name, agent_count, now)| found = False | ||
| for dept in departments: | ||
| if dept.name == name: | ||
| found = True | ||
| break |
There was a problem hiding this comment.
The current implementation iterates through all departments to find a match. Since department names are validated to be unique (as per RootConfig._validate_unique_department_names), converting the list of departments to a dictionary or a set for O(1) lookup would be more efficient, especially for a large number of departments. This avoids a linear scan for each request.
departments_by_name = {dept.name: dept for dept in departments}
if name not in departments_by_name:| description="Total calls in the last 24h", | ||
| ) | ||
|
|
||
| @computed_field # type: ignore[prop-decorator] |
There was a problem hiding this comment.
The type: ignore[prop-decorator] comment suppresses a type checking error related to the @computed_field decorator. While it resolves the immediate type error, it's generally better to address the root cause of the type incompatibility if possible, or provide a more detailed explanation for the suppression if it's intentional and unavoidable.
Add three new API endpoints for the v0.5.0 dashboard:
- GET /departments/{name}/health -- aggregates agent count,
utilization, cost 7d sparkline, performance and collaboration scores
- GET /providers/{name}/health -- returns health status (up/degraded/down),
error rate, avg response time, and call count from ProviderHealthTracker
- GET /activities -- paginated org-wide activity feed (REST fallback for
WebSocket) with type, agent_id, and last_n_hours filters
Includes ProviderHealthTracker service (in-memory, append-only),
39 unit tests, and OpenAPI schema update.
Closes #773
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Convert ProviderHealthSummary.health_status to @computed_field - Add __slots__ to ProviderHealthTracker - Refactor _assemble_department_health: extract helpers, use TaskGroup for _resolve_snapshots/_resolve_agent_ids, parallelize agent ID resolution with cost/active queries - Add MemoryError/RecursionError re-raise guard in activities.py - Re-raise fatal exceptions from ExceptionGroup via subgroup() - Fix tuple[object, ...] type annotation to tuple[CostRecord, ...] - Add active_agent_count <= agent_count model validator - Fix misleading "parallel" comment in activities controller - Add tests: get_all_summaries, computed_field, boundary values, _mean_optional, DepartmentHealth validator, graceful degradation Pre-reviewed by 6 agents, 15 findings addressed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…abbit + Gemini) Source fixes: - Add prune_expired() to ProviderHealthTracker to bound memory growth - Extract _aggregate_records() to deduplicate get_summary/get_all_summaries - Add cross-field validator: error_message must be None when success=True - Reorder ProviderHealthSummary docstring (computed_field after stored fields) - Convert DepartmentHealth.utilization_percent to @computed_field - Refactor _resolve_active_count to use list_by_department (O(dept) vs O(org)) - Add ServiceUnavailableError re-raise in activities + departments helpers - Extract _build_health_from_data to reduce _assemble_department_health < 50 lines - Expand _assemble_department_health docstring with 2-phase description - Fix "silently skipped" docstrings (actually logs WARNING) - Add degradation note to list_activities docstring - Remove unnecessary int() cast on ActivityWindowHours - Remove redundant "what" comments in activities controller - Use dict for O(1) department existence check Test improvements: - Replace pytest.raises(Exception) with ValidationError for frozen model tests - Remove redundant no-op patch.object in graceful degradation test - Convert test_last_n_hours_valid_values to @pytest.mark.parametrize - Add assertions for avg_performance_score/collaboration_score - Add ExceptionGroup fallback path test (degraded health on cost failure) - Add utilization_percent computed_field tests - Add cross-field validator tests (success/error_message consistency) - Add prune_expired tests (removes old, empty, nothing expired) Doc updates: - CLAUDE.md: add "health tracking" to providers/ package description - page-structure.md: add ActivityController to Controller-to-Page Map - page-structure.md: add new endpoints to Dashboard, Org Chart, Providers - operations.md: add health endpoint to Provider Management section Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
03608ff to
4cd03de
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/api/app.py`:
- Line 424: The create_app(...) factory makes its provider_health_tracker
parameter optional but forwards it into AppState(provider_health_tracker=...)
while AppState.provider_health_tracker raises 503 if unset, so ensure a default
tracker is auto-wired: in create_app (the function that accepts
provider_health_tracker: ProviderHealthTracker | None) detect when
provider_health_tracker is None and instantiate a sensible default
ProviderHealthTracker implementation (or a no-op/healthy stub) before
constructing AppState, so all callsites (including server.py's
create_app(config=config)) get a non-None tracker and
/api/v1/providers/{name}/health always works.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5adf0beb-044e-42f6-8c2f-456ed2170d8f
📒 Files selected for processing (16)
CLAUDE.mddocs/design/operations.mddocs/design/page-structure.mdsrc/synthorg/api/app.pysrc/synthorg/api/controllers/__init__.pysrc/synthorg/api/controllers/activities.pysrc/synthorg/api/controllers/departments.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/providers/health.pytests/unit/api/conftest.pytests/unit/api/controllers/test_activities.pytests/unit/api/controllers/test_departments_health.pytests/unit/api/controllers/test_provider_health.pytests/unit/providers/test_health.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (8)
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Documentation files (
docs/) use Markdown and are built with Zensical. Design spec pages are indocs/design/(10 pages).
Files:
docs/design/operations.mddocs/design/page-structure.md
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Python version must be 3.14+ (PEP 649 native lazy annotations)
Do NOT usefrom __future__ import annotations—Python 3.14 has PEP 649
Use PEP 758 except syntax:except A, B:(no parentheses)—ruff enforces this on Python 3.14
All public functions must have type hints; mypy strict mode is required
Use Google-style docstrings on all public classes and functions; enforced by ruff D rules
Create new objects instead of mutating existing ones—practice immutability. For non-Pydantic internal collections, usecopy.deepcopy()at construction andMappingProxyTypewrapping for read-only enforcement.
Use frozen Pydantic models for config/identity; separate mutable-via-copy models usingmodel_copy(update=...)for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Use@computed_fieldfor derived values instead of storing redundant fields. UseNotBlankStrfor all identifier/name fields instead of manual whitespace validators.
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Line length must be 88 characters (enforced by ruff)
Functions must be < 50 lines and files < 800 lines
Handle errors explicitly; never silently swallow exceptions
Validate at system boundaries: user input, external APIs, and config files
Always use the variable namelogger(not_logger, notlog)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO level
Use DEBUG level for object creation, internal flow, entry/exit of key functions
Maintain 80% minimum code coverage (enforced in CI)
Files:
src/synthorg/api/controllers/__init__.pysrc/synthorg/api/app.pytests/unit/api/conftest.pysrc/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/api/controllers/activities.pytests/unit/api/controllers/test_departments_health.pytests/unit/api/controllers/test_activities.pytests/unit/api/controllers/test_provider_health.pytests/unit/providers/test_health.pysrc/synthorg/api/controllers/departments.pysrc/synthorg/providers/health.py
src/synthorg/**/!(setup|sinks).py
📄 CodeRabbit inference engine (CLAUDE.md)
Every module with business logic MUST import and use
get_logger(__name__)fromsynthorg.observability, never useimport loggingorprint()
Files:
src/synthorg/api/controllers/__init__.pysrc/synthorg/api/app.pysrc/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/api/controllers/activities.pysrc/synthorg/api/controllers/departments.pysrc/synthorg/providers/health.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Use constants from domain-specific modules undersynthorg.observability.eventsfor event names (e.g.,API_REQUEST_STARTEDfromevents.api). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging: alwayslogger.info(EVENT, key=value). Never use format strings likelogger.info("msg %s", val)
Files:
src/synthorg/api/controllers/__init__.pysrc/synthorg/api/app.pysrc/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/api/controllers/activities.pysrc/synthorg/api/controllers/departments.pysrc/synthorg/providers/health.py
**/{src,tests}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,test-provider, etc.
Files:
src/synthorg/api/controllers/__init__.pysrc/synthorg/api/app.pytests/unit/api/conftest.pysrc/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/api/controllers/activities.pytests/unit/api/controllers/test_departments_health.pytests/unit/api/controllers/test_activities.pytests/unit/api/controllers/test_provider_health.pytests/unit/providers/test_health.pysrc/synthorg/api/controllers/departments.pysrc/synthorg/providers/health.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Async tests useasyncio_mode = "auto"in pytest configuration—no manual@pytest.mark.asyncioneeded
Global test timeout is 30 seconds per test (configured inpyproject.toml). Do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed.
Prefer@pytest.mark.parametrizefor testing similar cases
For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Use Hypothesis for property-based testing in Python (@given+@settings). Profiles:ci(50 examples, default) anddev(1000 examples), controlled viaHYPOTHESIS_PROFILEenv var.
Files:
tests/unit/api/conftest.pytests/unit/api/controllers/test_departments_health.pytests/unit/api/controllers/test_activities.pytests/unit/api/controllers/test_provider_health.pytests/unit/providers/test_health.py
src/synthorg/providers/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/providers/**/*.py: All provider calls go throughBaseCompletionProviderwhich applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code.
UseRetryConfigandRateLimiterConfigper-provider inProviderConfig. Retryable errors:RateLimitError,ProviderTimeoutError,ProviderConnectionError,ProviderInternalError. Non-retryable errors raise immediately.
Rate limiter respectsRateLimitError.retry_afterfrom providers—automatically pauses future requests
Files:
src/synthorg/providers/health.py
src/synthorg/{engine,providers}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
RetryExhaustedErrorsignals all retries failed—the engine layer catches this to trigger fallback chains
Files:
src/synthorg/providers/health.py
🧠 Learnings (43)
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.
Applied to files:
CLAUDE.mddocs/design/operations.mdsrc/synthorg/api/app.pytests/unit/api/conftest.pydocs/design/page-structure.mdsrc/synthorg/api/state.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/providers/health.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
CLAUDE.mdsrc/synthorg/api/controllers/__init__.pysrc/synthorg/api/app.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Documentation source in `docs/` (Markdown, built with Zensical). Design spec in `docs/design/` (7 pages: index, agents, organization, communication, engine, memory, operations). Architecture in `docs/architecture/` (overview, tech-stack, decision log). Roadmap in `docs/roadmap/`. Security in `docs/security.md`. Licensing in `docs/licensing.md`. Reference in `docs/reference/`. REST API reference in `docs/rest-api.md`. Library reference in `docs/api/` (auto-generated from docstrings). Custom templates in `docs/overrides/`. Config in `mkdocs.yml`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Settings: Runtime-editable settings persistence (DB > env > YAML > code defaults), typed definitions (9 namespaces), Fernet encryption for sensitive values, config bridge, ConfigResolver (typed composed reads for controllers), validation, registry, change notifications via message bus. Per-namespace setting definitions in definitions/ submodule (api, company, providers, memory, budget, security, coordination, observability, backup).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/security/**/*.py : Security package (security/): SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to docs/** : Docs source in docs/ (Markdown, built with Zensical); design spec in docs/design/ (7 pages: index, agents, organization, communication, engine, memory, operations)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/security/**/*.py : Security module includes SecOps agent, rule engine (soft-allow/hard-deny), audit log, output scanner, risk classifier, autonomy levels (4 strategies), timeout policies.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Security: SecOps agent, rule engine (soft-allow/hard-deny, fail-closed), audit log, output scanner, output scan response policies (redact/withhold/log-only/autonomy-tiered), risk classifier, risk tier classifier, action type registry, ToolInvoker security integration, progressive trust (4 strategies: disabled/weighted/per-category/milestone), autonomy levels (presets, resolver, change strategy), timeout policies (park/resume).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/settings/**/*.py : Settings package (settings/): runtime-editable settings persistence (DB > env > YAML > code defaults), typed definitions (9 namespaces), Fernet encryption for sensitive values, config bridge (JSON serialization for Pydantic/collections), ConfigResolver (typed accessors), validation, registry, change notifications via message bus, SettingsSubscriber protocol, SettingsChangeDispatcher (polls `#settings` channel, routes to subscribers, restart_required filtering)
Applied to files:
CLAUDE.mdsrc/synthorg/api/app.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers
Applied to files:
src/synthorg/api/controllers/__init__.pydocs/design/page-structure.mdsrc/synthorg/api/state.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/api/**/*.py : Use Litestar for REST + WebSocket API. Controllers, guards, channels, JWT + API key + WS ticket auth, RFC 9457 structured errors.
Applied to files:
src/synthorg/api/controllers/__init__.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/api/**/*.py : REST API: Litestar framework, controllers with guards, channels for WebSocket, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint. RFC 9457 structured errors (ErrorCategory, ErrorCode, ErrorDetail, ProblemDetail, CATEGORY_TITLES, category_title, category_type_uri, content negotiation).
Applied to files:
src/synthorg/api/controllers/__init__.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/persistence/**/*.py : Persistence uses pluggable PersistenceBackend protocol. SQLite is the initial backend. Settings use SettingsRepository (namespaced settings CRUD).
Applied to files:
src/synthorg/api/app.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Persistence backend: pluggable PersistenceBackend protocol in `src/synthorg/persistence/`, SQLite initial, SettingsRepository (namespaced settings CRUD).
Applied to files:
src/synthorg/api/app.py
📚 Learning: 2026-03-24T20:58:10.766Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-24T20:58:10.766Z
Learning: Applies to src/synthorg/providers/**/*.py : Use `RetryConfig` and `RateLimiterConfig` per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.
Applied to files:
src/synthorg/api/app.pysrc/synthorg/api/state.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/providers/health.py
📚 Learning: 2026-03-16T19:13:36.562Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:36.562Z
Learning: Applies to src/synthorg/providers/**/*.py : RetryConfig and RateLimiterConfig are set per-provider in ProviderConfig.
Applied to files:
src/synthorg/api/app.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.
Applied to files:
src/synthorg/api/app.pysrc/synthorg/api/controllers/providers.pysrc/synthorg/providers/health.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/settings/**/*.py : Settings use runtime-editable persistence with precedence: DB > env > YAML > code defaults. 8 namespaces with Fernet encryption for sensitive values.
Applied to files:
src/synthorg/api/app.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains. Rate limiter respects `RateLimitError.retry_after` from providers — automatically pauses future requests.
Applied to files:
src/synthorg/api/app.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Tests must use test-provider, test-small-001, etc. for vendor-agnostic test data.
Applied to files:
tests/unit/api/conftest.pytests/unit/api/controllers/test_provider_health.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine must provide: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection), promotion/demotion
Applied to files:
tests/unit/api/conftest.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Applied to files:
tests/unit/api/conftest.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)
Applied to files:
tests/unit/api/conftest.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations
Applied to files:
docs/design/page-structure.md
📚 Learning: 2026-03-15T21:20:09.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T21:20:09.993Z
Learning: Applies to web/src/components/** : Vue components organized by feature (agents/, approvals/, budget/, common/, dashboard/, layout/, messages/, org-chart/, tasks/).
Applied to files:
docs/design/page-structure.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-24T20:58:10.766Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-24T20:58:10.766Z
Learning: Applies to src/synthorg/**/*.py : Use constants from domain-specific modules under `synthorg.observability.events` for event names (e.g., `API_REQUEST_STARTED` from `events.api`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly
Applied to files:
src/synthorg/api/state.pysrc/synthorg/observability/events/api.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Use event constants from `synthorg.observability.events.<domain>` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly and log with structured kwargs: `logger.info(EVENT, key=value)`, never interpolated strings
Applied to files:
src/synthorg/observability/events/api.pysrc/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/api.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls
Applied to files:
src/synthorg/observability/events/api.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.
Applied to files:
src/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to src/synthorg/providers/**/*.py : All provider calls go through BaseCompletionProvider which applies retry + rate limiting automatically. Never implement retry logic in driver subclasses or calling code — it's handled by the base class.
Applied to files:
src/synthorg/api/controllers/providers.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Use `except A, B:` syntax (no parentheses) for exception handling — PEP 758 exception syntax enforced by ruff on Python 3.14
Applied to files:
src/synthorg/api/controllers/activities.pysrc/synthorg/api/controllers/departments.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to **/*.py : Use PEP 758 except syntax with `except A, B:` (no parentheses) for multiple exceptions—ruff enforces this on Python 3.14.
Applied to files:
src/synthorg/api/controllers/activities.pysrc/synthorg/api/controllers/departments.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: use `except A, B:` (no parentheses) — ruff enforces this on Python 3.14.
Applied to files:
src/synthorg/api/controllers/activities.pysrc/synthorg/api/controllers/departments.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses) — enforced by ruff on Python 3.14
Applied to files:
src/synthorg/api/controllers/activities.pysrc/synthorg/api/controllers/departments.py
📚 Learning: 2026-03-24T20:58:10.766Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-24T20:58:10.766Z
Learning: Applies to **/*.py : Use PEP 758 except syntax: `except A, B:` (no parentheses)—ruff enforces this on Python 3.14
Applied to files:
src/synthorg/api/controllers/activities.pysrc/synthorg/api/controllers/departments.py
🧬 Code graph analysis (9)
src/synthorg/api/controllers/__init__.py (1)
src/synthorg/api/controllers/activities.py (1)
ActivityController(40-139)
src/synthorg/api/app.py (3)
src/synthorg/providers/health.py (1)
ProviderHealthTracker(147-251)src/synthorg/api/state.py (1)
provider_health_tracker(349-354)tests/unit/api/conftest.py (1)
provider_health_tracker(208-210)
tests/unit/api/conftest.py (2)
src/synthorg/providers/health.py (1)
ProviderHealthTracker(147-251)src/synthorg/api/state.py (1)
provider_health_tracker(349-354)
src/synthorg/api/state.py (2)
src/synthorg/providers/health.py (1)
ProviderHealthTracker(147-251)tests/unit/api/conftest.py (1)
provider_health_tracker(208-210)
src/synthorg/api/controllers/providers.py (6)
src/synthorg/providers/health.py (3)
ProviderHealthSummary(76-115)get_summary(188-219)health_status(113-115)src/synthorg/api/dto.py (1)
ApiResponse(130-158)src/synthorg/api/state.py (2)
AppState(43-528)provider_health_tracker(349-354)src/synthorg/settings/resolver.py (1)
get_provider_configs(453-472)src/synthorg/api/errors.py (1)
NotFoundError(167-175)tests/unit/api/conftest.py (1)
provider_health_tracker(208-210)
src/synthorg/api/controllers/activities.py (5)
src/synthorg/api/errors.py (1)
ServiceUnavailableError(222-231)src/synthorg/api/guards.py (1)
require_read_access(121-145)src/synthorg/api/state.py (1)
performance_tracker(298-303)src/synthorg/hr/activity.py (2)
ActivityEvent(19-41)merge_activity_timeline(110-130)src/synthorg/hr/performance/tracker.py (1)
get_task_metrics(374-400)
tests/unit/api/controllers/test_departments_health.py (6)
src/synthorg/budget/tracker.py (2)
CostTracker(68-464)get_records(194-234)src/synthorg/hr/performance/tracker.py (2)
PerformanceTracker(51-506)record_task_metric(136-159)tests/unit/api/conftest.py (6)
make_auth_headers(125-151)cost_tracker(182-183)performance_tracker(197-198)agent_registry(202-204)_seed_test_users(264-295)auth_service(163-164)src/synthorg/api/app.py (1)
create_app(408-643)src/synthorg/api/auth/service.py (1)
AuthService(36-251)src/synthorg/api/controllers/departments.py (3)
_mean_optional(209-214)DepartmentHealth(41-98)utilization_percent(83-87)
tests/unit/api/controllers/test_provider_health.py (7)
src/synthorg/providers/health.py (3)
ProviderHealthRecord(41-73)ProviderHealthTracker(147-251)record(160-167)src/synthorg/settings/registry.py (1)
get_registry(111-113)src/synthorg/settings/service.py (1)
SettingsService(152-637)tests/unit/api/fakes.py (1)
FakePersistenceBackend(420-523)tests/unit/api/conftest.py (4)
make_auth_headers(125-151)fake_persistence(168-171)provider_health_tracker(208-210)_seed_test_users(264-295)src/synthorg/api/state.py (3)
provider_health_tracker(349-354)persistence(169-171)message_bus(179-181)src/synthorg/api/app.py (1)
create_app(408-643)
tests/unit/providers/test_health.py (1)
src/synthorg/providers/health.py (10)
ProviderHealthRecord(41-73)ProviderHealthStatus(33-38)ProviderHealthSummary(76-115)ProviderHealthTracker(147-251)record(160-167)health_status(113-115)get_summary(188-219)_derive_health_status(118-124)get_all_summaries(221-246)prune_expired(169-186)
🔇 Additional comments (43)
CLAUDE.md (1)
116-116: Provider package summary update looks good.The added “health tracking” note aligns the package overview with the new provider health feature.
src/synthorg/api/controllers/__init__.py (1)
6-6: Controller registration is correctly wired.
ActivityControlleris imported, registered, and exported consistently.Also applies to: 32-32, 54-54
docs/design/operations.md (1)
122-122: Provider health endpoint design note is clear and useful.The new line documents endpoint purpose and core health metrics succinctly.
tests/unit/api/conftest.py (1)
207-210: Test fixture wiring for provider health is solid.Injecting a fresh tracker per
test_clientkeeps endpoint health tests isolated and repeatable.Also applies to: 234-257
src/synthorg/observability/events/api.py (1)
69-71: New API event constants are consistent with the existing taxonomy.Names and values follow the established
api.<domain>.<action>convention.src/synthorg/api/state.py (1)
30-30:AppStateservice integration is consistent and well-structured.The new tracker slot, injection point, guarded accessor, and
has_*property align with existing state-container patterns.Also applies to: 75-76, 106-107, 126-126, 348-359
docs/design/page-structure.md (1)
26-26: Page-structure mapping updates are aligned with the new API surface.The added endpoints and
ActivityControllermapping keep the IA doc in sync with backend capabilities.Also applies to: 35-35, 101-101, 310-310
src/synthorg/api/controllers/providers.py (1)
208-245: LGTM!The new
get_provider_healthendpoint is well-structured:
- Validates provider existence before querying health tracker
- Uses appropriate event constants (
API_RESOURCE_NOT_FOUND,API_PROVIDER_HEALTH_QUERIED)- Follows structured logging conventions with keyword arguments
- Consistent error handling pattern with other endpoints in the controller
tests/unit/api/controllers/test_activities.py (5)
1-57: LGTM!Test helpers and imports are well-structured. The
_make_lifecycle_eventand_make_task_metrichelpers provide sensible defaults for test data creation.
59-100: LGTM!Good test coverage for basic scenarios: empty feed response, authentication enforcement, and lifecycle event ordering (most recent first).
101-165: LGTM!Filter tests properly validate both
typeandagent_idquery parameters with appropriate assertions.
166-211: LGTM!Time window validation tests cover:
- Default 24h behavior
- Valid enum values (24, 48, 168)
- Invalid value rejection (400 response)
- Pagination metadata echoing
212-274: LGTM!The graceful degradation test properly:
- Seeds lifecycle events that should still be returned
- Mocks
get_task_metricsto raiseRuntimeError- Verifies endpoint returns 200 with lifecycle events despite tracker failure
This aligns with the controller's exception handling that catches non-fatal errors and continues with an empty task metrics tuple.
src/synthorg/api/controllers/departments.py (5)
41-99: LGTM!The
DepartmentHealthmodel is well-designed:
- Frozen for immutability
utilization_percentas@computed_fieldavoids redundant storage@model_validatorensuresactive_agent_count <= agent_count- Proper field constraints with
ge/levalidators
112-137: LGTM!The
_resolve_active_counthelper correctly:
- Checks registry availability before querying
- Uses PEP 758
except MemoryError, RecursionError:syntax- Re-raises fatal errors while gracefully handling others
- Logs warnings with structured context on failure
139-172: LGTM!The
_resolve_snapshotshelper usesasyncio.TaskGroupfor parallel fan-out as required by coding guidelines. It properly handles individual failures without aborting other fetches, and re-raisesServiceUnavailableErrorto propagate 503s.
302-368: LGTM!The
_assemble_department_healthfunction demonstrates proper structured concurrency:
- Phase 1 parallelizes independent queries via
TaskGroupExceptionGrouphandling correctly filters fatal errors and returns degraded health for recoverable failures- Phase 2 depends on Phase 1 results (agent IDs), maintaining correct ordering
429-479: LGTM!The
get_department_healthendpoint:
- Validates department existence before aggregation
- Filters agents by department correctly
- Logs debug event with structured metadata
- Returns proper
ApiResponse[DepartmentHealth]envelopesrc/synthorg/api/controllers/activities.py (3)
32-45: LGTM!
ActivityWindowHoursIntEnum provides clear, validated time window options. Controller-level guards ensure read access for all endpoints.
47-72: LGTM!The endpoint signature is well-designed:
- Clear parameter annotations with
AnnotatedandParameterevent_typealiased totypequery param for cleaner APIlast_n_hoursuses IntEnum for validation with descriptive error on invalid values
91-139: LGTM!The implementation correctly:
- Fetches lifecycle events asynchronously with a safety cap (
_MAX_LIFECYCLE_EVENTS)- Handles performance tracker failures gracefully (re-raises fatal/503 errors, catches others)
- Uses PEP 758
except MemoryError, RecursionError:syntax- Merges and optionally filters the timeline before pagination
- Logs structured debug event with all relevant filter context
tests/unit/api/controllers/test_departments_health.py (6)
1-90: LGTM!Well-structured test helpers with generic provider names (
test-provider,test-small-001) as required by coding guidelines. The_build_dept_clientfactory correctly wires all dependencies for isolated testing.
129-175: LGTM!Good coverage of edge cases:
- 404 for non-existent department
- 401 for invalid auth
- Empty department returns zeroed metrics with
Nonefor optional scores
177-261: LGTM!The full scenario test (
test_with_agents_and_data) thoroughly validates:
- Agent count and active count from registry
- Utilization percent calculation
- Cost aggregation from cost tracker
- Performance/collaboration score presence
- Proper coordination between multiple services
262-325: LGTM!The department isolation test (
test_other_department_agents_excluded) and sparkline structure test (test_cost_trend_is_daily_sparkline) verify important behaviors:
- Agents from other departments are excluded from health metrics
- Cost trend contains exactly 7 daily buckets with expected fields
327-393: LGTM!Unit tests for internal helpers and model constraints:
_mean_optionaledge cases (empty, all None, mixed, all present)DepartmentHealthvalidation rejectsactive_agent_count > agent_countutilization_percentcomputed correctly including zero-agent case
395-435: LGTM!The degradation test properly validates that when
CostTracker.get_recordsraises an exception, the endpoint still returns 200 with degraded (zeroed) metrics rather than failing entirely.tests/unit/api/controllers/test_provider_health.py (3)
1-84: LGTM!Test setup correctly uses generic provider names (
test-provider,test-small-001) as required by coding guidelines. The_build_provider_clientfactory properly configures the app with a provider and optionally injected health tracker.Based on learnings: "Tests must use test-provider, test-small-001, etc. for vendor-agnostic test data."
86-153: LGTM!Tests cover important scenarios:
- Provider not found returns 404
- Invalid auth returns 401
- Empty tracker returns default "up" status with
None/zero metrics- Healthy provider (all success) returns "up" with correct call count
155-209: LGTM!Threshold boundary tests validate:
- 20% error rate (2/10 failures) → "degraded" status
- 100% error rate (all failures) → "down" status
These align with the implementation thresholds (
_DEGRADED_THRESHOLD = 10.0,_DOWN_THRESHOLD = 50.0).tests/unit/providers/test_health.py (7)
1-62: LGTM!Model tests validate:
- Frozen immutability raises
ValidationErroron mutation- Successful record has
Noneerror_message- Error record can have error_message
response_time_msrejects negative values
64-77: LGTM!
ProviderHealthSummarytests verify default values and frozen immutability.
83-156: LGTM!Tracker tests cover key scenarios:
- Empty summary returns defaults
- Single success/error record affects metrics correctly
- Degraded status at 20% error rate
- Down status at 50% error rate
158-251: LGTM!Advanced tracker tests validate:
- UP status with 0% error rate
- 24h window filtering excludes old records
- Provider isolation (each provider tracked independently)
- Average response time calculation
last_check_timestampis most recent record
253-272: LGTM!Concurrency test uses
asyncio.TaskGroupto issue 4 batches of 50 records concurrently, verifying all 200 records are correctly stored without corruption.
278-300: LGTM!Excellent boundary value testing for
_derive_health_statususing@pytest.mark.parametrize:
- 0.0, 9.99 → UP
- 10.0, 49.99 → DEGRADED
- 50.0, 100.0 → DOWN
This validates the exact threshold boundaries.
306-431: LGTM!Comprehensive coverage of:
get_all_summaries: empty, single provider, multiple providers, excludes old recordshealth_statuscomputed field derivation- Cross-field validator: success+error_message rejected, failure±error_message allowed
prune_expired: removes old records, handles empty tracker, no-op when nothing expiredsrc/synthorg/providers/health.py (6)
1-31: LGTM!Module setup is correct:
- Clear docstring explaining purpose
- Appropriate imports for async, datetime, and Pydantic
- Well-defined constants for window size and status thresholds
- Logger imported from observability (though logging occurs at API layer)
33-73: LGTM!
ProviderHealthRecordis well-designed:
- Frozen for immutability
NotBlankStrfor provider_name ensures no empty/whitespace namesAwareDatetimeensures timezone-aware timestampsresponse_time_mshasge=0.0constraint- Cross-field validator enforces
error_message is Nonewhensuccess is True
76-124: LGTM!
ProviderHealthSummarycorrectly uses:
@computed_fieldforhealth_statusderived from error rate (avoids redundant storage)- Sensible defaults (status defaults to "up" when no errors)
- Proper constraints on numeric fields
The
_derive_health_statusfunction implements clear threshold logic.
127-144: LGTM!
_aggregate_recordscorrectly:
- Calculates error rate as percentage rounded to 2 decimals
- Uses
math.fsumfor numerically stable average response time- Extracts most recent timestamp via
max()- Returns immutable
ProviderHealthSummary
147-186: LGTM!
ProviderHealthTrackerinitialization and mutation methods:
- Uses
__slots__for memory efficiencyasyncio.Lockensures thread-safetyprune_expiredcorrectly removes records older than 24h window (addresses previous review comment)
188-251: LGTM!Query methods are well-implemented:
get_summaryfilters by provider and 24h window, returns defaults for empty resultsget_all_summariesgroups by provider, excludes old records, returns sorted dict_snapshotprovides immutable copy under lock for safe iterationThe implementation follows the append-only pattern referenced in the docstring, consistent with
CostTracker.
| performance_tracker: PerformanceTracker | None = None, | ||
| settings_service: SettingsService | None = None, | ||
| provider_registry: ProviderRegistry | None = None, | ||
| provider_health_tracker: ProviderHealthTracker | None = None, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -u
echo "== create_app callsites =="
rg -nP --type=py -C2 '\bcreate_app\s*\('
echo
echo "== explicit provider_health_tracker argument at callsites =="
rg -nP --type=py -C2 'provider_health_tracker\s*='
echo
echo "== default tracker construction inside app factory =="
rg -nP --type=py -C3 'ProviderHealthTracker\s*\(' src/synthorg/api/app.py || trueRepository: Aureliolo/synthorg
Length of output: 20754
🏁 Script executed:
# Inspect the create_app function signature and implementation
sed -n '408,560p' src/synthorg/api/app.py | head -100Repository: Aureliolo/synthorg
Length of output: 4201
🏁 Script executed:
# Check if there's any auto-wiring or default construction for provider_health_tracker
rg -A10 'def create_app' src/synthorg/api/app.py | head -50Repository: Aureliolo/synthorg
Length of output: 525
🏁 Script executed:
# Verify the production entry point in server.py
cat -n src/synthorg/api/server.py | sed -n '20,55p'Repository: Aureliolo/synthorg
Length of output: 1299
🏁 Script executed:
# Get the full auto-wiring logic and AppState construction
sed -n '460,560p' src/synthorg/api/app.pyRepository: Aureliolo/synthorg
Length of output: 3912
🏁 Script executed:
# Confirm that AppState.provider_health_tracker raises 503 when unset
grep -n 'provider_health_tracker' src/synthorg/api/state.py | head -20Repository: Aureliolo/synthorg
Length of output: 516
🏁 Script executed:
# Get the property implementation to confirm 503 behavior
sed -n '1,200p' src/synthorg/api/state.py | grep -A5 'provider_health_tracker'Repository: Aureliolo/synthorg
Length of output: 762
🏁 Script executed:
# Get the full provider_health_tracker property and _ensure_service implementation
sed -n '349,360p' src/synthorg/api/state.pyRepository: Aureliolo/synthorg
Length of output: 509
🏁 Script executed:
# Find and show _ensure_service method
rg -n '_ensure_service' src/synthorg/api/state.py -A3 | head -30Repository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# Find _require_service method to confirm 503 behavior
rg -n '_require_service' src/synthorg/api/state.py -B2 -A8Repository: Aureliolo/synthorg
Length of output: 7372
🏁 Script executed:
# Get the full _require_service implementation to confirm it raises 503
sed -n '147,168p' src/synthorg/api/state.pyRepository: Aureliolo/synthorg
Length of output: 844
Auto-wire a default ProviderHealthTracker or guarantee injection at all app entrypoints.
At line 424 the tracker is optional with no auto-wiring. At line 553 it is forwarded unchanged into AppState. Since AppState.provider_health_tracker raises 503 when unset, /api/v1/providers/{name}/health will be unavailable unless explicitly injected at every create_app(...) callsite.
The production entry point in src/synthorg/api/server.py:49 calls create_app(config=config) without passing provider_health_tracker, leaving it None. Test fixtures work around this by explicitly passing the tracker (e.g., conftest.py:256, test_provider_health.py:81), but the factory itself provides no fallback.
Suggested fix (default tracker fallback)
def create_app( # noqa: PLR0913
@@
- app_state = AppState(
+ effective_provider_health_tracker = (
+ provider_health_tracker or ProviderHealthTracker()
+ )
+
+ app_state = AppState(
@@
- provider_health_tracker=provider_health_tracker,
+ provider_health_tracker=effective_provider_health_tracker,
startup_time=time.monotonic(),
)Also applies to: 553-553
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/api/app.py` at line 424, The create_app(...) factory makes its
provider_health_tracker parameter optional but forwards it into
AppState(provider_health_tracker=...) while AppState.provider_health_tracker
raises 503 if unset, so ensure a default tracker is auto-wired: in create_app
(the function that accepts provider_health_tracker: ProviderHealthTracker |
None) detect when provider_health_tracker is None and instantiate a sensible
default ProviderHealthTracker implementation (or a no-op/healthy stub) before
constructing AppState, so all callsites (including server.py's
create_app(config=config)) get a non-None tracker and
/api/v1/providers/{name}/health always works.
ProviderHealthTracker was optional with no auto-wiring, causing
server.py's create_app(config=config) call to leave the tracker
as None -- making GET /providers/{name}/health return 503.
Follow the existing CostTracker/ProviderRegistry pattern: auto-wire
a default ProviderHealthTracker() in auto_wire_phase1 when None.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/ci.yml:
- Line 161: Add a periodic-review note and automation reference for the CVE
ignore so it doesn't become stale: next to the pip-audit invocation string ("uv
run pip-audit --ignore-vuln CVE-2026-4539") add a comment with an explicit
review date (e.g., review by YYYY-MM-DD) and a link/reference to a tracking
issue, or create a scheduled workflow/issue that checks for a patched Pygments
release and removes the ignore when fixed; ensure the comment references the
exact CVE ID and the automation/issue number so maintainers can find and act on
it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: fe4614f5-1407-47e8-921b-2545706b8873
📒 Files selected for processing (2)
.github/workflows/ci.yml.github/workflows/python-audit.yml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Sandbox
- GitHub Check: Build Backend
- GitHub Check: Analyze (python)
- GitHub Check: Dependency Review
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2026-03-15T21:32:02.880Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T21:32:02.880Z
Learning: Applies to .github/workflows/*.yml : Dependabot: daily updates for uv + github-actions + npm + pre-commit + docker + gomod, grouped minor/patch, no auto-merge. Use `/review-dep-pr` to review Dependabot PRs before merging.
Applied to files:
.github/workflows/python-audit.yml.github/workflows/ci.yml
📚 Learning: 2026-03-15T18:17:43.675Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:17:43.675Z
Learning: Applies to .github/workflows/**/*.yml : Path filtering: dorny/paths-filter detects Python/dashboard/docker changes; jobs only run when their domain is affected. CLI has its own workflow (cli.yml).
Applied to files:
.github/workflows/python-audit.yml.github/workflows/ci.yml
📚 Learning: 2026-03-24T20:58:10.766Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-24T20:58:10.766Z
Learning: Applies to pyproject.toml : Dependencies are organized in groups: `test` (pytest + plugins, hypothesis), `dev` (includes test + ruff, mypy, pre-commit, commitizen, pip-audit). Install with `uv sync`.
Applied to files:
.github/workflows/python-audit.yml.github/workflows/ci.yml
📚 Learning: 2026-03-15T21:32:02.880Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T21:32:02.880Z
Learning: Applies to .github/workflows/docker.yml : Docker workflow: builds backend + web + sandbox images, pushes to GHCR, signs with cosign. SLSA L3 provenance attestations via actions/attest-build-provenance. Scans: Trivy (CRITICAL = hard fail, HIGH = warn) + Grype (critical cutoff) + CIS Docker Benchmark v1.6.0 compliance (informational). CVE triage via .github/.trivyignore.yaml and .github/.grype.yaml. Images only pushed after scans pass. Triggers on push to main and version tags (v*).
Applied to files:
.github/workflows/ci.yml
📚 Learning: 2026-03-15T12:00:18.113Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T12:00:18.113Z
Learning: Applies to .github/workflows/docker.yml : CI Docker: build → scan → push to GHCR + cosign sign + SLSA L3 provenance via attest-build-provenance (images only pushed after Trivy/Grype scans pass).
Applied to files:
.github/workflows/ci.yml
📚 Learning: 2026-03-24T20:58:10.766Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-24T20:58:10.766Z
Learning: Pre-commit hooks enforce: trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, ruff check+format, gitleaks, hadolint (Dockerfile), golangci-lint + go vet (CLI), no-em-dashes, no-redundant-timeout
Applied to files:
.github/workflows/ci.yml
📚 Learning: 2026-03-15T18:17:43.675Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:17:43.675Z
Learning: Pre-commit hooks: trailing-whitespace, end-of-file-fixer, check-yaml, check-toml, check-json, check-merge-conflict, check-added-large-files, no-commit-to-branch (main), ruff check+format, gitleaks, hadolint (Dockerfile linting).
Applied to files:
.github/workflows/ci.yml
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to docker/Dockerfile : Docker: 3-stage build (builder → setup → distroless runtime) for backend, Chainguard Python, non-root (UID 65532), CIS-hardened
Applied to files:
.github/workflows/ci.yml
🔇 Additional comments (1)
.github/workflows/python-audit.yml (1)
23-23: Same suppression risk exists in the scheduled audit workflow.Line 23 mirrors the same unconditional
CVE-2026-4539ignore, so it inherits the same stale-ignore risk noted in CI.
pygments 2.19.2 has CVE-2026-4539 (local ReDoS in AdlLexer). No fix version exists yet -- 2.19.2 is the latest release. Ignore in both CI and scheduled audit workflows until upstream publishes a patch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1e728c3 to
da4c5f6
Compare
🤖 I have created a release *beep* *boop* --- #MAJOR CHANGES; We got a somewhat working webui :) ## [0.5.0](v0.4.9...v0.5.0) (2026-03-30) ### Features * add analytics trends and budget forecast API endpoints ([#798](#798)) ([16b61f5](16b61f5)) * add department policies to default templates ([#852](#852)) ([7a41548](7a41548)) * add remaining activity event types (task_started, tool_used, delegation, cost_incurred) ([#832](#832)) ([4252fac](4252fac)) * agent performance, activity, and history API endpoints ([#811](#811)) ([9b75c1d](9b75c1d)) * Agent Profiles and Detail pages (biography, career, performance) ([#874](#874)) ([62d7880](62d7880)) * app shell, Storybook, and CI/CD pipeline ([#819](#819)) ([d4dde90](d4dde90)) * Approvals page with risk grouping, urgency indicators, batch actions ([#889](#889)) ([4e9673d](4e9673d)) * Budget Panel page (P&L dashboard, breakdown charts, forecast) ([#890](#890)) ([b63b0f1](b63b0f1)) * build infrastructure layer (API client, auth, WebSocket) ([#815](#815)) ([9f01d3e](9f01d3e)) * CLI global options infrastructure, UI modes, exit codes, env vars ([#891](#891)) ([fef4fc5](fef4fc5)) * CodeMirror editor and theme preferences toggle ([#905](#905), [#807](#807)) ([#909](#909)) ([41fbedc](41fbedc)) * Company page (department/agent management) ([#888](#888)) ([cfb88b0](cfb88b0)) * comprehensive hint coverage across all CLI commands ([#900](#900)) ([937974e](937974e)) * config system extensions, per-command flags for init/start/stop/status/logs ([#895](#895)) ([32f83fe](32f83fe)) * configurable currency system replacing hardcoded USD ([#854](#854)) ([b372551](b372551)) * Dashboard page (metric cards, activity feed, budget burn) ([#861](#861)) ([7d519d5](7d519d5)) * department health, provider status, and activity feed endpoints ([#818](#818)) ([6d5f196](6d5f196)) * design tokens and core UI components ([#833](#833)) ([ed887f2](ed887f2)) * extend approval, meeting, and budget API responses ([#834](#834)) ([31472bf](31472bf)) * frontend polish -- real-time UX, accessibility, responsive, performance ([#790](#790), [#792](#792), [#791](#791), [#793](#793)) ([#917](#917)) ([f04a537](f04a537)) * implement human roles and access control levels ([#856](#856)) ([d6d8a06](d6d8a06)) * implement semantic conflict detection in workspace merge ([#860](#860)) ([d97283b](d97283b)) * interaction components and animation patterns ([#853](#853)) ([82d4b01](82d4b01)) * Login page + first-run bootstrap + Company page ([#789](#789), [#888](#888)) ([#896](#896)) ([8758e8d](8758e8d)) * Meetings page with timeline viz, token bars, contribution formatting ([#788](#788)) ([#904](#904)) ([b207f46](b207f46)) * Messages page with threading, channel badges, sender indicators ([#787](#787)) ([#903](#903)) ([28293ad](28293ad)) * Org Chart force-directed view and drag-drop reassignment ([#872](#872), [#873](#873)) ([#912](#912)) ([a68a938](a68a938)) * Org Chart page (living nodes, status, CRUD, department health) ([#870](#870)) ([0acbdae](0acbdae)) * per-command flags for remaining commands, auto-behavior wiring, help/discoverability ([#897](#897)) ([3f7afa2](3f7afa2)) * Providers page with backend rework -- health, CRUD, subscription auth ([#893](#893)) ([9f8dd98](9f8dd98)) * scaffold React + Vite + TypeScript + Tailwind project ([#799](#799)) ([bd151aa](bd151aa)) * Settings page with search, dependency indicators, grouped rendering ([#784](#784)) ([#902](#902)) ([a7b9870](a7b9870)) * Setup Wizard rebuild with template comparison, cost estimator, theme customization ([#879](#879)) ([ae8b50b](ae8b50b)) * setup wizard UX -- template filters, card metadata, provider form reuse ([#910](#910)) ([7f04676](7f04676)) * setup wizard UX overhaul -- mode choice, step reorder, provider fixes ([#907](#907)) ([ee964c4](ee964c4)) * structured ModelRequirement in template agent configs ([#795](#795)) ([7433548](7433548)) * Task Board page (rich Kanban, filtering, dependency viz) ([#871](#871)) ([04a19b0](04a19b0)) ### Bug Fixes * align frontend types with backend and debounce WS refetches ([#916](#916)) ([134c11b](134c11b)) * auto-cleanup targets newly pulled images instead of old ones ([#884](#884)) ([50e6591](50e6591)) * correct wipe backup-skip flow and harden error handling ([#808](#808)) ([c05860f](c05860f)) * improve provider setup in wizard, subscription auth, dashboard bugs ([#914](#914)) ([87bf8e6](87bf8e6)) * improve update channel detection and add config get command ([#814](#814)) ([6b137f0](6b137f0)) * resolve all ESLint warnings, add zero-warnings enforcement ([#899](#899)) ([079b46a](079b46a)) * subscription auth uses api_key, base URL optional for cloud providers ([#915](#915)) ([f0098dd](f0098dd)) ### Refactoring * semantic analyzer cleanup -- shared filtering, concurrency, extraction ([#908](#908)) ([81372bf](81372bf)) ### Documentation * brand identity and UX design system from [#765](#765) exploration ([#804](#804)) ([389a9f4](389a9f4)) * page structure and information architecture for v0.5.0 dashboard ([#809](#809)) ([f8d6d4a](f8d6d4a)) * write UX design guidelines with WCAG-verified color system ([#816](#816)) ([4a4594e](4a4594e)) ### Tests * add unit tests for agent hooks and page components ([#875](#875)) ([#901](#901)) ([1d81546](1d81546)) ### CI/CD * bump actions/deploy-pages from 4.0.5 to 5.0.0 in the major group ([#831](#831)) ([01c19de](01c19de)) * bump astral-sh/setup-uv from 7.6.0 to 8.0.0 in /.github/actions/setup-python-uv in the all group ([#920](#920)) ([5f6ba54](5f6ba54)) * bump codecov/codecov-action from 5.5.3 to 6.0.0 in the major group ([#868](#868)) ([f22a181](f22a181)) * bump github/codeql-action from 4.34.1 to 4.35.0 in the all group ([#883](#883)) ([87a4890](87a4890)) * bump sigstore/cosign-installer from 4.1.0 to 4.1.1 in the minor-and-patch group ([#830](#830)) ([7a69050](7a69050)) * bump the all group with 3 updates ([#923](#923)) ([ff27c8e](ff27c8e)) * bump wrangler from 4.76.0 to 4.77.0 in /.github in the minor-and-patch group ([#822](#822)) ([07d43eb](07d43eb)) * bump wrangler from 4.77.0 to 4.78.0 in /.github in the all group ([#882](#882)) ([f84118d](f84118d)) ### Maintenance * add design system enforcement hook and component inventory ([#846](#846)) ([15abc43](15abc43)) * add dev-only auth bypass for frontend testing ([#885](#885)) ([6cdcd8a](6cdcd8a)) * add pre-push rebase check hook ([#855](#855)) ([b637a04](b637a04)) * backend hardening -- eviction/size-caps and model validation ([#911](#911)) ([81253d9](81253d9)) * bump axios from 1.13.6 to 1.14.0 in /web in the all group across 1 directory ([#922](#922)) ([b1b0232](b1b0232)) * bump brace-expansion from 5.0.4 to 5.0.5 in /web ([#862](#862)) ([ba4a565](ba4a565)) * bump eslint-plugin-react-refresh from 0.4.26 to 0.5.2 in /web ([#801](#801)) ([7574bb5](7574bb5)) * bump faker from 40.11.0 to 40.11.1 in the minor-and-patch group ([#803](#803)) ([14d322e](14d322e)) * bump https://github.com/astral-sh/ruff-pre-commit from v0.15.7 to 0.15.8 ([#864](#864)) ([f52901e](f52901e)) * bump nginxinc/nginx-unprivileged from `6582a34` to `f99cc61` in /docker/web in the all group ([#919](#919)) ([df85e4f](df85e4f)) * bump nginxinc/nginx-unprivileged from `ccbac1a` to `6582a34` in /docker/web ([#800](#800)) ([f4e9450](f4e9450)) * bump node from `44bcbf4` to `71be405` in /docker/sandbox ([#827](#827)) ([91bec67](91bec67)) * bump node from `5209bca` to `cf38e1f` in /docker/web ([#863](#863)) ([66d6043](66d6043)) * bump picomatch in /site ([#842](#842)) ([5f20bcc](5f20bcc)) * bump recharts 2->3 and @types/node 22->25 in /web ([#802](#802)) ([a908800](a908800)) * Bump requests from 2.32.5 to 2.33.0 ([#843](#843)) ([41daf69](41daf69)) * bump smol-toml from 1.6.0 to 1.6.1 in /site ([#826](#826)) ([3e5dbe4](3e5dbe4)) * bump the all group with 3 updates ([#921](#921)) ([7bace0b](7bace0b)) * bump the minor-and-patch group across 1 directory with 2 updates ([#829](#829)) ([93e611f](93e611f)) * bump the minor-and-patch group across 1 directory with 3 updates ([#841](#841)) ([7010c8e](7010c8e)) * bump the minor-and-patch group across 1 directory with 3 updates ([#869](#869)) ([548cee5](548cee5)) * bump the minor-and-patch group in /site with 2 updates ([#865](#865)) ([9558101](9558101)) * bump the minor-and-patch group with 2 updates ([#867](#867)) ([4830706](4830706)) * consolidate Dependabot groups to 1 PR per ecosystem ([06d2556](06d2556)) * consolidate Dependabot groups to 1 PR per ecosystem ([#881](#881)) ([06d2556](06d2556)) * improve worktree skill with full dep sync and status enhancements ([#906](#906)) ([772c625](772c625)) * remove Vue remnants and document framework decision ([#851](#851)) ([bf2adf6](bf2adf6)) * update web dependencies and fix brace-expansion CVE ([#880](#880)) ([a7a0ed6](a7a0ed6)) * upgrade to Storybook 10 and TypeScript 6 ([#845](#845)) ([52d95f2](52d95f2)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Summary
GET /api/v1/departments/{name}/health-- aggregates agent count, utilization, 7-day cost sparkline, average performance and collaboration scores per departmentGET /api/v1/providers/{name}/health-- returns health status (up/degraded/down), error rate, avg response time, and call count from newProviderHealthTrackerGET /api/v1/activities-- paginated org-wide activity feed (REST fallback for WebSocket) withtype,agent_id, andlast_n_hours(24/48/168) filtersProviderHealthTrackerservice (in-memory, append-only, asyncio.Lock-protected) withProviderHealthRecordmodel andProviderHealthSummary(health_status is a@computed_field)Design decisions
/providers/{name}/health) rather than extending existingGET /providers/{name}-- follows the/agents/{name}/performancepattern, avoids coupling config data with real-time health, avoids expensive health computation on provider list callsActivityEventfromhr/activity.py+merge_activity_timeline()for the org-wide feed -- avoids model fragmentation with per-agent activity endpoint from feat: agent performance, activity, and history API endpoints #811ProviderHealthSummary.health_statusis a@computed_fieldderived fromerror_rate_percent_24h-- per CLAUDE.md convention, prevents inconsistent stateasyncio.TaskGroupused for parallel fan-out in department health (active count + cost records + agent ID resolution run concurrently; snapshots fan out per-agent)Pre-PR review
Pre-reviewed by 6 agents (code-reviewer, test-analyzer, issue-resolution-verifier, silent-failure-hunter, async-concurrency-reviewer, type-design-analyzer). 15 findings addressed:
tuple[object, ...]totuple[CostRecord, ...])active_agent_count <= agent_countmodel validatorTest plan
uv run python -m pytest tests/unit/providers/test_health.py -m unit -n auto-- 30 tests passuv run python -m pytest tests/unit/api/controllers/test_departments_health.py -m unit -n auto-- 9 tests passuv run python -m pytest tests/unit/api/controllers/test_provider_health.py -m unit -n auto-- 6 tests passuv run python -m pytest tests/unit/api/controllers/test_activities.py -m unit -n auto-- 12 tests pass (incl. graceful degradation)uv run mypy src/ tests/-- 0 errorsuv run ruff check src/ tests/-- 0 errorsCloses #773
🤖 Generated with Claude Code