Skip to content

chore: backend hardening -- eviction/size-caps and model validation#911

Merged
Aureliolo merged 11 commits intomainfrom
feat/backend-hardening
Mar 29, 2026
Merged

chore: backend hardening -- eviction/size-caps and model validation#911
Aureliolo merged 11 commits intomainfrom
feat/backend-hardening

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Closes #844
Closes #817
Closes #837

Test plan

  • uv run python -m pytest tests/ -n auto -- 10978 passed
  • uv run ruff check src/ tests/ -- all clean
  • uv run mypy src/ tests/ -- no issues in 1233 files
  • Pre-reviewed by 11 agents, 7 findings addressed (logging parity, threshold validation, time-range warning logs, boundary tests)

🤖 Generated with Claude Code

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 29, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 25993c39-0813-444f-83d1-565c0c496c4b

📥 Commits

Reviewing files that changed from the base of the PR and between 26f86b4 and 7bdbad4.

📒 Files selected for processing (1)
  • tests/unit/providers/test_health.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Web
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations -- Python 3.14 has PEP 649 native lazy annotations.
Use except A, B: syntax (no parentheses) for exception handling -- PEP 758 except syntax on Python 3.14 (enforced by ruff).
All public functions and classes require type hints. mypy strict mode is enforced.
Docstrings must use Google style and are required on public classes and functions (enforced by ruff D rules).
Create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves (e.g. agent execution state, task progress). Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens).
Use NotBlankStr (from core.types) for all identifier/name fields -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators.
Line length must be 88 characters (enforced by ruff).
Handle errors explicitly, never silently swallow.
Validate at system boundaries (user input, external APIs, config files).

Files:

  • tests/unit/providers/test_health.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Test code must use test-provider, test-small-001, etc. (never real vendor names).
Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow to mark tests.
Minimum 80% code coverage (enforced in CI).
Async tests use asyncio_mode = "auto" -- no manual @pytest.mark.asyncio needed.
Global test timeout is 30 seconds per test (configured in pyproject.toml -- do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed).
Always run pytest with -n auto for parallelism via pytest-xdist, never run tests sequentially.
Prefer @pytest.mark.parametrize for testing similar cases.
Use Hypothesis (@given + @settings) for property-based testing in Python. Profiles: ci (50 examples, default) and dev (1000 examples), controlled via HYPOTHESIS_PROFILE env var.
For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).
Never skip, dismiss, or ignore flaky tests -- always fix them fully and fundamentally.

Files:

  • tests/unit/providers/test_health.py
🔇 Additional comments (6)
tests/unit/providers/test_health.py (6)

433-441: LGTM!

The boundary test correctly validates that records timestamped exactly at the 24-hour cutoff are retained, matching the production code's r.timestamp >= cutoff filter in _prune_before. Using a fixed datetime ensures deterministic behavior.


447-475: LGTM!

The test correctly validates the auto-prune behavior when exceeding the threshold. The two-phase assertion (checking calls_last_24h == 6 for filtering, then prune_expired() == 0 to prove eviction already occurred) effectively distinguishes between filtering and actual pruning, addressing the prior review feedback.


476-494: LGTM!

The test correctly validates that auto-pruning does not occur when record count is below the threshold. The follow-up prune_expired() returning 1 proves the expired record was retained in memory, addressing the prior review feedback about strengthening the assertion.


496-515: LGTM!

The test correctly validates the boundary condition where record count equals the threshold exactly. Since _snapshot uses strict greater-than (>), 10 records with a threshold of 10 should not trigger auto-pruning. The assertion that prune_expired() == 5 confirms the expired records were retained.


517-529: LGTM!

The test correctly validates the edge case where all records are expired and exceed the threshold. Auto-pruning evicts all records, resulting in calls_last_24h == 0 and subsequent prune_expired() returning 0.


531-537: LGTM!

The parametrized test correctly validates that the ProviderHealthTracker constructor rejects invalid auto_prune_threshold values (0 and -1) with the expected error message. Using ids for clearer test output is a nice touch.


Walkthrough

This PR adds bounded-memory behavior and telemetry across in-memory stores. CostTracker and ProviderHealthTracker gain TTL-based pruning with a new auto_prune_threshold constructor parameter and prune_expired logic; snapshotting may auto-prune when thresholds are exceeded. DelegationRecordStore and ToolInvocationTracker switch from unbounded lists to bounded deques with a max_records constructor parameter and emit a one-time eviction warning on overflow. Task and AcceptanceCriterion set allow_inf_nan=False in their Pydantic configs. New observability event constants were added, and unit tests cover eviction, pruning, and validation behaviors.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'chore: backend hardening -- eviction/size-caps and model validation' accurately captures the main themes of the changeset: introducing eviction mechanisms and capacity limits across multiple trackers, plus adding model validation for non-finite values.
Description check ✅ Passed The pull request description is well-related to the changeset, explicitly referencing all three linked issues (#844, #817, #837) and summarizing the main changes: allow_inf_nan validation, TTL auto-eviction for trackers, and FIFO eviction for activity stores.
Linked Issues check ✅ Passed The code changes comprehensively address all objectives from the three linked issues: allow_inf_nan=False added to Task/AcceptanceCriterion [#844], TTL auto-eviction implemented in ProviderHealthTracker and CostTracker [#817], and FIFO eviction via deque added to ToolInvocationTracker and DelegationRecordStore [#837].
Out of Scope Changes check ✅ Passed All file modifications directly support the three linked objectives. CLAUDE.md guidance update reflects the new allow_inf_nan requirement, and setup API tests comprehensively validate affected functionality without introducing unrelated changes.
Docstring Coverage ✅ Passed Docstring coverage is 48.42% which is sufficient. The required threshold is 40.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 29, 2026 08:38 — with GitHub Actions Inactive
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 29, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 7bdbad4.
Ensure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces memory management and record eviction strategies across several tracking services, including CostTracker, DelegationRecordStore, ProviderHealthTracker, and ToolInvocationTracker. It implements time-based pruning for budget and health records and FIFO eviction using deques for delegation and tool invocation records. Additionally, Pydantic models for tasks were updated to disallow infinity and NaN values. Review feedback suggests reducing code duplication by extracting common pruning logic into helper methods and improving testability by allowing reference times to be passed into snapshot methods.

Comment on lines +425 to +437
if len(self._records) > self._auto_prune_threshold:
cutoff = datetime.now(UTC) - timedelta(
hours=_COST_WINDOW_HOURS,
)
before = len(self._records)
self._records = [r for r in self._records if r.timestamp >= cutoff]
pruned = before - len(self._records)
if pruned:
logger.info(
BUDGET_RECORDS_AUTO_PRUNED,
pruned=pruned,
remaining=len(self._records),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pruning logic here is nearly identical to the logic in the prune_expired method. This duplication can make future maintenance more difficult and error-prone.

To improve this, you could extract the common logic into a private, non-locking helper method. Both prune_expired and _snapshot could then call this helper from within their async with self._lock: blocks, keeping the code DRY (Don't Repeat Yourself).

For example:

def _prune_records_unlocked(self, cutoff: datetime) -> int:
    """Prunes records older than the cutoff. Assumes lock is held."""
    before = len(self._records)
    self._records = [r for r in self._records if r.timestamp >= cutoff]
    return before - len(self._records)

Comment on lines 255 to 275
async def _snapshot(self) -> tuple[ProviderHealthRecord, ...]:
"""Return an immutable snapshot of all current records."""
"""Return an immutable snapshot of all current records.

When the record count exceeds the auto-prune threshold,
expired records are removed before the snapshot is taken.
"""
async with self._lock:
if len(self._records) > self._auto_prune_threshold:
cutoff = datetime.now(UTC) - timedelta(
hours=_HEALTH_WINDOW_HOURS,
)
before = len(self._records)
self._records = [r for r in self._records if r.timestamp >= cutoff]
pruned = before - len(self._records)
if pruned:
logger.info(
PROVIDER_HEALTH_AUTO_PRUNED,
pruned=pruned,
remaining=len(self._records),
)
return tuple(self._records)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This method can be improved for maintainability and testability:

  1. Code Duplication: The pruning logic is nearly identical to that in prune_expired. This could be extracted into a private helper to avoid repetition.
  2. Testability: The method uses datetime.now(UTC) directly, making it hard to test deterministically. Since callers like get_summary already accept a now parameter, it should be passed down to _snapshot.

Here is a suggestion that addresses the testability issue. After applying it, you could further refactor to address the code duplication.

    async def _snapshot(self, *, now: datetime | None = None) -> tuple[ProviderHealthRecord, ...]:
        """Return an immutable snapshot of all current records.

        When the record count exceeds the auto-prune threshold,
        expired records are removed before the snapshot is taken.
        """
        async with self._lock:
            if len(self._records) > self._auto_prune_threshold:
                ref = now or datetime.now(UTC)
                cutoff = ref - timedelta(
                    hours=_HEALTH_WINDOW_HOURS,
                )
                before = len(self._records)
                self._records = [r for r in self._records if r.timestamp >= cutoff]
                pruned = before - len(self._records)
                if pruned:
                    logger.info(
                        PROVIDER_HEALTH_AUTO_PRUNED,
                        pruned=pruned,
                        remaining=len(self._records),
                    )
            return tuple(self._records)

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 29, 2026

Codecov Report

❌ Patch coverage is 98.80952% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 92.17%. Comparing base (7f04676) to head (7bdbad4).
⚠️ Report is 2 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/budget/tracker.py 95.83% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #911      +/-   ##
==========================================
+ Coverage   92.15%   92.17%   +0.02%     
==========================================
  Files         596      596              
  Lines       31438    31508      +70     
  Branches     3043     3055      +12     
==========================================
+ Hits        28971    29043      +72     
+ Misses       1945     1943       -2     
  Partials      522      522              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/budget/tracker.py`:
- Around line 93-99: Validate the auto_prune_threshold passed into the
constructor (auto_prune_threshold) so it cannot be zero or negative: in the
initializer where self._auto_prune_threshold is set, check that
auto_prune_threshold is an integer >= 1 and raise a ValueError (or TypeError if
not int) with a clear message; this prevents _snapshot() from performing an
unconditional or overly frequent prune scan and ensures the
_auto_prune_threshold invariant is enforced at the system boundary.

In `@src/synthorg/providers/health.py`:
- Around line 255-274: The auto-prune in _snapshot currently uses
datetime.now(UTC) which can evict records that callers asking for a different
reference time expect to keep; change _snapshot(self) -> tuple[...] to accept an
optional now: datetime parameter (defaulting to datetime.now(UTC)) and compute
cutoff = now - timedelta(hours=_HEALTH_WINDOW_HOURS) so pruning honors the
caller-supplied reference; update callers (e.g., get_summary and
get_all_summaries) to pass their `now` through to _snapshot, and ensure
_records, _auto_prune_threshold and the logging (PROVIDER_HEALTH_AUTO_PRUNED)
remain unchanged otherwise.

In `@src/synthorg/tools/invocation_tracker.py`:
- Around line 63-67: The current record() logic emits
logger.warning(TOOL_INVOCATION_EVICTED, ...) every time the deque self._records
is full, producing continuous warnings; modify record() to avoid repeated
warnings by either lowering severity or emitting only once: add a boolean flag
(e.g. self._eviction_warned) initialized False, set it True and call
logger.warning(TOOL_INVOCATION_EVICTED, max_records=...) only the first time
self._records reaches maxlen, and for subsequent evictions either log at
info/debug or skip logging entirely; ensure the flag is referenced and updated
inside the same scope where self._records and TOOL_INVOCATION_EVICTED are used
so behavior is consistent.

In `@tests/unit/communication/delegation/test_record_store.py`:
- Around line 222-228: Replace the two separate tests
test_max_records_zero_rejected and test_max_records_negative_rejected with a
single parametrized test that uses `@pytest.mark.parametrize` over the invalid
values (e.g., 0 and -1) and asserts that constructing
DelegationRecordStore(max_records=value) raises ValueError with the same match
"max_records must be >= 1"; keep the test name descriptive (e.g.,
test_max_records_invalid_rejected) and reuse the existing exception assertion
and message match.

In `@tests/unit/providers/test_health.py`:
- Around line 462-475: The test test_snapshot_no_prune_below_threshold currently
checks only calls_last_24h but not that expired records remain in memory; modify
the test to also assert the tracker's internal storage still contains both
entries after recording when auto_prune_threshold=10 — e.g. after two await
tracker.record(...) calls and before/after await tracker.get_summary(...), check
ProviderHealthTracker's internal list (the attribute holding records used by
record/get_summary, e.g., _records or whatever internal container is used) has
length 2 and that one of those entries has a timestamp older than 24h and the
other within 24h, while preserving the existing assertion that
summary.calls_last_24h == 1. Ensure you reference the same symbols:
test_snapshot_no_prune_below_threshold, ProviderHealthTracker, record,
get_summary, and calls_last_24h.

In `@tests/unit/tools/test_invocation_tracker.py`:
- Around line 140-146: Merge the two nearly identical tests
test_max_records_zero_rejected and test_max_records_negative_rejected into a
single parametrized test using pytest.mark.parametrize; call
ToolInvocationTracker(max_records=...) for each invalid value (e.g., 0 and -1)
and assert ValueError with match "max_records must be >= 1". Replace both
functions with one test function (e.g., test_max_records_invalid_rejected)
decorated with `@pytest.mark.parametrize`("invalid_max", [0, -1]) and keep the
with pytest.raises(...) block inside to exercise the ToolInvocationTracker
constructor.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d51c6aab-8733-4fd9-b8e3-adf8f61c2a3c

📥 Commits

Reviewing files that changed from the base of the PR and between 41fbedc and 1588d4d.

📒 Files selected for processing (14)
  • src/synthorg/budget/tracker.py
  • src/synthorg/communication/delegation/record_store.py
  • src/synthorg/core/task.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/providers/health.py
  • src/synthorg/tools/invocation_tracker.py
  • tests/unit/budget/test_tracker.py
  • tests/unit/communication/delegation/test_record_store.py
  • tests/unit/core/test_task.py
  • tests/unit/providers/test_health.py
  • tests/unit/tools/test_invocation_tracker.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Web
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (2)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Use PEP 758 except syntax: except A, B: (no parentheses) on Python 3.14. Do NOT use from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
All public functions and classes must have type hints. Use mypy strict mode. Google-style docstrings are required on all public classes and functions (enforced by ruff D rules).
Every module with business logic must include: from synthorg.observability import get_logger then logger = get_logger(__name__). Always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Use structured kwargs: logger.info(EVENT, key=value) — never logger.info("msg %s", val). Never use import logging or print() in application code.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing redundant fields. Use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and wrap with MappingProxyType for read-only enforcement. For dict/list fields in frozen Pydantic models, use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity. Use separate mutable-via-copy models (via model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Keep functions under 50 lines and files under 800 lines. Handle errors exp...

Files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/core/task.py
  • src/synthorg/providers/health.py
  • src/synthorg/tools/invocation_tracker.py
  • src/synthorg/communication/delegation/record_store.py
  • src/synthorg/budget/tracker.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow. Maintain 80% minimum code coverage (enforced in CI). Always run pytest with -n auto for parallel execution. Use @pytest.mark.parametrize for testing similar cases. Use asyncio_mode = "auto" (no manual @pytest.mark.asyncio needed).
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests or project code—use generic names like test-provider, test-small-001, example-provider, example-large-001, large/medium/small. Vendor names may only appear in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths, (4) provider presets.
Never skip, dismiss, or ignore flaky tests—always fix them fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() instead of widening margins. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/core/test_task.py
  • tests/unit/communication/delegation/test_record_store.py
  • tests/unit/tools/test_invocation_tracker.py
  • tests/unit/budget/test_tracker.py
  • tests/unit/providers/test_health.py
🧠 Learnings (24)
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/provider.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.

Applied to files:

  • src/synthorg/observability/events/provider.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly

Applied to files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/budget/tracker.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/budget/tracker.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls

Applied to files:

  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/budget/tracker.py
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity. Use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing redundant fields. Use `NotBlankStr` from `core.types` for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

  • src/synthorg/core/task.py
🔇 Additional comments (6)
tests/unit/core/test_task.py (1)

371-378: Nice targeted regression coverage.

This hits inf, -inf, and nan without duplicating setup.

tests/unit/communication/delegation/test_record_store.py (1)

194-221: Good eviction coverage for FIFO semantics.

These tests clearly validate both overflow eviction order and non-eviction under capacity.

tests/unit/providers/test_health.py (1)

441-460: Threshold-triggered auto-prune test is solid.

This case validates the intended prune path and expected 24h-call count after pruning.

tests/unit/tools/test_invocation_tracker.py (1)

116-139: Eviction behavior tests are well targeted.

The assertions verify FIFO retention order and below-threshold non-eviction correctly.

tests/unit/budget/test_tracker.py (1)

464-527: Prune and auto-prune test coverage is comprehensive.

These additions cover removal count semantics, empty/no-expiry behavior, and threshold-triggered snapshot pruning well.

src/synthorg/communication/delegation/record_store.py (1)

48-77: Bounded deque migration and eviction logging look correct.

The constructor guard, deque(maxlen=...) storage, and pre-append eviction warning are consistent with FIFO cap behavior.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CLAUDE.md`:
- Line 321: Several ConfigDict declarations across the codebase are missing the
new convention allow_inf_nan=False; update every ConfigDict in the listed
modules (the remaining declarations in providers/models.py, tools/base.py,
settings/models.py, templates/schema.py) to include allow_inf_nan=False (Task
and AcceptanceCriterion are already updated), and if you don't want to change
them now create an issue to track updating these remaining ConfigDict
occurrences so they’re handled in a follow-up PR.

In `@src/synthorg/budget/tracker.py`:
- Around line 455-461: The fast-path in _prune_before assumes self._records[0]
is the oldest but record() simply appends, so the early-return can be incorrect
when inserts are out-of-order; fix by either (A) removing the fast-path and
always filtering self._records in _prune_before, or (B) ensure chronological
order by updating record() to insert/merge by timestamp (or sort self._records)
so the leading element is guaranteed oldest; reference functions/attributes:
_prune_before, record, and self._records and choose the approach that fits
expected insertion patterns.

In `@src/synthorg/observability/events/tool.py`:
- Line 64: The constant TOOL_INVOCATION_TIME_RANGE_INVALID uses an inconsistent
event name; update its value from "tool.invocation.time_range_invalid" to the
dotted convention "tool.invocation.time_range.invalid" so it matches other
events (domain.subject.qualifier) and avoids splitting observability
queries/alerts; ensure any downstream uses or tests referring to
TOOL_INVOCATION_TIME_RANGE_INVALID still work after the string change.

In `@src/synthorg/providers/health.py`:
- Around line 303-308: The fast-path in _prune_before assumes self._records is
in chronological order by checking self._records[0].timestamp; change this to a
defensive check that does not rely on ordering (e.g. compute the minimum
timestamp or use all(...) to determine if every record.timestamp >= cutoff)
before returning 0, then proceed with the existing list comprehension prune;
reference: _prune_before and self._records (compare behavior to
CostTracker._prune_before).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 833278b4-c906-4c43-ace9-c2034ca40a10

📥 Commits

Reviewing files that changed from the base of the PR and between 1588d4d and 12a1e64.

📒 Files selected for processing (14)
  • CLAUDE.md
  • src/synthorg/budget/tracker.py
  • src/synthorg/communication/delegation/record_store.py
  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/providers/health.py
  • src/synthorg/tools/invocation_tracker.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/budget/test_tracker.py
  • tests/unit/communication/delegation/test_record_store.py
  • tests/unit/core/test_task.py
  • tests/unit/providers/test_health.py
  • tests/unit/tools/test_invocation_tracker.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Documentation must follow Markdown conventions: use consistent heading levels, link to relevant sections, include code examples where appropriate, and keep content up-to-date with implementation changes.

Files:

  • CLAUDE.md
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Use PEP 758 except syntax: except A, B: (no parentheses) on Python 3.14. Do NOT use from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
All public functions and classes must have type hints. Use mypy strict mode. Google-style docstrings are required on all public classes and functions (enforced by ruff D rules).
Every module with business logic must include: from synthorg.observability import get_logger then logger = get_logger(__name__). Always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Use structured kwargs: logger.info(EVENT, key=value) — never logger.info("msg %s", val). Never use import logging or print() in application code.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing redundant fields. Use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and wrap with MappingProxyType for read-only enforcement. For dict/list fields in frozen Pydantic models, use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity. Use separate mutable-via-copy models (via model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Keep functions under 50 lines and files under 800 lines. Handle errors exp...

Files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/communication/delegation/record_store.py
  • src/synthorg/providers/health.py
  • src/synthorg/tools/invocation_tracker.py
  • src/synthorg/budget/tracker.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow. Maintain 80% minimum code coverage (enforced in CI). Always run pytest with -n auto for parallel execution. Use @pytest.mark.parametrize for testing similar cases. Use asyncio_mode = "auto" (no manual @pytest.mark.asyncio needed).
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests or project code—use generic names like test-provider, test-small-001, example-provider, example-large-001, large/medium/small. Vendor names may only appear in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths, (4) provider presets.
Never skip, dismiss, or ignore flaky tests—always fix them fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() instead of widening margins. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/tools/test_invocation_tracker.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/core/test_task.py
  • tests/unit/communication/delegation/test_record_store.py
  • tests/unit/providers/test_health.py
  • tests/unit/budget/test_tracker.py
🧠 Learnings (37)
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T11:41:02.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:41:02.964Z
Learning: Applies to src/**/*.py : Models: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing + validating redundant fields. Use `NotBlankStr` for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-26T15:18:16.848Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields. Use `NotBlankStr` for all identifier/name fields.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing redundant fields. Use `NotBlankStr` from `core.types` for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T10:40:25.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T10:40:25.144Z
Learning: Applies to **/*.py : Docstrings: Google style, required on public classes and functions (enforced by ruff D rules).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to src/synthorg/**/*.py : Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction and wrap with `MappingProxyType` for read-only enforcement. For `dict`/`list` fields in frozen Pydantic models, use `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to src/synthorg/**/*.py : All public functions and classes must have type hints. Use mypy strict mode. Google-style docstrings are required on all public classes and functions (enforced by ruff D rules).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to **/*.py : Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to **/*.py : Immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T23:05:29.577Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T23:05:29.577Z
Learning: Applies to **/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/provider.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.

Applied to files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/providers/health.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly

Applied to files:

  • src/synthorg/observability/events/provider.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Use event constants from `synthorg.observability.events.<domain>` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly and log with structured kwargs: `logger.info(EVENT, key=value)`, never interpolated strings

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases.

Applied to files:

  • tests/unit/tools/test_invocation_tracker.py
  • tests/unit/communication/delegation/test_record_store.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Parametrize: Prefer pytest.mark.parametrize for testing similar cases.

Applied to files:

  • tests/unit/tools/test_invocation_tracker.py
  • tests/unit/communication/delegation/test_record_store.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/api/controllers/test_setup.py
  • tests/unit/budget/test_tracker.py
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to tests/**/*.py : Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests or project code—use generic names like `test-provider`, `test-small-001`, `example-provider`, `example-large-001`, `large`/`medium`/`small`. Vendor names may only appear in: (1) Operations design page, (2) `.claude/` files, (3) third-party import paths, (4) provider presets.

Applied to files:

  • tests/unit/api/controllers/test_setup.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions.

Applied to files:

  • src/synthorg/tools/invocation_tracker.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit

Applied to files:

  • src/synthorg/tools/invocation_tracker.py
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • src/synthorg/tools/invocation_tracker.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/budget/tracker.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • src/synthorg/budget/tracker.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/budget/tracker.py
🔇 Additional comments (22)
tests/unit/api/controllers/test_setup.py (1)

1076-1107: Deterministic randomize-name test is a solid hardening change.

Using monkeypatch to force generate_auto_name and asserting exact values in both POST and follow-up GET makes this test stable and meaningfully verifies persistence behavior.

Based on learnings: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic instead of widening timing margins.

src/synthorg/observability/events/delegation.py (1)

31-32: Good event-surface extension for delegation tracking.

These constants cleanly support the new eviction and invalid-range logging paths without introducing string literals downstream.

src/synthorg/communication/delegation/record_store.py (2)

52-82: Bounded FIFO storage and one-time eviction warning are implemented well.

max_records validation plus deque(maxlen=...) keeps memory bounded while preserving insertion-order semantics.


189-195: Nice error-path observability on invalid time ranges.

Logging DELEGATION_TIME_RANGE_INVALID immediately before raising makes failures diagnosable without changing API behavior.

tests/unit/communication/delegation/test_record_store.py (2)

198-244: Eviction boundary coverage is strong.

The suite checks below/equal/over-capacity and max_records=1, which directly validates FIFO retention behavior.


222-225: Good consolidation of invalid constructor cases.

Using one parametrized test for 0 and -1 keeps the failure contract focused and reduces duplication.

tests/unit/tools/test_invocation_tracker.py (2)

116-162: Eviction behavior tests are comprehensive and targeted.

The cases validate FIFO ordering and edge behavior (below, exact, over, and max_records=1) for the new bounded storage.


140-143: Parametrized invalid-input coverage looks good.

This keeps constructor rejection checks concise while preserving explicit value coverage.

src/synthorg/tools/invocation_tracker.py (2)

45-72: Tracker hardening implementation is solid.

max_records validation, bounded deque storage, and warning-on-first-eviction provide a clean cap without log spam.


124-136: Time-range validation helper is clean and observable.

Centralizing validation with a structured warning before raising improves consistency and debuggability.

src/synthorg/budget/tracker.py (3)

97-116: LGTM!

Constructor validation at lines 104-106 properly enforces auto_prune_threshold >= 1 at the system boundary, addressing the previous review feedback. The error message is clear and actionable.


142-164: LGTM!

The prune_expired() method is well-designed:

  • Accepts optional now for testability
  • Uses shared _prune_before() helper
  • Logs with structured event constant
  • Returns count for caller visibility

428-453: LGTM!

The _snapshot() method correctly:

  • Accepts now parameter for consistent reference time
  • Only auto-prunes when threshold is exceeded (not at threshold)
  • Logs auto-prune events separately from manual prune
  • Returns immutable tuple snapshot
tests/unit/budget/test_tracker.py (2)

464-502: LGTM!

Comprehensive test coverage for prune_expired():

  • Standard pruning case
  • Empty tracker edge case
  • No-expiry case
  • Exact cutoff boundary verification (record at cutoff retained)

The tests use explicit timestamps avoiding timing-based flakiness.


508-572: LGTM!

The TestCostTrackerAutoEviction class thoroughly validates auto-pruning behavior:

  • Threshold exceeded (12 records, threshold 10)
  • Below threshold (no prune)
  • Exactly at threshold (no prune - correct > semantics)
  • All-expired edge case
  • Invalid constructor values via parametrize
tests/unit/core/test_task.py (1)

371-378: LGTM!

Good parametrized test covering all three non-finite float cases (inf, -inf, nan) with clear test IDs. This validates the allow_inf_nan=False ConfigDict setting on the Task model.

src/synthorg/observability/events/provider.py (1)

82-85: LGTM!

New event constants follow the established naming pattern and provide proper observability for both manual and automatic pruning paths.

tests/unit/providers/test_health.py (2)

433-441: LGTM!

The exact-cutoff boundary test correctly verifies that a record timestamped exactly at the 24-hour cutoff is retained (not pruned), confirming the >= semantics.


447-527: LGTM!

The TestAutoEviction class provides comprehensive coverage that mirrors the CostTracker tests:

  • Auto-prune when threshold exceeded
  • No prune below threshold (with verification via prune_expired)
  • No prune at exact threshold
  • All-expired edge case
  • Invalid constructor values

The assertion at lines 489-491 addresses the previous review feedback about strengthening the "no prune below threshold" test.

src/synthorg/providers/health.py (3)

57-57: LGTM!

ProviderHealthRecord correctly includes allow_inf_nan=False in its ConfigDict, aligning with the updated convention and ensuring response_time_ms rejects non-finite values.


169-181: LGTM!

Good additions:

  • __slots__ improves memory efficiency for tracker instances
  • Constructor validation enforces auto_prune_threshold >= 1
  • Pattern matches CostTracker for consistency

276-301: LGTM!

The _snapshot() method correctly:

  • Accepts now parameter (addresses previous review feedback)
  • Uses consistent reference time for cutoff calculation
  • Auto-prunes only when > threshold (not >=)
  • Logs with distinct event constant for auto-prune

- **Immutability**: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction + `MappingProxyType` wrapping for read-only enforcement. For `dict`/`list` fields in frozen Pydantic models, rely on `frozen=True` for field reassignment prevention and `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).
- **Config vs runtime state**: frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state that evolves (e.g. agent execution state, task progress). Never mix static config fields with mutable runtime fields in one model.
- **Models**: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Adopted conventions: use `@computed_field` for derived values instead of storing + validating redundant fields (e.g. `TokenUsage.total_tokens`); use `NotBlankStr` (from `core.types`) for all identifier/name fields -- including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants -- instead of manual whitespace validators.
- **Models**: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Adopted conventions: use `allow_inf_nan=False` in all `ConfigDict` declarations to reject `NaN`/`Inf` in numeric fields at validation time; use `@computed_field` for derived values instead of storing + validating redundant fields (e.g. `TokenUsage.total_tokens`); use `NotBlankStr` (from `core.types`) for all identifier/name fields -- including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants -- instead of manual whitespace validators.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

New allow_inf_nan=False convention is not applied consistently across codebase.

The convention update is well-documented, but the relevant code snippets indicate 18+ existing ConfigDict declarations that don't include allow_inf_nan=False:

  • src/synthorg/providers/models.py (9 declarations)
  • src/synthorg/tools/base.py (1 declaration)
  • src/synthorg/settings/models.py (3 declarations)
  • src/synthorg/templates/schema.py (6 declarations)

This PR only updated Task and AcceptanceCriterion. Consider tracking the remaining models for a follow-up PR to ensure the codebase fully aligns with this convention.

Do you want me to open an issue to track updating the remaining ConfigDict declarations to include allow_inf_nan=False?

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CLAUDE.md` at line 321, Several ConfigDict declarations across the codebase
are missing the new convention allow_inf_nan=False; update every ConfigDict in
the listed modules (the remaining declarations in providers/models.py,
tools/base.py, settings/models.py, templates/schema.py) to include
allow_inf_nan=False (Task and AcceptanceCriterion are already updated), and if
you don't want to change them now create an issue to track updating these
remaining ConfigDict occurrences so they’re handled in a follow-up PR.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/providers/health.py (1)

237-242: ⚠️ Potential issue | 🟠 Major

Clamp historical summaries to the caller’s reference time.

Lines 241 and 268 only apply the lower bound (timestamp >= cutoff). If a caller asks for now=<historical datetime>, records newer than that reference time are still included, so the returned summary is not actually “as of” now. Add the missing upper bound in both filters.

Proposed fix
         recent = [
             r
             for r in snapshot
-            if r.provider_name == provider_name and r.timestamp >= cutoff
+            if r.provider_name == provider_name
+            and cutoff <= r.timestamp <= ref
         ]
@@
         by_provider: dict[str, list[ProviderHealthRecord]] = defaultdict(list)
         for r in snapshot:
-            if r.timestamp >= cutoff:
+            if cutoff <= r.timestamp <= ref:
                 by_provider[r.provider_name].append(r)

Also applies to: 265-269

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/providers/health.py` around lines 237 - 242, The filtering for
historical "as of" snapshots only applies the lower bound (r.timestamp >=
cutoff) and must also clamp to the reference time; update both filters that
build recent and recent_by_type after calling self._snapshot(now=ref) to include
an upper bound using the reference variable (e.g., require cutoff <= r.timestamp
<= ref or r.timestamp <= ref), ensuring you use the same ref/now passed to
_snapshot and apply the <= ref check in the list comprehensions that reference
provider_name and when computing max timestamps per type.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/budget/tracker.py`:
- Around line 61-62: The 7-day retention constant _COST_WINDOW_HOURS is
hard-coded and pruning logic tied to _AUTO_PRUNE_THRESHOLD discards older
records while build_summary() accepts arbitrary start/end ranges, causing silent
undercounting for month-to-date or longer queries; either make the retention
window configurable (expose _COST_WINDOW_HOURS as a configurable parameter and
use it wherever pruning and retention checks occur) or ensure retention aligns
with the longest supported reporting period, and if you intentionally keep 7-day
retention then modify build_summary() to validate the requested start/end
against the retention window and raise/reject queries that exceed it (update all
uses of _COST_WINDOW_HOURS and the pruning code that depends on it and ensure
budget_total_monthly / budget_used_percent calculations only run when data for
the full requested range is present).

---

Outside diff comments:
In `@src/synthorg/providers/health.py`:
- Around line 237-242: The filtering for historical "as of" snapshots only
applies the lower bound (r.timestamp >= cutoff) and must also clamp to the
reference time; update both filters that build recent and recent_by_type after
calling self._snapshot(now=ref) to include an upper bound using the reference
variable (e.g., require cutoff <= r.timestamp <= ref or r.timestamp <= ref),
ensuring you use the same ref/now passed to _snapshot and apply the <= ref check
in the list comprehensions that reference provider_name and when computing max
timestamps per type.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f17b93ef-0b1a-4cce-a440-475174e1affe

📥 Commits

Reviewing files that changed from the base of the PR and between 12a1e64 and b33f4fb.

📒 Files selected for processing (3)
  • src/synthorg/budget/tracker.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/providers/health.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (1)
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Use PEP 758 except syntax: except A, B: (no parentheses) on Python 3.14. Do NOT use from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
All public functions and classes must have type hints. Use mypy strict mode. Google-style docstrings are required on all public classes and functions (enforced by ruff D rules).
Every module with business logic must include: from synthorg.observability import get_logger then logger = get_logger(__name__). Always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Use structured kwargs: logger.info(EVENT, key=value) — never logger.info("msg %s", val). Never use import logging or print() in application code.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing redundant fields. Use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and wrap with MappingProxyType for read-only enforcement. For dict/list fields in frozen Pydantic models, use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity. Use separate mutable-via-copy models (via model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Keep functions under 50 lines and files under 800 lines. Handle errors exp...

Files:

  • src/synthorg/observability/events/tool.py
  • src/synthorg/budget/tracker.py
  • src/synthorg/providers/health.py
🧠 Learnings (14)
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Use event constants from `synthorg.observability.events.<domain>` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly and log with structured kwargs: `logger.info(EVENT, key=value)`, never interpolated strings

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/budget/tracker.py
  • src/synthorg/providers/health.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/budget/tracker.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • src/synthorg/budget/tracker.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.

Applied to files:

  • src/synthorg/providers/health.py
🔇 Additional comments (1)
src/synthorg/observability/events/tool.py (1)

63-64: Looks good — event constants are consistent and correctly named.

Both new constants follow the existing tool.* naming scheme, and TOOL_INVOCATION_TIME_RANGE_INVALID now matches the dotted convention (time_range.invalid).

As per coding guidelines, event names should be defined as constants in domain-specific modules under synthorg.observability.events.

Comment on lines +61 to +62
_COST_WINDOW_HOURS = 168 # 7 days
_AUTO_PRUNE_THRESHOLD = 100_000
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Seven-day eviction will undercount longer-range budget summaries.

Lines 155 and 445 permanently discard records older than 168 hours, but build_summary() still accepts arbitrary start/end ranges and computes budget_total_monthly / budget_used_percent from whatever survives. Once a deployment crosses the prune threshold, month-to-date and any longer-range summaries will silently undercount. Make the retention window configurable (or align it to the longest supported reporting period); if 7 days is intentional, reject older query ranges instead of returning partial data.

Also applies to: 75-81, 154-157, 443-446

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/tracker.py` around lines 61 - 62, The 7-day retention
constant _COST_WINDOW_HOURS is hard-coded and pruning logic tied to
_AUTO_PRUNE_THRESHOLD discards older records while build_summary() accepts
arbitrary start/end ranges, causing silent undercounting for month-to-date or
longer queries; either make the retention window configurable (expose
_COST_WINDOW_HOURS as a configurable parameter and use it wherever pruning and
retention checks occur) or ensure retention aligns with the longest supported
reporting period, and if you intentionally keep 7-day retention then modify
build_summary() to validate the requested start/end against the retention window
and raise/reject queries that exceed it (update all uses of _COST_WINDOW_HOURS
and the pruning code that depends on it and ensure budget_total_monthly /
budget_used_percent calculations only run when data for the full requested range
is present).

Aureliolo and others added 9 commits March 29, 2026 12:00
…#844)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cordStore (#837)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pre-reviewed by 11 agents, 7 findings addressed:
- Add logging to ProviderHealthTracker.prune_expired (parity with CostTracker)
- Validate auto_prune_threshold >= 1 in both trackers
- Log WARNING before raising ValueError in time-range validation
- Add boundary tests (threshold exact, max_records=1)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r crash

The lambda mock was missing the `seed` kwarg that setup_agents.py
passes, causing a TypeError that crashed the xdist worker silently.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… Gemini

- Fix stale "append-only" docstrings across 4 source files (now documents
  TTL-based eviction and size-cap eviction respectively)
- Add missing Args/Raises to CostTracker and ProviderHealthTracker docstrings
- Extract _prune_before helper to DRY duplicated pruning logic
- Thread `now` parameter to _snapshot() for deterministic auto-prune cutoffs
- Add O(1) fast-path in _prune_before (skip scan when oldest record is fresh)
- Fix eviction warning log spam: warn once per tracker lifetime, not per append
- Add __slots__ to CostTracker, DelegationRecordStore, ToolInvocationTracker
- Extract _validate_time_range helper in ToolInvocationTracker (consistent
  with CostTracker and DelegationRecordStore)
- Update concurrency note in DelegationRecordStore to reference deque semantics
- Document prune window durations in prune_expired docstrings (168h, 24h)
- Document allow_inf_nan=False convention in CLAUDE.md Models section
- Parametrize duplicate zero/negative validation tests across 4 test files
- Add match= to inf/nan rejection test for precision
- Add boundary tests: exact cutoff retention, all-records-expired, exact-max
- Strengthen auto-prune test to verify identity of surviving records

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ProviderHealthTracker retains __slots__ (pre-existing, its tests use
real instances).  CostTracker, DelegationRecordStore, and
ToolInvocationTracker cannot use __slots__ because downstream tests
mock them with MagicMock which requires dynamic attribute assignment.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unsafe fast-path in _prune_before (CostTracker +
  ProviderHealthTracker) that assumed chronological insertion order;
  records appended via record() may have out-of-order timestamps
- Fix event name "tool.invocation.time_range_invalid" to use dotted
  convention "tool.invocation.time_range.invalid" matching budget and
  delegation event patterns
- Created #913 to track adding allow_inf_nan=False to remaining ~150
  ConfigDict declarations across the codebase

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CostTracker.build_summary() now logs a warning when the requested
  date range extends beyond the 168-hour retention window, alerting
  operators that auto-pruned data may cause undercounting
- ProviderHealthTracker.get_summary() and get_all_summaries() now clamp
  records to the reference time (cutoff <= timestamp <= ref), preventing
  records newer than the reference from leaking into historical queries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/api/controllers/test_setup.py (1)

479-485: 🧹 Nitpick | 🔵 Trivial

Run the registry assertion on the TestClient app loop.

Creating a fresh event loop here disconnects the assertion from the loop that owns app_state.agent_registry and any loop-bound primitives it uses. That makes these tests brittle even though they pass today. Prefer an async test or the Litestar TestClient portal/helper that executes the coroutine on the application's loop.

Also applies to: 633-639

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/api/controllers/test_setup.py` around lines 479 - 485, The test
currently creates a fresh asyncio loop and runs
app_state.agent_registry.agent_count() on it, which disconnects that call from
the application's event loop and its loop-bound primitives; instead, execute the
coroutine on the TestClient/app loop by using the Litestar TestClient helper
(e.g., TestClient.run_sync or the TestClient portal) to run
app_state.agent_registry.agent_count() so the call runs in the application's
loop and you can remove asyncio.new_event_loop() / loop.run_until_complete() /
loop.close() surrounding the agent_count call.
♻️ Duplicate comments (1)
src/synthorg/budget/tracker.py (1)

62-63: ⚠️ Potential issue | 🟠 Major

The 7-day retention window still returns silently partial summaries.

This only logs when start predates the retention cutoff, but build_summary() still returns normal totals and budget_used_percent afterward. Once _snapshot() has pruned, any month-to-date or other long-range report becomes an undercount that callers cannot detect. Either reject ranges earlier than the retained window, or make the retention period configurable/aligned with the longest supported reporting period.

Also applies to: 299-309

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/tracker.py` around lines 62 - 63, The retention window
constant _COST_WINDOW_HOURS (168) causes silent undercounts because _snapshot()
prunes old data but build_summary() and budget_used_percent still return totals
for ranges that start before the retention cutoff; change behavior so callers
can detect this: either make the retention period configurable and used as the
authoritative max-range, or validate range inputs (e.g., in build_summary() and
any month-to-date/long-range report methods) and raise/return an explicit error
when the requested start is earlier than now() -
timedelta(hours=_COST_WINDOW_HOURS). Ensure _snapshot() and the validation use
the same _COST_WINDOW_HOURS (or the new config) so they stay aligned and callers
no longer receive silent partial summaries.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/budget/tracker.py`:
- Around line 143-165: The prune_expired() routine is only invoked on read paths
so expired entries accumulate on write-heavy nodes; fix by adding opportunistic
or periodic pruning: update the record() method in the same budget tracker to
opportunistically call prune_expired() (or inline _prune_before(cutoff)) under
the existing self._lock whenever appending a record and the record count or age
exceeds a small threshold, and/or add a background prune loop that calls
prune_expired() on an interval (implement a _start_prune_loop coroutine and
start it from your service lifecycle bootstrap where services are started).
Ensure you use the existing methods (_prune_before, prune_expired) and locks to
avoid races and avoid blocking the write path by scheduling async background
calls where appropriate.

In `@src/synthorg/communication/delegation/record_store.py`:
- Around line 38-43: record_sync lacks protection against concurrent calls from
other OS threads and _snapshot iterates the deque without that protection; add a
dedicated threading lock (e.g. self._thread_lock = threading.Lock()) and use it
to guard deque mutations and snapshot iteration: acquire self._thread_lock in
record_sync before appending the DelegationRecord and in _snapshot while
copying/iterating the deque, keep the existing asyncio._lock for async-reader
coordination, and import threading; this provides thread-safety without changing
the async lock semantics.

In `@src/synthorg/tools/invocation_tracker.py`:
- Around line 50-52: Before raising the ValueError for invalid max_records in
invocation_tracker.py, emit a warning that includes the rejected value and
context; e.g., call the module/class logger (use the same logger used elsewhere
in this file) to logger.warning("Rejected max_records", extra={"max_records":
max_records, "reason": "must be >= 1"}) or similar structured message
immediately before raise ValueError(msg) so the invalid value and rationale are
recorded.

In `@tests/unit/providers/test_health.py`:
- Around line 453-472: The test currently only verifies get_summary's 24h
filtering but not that expired entries were actually evicted from the tracker's
storage; update the test for ProviderHealthTracker by adding an explicit call to
prune_expired(now=self._NOW) after summary = await tracker.get_summary(...) and
assert the prune_expired return (or the change in tracker._records length)
equals the expected number evicted (0 for >threshold auto-prune case and the
expected count at the exact-threshold boundary); apply the same additional
assertion pattern to the other test block around the 493-520 region to prove
pruning is a side effect, referencing ProviderHealthTracker, get_summary,
prune_expired, and the internal _records for locating the code.

---

Outside diff comments:
In `@tests/unit/api/controllers/test_setup.py`:
- Around line 479-485: The test currently creates a fresh asyncio loop and runs
app_state.agent_registry.agent_count() on it, which disconnects that call from
the application's event loop and its loop-bound primitives; instead, execute the
coroutine on the TestClient/app loop by using the Litestar TestClient helper
(e.g., TestClient.run_sync or the TestClient portal) to run
app_state.agent_registry.agent_count() so the call runs in the application's
loop and you can remove asyncio.new_event_loop() / loop.run_until_complete() /
loop.close() surrounding the agent_count call.

---

Duplicate comments:
In `@src/synthorg/budget/tracker.py`:
- Around line 62-63: The retention window constant _COST_WINDOW_HOURS (168)
causes silent undercounts because _snapshot() prunes old data but
build_summary() and budget_used_percent still return totals for ranges that
start before the retention cutoff; change behavior so callers can detect this:
either make the retention period configurable and used as the authoritative
max-range, or validate range inputs (e.g., in build_summary() and any
month-to-date/long-range report methods) and raise/return an explicit error when
the requested start is earlier than now() - timedelta(hours=_COST_WINDOW_HOURS).
Ensure _snapshot() and the validation use the same _COST_WINDOW_HOURS (or the
new config) so they stay aligned and callers no longer receive silent partial
summaries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c340253f-1d44-4b9c-961c-818005f0aefa

📥 Commits

Reviewing files that changed from the base of the PR and between b33f4fb and e4649ce.

📒 Files selected for processing (16)
  • CLAUDE.md
  • src/synthorg/budget/tracker.py
  • src/synthorg/communication/delegation/record_store.py
  • src/synthorg/core/task.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/providers/health.py
  • src/synthorg/tools/invocation_tracker.py
  • tests/unit/api/controllers/test_setup.py
  • tests/unit/budget/test_tracker.py
  • tests/unit/communication/delegation/test_record_store.py
  • tests/unit/core/test_task.py
  • tests/unit/providers/test_health.py
  • tests/unit/tools/test_invocation_tracker.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Documentation must follow Markdown conventions: use consistent heading levels, link to relevant sections, include code examples where appropriate, and keep content up-to-date with implementation changes.

Files:

  • CLAUDE.md
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Use PEP 758 except syntax: except A, B: (no parentheses) on Python 3.14. Do NOT use from __future__ import annotations—Python 3.14 has PEP 649 native lazy annotations.
All public functions and classes must have type hints. Use mypy strict mode. Google-style docstrings are required on all public classes and functions (enforced by ruff D rules).
Every module with business logic must include: from synthorg.observability import get_logger then logger = get_logger(__name__). Always use constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api). Use structured kwargs: logger.info(EVENT, key=value) — never logger.info("msg %s", val). Never use import logging or print() in application code.
Use Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use @computed_field for derived values instead of storing redundant fields. Use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction and wrap with MappingProxyType for read-only enforcement. For dict/list fields in frozen Pydantic models, use copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).
Use frozen Pydantic models for config/identity. Use separate mutable-via-copy models (via model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Keep functions under 50 lines and files under 800 lines. Handle errors exp...

Files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/core/task.py
  • src/synthorg/observability/events/budget.py
  • src/synthorg/tools/invocation_tracker.py
  • src/synthorg/communication/delegation/record_store.py
  • src/synthorg/providers/health.py
  • src/synthorg/budget/tracker.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow. Maintain 80% minimum code coverage (enforced in CI). Always run pytest with -n auto for parallel execution. Use @pytest.mark.parametrize for testing similar cases. Use asyncio_mode = "auto" (no manual @pytest.mark.asyncio needed).
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests or project code—use generic names like test-provider, test-small-001, example-provider, example-large-001, large/medium/small. Vendor names may only appear in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths, (4) provider presets.
Never skip, dismiss, or ignore flaky tests—always fix them fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() instead of widening margins. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/core/test_task.py
  • tests/unit/tools/test_invocation_tracker.py
  • tests/unit/communication/delegation/test_record_store.py
  • tests/unit/providers/test_health.py
  • tests/unit/budget/test_tracker.py
  • tests/unit/api/controllers/test_setup.py
🧠 Learnings (44)
📚 Learning: 2026-03-17T11:41:02.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:41:02.964Z
Learning: Applies to src/**/*.py : Models: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing + validating redundant fields. Use `NotBlankStr` for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • CLAUDE.md
  • src/synthorg/core/task.py
📚 Learning: 2026-03-26T15:18:16.848Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields. Use `NotBlankStr` for all identifier/name fields.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing redundant fields. Use `NotBlankStr` from `core.types` for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

  • CLAUDE.md
  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`

Applied to files:

  • CLAUDE.md
  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.

Applied to files:

  • CLAUDE.md
  • src/synthorg/core/task.py
📚 Learning: 2026-03-16T10:40:25.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T10:40:25.144Z
Learning: Applies to **/*.py : Docstrings: Google style, required on public classes and functions (enforced by ruff D rules).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to src/synthorg/**/*.py : Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, `BaseTool`), use `copy.deepcopy()` at construction and wrap with `MappingProxyType` for read-only enforcement. For `dict`/`list` fields in frozen Pydantic models, use `copy.deepcopy()` at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, persistence serialization).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to src/synthorg/**/*.py : All public functions and classes must have type hints. Use mypy strict mode. Google-style docstrings are required on all public classes and functions (enforced by ruff D rules).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to **/*.py : Config vs runtime state: use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • CLAUDE.md
  • src/synthorg/core/task.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to **/*.py : Immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T23:05:29.577Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T23:05:29.577Z
Learning: Applies to **/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to **/*.py : Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/delegation.py
  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals

Applied to files:

  • src/synthorg/observability/events/tool.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Use event constants from `synthorg.observability.events.<domain>` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly and log with structured kwargs: `logger.info(EVENT, key=value)`, never interpolated strings

Applied to files:

  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls

Applied to files:

  • src/synthorg/observability/events/tool.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/providers/**/*.py : Providers: LLM provider abstraction (LiteLLM adapter), auth types (api_key/oauth/custom_header/none), presets (PROVIDER_PRESETS), runtime CRUD (ProviderManagementService with asyncio.Lock serialization), hot-reload via AppState swap.

Applied to files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/providers/health.py
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.) — import directly

Applied to files:

  • src/synthorg/observability/events/provider.py
  • src/synthorg/observability/events/budget.py
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity. Use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; separate mutable-via-copy models (using `model_copy(update=...)`) for runtime state

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence).

Applied to files:

  • src/synthorg/core/task.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Prefer `pytest.mark.parametrize` for testing similar cases.

Applied to files:

  • tests/unit/tools/test_invocation_tracker.py
  • tests/unit/communication/delegation/test_record_store.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Parametrize: Prefer pytest.mark.parametrize for testing similar cases.

Applied to files:

  • tests/unit/tools/test_invocation_tracker.py
  • tests/unit/communication/delegation/test_record_store.py
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget package (budget/): cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError)

Applied to files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/budget/tracker.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/budget/**/*.py : Budget tracking includes pre-flight/in-flight checks, auto-downgrade, billing periods, cost tiers, quota/subscription. CFO includes anomaly detection, efficiency analysis, downgrade recommendations.

Applied to files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/providers/health.py
  • src/synthorg/budget/tracker.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Budget: Cost tracking, budget enforcement (pre-flight/in-flight checks, auto-downgrade), billing periods, cost tiers, quota/subscription tracking, CFO cost optimization (anomaly detection, efficiency analysis, downgrade recommendations, approval decisions), spending reports, budget errors (BudgetExhaustedError, DailyLimitExceededError, QuotaExhaustedError).

Applied to files:

  • src/synthorg/observability/events/budget.py
  • src/synthorg/budget/tracker.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions.

Applied to files:

  • src/synthorg/tools/invocation_tracker.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit

Applied to files:

  • src/synthorg/tools/invocation_tracker.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/api/controllers/test_setup.py
📚 Learning: 2026-03-28T21:29:12.755Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-28T21:29:12.755Z
Learning: Applies to tests/**/*.py : Use markers `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Maintain 80% minimum code coverage (enforced in CI). Always run pytest with `-n auto` for parallel execution. Use `pytest.mark.parametrize` for testing similar cases. Use `asyncio_mode = "auto"` (no manual `pytest.mark.asyncio` needed).

Applied to files:

  • tests/unit/api/controllers/test_setup.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to tests/**/*.py : Test markers: `pytest.mark.unit`, `pytest.mark.integration`, `pytest.mark.e2e`, `pytest.mark.slow`. Coverage: 80% minimum. Async: `asyncio_mode = 'auto'` — no manual `pytest.mark.asyncio` needed. Timeout: 30 seconds per test. Parallelism: `pytest-xdist` via `-n auto` — ALWAYS include `-n auto` when running pytest, never run tests sequentially.

Applied to files:

  • tests/unit/api/controllers/test_setup.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • tests/unit/api/controllers/test_setup.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/api/**/*.py : API package (api/): Litestar REST + WebSocket with controllers, guards, channels, JWT + API key + WS ticket auth, approval gate integration, coordination endpoint, collaboration endpoint, settings endpoint, provider management endpoint (CRUD + test + presets), backup endpoint, RFC 9457 structured errors, AppState hot-reload slots, service auto-wiring (Phase 1 at construction, Phase 2 on startup), lifecycle helpers

Applied to files:

  • tests/unit/api/controllers/test_setup.py
🔇 Additional comments (9)
CLAUDE.md (1)

321-321: LGTM!

The updated convention clearly documents the requirement for allow_inf_nan=False in all ConfigDict declarations. Per the commit messages, issue #913 has been opened to track applying this convention to the remaining ConfigDict declarations across the codebase.

tests/unit/core/test_task.py (1)

371-378: LGTM!

Well-structured parametrized test for the new allow_inf_nan=False validation. The test correctly covers all three non-finite float edge cases (+inf, -inf, NaN) and the error message match aligns with Pydantic's validation output.

src/synthorg/observability/events/provider.py (1)

82-85: LGTM!

Event constants are correctly defined with proper dotted naming convention (provider.health.pruned, provider.health.auto_pruned) matching the existing patterns in this module.

Minor observation: The section comment uses # -- ... -- style while other sections in this file use the box-drawing characters (# ── ... ──). Consider aligning for visual consistency, though this is purely cosmetic.

src/synthorg/observability/events/delegation.py (1)

31-32: LGTM!

Event constants are correctly defined with proper dotted naming convention, consistent with the existing patterns in this module. The context snippet confirms these constants are properly imported and used by record_store.py.

src/synthorg/observability/events/tool.py (1)

63-64: LGTM!

Event constants are correctly defined. The TOOL_INVOCATION_TIME_RANGE_INVALID constant now uses the proper dotted naming convention (tool.invocation.time_range.invalid), consistent with other similar events like DELEGATION_TIME_RANGE_INVALID and BUDGET_TIME_RANGE_INVALID.

src/synthorg/core/task.py (2)

34-34: LGTM!

Adding allow_inf_nan=False to AcceptanceCriterion is defensive and aligns with the new codebase-wide convention, even though this model currently has no float fields.


79-79: LGTM!

The allow_inf_nan=False addition correctly prevents non-finite values in the budget_limit field, which is the primary motivation for this change. This is properly covered by the new parametrized test in test_task.py.

tests/unit/communication/delegation/test_record_store.py (1)

189-244: LGTM!

Comprehensive test coverage for the new FIFO eviction behavior:

  • Eviction when exceeding max_records
  • No eviction below and at exact capacity
  • Constructor validation for invalid max_records values (properly parametrized per guidelines)
  • Edge case for max_records=1

The tests correctly verify the deque(maxlen=N) behavior where the oldest records are automatically evicted on append when at capacity.

src/synthorg/observability/events/budget.py (1)

37-41: LGTM!

Event constants are correctly defined with proper dotted naming convention. The context snippet confirms these constants are properly imported and used by tracker.py for the new TTL-based pruning and retention warning behaviors.

Comment on lines +143 to +165
async def prune_expired(self, *, now: datetime | None = None) -> int:
"""Remove records older than the 168-hour (7-day) cost window.

Call periodically from long-running services to bound
memory growth.

Args:
now: Reference time. Defaults to current UTC time.

Returns:
Number of records removed.
"""
ref = now or datetime.now(UTC)
cutoff = ref - timedelta(hours=_COST_WINDOW_HOURS)
async with self._lock:
pruned = self._prune_before(cutoff)
if pruned:
logger.info(
BUDGET_RECORDS_PRUNED,
pruned=pruned,
remaining=len(self._records),
)
return pruned
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Expired records are only pruned on read paths.

record() never checks the TTL, _snapshot() only runs when somebody queries, and the current wiring in src/synthorg/api/auto_wire.py:156-167, src/synthorg/api/lifecycle.py:157-200, and src/synthorg/api/app.py:192-210 does not schedule prune_expired(). A write-heavy/read-light node will therefore keep accumulating expired entries indefinitely, which misses the memory-bounding goal of this change. Add a periodic prune loop or opportunistic pruning from record() once the threshold is crossed.

Also applies to: 439-464

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/budget/tracker.py` around lines 143 - 165, The prune_expired()
routine is only invoked on read paths so expired entries accumulate on
write-heavy nodes; fix by adding opportunistic or periodic pruning: update the
record() method in the same budget tracker to opportunistically call
prune_expired() (or inline _prune_before(cutoff)) under the existing self._lock
whenever appending a record and the record count or age exceeds a small
threshold, and/or add a background prune loop that calls prune_expired() on an
interval (implement a _start_prune_loop coroutine and start it from your service
lifecycle bootstrap where services are started). Ensure you use the existing
methods (_prune_before, prune_expired) and locks to avoid races and avoid
blocking the write path by scheduling async background calls where appropriate.

Comment on lines 38 to +43
Concurrency note: ``record_sync`` does not acquire ``_lock``.
The lock serialises concurrent async readers only. Append-only
semantics and cooperative asyncio scheduling make single-call
sync writes safe (``list.append`` cannot be interrupted).
The lock serialises concurrent async readers only. Cooperative
asyncio scheduling and deque's internal maxlen enforcement make
single-call sync writes safe (``deque.append`` cannot be
interrupted). The eviction warning is best-effort under this
model.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n src/synthorg/communication/delegation/record_store.py | head -100

Repository: Aureliolo/synthorg

Length of output: 4014


🏁 Script executed:

cat -n src/synthorg/communication/delegation/record_store.py | tail -80

Repository: Aureliolo/synthorg

Length of output: 3192


🏁 Script executed:

rg "record_sync" --type py -B 2 -A 2

Repository: Aureliolo/synthorg

Length of output: 12305


🏁 Script executed:

sed -n '1,50p' src/synthorg/communication/delegation/service.py | head -60

Repository: Aureliolo/synthorg

Length of output: 1643


🏁 Script executed:

sed -n '40,150p' src/synthorg/communication/delegation/service.py

Repository: Aureliolo/synthorg

Length of output: 3669


🏁 Script executed:

sed -n '130,200p' src/synthorg/communication/delegation/service.py

Repository: Aureliolo/synthorg

Length of output: 2485


🏁 Script executed:

rg "\.delegate\(" --type py -B 3 -A 1 | head -60

Repository: Aureliolo/synthorg

Length of output: 4975


🏁 Script executed:

rg "DelegationService\(" --type py -B 2 -A 5 | head -80

Repository: Aureliolo/synthorg

Length of output: 2717


🏁 Script executed:

rg "service\.delegate|\.delegate\(" --type py src/ | head -30

Repository: Aureliolo/synthorg

Length of output: 44


🏁 Script executed:

rg "delegate\(" --type py src/ -A 2 -B 2

Repository: Aureliolo/synthorg

Length of output: 430


🏁 Script executed:

rg "\.delegate\(" src/ --type py -B 5 -A 3

Repository: Aureliolo/synthorg

Length of output: 44


🏁 Script executed:

fd -e py src/ | xargs grep -l "DelegationService" | head -5

Repository: Aureliolo/synthorg

Length of output: 383


🏁 Script executed:

fd -e py . src/ | xargs grep -l "DelegationService" 2>/dev/null | head -5

Repository: Aureliolo/synthorg

Length of output: 257


🏁 Script executed:

cat -n src/synthorg/communication/conflict_resolution/service.py | head -150

Repository: Aureliolo/synthorg

Length of output: 5684


🏁 Script executed:

rg "to_thread|ThreadPoolExecutor|threading\." src/synthorg/communication/delegation/ --type py

Repository: Aureliolo/synthorg

Length of output: 44


🏁 Script executed:

rg "async def.*delegate" src/ --type py -A 10 | head -50

Repository: Aureliolo/synthorg

Length of output: 1837


Clarify the design constraint on record_sync() or add enforcement mechanism.

record_sync() indeed lacks synchronization with _snapshot(): an asyncio.Lock only coordinates coroutines on the event loop, not separate OS threads. If record_sync() is called from a different thread while _snapshot() iterates the deque, concurrent mutation can cause RuntimeError or a torn snapshot.

The current docstring states the method is "safe to call from sync code under asyncio cooperative scheduling," which implicitly constrains usage to the event loop thread. However, this constraint is undocumented in the signature and unenforced at runtime. Either:

  1. Add a thread-safe lock shared by both record_sync() and _snapshot(), or
  2. Explicitly document the event-loop-thread-only requirement and (optionally) add a runtime check to prevent off-thread calls.
Context: lines 38–43 and 66–88
Concurrency note: ``record_sync`` does not acquire ``_lock``.
The lock serialises concurrent async readers only.  Cooperative
asyncio scheduling and deque's internal maxlen enforcement make
single-call sync writes safe (``deque.append`` cannot be
interrupted).  The eviction warning is best-effort under this
model.
def record_sync(self, delegation: DelegationRecord) -> None:
    """Append a delegation record (sync, for cooperative scheduling).

    Safe to call from sync code under asyncio cooperative scheduling
    since a plain deque append cannot be interrupted by another
    coroutine.
    ...
    """
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/communication/delegation/record_store.py` around lines 38 - 43,
record_sync lacks protection against concurrent calls from other OS threads and
_snapshot iterates the deque without that protection; add a dedicated threading
lock (e.g. self._thread_lock = threading.Lock()) and use it to guard deque
mutations and snapshot iteration: acquire self._thread_lock in record_sync
before appending the DelegationRecord and in _snapshot while copying/iterating
the deque, keep the existing asyncio._lock for async-reader coordination, and
import threading; this provides thread-safety without changing the async lock
semantics.

Comment on lines +50 to +52
if max_records < 1:
msg = f"max_records must be >= 1, got {max_records}"
raise ValueError(msg)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log rejected max_records values before raising.

This new constructor error path is silent today, so bad retention config can fail without any structured warning in logs. Please emit a warning with the rejected value before raising ValueError.

Based on learnings, "All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/tools/invocation_tracker.py` around lines 50 - 52, Before
raising the ValueError for invalid max_records in invocation_tracker.py, emit a
warning that includes the rejected value and context; e.g., call the
module/class logger (use the same logger used elsewhere in this file) to
logger.warning("Rejected max_records", extra={"max_records": max_records,
"reason": "must be >= 1"}) or similar structured message immediately before
raise ValueError(msg) so the invalid value and rationale are recorded.

Add prune_expired() assertions after get_summary() calls to prove
auto-prune actually evicted expired records (removed==0) and that
no-prune paths preserved expired records (removed==5).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/unit/providers/test_health.py (1)

517-527: ⚠️ Potential issue | 🟡 Minor

Assert auto-prune side effect in the all-expired case.

At Line 525-526, calls_last_24h == 0 can still pass via filtering alone. Add a post-summary prune_expired assertion to prove _snapshot() actually evicted records when len(_records) > auto_prune_threshold.

♻️ Proposed test hardening
     summary = await tracker.get_summary("test-provider", now=self._NOW)
     assert summary.calls_last_24h == 0
+    removed = await tracker.prune_expired(now=self._NOW)
+    assert removed == 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/providers/test_health.py` around lines 517 - 527, The test
test_snapshot_all_records_expired must assert the auto-prune side-effect: after
creating 6 expired records on a ProviderHealthTracker(auto_prune_threshold=5)
and calling get_summary, verify that pruning actually occurred by asserting the
tracker’s internal storage was reduced—either call the tracker.prune_expired()
(or the public pruning API) and assert tracker._records is empty (or
len(tracker._records) <= auto_prune_threshold) to prove _snapshot() evicted
records when len(_records) > auto_prune_threshold.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/unit/providers/test_health.py`:
- Around line 517-527: The test test_snapshot_all_records_expired must assert
the auto-prune side-effect: after creating 6 expired records on a
ProviderHealthTracker(auto_prune_threshold=5) and calling get_summary, verify
that pruning actually occurred by asserting the tracker’s internal storage was
reduced—either call the tracker.prune_expired() (or the public pruning API) and
assert tracker._records is empty (or len(tracker._records) <=
auto_prune_threshold) to prove _snapshot() evicted records when len(_records) >
auto_prune_threshold.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e1588bba-7f03-4f98-9683-796ed0bb5814

📥 Commits

Reviewing files that changed from the base of the PR and between e4649ce and 26f86b4.

📒 Files selected for processing (1)
  • tests/unit/providers/test_health.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (1)
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use markers @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow. Maintain 80% minimum code coverage (enforced in CI). Always run pytest with -n auto for parallel execution. Use @pytest.mark.parametrize for testing similar cases. Use asyncio_mode = "auto" (no manual @pytest.mark.asyncio needed).
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in tests or project code—use generic names like test-provider, test-small-001, example-provider, example-large-001, large/medium/small. Vendor names may only appear in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths, (4) provider presets.
Never skip, dismiss, or ignore flaky tests—always fix them fundamentally. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() instead of widening margins. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/providers/test_health.py
🔇 Additional comments (1)
tests/unit/providers/test_health.py (1)

453-515: Good coverage improvements for threshold/boundary behavior.

The new tests around threshold exceeded vs. below/exact threshold and invalid auto_prune_threshold values are solid and materially improve regression protection.

Also applies to: 528-534

Add prune_expired() assertion to test_snapshot_all_records_expired to
prove auto-prune evicted all expired records (removed==0), consistent
with the other two auto-eviction tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 29, 2026 11:34 — with GitHub Actions Inactive
@Aureliolo Aureliolo merged commit 81253d9 into main Mar 29, 2026
31 checks passed
@Aureliolo Aureliolo deleted the feat/backend-hardening branch March 29, 2026 11:40
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview March 29, 2026 11:40 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Mar 30, 2026
🤖 I have created a release *beep* *boop*
---
#MAJOR CHANGES; We got a somewhat working webui :)

##
[0.5.0](v0.4.9...v0.5.0)
(2026-03-30)


### Features

* add analytics trends and budget forecast API endpoints
([#798](#798))
([16b61f5](16b61f5))
* add department policies to default templates
([#852](#852))
([7a41548](7a41548))
* add remaining activity event types (task_started, tool_used,
delegation, cost_incurred)
([#832](#832))
([4252fac](4252fac))
* agent performance, activity, and history API endpoints
([#811](#811))
([9b75c1d](9b75c1d))
* Agent Profiles and Detail pages (biography, career, performance)
([#874](#874))
([62d7880](62d7880))
* app shell, Storybook, and CI/CD pipeline
([#819](#819))
([d4dde90](d4dde90))
* Approvals page with risk grouping, urgency indicators, batch actions
([#889](#889))
([4e9673d](4e9673d))
* Budget Panel page (P&L dashboard, breakdown charts, forecast)
([#890](#890))
([b63b0f1](b63b0f1))
* build infrastructure layer (API client, auth, WebSocket)
([#815](#815))
([9f01d3e](9f01d3e))
* CLI global options infrastructure, UI modes, exit codes, env vars
([#891](#891))
([fef4fc5](fef4fc5))
* CodeMirror editor and theme preferences toggle
([#905](#905),
[#807](#807))
([#909](#909))
([41fbedc](41fbedc))
* Company page (department/agent management)
([#888](#888))
([cfb88b0](cfb88b0))
* comprehensive hint coverage across all CLI commands
([#900](#900))
([937974e](937974e))
* config system extensions, per-command flags for
init/start/stop/status/logs
([#895](#895))
([32f83fe](32f83fe))
* configurable currency system replacing hardcoded USD
([#854](#854))
([b372551](b372551))
* Dashboard page (metric cards, activity feed, budget burn)
([#861](#861))
([7d519d5](7d519d5))
* department health, provider status, and activity feed endpoints
([#818](#818))
([6d5f196](6d5f196))
* design tokens and core UI components
([#833](#833))
([ed887f2](ed887f2))
* extend approval, meeting, and budget API responses
([#834](#834))
([31472bf](31472bf))
* frontend polish -- real-time UX, accessibility, responsive,
performance ([#790](#790),
[#792](#792),
[#791](#791),
[#793](#793))
([#917](#917))
([f04a537](f04a537))
* implement human roles and access control levels
([#856](#856))
([d6d8a06](d6d8a06))
* implement semantic conflict detection in workspace merge
([#860](#860))
([d97283b](d97283b))
* interaction components and animation patterns
([#853](#853))
([82d4b01](82d4b01))
* Login page + first-run bootstrap + Company page
([#789](#789),
[#888](#888))
([#896](#896))
([8758e8d](8758e8d))
* Meetings page with timeline viz, token bars, contribution formatting
([#788](#788))
([#904](#904))
([b207f46](b207f46))
* Messages page with threading, channel badges, sender indicators
([#787](#787))
([#903](#903))
([28293ad](28293ad))
* Org Chart force-directed view and drag-drop reassignment
([#872](#872),
[#873](#873))
([#912](#912))
([a68a938](a68a938))
* Org Chart page (living nodes, status, CRUD, department health)
([#870](#870))
([0acbdae](0acbdae))
* per-command flags for remaining commands, auto-behavior wiring,
help/discoverability
([#897](#897))
([3f7afa2](3f7afa2))
* Providers page with backend rework -- health, CRUD, subscription auth
([#893](#893))
([9f8dd98](9f8dd98))
* scaffold React + Vite + TypeScript + Tailwind project
([#799](#799))
([bd151aa](bd151aa))
* Settings page with search, dependency indicators, grouped rendering
([#784](#784))
([#902](#902))
([a7b9870](a7b9870))
* Setup Wizard rebuild with template comparison, cost estimator, theme
customization ([#879](#879))
([ae8b50b](ae8b50b))
* setup wizard UX -- template filters, card metadata, provider form
reuse ([#910](#910))
([7f04676](7f04676))
* setup wizard UX overhaul -- mode choice, step reorder, provider fixes
([#907](#907))
([ee964c4](ee964c4))
* structured ModelRequirement in template agent configs
([#795](#795))
([7433548](7433548))
* Task Board page (rich Kanban, filtering, dependency viz)
([#871](#871))
([04a19b0](04a19b0))


### Bug Fixes

* align frontend types with backend and debounce WS refetches
([#916](#916))
([134c11b](134c11b))
* auto-cleanup targets newly pulled images instead of old ones
([#884](#884))
([50e6591](50e6591))
* correct wipe backup-skip flow and harden error handling
([#808](#808))
([c05860f](c05860f))
* improve provider setup in wizard, subscription auth, dashboard bugs
([#914](#914))
([87bf8e6](87bf8e6))
* improve update channel detection and add config get command
([#814](#814))
([6b137f0](6b137f0))
* resolve all ESLint warnings, add zero-warnings enforcement
([#899](#899))
([079b46a](079b46a))
* subscription auth uses api_key, base URL optional for cloud providers
([#915](#915))
([f0098dd](f0098dd))


### Refactoring

* semantic analyzer cleanup -- shared filtering, concurrency, extraction
([#908](#908))
([81372bf](81372bf))


### Documentation

* brand identity and UX design system from
[#765](#765) exploration
([#804](#804))
([389a9f4](389a9f4))
* page structure and information architecture for v0.5.0 dashboard
([#809](#809))
([f8d6d4a](f8d6d4a))
* write UX design guidelines with WCAG-verified color system
([#816](#816))
([4a4594e](4a4594e))


### Tests

* add unit tests for agent hooks and page components
([#875](#875))
([#901](#901))
([1d81546](1d81546))


### CI/CD

* bump actions/deploy-pages from 4.0.5 to 5.0.0 in the major group
([#831](#831))
([01c19de](01c19de))
* bump astral-sh/setup-uv from 7.6.0 to 8.0.0 in
/.github/actions/setup-python-uv in the all group
([#920](#920))
([5f6ba54](5f6ba54))
* bump codecov/codecov-action from 5.5.3 to 6.0.0 in the major group
([#868](#868))
([f22a181](f22a181))
* bump github/codeql-action from 4.34.1 to 4.35.0 in the all group
([#883](#883))
([87a4890](87a4890))
* bump sigstore/cosign-installer from 4.1.0 to 4.1.1 in the
minor-and-patch group
([#830](#830))
([7a69050](7a69050))
* bump the all group with 3 updates
([#923](#923))
([ff27c8e](ff27c8e))
* bump wrangler from 4.76.0 to 4.77.0 in /.github in the minor-and-patch
group ([#822](#822))
([07d43eb](07d43eb))
* bump wrangler from 4.77.0 to 4.78.0 in /.github in the all group
([#882](#882))
([f84118d](f84118d))


### Maintenance

* add design system enforcement hook and component inventory
([#846](#846))
([15abc43](15abc43))
* add dev-only auth bypass for frontend testing
([#885](#885))
([6cdcd8a](6cdcd8a))
* add pre-push rebase check hook
([#855](#855))
([b637a04](b637a04))
* backend hardening -- eviction/size-caps and model validation
([#911](#911))
([81253d9](81253d9))
* bump axios from 1.13.6 to 1.14.0 in /web in the all group across 1
directory ([#922](#922))
([b1b0232](b1b0232))
* bump brace-expansion from 5.0.4 to 5.0.5 in /web
([#862](#862))
([ba4a565](ba4a565))
* bump eslint-plugin-react-refresh from 0.4.26 to 0.5.2 in /web
([#801](#801))
([7574bb5](7574bb5))
* bump faker from 40.11.0 to 40.11.1 in the minor-and-patch group
([#803](#803))
([14d322e](14d322e))
* bump https://github.com/astral-sh/ruff-pre-commit from v0.15.7 to
0.15.8 ([#864](#864))
([f52901e](f52901e))
* bump nginxinc/nginx-unprivileged from `6582a34` to `f99cc61` in
/docker/web in the all group
([#919](#919))
([df85e4f](df85e4f))
* bump nginxinc/nginx-unprivileged from `ccbac1a` to `6582a34` in
/docker/web ([#800](#800))
([f4e9450](f4e9450))
* bump node from `44bcbf4` to `71be405` in /docker/sandbox
([#827](#827))
([91bec67](91bec67))
* bump node from `5209bca` to `cf38e1f` in /docker/web
([#863](#863))
([66d6043](66d6043))
* bump picomatch in /site
([#842](#842))
([5f20bcc](5f20bcc))
* bump recharts 2-&gt;3 and @types/node 22-&gt;25 in /web
([#802](#802))
([a908800](a908800))
* Bump requests from 2.32.5 to 2.33.0
([#843](#843))
([41daf69](41daf69))
* bump smol-toml from 1.6.0 to 1.6.1 in /site
([#826](#826))
([3e5dbe4](3e5dbe4))
* bump the all group with 3 updates
([#921](#921))
([7bace0b](7bace0b))
* bump the minor-and-patch group across 1 directory with 2 updates
([#829](#829))
([93e611f](93e611f))
* bump the minor-and-patch group across 1 directory with 3 updates
([#841](#841))
([7010c8e](7010c8e))
* bump the minor-and-patch group across 1 directory with 3 updates
([#869](#869))
([548cee5](548cee5))
* bump the minor-and-patch group in /site with 2 updates
([#865](#865))
([9558101](9558101))
* bump the minor-and-patch group with 2 updates
([#867](#867))
([4830706](4830706))
* consolidate Dependabot groups to 1 PR per ecosystem
([06d2556](06d2556))
* consolidate Dependabot groups to 1 PR per ecosystem
([#881](#881))
([06d2556](06d2556))
* improve worktree skill with full dep sync and status enhancements
([#906](#906))
([772c625](772c625))
* remove Vue remnants and document framework decision
([#851](#851))
([bf2adf6](bf2adf6))
* update web dependencies and fix brace-expansion CVE
([#880](#880))
([a7a0ed6](a7a0ed6))
* upgrade to Storybook 10 and TypeScript 6
([#845](#845))
([52d95f2](52d95f2))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Aureliolo added a commit that referenced this pull request Mar 31, 2026
…ns (#943)

## Summary

- Add `allow_inf_nan=False` to all 260 remaining `ConfigDict`
declarations across 92 files in `src/synthorg/`
- All 421 `ConfigDict` declarations in the codebase now consistently
reject `NaN`/`Inf` in numeric fields at validation time
- Defensive application to models without numeric fields guards against
future float field additions

## Details

PR #911 established the `allow_inf_nan=False` convention and applied it
to `Task` and `AcceptanceCriterion`. This PR completes the rollout to
all remaining models across: api/, budget/, communication/, config/,
core/, engine/, hr/, memory/ (already done), observability/,
persistence/, providers/, security/, settings/, templates/, and tools/.

Four patterns were updated:
- `ConfigDict(frozen=True)` (240 declarations)
- `ConfigDict(frozen=True, extra="forbid")` (18 declarations)
- `ConfigDict(frozen=True, arbitrary_types_allowed=True)` (1
declaration)
- `ConfigDict(frozen=True, populate_by_name=True)` (1 declaration)

## Test plan

- [x] `uv run ruff check src/ tests/` -- all checks passed
- [x] `uv run ruff format src/ tests/` -- all files already formatted
- [x] `uv run mypy src/ tests/` -- no issues in 1,250 files
- [x] `uv run python -m pytest tests/ -n auto` -- 11,259 passed, 9
skipped
- [x] Pre-commit and pre-push hooks passed

Closes #913

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant