Skip to content

feat: memory consolidation upgrades (LLM Merge, Search-and-Ask, diversity penalty, distillation capture)#1071

Merged
Aureliolo merged 13 commits intomainfrom
feat/memory-consolidation-upgrades
Apr 5, 2026
Merged

feat: memory consolidation upgrades (LLM Merge, Search-and-Ask, diversity penalty, distillation capture)#1071
Aureliolo merged 13 commits intomainfrom
feat/memory-consolidation-upgrades

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Implements issue #704 end-to-end. Adopts four patterns from arXiv:2603.17621 ("Complementary Reinforcement Learning") without the RL training loop.

Summary

  • LLMConsolidationStrategy -- LLM-based semantic dedup/synthesis behind the existing ConsolidationStrategy protocol. Groups entries by category, processes groups in parallel via asyncio.TaskGroup, keeps the highest-relevance entry per group (with most-recent as tiebreaker), and feeds the rest to an LLM. Falls back to truncated concatenation on RetryExhaustedError / empty response.
  • Distillation capture -- capture_distillation records trajectory summary + outcome + memory-tool-invocations as an EPISODIC entry tagged "distillation" at task completion. Wired into AgentEngine._post_execution_pipeline behind distillation_capture_enabled=False opt-in.
  • Trajectory context wiring -- When distillation entries exist for the agent, LLMConsolidationStrategy embeds their trajectory summaries in the synthesis system prompt so the LLM has context about what the agent was trying to accomplish.
  • Search-and-Ask retrieval -- ToolBasedInjectionStrategy runs an iterative retrieve -> check sufficiency -> reformulate -> re-retrieve loop (up to max_reformulation_rounds) when query_reformulation_enabled=True and both a QueryReformulator and SufficiencyChecker are provided. Results merged across rounds by ID.
  • Diversity penalty (MMR) -- apply_diversity_penalty in memory/ranking.py re-ranks results balancing relevance vs redundancy with pre-computed bigram cache to keep complexity at O(n**2).

Hardening from pre-PR review (12 agents, 45+ findings addressed)

  • Fail-closed memory filter -- filter exceptions now return no messages instead of silently leaking unfiltered memories (was a privacy-posture regression).
  • LLM strategy resilience -- explicit RetryExhaustedError handling, partial delete failure tolerance (wrap each delete in try/except, record successful IDs only), concat fallback tagged "concat-fallback" distinct from "llm-synthesized".
  • Search-and-Ask error isolation -- sufficiency checker and reformulator calls wrapped in per-round error isolation; emit MEMORY_REFORMULATION_FAILED / MEMORY_SUFFICIENCY_CHECK_FAILED events; degrade to current cumulative entries rather than propagating.
  • MMR sentinel -- -math.inf instead of -1.0 so the algorithm is correct even when MMR scores saturate at the theoretical minimum.
  • Field naming -- DistillationRequest.retrieved_memory_ids renamed to memory_tool_invocations (field stored tool names, not memory IDs).
  • Explicit type narrowing -- replaced assert with explicit guards in _reformulation_loop (asserts are stripped under python -O).
  • Error prefix consistency -- all user-facing error strings in tool_retriever.py prefixed with ERROR_PREFIX = "Error:" so memory/tools.py detection via startswith() is reliable.
  • Invalid category surfacing -- LLM-hallucinated categories are logged at WARNING and reported back in the tool response for self-correction.
  • Pre-raise logging at all MemoryError / RecursionError sites per CLAUDE.md error rule.
  • Config validators -- diversity_penalty_enabled warns when combined with a strategy that ignores it (e.g. TOOL_BASED); stale reformulation "not yet wired" docstrings updated.

Tests

+1,025 lines of new/updated tests, 25 net new test cases:

  • tests/unit/engine/test_agent_engine_distillation.py (new) -- captures on completion, captures on error, disabled by default, no-op without backend
  • tests/unit/memory/test_ranking_diversity.py (new, split from test_ranking.py which was over 800 lines) -- bigram jaccard + MMR re-ranking + all-negative-score determinism
  • test_llm_strategy.py -- RecursionError propagation, tiebreaker, None-relevance handling, fallback truncation, verbatim synthesis, partial delete failure, trajectory context fetch + disabled mode + lookup failure, parallel category processing
  • test_distillation.py -- RecursionError propagation, store request shape assertion
  • test_tool_retriever.py -- reformulation dedup across rounds, sufficiency checker error handling, reformulator error handling, invalid category surfacing
  • test_retrieval_config.py -- diversity/TOOL_BASED warn validator
  • test_retriever.py -- memory filter fail-closed behavior
  • test_memory_tools.py -- updated to ERROR_PREFIX-prefixed constants

Test plan

  • uv run ruff check src/ tests/ -- clean
  • uv run mypy src/ tests/ -- 1538 source files, no issues
  • uv run python -m pytest tests/ -n 8 -- 14310 passed, 9 skipped (up from 14285 baseline)
  • Pre-push hooks (mypy affected modules + pytest unit tests) passed

Docs

  • docs/design/memory.md -- reformulation wired-in (was marked as not-yet-wired), new "Consolidation Strategies" subsection, "Distillation Capture" subsection, "Diversity Re-ranking (MMR)" subsection, fail-closed memory filter note
  • CLAUDE.md -- memory package structure updated with new subpackage details, logging events list extended with events.consolidation examples (LLM_STRATEGY_SYNTHESIZED, DISTILLATION_CAPTURED)

Review coverage

Pre-reviewed by 12 agents: docs-consistency, code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, async-concurrency-reviewer, issue-resolution-verifier. Followed by polish pass with code-simplifier. Every valid finding was addressed.

Closes #704

Copilot AI review requested due to automatic review settings April 4, 2026 21:19
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3593670e-9382-4532-9d1a-b0b5ade01b54

📥 Commits

Reviewing files that changed from the base of the PR and between d215af2 and 2f41295.

📒 Files selected for processing (5)
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/consolidation/models.py
  • src/synthorg/memory/tool_retriever.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/test_ranking_diversity.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations -- Python 3.14 has PEP 649
Use PEP 758 except syntax: use except A, B: (no parentheses) -- ruff enforces this on Python 3.14
Type hints: all public functions, mypy strict mode
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules)
Immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves
Never mix static config fields with mutable runtime fields in one model
Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff)
Functions: < 50 lines, files < 800 lines
Errors: handle explicitly, never silently swallow
Validate: at system boundaries (user input, external APIs, config files)
Use ruff check and ruff format for li...

Files:

  • tests/unit/memory/test_ranking_diversity.py
  • src/synthorg/memory/consolidation/models.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use test-provider, test-small-001, etc. instead of real vendor names
Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Async: asyncio_mode = "auto" -- no manual @pytest.mark.asyncio needed
Timeout: 30 seconds per test (global in pyproject.toml -- do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed)
Parametrize: Prefer @pytest.mark.parametrize for testing similar cases
Property-based testing with Hypothesis: use @given + @settings, Hypothesis profiles in tests/conftest.py with ci, dev, fuzz, extreme profiles controlled via HYPOTHESIS_PROFILE env var
When Hypothesis finds a failure, it is a real bug -- fix the underlying bug and add an explicit @example(...) decorator to the test
Flaky tests: NEVER skip, dismiss, or ignore -- always fix them fully. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number)

Files:

  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/memory/consolidation/test_llm_strategy.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code (exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr))
Variable name for logger: always logger (not _logger, not log)
Event names: always use constants from the domain-specific module under synthorg.observability.events (imported directly, never as string literals)
Structured kwargs: always logger.info(EVENT, key=value) -- never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO
DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases

Files:

  • src/synthorg/memory/consolidation/models.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/memory/consolidation/models.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
🧠 Learnings (31)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-05T07:25:14.510Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-05T07:25:14.510Z
Learning: Applies to **/*.py : Models: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `allow_inf_nan=False` in all `ConfigDict` declarations to reject `NaN`/`Inf` in numeric fields at validation time

Applied to files:

  • tests/unit/memory/test_ranking_diversity.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Use frozen Pydantic models for config/identity; use separate mutable-via-copy models (via `model_copy(update=...)`) for runtime state that evolves

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-17T11:41:02.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:41:02.964Z
Learning: Applies to src/**/*.py : Models: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing + validating redundant fields. Use `NotBlankStr` for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-26T15:18:16.848Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields. Use `NotBlankStr` for all identifier/name fields.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to src/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `allow_inf_nan=False` in all `ConfigDict` declarations to reject `NaN`/`Inf` in numeric fields at validation time

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T09:39:21.584Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:39:21.584Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-04-05T07:25:14.510Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-05T07:25:14.510Z
Learning: Applies to **/*.py : Async concurrency: prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to tests/**/*.py : For tasks that must block indefinitely until cancelled (e.g. simulating a slow provider or stubborn coroutine), use `asyncio.Event().wait()` instead of `asyncio.sleep(large_number)` -- it is cancellation-safe

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-31T20:07:03.035Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T20:07:03.035Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising; all state transitions must log at INFO; DEBUG for object creation, internal flow, entry/exit of key functions

Applied to files:

  • src/synthorg/memory/tool_retriever.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, and key function entry/exit

Applied to files:

  • src/synthorg/memory/tool_retriever.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions.

Applied to files:

  • src/synthorg/memory/tool_retriever.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising. All state transitions must log at INFO. DEBUG for object creation, internal flow, entry/exit of key functions. Pure data models, enums, and re-exports do NOT need logging.

Applied to files:

  • src/synthorg/memory/tool_retriever.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T06:12:03.047Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T06:12:03.047Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Keep functions under 50 lines and files under 800 lines

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Functions must be less than 50 lines; files must be less than 800 lines

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
🔇 Additional comments (27)
src/synthorg/memory/consolidation/models.py (3)

85-106: Well-designed breaking change with clear documentation and fail-fast behavior.

The shift from summary_id to summary_ids with extra="forbid" is a sound design choice: callers that previously passed summary_id= will get an immediate ValidationError rather than silent data loss. The detailed docstring explaining the migration path and rationale (Lines 90-97) is excellent for maintainability.

The plain @property (not @computed_field) for summary_id correctly avoids round-trip serialization issues with extra="forbid".


130-138: Duplicate validation for summary_ids is a good defensive addition.

The check at Lines 136-138 ensures no duplicate summary IDs slip through, maintaining referential integrity with the backend. This mirrors the existing removed_ids duplicate check and is consistent with the overall validation strategy.


184-199: Non-serialized summary_id property correctly preserves backward compatibility for readers.

Exposing summary_id as a plain @property rather than @computed_field ensures it won't appear in model_dump() output, preventing the extra="forbid" guard from rejecting round-tripped data. The docstring thoroughly explains this design decision.

src/synthorg/memory/consolidation/llm_strategy.py (7)

43-50: Clear rationale for the _MIN_GROUP_THRESHOLD = 3 constraint.

The inline documentation (Lines 44-50) explains why group_threshold=2 is rejected: it would leave only one entry for synthesis after selection, resulting in paraphrasing rather than meaningful consolidation. This addresses a past review concern and prevents a subtle misconfiguration.


199-214: Correctly short-circuits distillation lookup when no groups need processing.

Building groups before fetching trajectory context (Lines 204-213) avoids unnecessary backend round-trips for below-threshold batches. This addresses the past review concern about paying lookup costs on pure no-ops.


274-294: Good exception unwrapping with multi-task failure logging.

The except* handlers correctly log the full ExceptionGroup information (exception count and types) before unwrapping, addressing the past review concern about lost diagnostic information when multiple groups fail. The severity-based log level selection is appropriate.


350-360: Correctly sorts by created_at to ensure recency ordering.

Since MemoryBackend.retrieve() returns relevance-ordered results (not recency-ordered), the local sort by created_at descending (Lines 355-360) ensures the synthesis prompt receives the most recent trajectory context. The over-fetch factor (* 4) provides enough candidates for the sort.


381-436: Store-first-then-delete ordering prevents data loss.

The control flow in _process_group correctly synthesizes and stores the summary (Lines 409-420) before attempting deletes (Lines 431-435). This ensures that if store() fails, no originals are deleted. The summarized list tracks only entries that were actually included in the prompt, addressing the past review concern about deleting entries omitted due to truncation.


501-569: Well-structured synthesis with clear fallback semantics.

The _synthesize method returns a (summary, used_llm, summarized_entries) triple that cleanly separates the success path from fallback. The summarized list (returned from _build_user_prompt) ensures the caller only deletes entries that were actually represented in the summary, preventing data loss when prompt truncation occurs.


762-779: Guard against empty entries in fallback summary.

The early return at Lines 769-770 handles the edge case where entries is empty, preventing an IndexError on Line 771 when accessing entries[0].category. This was noted as a post-push fix in the PR comments.

tests/unit/memory/consolidation/test_llm_strategy.py (7)

74-78: Good protocol conformance test.

The isinstance check against ConsolidationStrategy verifies that LLMConsolidationStrategy satisfies the protocol contract at runtime.


81-93: Thorough validation tests for group_threshold constraint.

The tests cover the boundary cases: rejection at 2 and 1, acceptance at 3. This ensures the _MIN_GROUP_THRESHOLD = 3 invariant is enforced.


272-302: Critical regression guard: no deletes on synthesis failure.

This test (Lines 272-302) verifies the store-first-then-delete ordering invariant. If synthesis raises a non-retryable error, no entries should be deleted. The explicit backend.delete.assert_not_called() and backend.store.assert_not_called() assertions are exactly what's needed.


304-328: Additional regression guard: no deletes on store failure.

This test complements the synthesis failure test by verifying that a store() crash also prevents any deletes. Together, these tests lock in the correct ordering.


568-600: Correct partial delete failure handling test.

The test verifies that when one delete fails mid-loop, the strategy continues, stores the summary, and only reports successfully deleted IDs. The side_effect=[True, MemoryStoreError(...), True] pattern accurately simulates partial failure.


744-820: Proper concurrency verification using barrier pattern.

This test addresses the past review concern about proving parallelism. The asyncio.Event barrier pattern ensures that:

  1. Sequential execution would deadlock (second call never starts, first waits forever)
  2. Parallel TaskGroup execution satisfies the barrier and passes

The max_in_flight assertion at Lines 814-818 provides a clear failure message explaining what went wrong.


21-39: Clean helper function for test data construction.

The _make_entry helper with sensible defaults reduces boilerplate across tests. The # noqa: PLR0913 suppression is appropriate for a test helper with many optional parameters.

tests/unit/memory/test_ranking_diversity.py (1)

1-293: Excellent test coverage for MMR diversity re-ranking.

This test module comprehensively validates both the word-bigram Jaccard similarity utility and the MMR-based diversity penalty re-ranker. Highlights:

  • Edge-case rigor: Empty inputs, single-word strings (no bigrams possible), identical content, disjoint sets, and lambda boundary values (0.0, 1.0) are all covered.
  • Determinism regression: Lines 273–293 explicitly guard against sentinel bugs where all MMR scores are negative, ensuring input-order preservation when entries tie.
  • Custom similarity verification: Lines 179–217 use a spy pattern to confirm that similarity_fn overrides are both invoked and produce observable ranking changes.
  • Validation completeness: Lines 231–245 parametrize both out-of-range finite values (-0.1, 1.1) and non-finite inputs (nan, inf, -inf), locking down the ValueError guard in production code.

The helpers (_make_entry, _make_scored) provide convenient, deterministic test fixtures, and all assertions are clear and precise.

src/synthorg/memory/tool_retriever.py (9)

44-54: LGTM — Error constants are well-documented and maintain consistency with the downstream matcher.

The comment block clearly warns maintainers about the coupling with memory/tools.py::_is_error_response(), and all user-facing error strings follow the ERROR_PREFIX convention for reliable startswith() detection.


122-180: LGTM — Malformed category inputs are now surfaced for LLM self-correction.

The function correctly rejects non-list shapes and invalid category values by returning them in the rejected_values tuple rather than silently broadening the search. The WARNING logs provide operator visibility into hallucinated categories.


182-228: LGTM — Merge implementation correctly addresses the cross-round ranking concern.

The function now re-sorts by relevance descending so that higher-quality results from later reformulation rounds can displace lower-quality earlier results before truncation. The first_seen tiebreaker ensures deterministic output.


267-276: LGTM — Non-string query values are now rejected up-front.

The isinstance(query_raw, str) check prevents downstream crashes on LLM-hallucinated inputs like {"query": 123} before any string operations are attempted.


301-341: LGTM — Docstring now correctly documents the constructor validation.

The Raises section and the Args descriptions accurately reflect the fail-fast ValueError when config.query_reformulation_enabled is True but collaborators are missing.

Also applies to: 363-373


529-540: LGTM — System errors are now logged with context before propagating.

The except builtins.MemoryError, RecursionError: block logs at ERROR with agent_id, query_length, limit, and exc_info=True before re-raising, fulfilling the observability requirement. As per coding guidelines, "All error paths must log at WARNING or ERROR with context before raising."


603-669: LGTM — Reformulation loop is well-structured with proper error isolation.

The initial retrieve logs system errors before propagating (lines 636-647), all exit paths truncate results to the requested limit, and the loop terminates cleanly on sufficiency, reformulator exhaustion, or max rounds.


745-873: LGTM — Consistent error isolation pattern across all helper methods.

Each helper (_check_sufficiency, _reformulate, _retrieve_round) follows the same pattern: log system errors at ERROR with context before re-raising, log non-system errors at WARNING and return None for graceful loop degradation. The static method pattern for _check_sufficiency and _reformulate is appropriate since they don't need instance state.


881-890: LGTM — Recall handler mirrors the search handler's validation and error handling.

Non-string memory_id values are rejected up-front (lines 886-887), and system errors are logged with context before propagating (lines 907-915). The guard-clause pattern with multiple returns is appropriate for early validation.

Also applies to: 906-916


Walkthrough

Adds MMR-based diversity re-ranking to the ContextInjection retrieval path with new diversity_penalty_enabled/diversity_lambda config and validation, and makes memory filtering fail-closed. Implements an iterative Search-and-Ask tool-based retrieval loop with query reformulation and sufficiency checks, plus per-round result merging. Introduces distillation capture (DistillationRequest, capture_distillation) and optional wiring into AgentEngine. Adds LLMConsolidationStrategy, changes consolidation results to summary_ids, expands observability event constants, and includes extensive unit tests and doc updates.

Suggested labels

autorelease: tagged

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 4, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 2f41295.
Ensure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several advanced memory management features, including an LLM-based consolidation strategy that synthesizes related memories using task 'distillation' context, an iterative 'Search-and-Ask' loop for tool-based retrieval, and MMR-based diversity re-ranking to reduce redundancy. It also implements a 'fail-closed' policy for memory filters to enhance privacy. Review feedback highlights critical improvements: reordering operations in the LLM consolidation strategy to prevent data loss, applying memory filters before diversity re-ranking for efficiency, ensuring tool-based search respects result limits after iterative rounds, and capping total input size for LLM synthesis to avoid context window overflows.

Comment on lines +299 to +334
removed_ids: list[NotBlankStr] = []
for entry in to_remove:
try:
await self._backend.delete(agent_id, entry.id)
except MemoryError, RecursionError:
raise
except Exception as exc:
logger.warning(
LLM_STRATEGY_FALLBACK,
agent_id=agent_id,
category=category.value,
entry_id=entry.id,
reason="delete_failed",
error=str(exc),
error_type=type(exc).__name__,
exc_info=True,
)
continue
removed_ids.append(entry.id)

synthesized, used_llm = await self._synthesize(
to_remove,
agent_id=agent_id,
category=category,
trajectory_context=trajectory_context,
)
tag = _LLM_SYNTHESIZED_TAG if used_llm else _CONCAT_FALLBACK_TAG
store_request = MemoryStoreRequest(
category=category,
content=synthesized,
metadata=MemoryMetadata(
source="consolidation",
tags=("consolidated", tag),
),
)
new_id = await self._backend.store(agent_id, store_request)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation deletes original memory entries before performing LLM synthesis and storing the resulting summary. If the LLM call fails (e.g., due to a non-retryable ProviderError) or if the backend storage of the summary fails, the original information is permanently lost.

It is safer to perform synthesis and store the summary first, and only then proceed with deleting the original entries. This ensures that even if deletion fails (partially or fully), the information is preserved in either the original or summarized form. Since the strategy already handles partial delete failures gracefully, moving deletion to the end of the method would eliminate the risk of data loss on LLM/storage errors.

Comment on lines 294 to 300
if self._config.diversity_penalty_enabled:
ranked = apply_diversity_penalty(
ranked,
diversity_lambda=self._config.diversity_lambda,
)

if self._memory_filter is not None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Diversity re-ranking (MMR) is currently applied before the memory filter. It is more efficient and semantically correct to apply the filter first.

  1. Efficiency: Filtering first reduces the number of entries passed to the $O(N^2)$ MMR algorithm.
  2. Correctness: Diversity should be optimized among the set of memories that are actually eligible for injection (e.g., non-inferable ones). Re-ranking before filtering may result in a less diverse final set if the filter removes the most diverse candidates that were prioritized by MMR.

result_count=len(entries),
reason="max_rounds_reached",
)
return entries
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Search-and-Ask loop merges results from multiple rounds but does not truncate the final list to the requested limit. If an agent requests a limit of 10, but reformulation rounds find additional unique entries, the tool response will exceed the requested limit. The final result should be truncated to limit after merging to respect the tool's contract.

        return entries[:limit]

Comment on lines +376 to +379
user_content = "\n---\n".join(
f"[{e.category.value}] {e.content[:_MAX_ENTRY_INPUT_CHARS]}"
for e in entries
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While individual entries are truncated to _MAX_ENTRY_INPUT_CHARS, there is no limit on the total length of user_content sent to the LLM. If a large number of entries are consolidated at once (e.g., a group of 50+ entries), the combined prompt could exceed the LLM's context window or result in extremely high costs. Consider adding a cap on the total number of entries processed per group or a total character count limit for the synthesis input.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements issue #704 by expanding the memory system with LLM-driven consolidation, distillation capture, iterative Search-and-Ask retrieval, and MMR-based diversity re-ranking, plus resilience/privacy hardening and corresponding test coverage.

Changes:

  • Added LLMConsolidationStrategy with per-category parallel processing and optional trajectory (distillation) context injection.
  • Added distillation capture at task completion (opt-in via AgentEngine) and wiring to store trajectory/outcome/tool-invocation summaries.
  • Added Search-and-Ask query reformulation loop for tool-based retrieval and MMR diversity re-ranking for context-based retrieval, with config validation + expanded unit tests.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/memory/test_tool_retriever.py Adds tests for Search-and-Ask reformulation loop, invalid category surfacing, and result merging.
tests/unit/memory/test_retriever.py Updates filter failure behavior test to verify fail-closed privacy posture.
tests/unit/memory/test_retrieval_config.py Adds config validation tests for diversity params and reformulation strategy constraints/warnings.
tests/unit/memory/test_ranking.py Notes test split for diversity ranking to keep file size manageable.
tests/unit/memory/test_ranking_diversity.py New tests for bigram Jaccard similarity + MMR diversity re-ranking behavior and edge cases.
tests/unit/memory/test_memory_tools.py Updates error-prefix matching tests for tool error classification.
tests/unit/memory/consolidation/test_llm_strategy.py New test suite covering LLM consolidation behavior, fallbacks, deletes, and distillation context wiring.
tests/unit/memory/consolidation/test_distillation.py New tests for distillation request model + capture logic and stored request shape.
tests/unit/engine/test_agent_engine_distillation.py New integration tests ensuring distillation capture is opt-in and behaves on success/error/no-backend.
src/synthorg/observability/events/memory.py Adds events for reformulation rounds/sufficiency and diversity re-ranking.
src/synthorg/observability/events/consolidation.py Adds events for LLM consolidation outcomes and distillation capture.
src/synthorg/memory/tool_retriever.py Implements Search-and-Ask loop, category rejection surfacing, result de-duping, and standardized error prefixes.
src/synthorg/memory/retriever.py Adds MMR diversity re-ranking hook and changes filter failures to fail closed (return empty).
src/synthorg/memory/retrieval_config.py Adds diversity settings + validators and enforces reformulation only for TOOL_BASED strategy.
src/synthorg/memory/ranking.py Adds bigram similarity + MMR-based apply_diversity_penalty implementation.
src/synthorg/memory/consolidation/llm_strategy.py Introduces LLM-based consolidation strategy with distillation-context prompt injection and fallbacks.
src/synthorg/memory/consolidation/distillation.py Introduces distillation capture utilities and storage of trajectory/outcome/tool-invocation summaries.
src/synthorg/memory/consolidation/init.py Exports new consolidation/distillation components from the package.
src/synthorg/engine/agent_engine.py Adds distillation_capture_enabled opt-in and post-execution distillation capture hook.
docs/design/memory.md Documents new consolidation strategies, distillation capture, diversity re-ranking, and Search-and-Ask behavior.
CLAUDE.md Updates memory package structure notes and logging event guidance/examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +280 to +284
Deletes originals BEFORE storing the summary so a mid-delete
failure aborts before creating an orphan summary. Individual
delete failures are tolerated best-effort: the loop continues,
logs the failure, and only successfully-deleted entry IDs are
returned in ``removed_ids``.
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _process_group, originals are deleted before the synthesized summary is stored. If backend.store() fails (transient DB/network issue, validation, etc.), the consolidation will permanently delete memories and then raise, leaving no summary and causing data loss. Consider storing the summary first (or storing a “pending” summary) and only deleting originals after a successful store; if deletes fail, log and keep originals (duplication is safer than loss).

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +100
total_tokens = sum(t.total_tokens for t in turns)
tool_call_count = sum(len(turn.tool_calls_made) for turn in turns)
unique_tools = sorted({tool for turn in turns for tool in turn.tool_calls_made})

parts = [f"{len(turns)} turns, {total_tokens} tokens"]
if unique_tools:
parts.append(f"tools: {', '.join(unique_tools)}")
if tool_call_count:
parts.append(f"{tool_call_count} tool calls total")
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_trajectory_summary() always renders "{n} turns" (e.g. "1 turns"), which is grammatically incorrect and becomes user-visible content stored in distillation memories. Consider using singular/plural rendering ("1 turn" vs "N turns") for the turns and tool-call counts.

Suggested change
total_tokens = sum(t.total_tokens for t in turns)
tool_call_count = sum(len(turn.tool_calls_made) for turn in turns)
unique_tools = sorted({tool for turn in turns for tool in turn.tool_calls_made})
parts = [f"{len(turns)} turns, {total_tokens} tokens"]
if unique_tools:
parts.append(f"tools: {', '.join(unique_tools)}")
if tool_call_count:
parts.append(f"{tool_call_count} tool calls total")
turn_count = len(turns)
total_tokens = sum(t.total_tokens for t in turns)
tool_call_count = sum(len(turn.tool_calls_made) for turn in turns)
unique_tools = sorted({tool for turn in turns for tool in turn.tool_calls_made})
turn_label = "turn" if turn_count == 1 else "turns"
tool_call_label = "tool call" if tool_call_count == 1 else "tool calls"
parts = [f"{turn_count} {turn_label}, {total_tokens} tokens"]
if unique_tools:
parts.append(f"tools: {', '.join(unique_tools)}")
if tool_call_count:
parts.append(f"{tool_call_count} {tool_call_label} total")

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 4, 2026

Codecov Report

❌ Patch coverage is 94.90909% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.62%. Comparing base (af19382) to head (2f41295).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/synthorg/memory/consolidation/llm_strategy.py 91.59% 7 Missing and 3 partials ⚠️
src/synthorg/memory/ranking.py 90.90% 1 Missing and 1 partial ⚠️
src/synthorg/memory/tool_retriever.py 93.54% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1071      +/-   ##
==========================================
+ Coverage   89.59%   89.62%   +0.02%     
==========================================
  Files         742      744       +2     
  Lines       43408    43625     +217     
  Branches     4359     4379      +20     
==========================================
+ Hits        38892    39099     +207     
- Misses       3745     3751       +6     
- Partials      771      775       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/synthorg/memory/retriever.py (1)

269-328: ⚠️ Potential issue | 🟠 Major

Move diversity reranking before the final cap and after filtering.

MemoryQuery.limit is already max_memories on Line 272, and both ranking paths trim to max_memories before control reaches this block. In the linear path that makes apply_diversity_penalty() reorder-only—it can never replace a redundant top-K item with a more diverse candidate below the cutoff. Because the filter runs afterwards, filtered-out entries can also act as MMR anchors and skew the final in-budget set. Fetch a larger candidate pool, apply the filter, then run diversity before the final truncation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/retriever.py` around lines 269 - 328, The current flow
limits MemoryQuery to self._config.max_memories before ranking, so
apply_diversity_penalty (and RRF vs linear pipelines: _execute_rrf_pipeline,
_execute_linear_pipeline) can only reorder within the already-truncated top-K
and filtered-out items still influence results; instead request a larger
candidate pool (e.g. set MemoryQuery.limit to a higher candidate_limit — either
a new config like candidate_pool_size or compute candidate_limit =
self._config.max_memories * N when building query), run the chosen pipeline
(_execute_rrf_pipeline/_execute_linear_pipeline) to get the candidate set, then
run the filter via self._memory_filter.filter_for_injection on that candidate
set, then apply_diversity_penalty(ranked,
diversity_lambda=self._config.diversity_lambda) if
self._config.diversity_penalty_enabled, and only after that truncate the final
list back to self._config.max_memories before returning; update related
logging/exception handling around self._memory_filter accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/memory/consolidation/distillation.py`:
- Around line 187-210: The EPISODIC store currently only saves
outcome/trajectory/count; update the store payload to persist the full
DistillationRequest (including task_id and memory_tool_invocations) by
serializing the DistillationRequest instance and using that serialized string as
MemoryStoreRequest.content before calling backend.store; prefer adding a
to_dict()/serialize() method on DistillationRequest (or reusing one if present)
and set MemoryMetadata(tags=(DISTILLATION_TAG,)) as before so downstream readers
can correlate by task_id and distinguish search_memory vs recall_memory and keep
capture_distillation() under the 50-line limit by moving serialization into
DistillationRequest.

In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 297-336: The code currently deletes selected entries via
self._backend.delete (in the _select_entries/for entry loop) before calling
self._backend.store, risking unrecoverable data loss if store fails; change the
sequence to synthesize and call self._backend.store first (using the same
MemoryStoreRequest, tag logic with _LLM_SYNTHESIZED_TAG/_CONCAT_FALLBACK_TAG),
capture the new_id, then attempt to delete the original entries with
self._backend.delete and populate removed_ids only for successful deletes; on
delete exceptions log the same LLM_STRATEGY_FALLBACK info
(agent_id/category/entry_id/reason/error/error_type) and continue so store
remains intact if deletes fail.

In `@src/synthorg/memory/ranking.py`:
- Around line 472-480: The diversity_lambda validation currently emits
MEMORY_DIVERSITY_RERANKED on error; change it to a distinct validation-failure
event (e.g., MEMORY_DIVERSITY_VALIDATION_FAILED) to match the pattern used by
MEMORY_RRF_VALIDATION_FAILED in fuse_ranked_lists: update the logger.warning
call in the diversity_lambda check (the block that constructs msg and calls
logger.warning with param="diversity_lambda", value=diversity_lambda,
reason=msg) to use the new event constant, add that constant (named
MEMORY_DIVERSITY_VALIDATION_FAILED) alongside the other memory event constants,
and keep the existing fields and subsequent raise ValueError unchanged so logs
and alerts can distinguish validation failures from successful reranks.

In `@tests/unit/engine/test_agent_engine_distillation.py`:
- Around line 101-106: Current tests call the private
engine._post_execution_pipeline(...) directly which bypasses the full flow; add
at least one integration-style test that calls the public engine.run(...)
instead, mocking/stubbing the execution loop and any external dependencies
(e.g., task execution, identity, distillation storage) so run() executes
deterministically, then assert the same end-to-end outcomes currently verified
for _post_execution_pipeline; apply the same change to the other similar tests
(around the other ranges) so one or more tests exercise run() path while keeping
the existing unit-level calls for fast iteration.

In `@tests/unit/memory/consolidation/test_distillation.py`:
- Around line 299-303: The test function
test_error_without_message_falls_through is declared async but only calls the
synchronous helper build_outcome and does not await anything; make it a regular
(non-async) test function by removing the async keyword so the test runner
treats it as a normal synchronous test and avoids warnings or misbehavior.

---

Outside diff comments:
In `@src/synthorg/memory/retriever.py`:
- Around line 269-328: The current flow limits MemoryQuery to
self._config.max_memories before ranking, so apply_diversity_penalty (and RRF vs
linear pipelines: _execute_rrf_pipeline, _execute_linear_pipeline) can only
reorder within the already-truncated top-K and filtered-out items still
influence results; instead request a larger candidate pool (e.g. set
MemoryQuery.limit to a higher candidate_limit — either a new config like
candidate_pool_size or compute candidate_limit = self._config.max_memories * N
when building query), run the chosen pipeline
(_execute_rrf_pipeline/_execute_linear_pipeline) to get the candidate set, then
run the filter via self._memory_filter.filter_for_injection on that candidate
set, then apply_diversity_penalty(ranked,
diversity_lambda=self._config.diversity_lambda) if
self._config.diversity_penalty_enabled, and only after that truncate the final
list back to self._config.max_memories before returning; update related
logging/exception handling around self._memory_filter accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: d5714550-a1d8-44e0-a655-57c45be51a70

📥 Commits

Reviewing files that changed from the base of the PR and between 7af94de and 9e9fec1.

📒 Files selected for processing (21)
  • CLAUDE.md
  • docs/design/memory.md
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/consolidation/__init__.py
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/retriever.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/observability/events/memory.py
  • tests/unit/engine/test_agent_engine_distillation.py
  • tests/unit/memory/consolidation/test_distillation.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/test_memory_tools.py
  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/memory/test_retrieval_config.py
  • tests/unit/memory/test_retriever.py
  • tests/unit/memory/test_tool_retriever.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Agent
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Analyze (python)
  • GitHub Check: Dependency Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations in Python -- Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) for exception handling in Python 3.14 per PEP 758 -- ruff enforces this
All public functions and methods must have type hints, with mypy strict mode enforcement
Google-style docstrings are required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (via model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with conventions: allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields; use @computed_field for derived values instead of storing redundant fields; use NotBlankStr for all identifier and name fields including optional and tuple variants, instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Maintain a line length of 88 characters (enforced by ruff)
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/test_retriever.py
  • src/synthorg/memory/consolidation/__init__.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/retriever.py
  • src/synthorg/observability/events/consolidation.py
  • tests/unit/memory/test_memory_tools.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/engine/test_agent_engine_distillation.py
  • tests/unit/memory/test_retrieval_config.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/ranking.py
  • tests/unit/memory/test_tool_retriever.py
  • tests/unit/memory/consolidation/test_distillation.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow for test markers
Maintain 80% minimum code coverage (enforced in CI)
Prefer @pytest.mark.parametrize for testing similar cases
Test vendor names must use test-provider, test-small-001, etc. instead of real vendor names
Use Python Hypothesis for property-based testing with @given + @settings. Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Control via HYPOTHESIS_PROFILE env var.
When Hypothesis finds a failing example, it is a real bug. Fix the underlying bug and add an explicit @example(...) decorator to the test so the case is permanently covered in CI.
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/test_retriever.py
  • tests/unit/memory/test_memory_tools.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/engine/test_agent_engine_distillation.py
  • tests/unit/memory/test_retrieval_config.py
  • tests/unit/memory/test_tool_retriever.py
  • tests/unit/memory/consolidation/test_distillation.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/test_retriever.py
  • tests/unit/memory/test_memory_tools.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/engine/test_agent_engine_distillation.py
  • tests/unit/memory/test_retrieval_config.py
  • tests/unit/memory/test_tool_retriever.py
  • tests/unit/memory/consolidation/test_distillation.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic must have from synthorg.observability import get_logger then logger = get_logger(__name__)
Always use logger as the variable name (not _logger, not log)
Always use event name constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging with logger.info(EVENT, key=value) -- never logger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names allowed only in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths, (4) provider presets which are user-facing runtime data. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/memory/consolidation/__init__.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/retriever.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/memory/consolidation/__init__.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/retriever.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Documentation is in docs/ (Markdown, built with Zensical, config: mkdocs.yml). Design spec in docs/design/ (12 pages). Architecture in docs/architecture/. Roadmap in docs/roadmap/. Security in docs/security.md. Licensing in docs/licensing.md.

Files:

  • docs/design/memory.md
🧠 Learnings (30)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • src/synthorg/memory/consolidation/__init__.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/retriever.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/test_ranking_diversity.py
  • CLAUDE.md
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/ranking.py
  • docs/design/memory.md
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Use event constants from `synthorg.observability.events.<domain>` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly and log with structured kwargs: `logger.info(EVENT, key=value)`, never interpolated strings

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module

Applied to files:

  • src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals

Applied to files:

  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Applied to files:

  • tests/unit/engine/test_agent_engine_distillation.py
  • CLAUDE.md
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/consolidation/distillation.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Applied to files:

  • tests/unit/engine/test_agent_engine_distillation.py
  • CLAUDE.md
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/core/**/*.py : Core module must contain shared domain models, base classes, resilience config (RetryConfig, RateLimiterConfig)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to docs/** : Docs source in docs/ (Markdown, built with Zensical); design spec in docs/design/ (7 pages: index, agents, organization, communication, engine, memory, operations)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Documentation source in `docs/` (Markdown, built with Zensical). Design spec in `docs/design/` (7 pages: index, agents, organization, communication, engine, memory, operations). Architecture in `docs/architecture/` (overview, tech-stack, decision log). Roadmap in `docs/roadmap/`. Security in `docs/security.md`. Licensing in `docs/licensing.md`. Reference in `docs/reference/`. REST API reference in `docs/rest-api.md`. Library reference in `docs/api/` (auto-generated from docstrings). Custom templates in `docs/overrides/`. Config in `mkdocs.yml`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine must provide: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection), promotion/demotion

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have `from synthorg.observability import get_logger` followed by `logger = get_logger(__name__)`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must have `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import logger via `from synthorg.observability import get_logger` and initialize with `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
🔇 Additional comments (41)
src/synthorg/memory/ranking.py (3)

498-534: LGTM!

The MMR re-ranking implementation is well-structured: precomputes bigrams once to avoid O(n²·k) re-tokenization, uses -math.inf as the sentinel (fixing the issue noted in PR objectives), and correctly tracks positions in the shrinking remaining_indices list. The algorithm matches the MMR formula documented in the docstring.


549-585: LGTM!

The generic MMR implementation correctly mirrors the bigram-cached version while supporting arbitrary similarity functions. The structure is clean and maintains the same MMR semantics.


401-436: LGTM!

The bigram utilities handle edge cases correctly (empty strings, single words) and implement Jaccard similarity accurately. Case-insensitivity via .lower() is appropriate for text similarity.

tests/unit/memory/test_tool_retriever.py (4)

349-378: LGTM!

The test correctly verifies that reformulation is disabled by default - even with reformulator and sufficiency checker configured, the loop doesn't engage when query_reformulation_enabled is not set. Good assertion on call counts.


427-457: LGTM!

Excellent test of the core reformulation flow. The side_effect setup correctly simulates the "insufficient → reformulate → sufficient" path, and assertions verify both cross-round result merging and the expected call counts.


459-492: LGTM!

The test correctly validates the max rounds boundary: with max_reformulation_rounds=2 and perpetually insufficient results, the loop performs 3 retrieves (initial + 2 reformulated) and exits after exhausting the round budget.


664-705: LGTM!

Thorough unit tests for _merge_results covering empty inputs, disjoint concatenation, relevance-based collision resolution, and None relevance handling. The bidirectional collision tests (lines 681-697) verify the function is order-independent.

tests/unit/memory/test_retrieval_config.py (3)

218-279: LGTM!

Comprehensive tests for diversity penalty configuration: default values, boundary conditions (0.0, 1.0), rejection of invalid values (<0, >1, NaN), and appropriate warning logs when diversity_lambda is set while diversity penalty is disabled. The capture_logs pattern correctly validates observability behavior.


291-305: LGTM!

Tests correctly validate the strategy-conditional reformulation behavior: enabling reformulation with CONTEXT strategy raises ValueError, while TOOL_BASED succeeds.


364-396: LGTM!

Tests correctly validate strategy consistency: diversity penalty with TOOL_BASED emits a warning (since it only applies to ContextInjectionStrategy), while CONTEXT strategy produces no warning.

src/synthorg/engine/agent_engine.py (3)

707-711: LGTM!

Distillation capture is correctly wired into the post-execution pipeline after procedural memory and before returning the result. The placement ensures trajectory context is captured for all termination reasons.


264-265: LGTM!

The distillation_capture_enabled parameter follows the established opt-in pattern with a safe False default, ensuring backward compatibility as confirmed by the relevant code snippet showing existing instantiations without this parameter.

Also applies to: 335-335


714-743: No action needed.

The error handling in capture_distillation correctly implements the documented behavior: non-system exceptions are caught at line 220 and swallowed without re-raising (line 229), while system errors (MemoryError, RecursionError) are explicitly re-raised at line 219. The docstring in _try_capture_distillation accurately reflects this implementation, and the direct await is appropriate because error handling already occurs internally within capture_distillation.

			> Likely an incorrect or invalid review comment.
CLAUDE.md (2)

101-101: LGTM!

Documentation accurately reflects the new memory package capabilities: MMR diversity re-ranking, iterative Search-and-Ask reformulation, LLMConsolidationStrategy with parallel processing and trajectory context injection, and DistillationRequest capture.


148-148: LGTM!

Event name documentation updated to include the new consolidation events (LLM_STRATEGY_SYNTHESIZED, DISTILLATION_CAPTURED) and memory events (MEMORY_DIVERSITY_RERANKED, MEMORY_REFORMULATION_ROUND), maintaining the structured logging inventory.

tests/unit/engine/test_agent_engine_distillation.py (2)

85-139: LGTM!

Tests comprehensively validate distillation capture: successful completion stores with "distillation" tag/source and "Task completed" content, while error termination captures "Task failed" with the error message. Good coverage of both success and failure paths.


141-181: LGTM!

Tests correctly validate opt-in semantics: default disabled (store not called), and enabled without backend completes without raising. These edge cases ensure the feature degrades gracefully.

tests/unit/memory/test_ranking_diversity.py (3)

57-84: LGTM!

Excellent edge case coverage for _bigram_jaccard: identical strings (1.0), completely different (0.0), empty strings (0.0), single words (0.0 - no bigrams possible), partial overlap (1/3), and case insensitivity. The partial overlap test with explicit set calculation in comments aids readability.


236-256: LGTM!

Critical test validating the MMR sentinel fix from PR objectives. With diversity_lambda=0.0 and identical content (all MMR scores are -1.0), using -math.inf as the sentinel ensures deterministic selection rather than the loop jamming. This directly tests the hardening fix mentioned in the commit messages.


210-234: LGTM!

This test validates the core value proposition of diversity re-ranking: with moderate lambda (0.3), the "diverse" entry (0.85 relevance, unrelated content) beats the "redundant" entry (0.9 relevance, similar content to base), demonstrating coverage promotion over pure relevance ordering.

tests/unit/memory/consolidation/test_llm_strategy.py (5)

80-89: LGTM!

Tests correctly validate constructor constraints: group_threshold < 2 raises ValueError, while group_threshold=2 (the minimum) is accepted. This matches the documented requirement for meaningful consolidation (at least 2 entries to merge).


217-378: LGTM!

Comprehensive error handling tests: retryable errors (RateLimitError) and unexpected errors (RuntimeError) fall back to concatenation, non-retryable errors (AuthenticationError) and system errors (MemoryError, RecursionError) propagate. Empty and whitespace-only LLM responses also trigger fallback. This matches the resilience patterns documented in CLAUDE.md.


560-648: LGTM!

Thorough tests for trajectory context integration: enabled path verifies distillation entries are fetched and injected into the LLM prompt, disabled path skips lookup entirely, and lookup failures degrade gracefully without blocking consolidation. This validates the key integration between distillation capture and LLM consolidation.


526-558: LGTM!

Critical edge case test: partial delete failures (1 of 3 fails) result in consolidated_count=2 (only successful deletions counted), and summary storage proceeds despite the failure. This validates the graceful degradation documented in PR objectives ("partial-delete tolerance").


649-692: 🧹 Nitpick | 🔵 Trivial

Consider adding explicit parallelism verification.

The test validates multi-category consolidation outcomes (correct call counts and consolidated_count), but doesn't directly verify that groups are processed in parallel via TaskGroup as documented. Consider adding timing assertions or using asyncio.Event coordination to confirm concurrent execution, though the current functional assertions are sufficient for correctness validation.

⛔ Skipped due to learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:39:21.584Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
src/synthorg/memory/retrieval_config.py (3)

116-131: LGTM! Well-designed diversity configuration fields.

The diversity_penalty_enabled and diversity_lambda fields follow established patterns: frozen model with allow_inf_nan=False, appropriate bounds (ge=0.0, le=1.0), and clear descriptions matching the MMR formula semantics (1.0 = pure relevance, 0.0 = max diversity).


192-229: LGTM! Validator logic is sound and user-friendly.

Both diversity validators correctly:

  • Use model_fields_set to avoid warning when diversity_lambda was never explicitly set (Line 198)
  • Log warnings with full context (field, value, strategy, reason) before returning
  • Only warn (not raise) since these are configuration mismatches, not hard errors

The strategy consistency check at Line 216 correctly identifies that TOOL_BASED strategy doesn't apply the MMR re-ranking step.


231-249: LGTM! Validator properly enforces tool-based requirement.

The validator correctly:

  • Returns early when reformulation is disabled (Line 234-235)
  • Allows TOOL_BASED strategy (Line 236-237)
  • Raises ValueError with a clear message including the actual strategy value for other strategies
  • Logs before raising per project conventions

This aligns with the documented behavior that Search-and-Ask reformulation is only wired into ToolBasedInjectionStrategy.

docs/design/memory.md (3)

415-459: Excellent documentation of the consolidation subsystem.

The documentation clearly explains:

  • Three strategy implementations with distinct behaviors
  • LLMConsolidationStrategy parallel processing via asyncio.TaskGroup
  • Distillation capture lifecycle and field sources (Line 439-445)
  • The memory_tool_invocations clarification (tool names, not entry IDs) prevents future confusion
  • How distillation entries flow into consolidation via tag queries

741-808: Well-documented MMR diversity re-ranking.

The documentation accurately describes:

  • MMR formula: lambda * combined_score - (1 - lambda) * max_sim_to_selected
  • Trade-off semantics: 1.0 = pure relevance, 0.0 = max diversity
  • Bigram Jaccard as default similarity function
  • O(n**2) complexity note with precomputation explanation
  • Warning about combining with strategies that ignore it

Minor note: The documentation correctly warns users via model_validator (Line 806-808), matching the implementation.


840-848: Clear Search-and-Ask loop documentation.

Documents the iterative reformulation behavior:

  • Requires both QueryReformulator and SufficiencyChecker
  • Merges results by entry ID (higher relevance wins)
  • Graceful degradation on failures
  • Explicitly notes MMR doesn't apply in tool-based path

This matches the implementation in tool_retriever.py.

src/synthorg/memory/tool_retriever.py (4)

164-199: LGTM! Correct merge-by-ID logic with stable ordering.

The _merge_results function correctly:

  • Preserves insertion order via separate order list
  • Handles None relevance as 0.0 (Lines 191-192)
  • Keeps higher-relevance entry on duplicates (Line 193-194)
  • Returns stable tuple maintaining "first seen" order

The implementation matches the documented behavior in docs/design/memory.md.


459-500: LGTM! Defensive guard with graceful fallback.

The method correctly implements fail-safe behavior:

  • Checks both query_reformulation_enabled flag AND presence of both collaborators (Lines 481-485)
  • Falls back to simple retrieval when any condition is unmet
  • This matches the documented "silently no-ops if either is missing" contract in retrieval_config.py

This defensive pattern prevents runtime errors even if the config validator is bypassed.


502-627: Well-structured reformulation loop with proper error isolation.

The implementation correctly:

  • Propagates system errors (builtins.MemoryError, RecursionError) at Lines 546-547, 573-574
  • Isolates non-system exceptions and degrades gracefully (Lines 548-557, 575-584)
  • Logs with full context (agent_id, round, error, error_type, exc_info)
  • Handles reformulator returning None or unchanged query (Lines 586-598)
  • Merges results across rounds (Line 617)

The loop body is lengthy but cohesive—it handles a single concern (the Search-and-Ask protocol) with clear phase transitions. Splitting would reduce readability without improving maintainability.


121-161: LGTM! Improved category parsing with operator visibility.

The updated _parse_categories:

  • Returns rejected values for caller surfacing (Line 141)
  • Logs invalid categories at WARNING with agent_id context (Lines 153-159)
  • Enables the LLM to self-correct via the feedback in Line 453-456

Good observability improvement.

tests/unit/memory/consolidation/test_distillation.py (2)

68-115: LGTM! Comprehensive model validation tests.

Tests cover all key DistillationRequest behaviors:

  • Field population including memory_tool_invocations
  • Frozen model mutation rejection via ValidationError
  • Default value for memory_tool_invocations
  • NotBlankStr validation rejecting whitespace agent_id

208-297: Excellent error propagation test coverage.

The async tests correctly verify:

  • Successful capture returns populated DistillationRequest (Lines 210-232)
  • MemoryStoreError returns None (Lines 234-249)
  • MemoryError propagates (Lines 251-263)
  • RecursionError propagates (Lines 285-297)

This matches the documented "non-critical" behavior for domain errors and "propagate" behavior for system errors.

src/synthorg/memory/consolidation/llm_strategy.py (4)

193-218: LGTM! Proper TaskGroup exception unwrapping.

The except* blocks correctly unwrap ExceptionGroup to surface the original exception type, maintaining consistent error handling semantics whether processing is parallel or sequential. The unwrapping of MemoryError, RecursionError, and ProviderError ensures callers see expected exception types.


382-463: LGTM! Comprehensive fallback handling in _synthesize.

The error handling correctly:

  • Propagates system errors (MemoryError, RecursionError) at Lines 404-413
  • Falls back on RetryExhaustedError (Lines 414-428)
  • Falls back on retryable ProviderError for edge cases (Lines 429-445)
  • Propagates non-retryable ProviderError (Lines 446-449)
  • Falls back on unexpected exceptions with WARNING log and traceback (Lines 450-462)

This matches the documented "falls back to truncated concatenation" behavior.


241-269: LGTM! Graceful degradation for trajectory context lookup.

The method correctly:

  • Short-circuits when include_distillation_context is False (Lines 251-252)
  • Propagates system errors (Line 259-260)
  • Degrades silently on other failures, returning empty context (Lines 261-269)

This ensures consolidation proceeds even if distillation data is unavailable.


516-539: LGTM! Correct best-entry selection logic.

The _select_entries method correctly:

  • Treats None relevance as 0.0 (Line 535)
  • Uses created_at as tiebreaker (Line 536)
  • Returns the best entry and remaining entries for removal (Line 539)

The tuple comparison (relevance, created_at) provides stable deterministic ordering.

Comment on lines +187 to +210
request = DistillationRequest(
agent_id=agent_id,
task_id=task_id,
trajectory_summary=trajectory,
outcome=outcome,
memory_tool_invocations=tool_invocations,
created_at=datetime.now(UTC),
)

store_content = (
f"Distillation: {outcome}\n"
f"Trajectory: {trajectory}\n"
f"Memory lookups: {len(tool_invocations)}"
)

store_request = MemoryStoreRequest(
category=MemoryCategory.EPISODIC,
content=store_content,
metadata=MemoryMetadata(
source="distillation",
tags=(DISTILLATION_TAG,),
),
)
await backend.store(agent_id, store_request)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persist the full distillation payload.

The stored EPISODIC entry only keeps outcome, trajectory, and a lookup count. task_id and the concrete memory_tool_invocations never reach the backend, so downstream readers cannot correlate entries by task or tell whether the agent used search_memory vs recall_memory. This is also a good seam to move serialization onto DistillationRequest, which would bring capture_distillation() back under the repo’s 50-line limit.

🛠️ Suggested direction
-        store_content = (
-            f"Distillation: {outcome}\n"
-            f"Trajectory: {trajectory}\n"
-            f"Memory lookups: {len(tool_invocations)}"
-        )
+        invocations_text = ", ".join(tool_invocations) if tool_invocations else "none"
+        store_content = (
+            f"Task ID: {task_id}\n"
+            f"Outcome: {outcome}\n"
+            f"Trajectory: {trajectory}\n"
+            f"Memory tool invocations: {invocations_text}"
+        )
As per coding guidelines "Functions must be less than 50 lines".
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
request = DistillationRequest(
agent_id=agent_id,
task_id=task_id,
trajectory_summary=trajectory,
outcome=outcome,
memory_tool_invocations=tool_invocations,
created_at=datetime.now(UTC),
)
store_content = (
f"Distillation: {outcome}\n"
f"Trajectory: {trajectory}\n"
f"Memory lookups: {len(tool_invocations)}"
)
store_request = MemoryStoreRequest(
category=MemoryCategory.EPISODIC,
content=store_content,
metadata=MemoryMetadata(
source="distillation",
tags=(DISTILLATION_TAG,),
),
)
await backend.store(agent_id, store_request)
request = DistillationRequest(
agent_id=agent_id,
task_id=task_id,
trajectory_summary=trajectory,
outcome=outcome,
memory_tool_invocations=tool_invocations,
created_at=datetime.now(UTC),
)
invocations_text = ", ".join(tool_invocations) if tool_invocations else "none"
store_content = (
f"Task ID: {task_id}\n"
f"Outcome: {outcome}\n"
f"Trajectory: {trajectory}\n"
f"Memory tool invocations: {invocations_text}"
)
store_request = MemoryStoreRequest(
category=MemoryCategory.EPISODIC,
content=store_content,
metadata=MemoryMetadata(
source="distillation",
tags=(DISTILLATION_TAG,),
),
)
await backend.store(agent_id, store_request)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/consolidation/distillation.py` around lines 187 - 210,
The EPISODIC store currently only saves outcome/trajectory/count; update the
store payload to persist the full DistillationRequest (including task_id and
memory_tool_invocations) by serializing the DistillationRequest instance and
using that serialized string as MemoryStoreRequest.content before calling
backend.store; prefer adding a to_dict()/serialize() method on
DistillationRequest (or reusing one if present) and set
MemoryMetadata(tags=(DISTILLATION_TAG,)) as before so downstream readers can
correlate by task_id and distinguish search_memory vs recall_memory and keep
capture_distillation() under the 50-line limit by moving serialization into
DistillationRequest.

Comment on lines +101 to +106
await engine._post_execution_pipeline(
result,
identity,
str(identity.id),
"task-dist-001",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider testing through the public run() method for integration coverage.

The tests call engine._post_execution_pipeline(...) directly (a private method). While this provides targeted unit testing of the distillation wiring, it bypasses the full execution flow. Consider adding at least one integration-style test that exercises run() with a mocked execution loop to verify end-to-end behavior.

That said, the current tests effectively validate the wiring logic and are valuable for quick iteration.

Also applies to: 129-134, 154-159, 176-180

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/engine/test_agent_engine_distillation.py` around lines 101 - 106,
Current tests call the private engine._post_execution_pipeline(...) directly
which bypasses the full flow; add at least one integration-style test that calls
the public engine.run(...) instead, mocking/stubbing the execution loop and any
external dependencies (e.g., task execution, identity, distillation storage) so
run() executes deterministically, then assert the same end-to-end outcomes
currently verified for _post_execution_pipeline; apply the same change to the
other similar tests (around the other ranges) so one or more tests exercise
run() path while keeping the existing unit-level calls for fast iteration.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

All contributors have signed the CLA. Thank you!
Posted by the CLA Assistant Lite bot.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 210-217: The except* handlers in the llm_strategy consolidation
code currently unwrap an ExceptionGroup named eg and re-raise only
eg.exceptions[0], losing info from other exceptions; before re-raising, log the
full ExceptionGroup so operators can diagnose multi-group failures (e.g., call
the module/class logger with logger.exception or logger.error(...,
exc_info=True) passing eg or its string) in each of the except* MemoryError /
RecursionError / ProviderError / Exception handlers, then continue to raise
eg.exceptions[0] to preserve the existing behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: e1c1efc0-34e6-43ea-b50e-4f1d1ac2cc22

📥 Commits

Reviewing files that changed from the base of the PR and between 9e9fec1 and bec2e5b.

📒 Files selected for processing (5)
  • CLAUDE.md
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Build Sandbox
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations in Python -- Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) for exception handling in Python 3.14 per PEP 758 -- ruff enforces this
All public functions and methods must have type hints, with mypy strict mode enforcement
Google-style docstrings are required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (via model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with conventions: allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields; use @computed_field for derived values instead of storing redundant fields; use NotBlankStr for all identifier and name fields including optional and tuple variants, instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Maintain a line length of 88 characters (enforced by ruff)
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/ranking.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic must have from synthorg.observability import get_logger then logger = get_logger(__name__)
Always use logger as the variable name (not _logger, not log)
Always use event name constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging with logger.info(EVENT, key=value) -- never logger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names allowed only in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths, (4) provider presets which are user-facing runtime data. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/ranking.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/ranking.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow for test markers
Maintain 80% minimum code coverage (enforced in CI)
Prefer @pytest.mark.parametrize for testing similar cases
Test vendor names must use test-provider, test-small-001, etc. instead of real vendor names
Use Python Hypothesis for property-based testing with @given + @settings. Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Control via HYPOTHESIS_PROFILE env var.
When Hypothesis finds a failing example, it is a real bug. Fix the underlying bug and add an explicit @example(...) decorator to the test so the case is permanently covered in CI.
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/memory/consolidation/test_llm_strategy.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
🧠 Learnings (33)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • CLAUDE.md
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/ranking.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging

Applied to files:

  • src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • src/synthorg/observability/events/memory.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module

Applied to files:

  • src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls

Applied to files:

  • src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-04-01T09:39:21.584Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:39:21.584Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/core/**/*.py : Core module must contain shared domain models, base classes, resilience config (RetryConfig, RateLimiterConfig)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to docs/** : Docs source in docs/ (Markdown, built with Zensical); design spec in docs/design/ (7 pages: index, agents, organization, communication, engine, memory, operations)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Documentation source in `docs/` (Markdown, built with Zensical). Design spec in `docs/design/` (7 pages: index, agents, organization, communication, engine, memory, operations). Architecture in `docs/architecture/` (overview, tech-stack, decision log). Roadmap in `docs/roadmap/`. Security in `docs/security.md`. Licensing in `docs/licensing.md`. Reference in `docs/reference/`. REST API reference in `docs/rest-api.md`. Library reference in `docs/api/` (auto-generated from docstrings). Custom templates in `docs/overrides/`. Config in `mkdocs.yml`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine must provide: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection), promotion/demotion

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have `from synthorg.observability import get_logger` followed by `logger = get_logger(__name__)`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must have `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import logger via `from synthorg.observability import get_logger` and initialize with `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
🔇 Additional comments (20)
src/synthorg/memory/consolidation/llm_strategy.py (7)

1-41: LGTM! Module setup follows project conventions.

Logger initialization, imports, and module docstring are correctly structured. The noqa: TC001 annotations appropriately suppress type-checking-only import warnings for protocol types used at runtime.


42-74: LGTM! Constants are well-documented with clear rationale.

The intentional duplication of _DISTILLATION_TAG (Lines 49-54) is properly justified to avoid pulling engine execution protocol into consolidation strategy. Tag constants distinguish LLM synthesis from fallback paths.


76-144: LGTM! Well-structured class initialization with proper validation.

The class docstring thoroughly documents behavior, error handling, and fallback paths. The group_threshold validation (Line 130-135) correctly enforces the minimum of 2 with a clear error message.


241-269: LGTM! Best-effort trajectory context fetch with proper degradation.

The method correctly short-circuits when disabled (Line 251-252), propagates system errors, and degrades gracefully on retrieval failures while logging context for debugging.


307-348: LGTM! Store-before-delete ordering prevents data loss.

The implementation correctly synthesizes and stores (Lines 311-326) before attempting deletions (Lines 328-346). Partial delete failures are logged and tolerated, returning only successfully deleted IDs—recoverable on the next consolidation pass.


350-485: LGTM! Comprehensive synthesis with well-defined fallback semantics.

Error handling covers all documented paths:

  • RetryExhaustedError → fallback (Lines 426-440)
  • Retryable ProviderError → fallback (Lines 441-457)
  • Non-retryable ProviderError → propagate (Lines 458-461)
  • Empty/whitespace response → fallback (Lines 476-485)
  • Unexpected exceptions → fallback with traceback (Lines 462-474)

Input truncation (Line 389) guards against oversized LLM inputs.


487-551: LGTM! Helper methods are well-implemented.

  • _build_system_prompt (Lines 487-507): Properly truncates trajectory context per entry.
  • _fallback_summary (Lines 509-526): Correctly truncates long content.
  • _select_entries (Lines 528-551): Deterministic selection using (relevance_score, created_at) tuple key with None0.0 coercion.
tests/unit/memory/consolidation/test_llm_strategy.py (5)

1-71: LGTM! Well-structured test helpers and fixtures.

Helper functions provide sensible defaults for building test data. The _make_strategy helper properly initializes mock dependencies with appropriate return values. Test model names follow the test-* convention.


73-88: LGTM! Initialization tests cover boundary conditions.

Protocol conformance check (Line 77) and threshold validation tests (Lines 82-88) properly verify the class interface and constructor constraints.


91-214: LGTM! Core consolidation behavior thoroughly tested.

Tests cover:

  • Empty input handling (Lines 93-97)
  • Below-threshold groups skipped (Lines 99-103)
  • Full consolidation flow with assertions on counts, IDs, and mock calls (Lines 105-141)
  • Multi-category independence (Lines 143-186)
  • Selection logic for highest relevance (Lines 188-214)

217-409: LGTM! Error handling tests are comprehensive.

Critical safety test test_no_deletes_on_synthesis_failure (Lines 267-297) verifies the store-before-delete ordering by asserting neither delete nor store are called when synthesis raises non-retryable errors. All fallback paths and propagation behaviors are properly verified.


412-724: LGTM! Detailed behavior tests cover edge cases thoroughly.

Notable coverage includes:

  • Tiebreaker determinism with equal relevance scores (Lines 414-450)
  • None relevance coercion (Lines 452-473)
  • Truncation in fallback summaries (Lines 475-505)
  • Verbatim LLM response storage with correct tags (Lines 507-556)
  • Partial delete failure graceful degradation (Lines 558-590)
  • Trajectory context integration and failure modes (Lines 592-679)
src/synthorg/observability/events/memory.py (1)

132-142: LGTM! New event constants follow naming conventions.

The new constants follow the memory.<entity>.<action> pattern:

  • MEMORY_REFORMULATION_ROUND / MEMORY_REFORMULATION_SUFFICIENT for Search-and-Ask
  • MEMORY_DIVERSITY_RERANKED / MEMORY_DIVERSITY_RERANK_FAILED for MMR diversity

Section comments provide clear organization.

CLAUDE.md (2)

101-101: LGTM! Memory package documentation accurately reflects new capabilities.

The updated description comprehensively covers:

  • MMR diversity re-ranking with bigram cache
  • Tool-based Search-and-Ask with reformulation
  • Fail-closed memory filter
  • LLMConsolidationStrategy with TaskGroup parallelism
  • Distillation capture with EPISODIC tagging

148-148: LGTM! Event constants documentation updated correctly.

New events LLM_STRATEGY_SYNTHESIZED, DISTILLATION_CAPTURED, MEMORY_DIVERSITY_RERANKED, MEMORY_DIVERSITY_RERANK_FAILED, and MEMORY_REFORMULATION_ROUND are properly listed with their source modules.

src/synthorg/memory/ranking.py (5)

51-99: LGTM! ScoredMemory documentation clarifies fusion-mode semantics.

The updated docstring (Lines 54-75) clearly explains how relevance_score, recency_score, and combined_score should be interpreted depending on whether the instance was produced by LINEAR ranking or RRF fusion.


399-437: LGTM! Bigram helpers are correctly implemented.

  • _word_bigrams (Lines 402-415): Returns frozenset for efficient caching, handles short texts gracefully.
  • _bigram_jaccard (Lines 418-437): Proper empty-set handling prevents division-by-zero.

440-496: LGTM! Validation now uses distinct failure event.

The diversity_lambda validation (Lines 473-481) correctly emits MEMORY_DIVERSITY_RERANK_FAILED for invalid parameters, addressing the past review comment about distinguishing validation failures from successful re-ranks. The early return for len(scored) <= 1 (Lines 483-484) is an appropriate optimization.


499-535: LGTM! Cached bigram MMR implementation is efficient.

Pre-computing bigrams once (Line 505) and using _bigram_jaccard_cached avoids O(n² × k) re-tokenization. The -math.inf sentinel (Line 511) ensures correct initialization. DEBUG-level logging (Lines 529-533) is appropriate for per-call details.


538-586: LGTM! Generic MMR path provides flexibility.

_bigram_jaccard_cached (Lines 538-547) correctly operates on pre-computed frozensets. _mmr_rerank_generic (Lines 550-586) mirrors the cached implementation structure, enabling custom similarity functions while maintaining the same MMR selection logic.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 707-711: The distillation capture is only invoked inside
_post_execution_pipeline, so early-return paths in run() (specifically when
calling _handle_budget_error() and _handle_fatal_error()) skip creating
distillation entries; update run() to ensure _try_capture_distillation(...) is
always called before any return by either (a) centralizing result creation into
a single exit path and calling _try_capture_distillation(execution_result,
agent_id, task_id) there, or (b) invoking _try_capture_distillation immediately
after building the result returned by _handle_budget_error() and
_handle_fatal_error(); locate uses of _try_capture_distillation,
_post_execution_pipeline, _handle_budget_error, and _handle_fatal_error in run()
and adjust control flow so distillation runs for all completion outcomes.

In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 161-164: The consolidation loop currently overwrites
ConsolidationResult.summary_id on each category and emits
LLM_STRATEGY_SYNTHESIZED before calling store(), losing per-group IDs; fix by
collecting per-group ids as you create them (e.g., summary_ids = []), call the
group store() first and append the returned id to summary_ids, emit
LLM_STRATEGY_SYNTHESIZED immediately after store() and include the summary_id
field, and finally set ConsolidationResult.summary_id to the most recent id but
also attach the full summary_ids list (or include it in STRATEGY_COMPLETE) so
callers retain visibility into every summary produced; update references to
ConsolidationResult.summary_id and the LLM_STRATEGY_SYNTHESIZED emission site
accordingly.
- Around line 457-477: In the except ProviderError as exc block: add a
structured log call before re-raising non-retryable provider errors so agent_id,
category, entry_count and model are recorded; mirror the existing logger fields
used in the retryable branch (use LLM_STRATEGY_FALLBACK, agent_id,
category.value, entry_count=len(entries), model=self._model, error=str(exc),
error_type=type(exc).__name__) and set a distinct reason like
"non_retryable_provider_error" at WARNING or ERROR level, then re-raise the
exception.
- Around line 42-43: The current constants _DEFAULT_GROUP_THRESHOLD and
_MIN_GROUP_THRESHOLD allow group_threshold=2 which results in _select_entries()
producing a pair that leaves one entry kept and one in to_remove so
_synthesize() no longer has multiple entries to deduplicate against; fix this by
enforcing a minimum of 3 (set _MIN_GROUP_THRESHOLD = 3 and validate
group_threshold on input) or, if pairs must be supported, modify the
consolidation flow so that when group_threshold == 2 the retained entry is
passed into _synthesize() as reference/context for deduplication (update the
code paths that call _select_entries(), _synthesize(), and the grouping logic
mentioned around the other occurrences to ensure the kept entry is included in
the LLM prompt).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 69f968bb-750f-41dd-bc9c-0ec7e01a8bea

📥 Commits

Reviewing files that changed from the base of the PR and between bec2e5b and f97ccd3.

📒 Files selected for processing (3)
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • tests/unit/memory/consolidation/test_distillation.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Build Backend
  • GitHub Check: Build Web
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations in Python -- Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) for exception handling in Python 3.14 per PEP 758 -- ruff enforces this
All public functions and methods must have type hints, with mypy strict mode enforcement
Google-style docstrings are required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (via model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with conventions: allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields; use @computed_field for derived values instead of storing redundant fields; use NotBlankStr for all identifier and name fields including optional and tuple variants, instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Maintain a line length of 88 characters (enforced by ruff)
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

  • tests/unit/memory/consolidation/test_distillation.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/consolidation/llm_strategy.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow for test markers
Maintain 80% minimum code coverage (enforced in CI)
Prefer @pytest.mark.parametrize for testing similar cases
Test vendor names must use test-provider, test-small-001, etc. instead of real vendor names
Use Python Hypothesis for property-based testing with @given + @settings. Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Control via HYPOTHESIS_PROFILE env var.
When Hypothesis finds a failing example, it is a real bug. Fix the underlying bug and add an explicit @example(...) decorator to the test so the case is permanently covered in CI.
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/memory/consolidation/test_distillation.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/consolidation/test_distillation.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic must have from synthorg.observability import get_logger then logger = get_logger(__name__)
Always use logger as the variable name (not _logger, not log)
Always use event name constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging with logger.info(EVENT, key=value) -- never logger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names allowed only in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths, (4) provider presets which are user-facing runtime data. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/consolidation/llm_strategy.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/consolidation/llm_strategy.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

  • tests/unit/memory/consolidation/test_distillation.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Applied to files:

  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T09:39:21.584Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:39:21.584Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
🔇 Additional comments (1)
tests/unit/memory/consolidation/test_distillation.py (1)

214-331: Nice coverage around distillation capture failure handling.

These tests exercise the success path, backend-store fallback, and MemoryError/RecursionError propagation, which gives the new capture contract solid regression protection.

Comment on lines +707 to +711
await self._try_capture_distillation(
execution_result,
agent_id,
task_id,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Distillation is skipped on the direct-return completion paths.

The new hook only runs inside _post_execution_pipeline(). run() still returns directly from _handle_budget_error() and _handle_fatal_error() (Lines 500-524), so pre-flight budget stops and fatal setup/pipeline failures never produce distillation entries. If this feature is meant to capture task completion outcomes, move the capture step to a single return path in run() or invoke it after those results are built.

Also applies to: 714-743

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 707 - 711, The distillation
capture is only invoked inside _post_execution_pipeline, so early-return paths
in run() (specifically when calling _handle_budget_error() and
_handle_fatal_error()) skip creating distillation entries; update run() to
ensure _try_capture_distillation(...) is always called before any return by
either (a) centralizing result creation into a single exit path and calling
_try_capture_distillation(execution_result, agent_id, task_id) there, or (b)
invoking _try_capture_distillation immediately after building the result
returned by _handle_budget_error() and _handle_fatal_error(); locate uses of
_try_capture_distillation, _post_execution_pipeline, _handle_budget_error, and
_handle_fatal_error in run() and adjust control flow so distillation runs for
all completion outcomes.

Copy link
Copy Markdown
Owner Author

PR Review Summary — feat/memory-consolidation-upgrades

Full review pipeline completed (local agents + CodeRabbit + Gemini + Copilot). Two commits pushed.


Fixes Applied

Commit f8e8bac — CodeRabbit findings

Fix File Source
Log ExceptionGroup before re-raising in each except* handler in consolidate() — operators can now diagnose multi-task failures where >1 category group fails concurrently memory/consolidation/llm_strategy.py CodeRabbit #4058984940
Run MemoryFilterStrategy before MMR diversity re-ranking — prevents privacy-excluded entries from acting as MMR anchors and suppressing visible-but-similar candidates memory/retriever.py CodeRabbit #4058864502
Update both pipeline step tables + Diversity Re-ranking prose to reflect filter-first ordering docs/design/memory.md Follow-on
Store tool invocation names in distillation content instead of bare count (e.g. Memory lookups: search_memory, recall_memory instead of Memory lookups: 2) memory/consolidation/distillation.py CodeRabbit #4058864502
Update test assertion for new content format tests/unit/memory/consolidation/test_distillation.py Follow-on
Remove spurious async from test_error_without_message_falls_through (sync test, no awaits) — note: a parallel review agent had already moved this test to the correct class; this commit resolved the resulting merge conflict tests/unit/memory/consolidation/test_distillation.py CodeRabbit async cleanup

Commit 68c3aaa — local agent test-quality findings

Fix File
Grammar bug in build_trajectory_summary: fix "1 turns""1 turn" (singular/plural) memory/consolidation/distillation.py
Update test assertion to "1 turn" to match corrected output tests/unit/memory/consolidation/test_distillation.py
Hoist 5 repeated per-method _merge_results deferred imports to module level tests/unit/memory/test_tool_retriever.py
Collapse test_empty_llm_response_falls_back + test_whitespace_llm_response_falls_back into a single @pytest.mark.parametrize test per CLAUDE.md conventions tests/unit/memory/consolidation/test_llm_strategy.py
Hoist deferred MemoryCategory import inside test_store_request_shape to module level tests/unit/memory/consolidation/test_distillation.py

Findings NOT Fixed (with rationale)

Finding Reason
"Store first, delete second" concern (CodeRabbit) False positive_process_group already stores before deleting; docstring documents this invariant explicitly
Serialize full DistillationRequest as content Design choice — human-readable text is appropriate for LLM consumption; serialized JSON would be less useful for synthesis
Fetch larger candidate pool before MMR truncation Requires backend API changes (candidate_limit separate from max_memories); valid follow-up
Add engine.run() integration tests Unit tests targeting _post_execution_pipeline directly are correct for this test file; public-API integration tests belong in a dedicated integration test suite

Checks

  • ruff check src/ tests/ — ✅ all passed
  • mypy src/synthorg/memory/ src/synthorg/engine/agent_engine.py — ✅ no issues (133 files)
  • pytest⚠️ pre-existing Python 3.14 RC2 + Pydantic AssertionError environment issue confirmed present on main before this PR; not caused by these changes

Generated by Claude Code

@Aureliolo Aureliolo temporarily deployed to cloudflare-preview April 5, 2026 06:53 — with GitHub Actions Inactive
Copy link
Copy Markdown
Owner Author

Follow-up fixes — commit e659544

Two additional correctness issues surfaced by a post-push conventions/resilience review agent. Both are fixed and pushed.


memory/consolidation/llm_strategy.py_fallback_summary IndexError guard

_fallback_summary accessed entries[0].category.value unconditionally on the header line. If _synthesize is ever called with an empty list (reachable when a backend deduplication bug yields all-same-ID entries, leaving to_remove empty), this raises IndexError: list index out of range.

Fix: Added early-return guard before the header line:

if not entries:
    return ""

engine/agent_engine.py_try_capture_distillation NotBlankStr coercion

_try_capture_distillation declared agent_id: str and task_id: str but passed them directly to capture_distillation(agent_id: NotBlankStr, task_id: NotBlankStr). Since NotBlankStr is Annotated[str, StringConstraints(...)], Python's runtime doesn't enforce the constraint on function arguments — a blank agent_id would silently bypass DistillationRequest's NotBlankStr field validator.

Fix: Added TypeAdapter(NotBlankStr).validate_python() coercion at the call site, so blank-string caller bugs are caught at the _try_capture_distillation boundary rather than propagating undetected into the model.


Both ruff and mypy pass clean after these changes.


Generated by Claude Code

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
src/synthorg/memory/consolidation/llm_strategy.py (3)

487-507: ⚠️ Potential issue | 🟡 Minor

Log non-retryable provider failures before re-raising.

This branch propagates auth/validation-style provider errors with no category/model context, so operators only get the coarse task-group log upstream.

💡 Minimal fix
         except ProviderError as exc:
             if exc.is_retryable:
                 logger.warning(
                     LLM_STRATEGY_FALLBACK,
                     agent_id=agent_id,
@@
                 )
                 return self._fallback_summary(entries), False
             # Non-retryable provider errors (auth, validation, etc.)
             # propagate -- they are bugs the caller must see, not
             # conditions to paper over with a fallback summary.
+            logger.error(
+                LLM_STRATEGY_FALLBACK,
+                agent_id=agent_id,
+                category=category.value,
+                entry_count=len(entries),
+                model=self._model,
+                error=str(exc),
+                error_type=type(exc).__name__,
+                reason="non_retryable_provider_error",
+                exc_info=True,
+            )
             raise
As per coding guidelines "Log all error paths at WARNING or ERROR level with context before raising".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 487 - 507, In
the except ProviderError block inside the consolidation logic (the handler that
currently checks exc.is_retryable), add a warning or error log call that logs
non-retryable provider errors with the same context fields used for retryable
cases (agent_id, category.value, entry_count=len(entries), model=self._model,
error=str(exc), error_type=type(exc).__name__) before re-raising; keep using the
logger and a clear message (similar to LLM_STRATEGY_FALLBACK) so operators get
category/model context for ProviderError paths prior to the raise.

161-164: ⚠️ Potential issue | 🟠 Major

Emit LLM_STRATEGY_SYNTHESIZED only after store() succeeds.

The success event fires inside _synthesize() before persistence and without summary_id. If store() fails, the logs still report a synthesized summary, and multi-category runs still have no per-group persisted ID beyond the last ConsolidationResult.summary_id.

💡 Suggested direction
-            if response.content and response.content.strip():
-                logger.info(
-                    LLM_STRATEGY_SYNTHESIZED,
-                    agent_id=agent_id,
-                    category=category.value,
-                    entry_count=len(entries),
-                    input_length=len(user_content),
-                    output_length=len(response.content),
-                    model=self._model,
-                    trajectory_context_count=len(trajectory_context),
-                )
-                return response.content.strip(), True
+            if response.content and response.content.strip():
+                return response.content.strip(), True
...
         new_id = await self._backend.store(agent_id, store_request)
+        if used_llm:
+            logger.info(
+                LLM_STRATEGY_SYNTHESIZED,
+                agent_id=agent_id,
+                category=category.value,
+                entry_count=len(to_remove),
+                summary_id=new_id,
+                model=self._model,
+                trajectory_context_count=len(trajectory_context),
+            )

Also applies to: 250-258, 355-364, 451-461

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 161 - 164,
The code currently emits the LLM_STRATEGY_SYNTHESIZED log inside _synthesize()
before persisting the summary, which can produce success logs without a
persisted summary_id; move the emission so it occurs only after store()
completes successfully and include the persisted summary_id from
ConsolidationResult.summary_id in the event payload; update every synthesize
call site that currently logs before persistence (e.g., the
LLM_STRATEGY_SYNTHESIZED emission in _synthesize() and the other similar emit
locations) to emit after the corresponding store() returns successfully and
handle/store any store() errors so the event is not emitted on failure.

42-43: ⚠️ Potential issue | 🟠 Major

group_threshold=2 never performs real consolidation.

For a 2-entry group, _select_entries() keeps one entry and _synthesize() sees only the other. That leaves the kept original plus a summary of a single removed entry, so there is no deduplication against the retained entry and no net count reduction.

💡 Simplest fix
-_MIN_GROUP_THRESHOLD = 2
+_MIN_GROUP_THRESHOLD = 3
...
-        ValueError: If ``group_threshold`` is less than 2.
+        ValueError: If ``group_threshold`` is less than 3.

Also applies to: 115-117, 130-135, 344-353

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 42 - 43, The
group_threshold of 2 cannot actually consolidate because _select_entries() will
keep one entry and _synthesize() will only see the other, producing a summary
that doesn't dedupe the retained entry; change the consolidation minimum to 3 by
updating _MIN_GROUP_THRESHOLD and _DEFAULT_GROUP_THRESHOLD to 3 and ensure any
parameter validation or clamping (where group_threshold is accepted/used)
enforces >= _MIN_GROUP_THRESHOLD so calls into _select_entries() and
_synthesize() always operate on groups of 3+ entries.
src/synthorg/memory/consolidation/distillation.py (1)

42-47: ⚠️ Potential issue | 🟠 Major

Persist task_id with the distillation entry.

DistillationRequest says downstream readers can correlate these entries by task, but the serialized payload drops task_id. Once this is written to the backend, consumers only see outcome/trajectory/tool names, so task-level correlation is impossible.

💡 Minimal fix
-        store_content = (
-            f"Distillation: {outcome}\n"
-            f"Trajectory: {trajectory}\n"
-            f"Memory lookups: {tool_names}"
-        )
+        store_content = (
+            f"Task ID: {task_id}\n"
+            f"Outcome: {outcome}\n"
+            f"Trajectory: {trajectory}\n"
+            f"Memory tool invocations: {tool_names}"
+        )

Also applies to: 196-209

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 146-269: The consolidate method is over the 50-line limit; split
its responsibilities into smaller methods: extract the grouping logic into e.g.
a new _build_groups(entries) -> list[tuple[MemoryCategory, list[MemoryEntry]]],
move the TaskGroup orchestration and exception-unwrapping block into a new
_run_groups_taskgroup(groups, agent_id, trajectory_context) ->
list[tuple[NotBlankStr, list[NotBlankStr]]] that creates tasks calling the
existing _process_group and returns task results, and factor the final folding
of group_results into a small _fold_group_results(group_results) ->
ConsolidationResult (or values used to construct it). Keep calls to
_fetch_trajectory_context and _process_group intact and update consolidate to
orchestrate by calling these new helpers so the top-level consolidate remains
under 50 lines.

---

Duplicate comments:
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 487-507: In the except ProviderError block inside the
consolidation logic (the handler that currently checks exc.is_retryable), add a
warning or error log call that logs non-retryable provider errors with the same
context fields used for retryable cases (agent_id, category.value,
entry_count=len(entries), model=self._model, error=str(exc),
error_type=type(exc).__name__) before re-raising; keep using the logger and a
clear message (similar to LLM_STRATEGY_FALLBACK) so operators get category/model
context for ProviderError paths prior to the raise.
- Around line 161-164: The code currently emits the LLM_STRATEGY_SYNTHESIZED log
inside _synthesize() before persisting the summary, which can produce success
logs without a persisted summary_id; move the emission so it occurs only after
store() completes successfully and include the persisted summary_id from
ConsolidationResult.summary_id in the event payload; update every synthesize
call site that currently logs before persistence (e.g., the
LLM_STRATEGY_SYNTHESIZED emission in _synthesize() and the other similar emit
locations) to emit after the corresponding store() returns successfully and
handle/store any store() errors so the event is not emitted on failure.
- Around line 42-43: The group_threshold of 2 cannot actually consolidate
because _select_entries() will keep one entry and _synthesize() will only see
the other, producing a summary that doesn't dedupe the retained entry; change
the consolidation minimum to 3 by updating _MIN_GROUP_THRESHOLD and
_DEFAULT_GROUP_THRESHOLD to 3 and ensure any parameter validation or clamping
(where group_threshold is accepted/used) enforces >= _MIN_GROUP_THRESHOLD so
calls into _select_entries() and _synthesize() always operate on groups of 3+
entries.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 274d0b7e-cf20-406d-b226-b40c46ad2f88

📥 Commits

Reviewing files that changed from the base of the PR and between f97ccd3 and f8e8bac.

📒 Files selected for processing (5)
  • docs/design/memory.md
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/retriever.py
  • tests/unit/memory/consolidation/test_distillation.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations in Python -- Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) for exception handling in Python 3.14 per PEP 758 -- ruff enforces this
All public functions and methods must have type hints, with mypy strict mode enforcement
Google-style docstrings are required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (via model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with conventions: allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields; use @computed_field for derived values instead of storing redundant fields; use NotBlankStr for all identifier and name fields including optional and tuple variants, instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Maintain a line length of 88 characters (enforced by ruff)
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

  • src/synthorg/memory/retriever.py
  • src/synthorg/memory/consolidation/distillation.py
  • tests/unit/memory/consolidation/test_distillation.py
  • src/synthorg/memory/consolidation/llm_strategy.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic must have from synthorg.observability import get_logger then logger = get_logger(__name__)
Always use logger as the variable name (not _logger, not log)
Always use event name constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging with logger.info(EVENT, key=value) -- never logger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names allowed only in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths, (4) provider presets which are user-facing runtime data. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/memory/retriever.py
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/consolidation/llm_strategy.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/memory/retriever.py
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/consolidation/llm_strategy.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow for test markers
Maintain 80% minimum code coverage (enforced in CI)
Prefer @pytest.mark.parametrize for testing similar cases
Test vendor names must use test-provider, test-small-001, etc. instead of real vendor names
Use Python Hypothesis for property-based testing with @given + @settings. Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Control via HYPOTHESIS_PROFILE env var.
When Hypothesis finds a failing example, it is a real bug. Fix the underlying bug and add an explicit @example(...) decorator to the test so the case is permanently covered in CI.
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/memory/consolidation/test_distillation.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/consolidation/test_distillation.py
docs/**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Documentation is in docs/ (Markdown, built with Zensical, config: mkdocs.yml). Design spec in docs/design/ (12 pages). Architecture in docs/architecture/. Roadmap in docs/roadmap/. Security in docs/security.md. Licensing in docs/licensing.md.

Files:

  • docs/design/memory.md
🧠 Learnings (14)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • src/synthorg/memory/retriever.py
  • src/synthorg/memory/consolidation/distillation.py
  • docs/design/memory.md
  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

  • tests/unit/memory/consolidation/test_distillation.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/memory/**/*.py : Use MemoryBackend protocol with pluggable backends (Mem0 adapter available at backends/mem0/) for persistent agent memory

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (`is_retryable=True`) include: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR level with context before raising

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
🔇 Additional comments (1)
src/synthorg/memory/retriever.py (1)

294-339: Good filter-before-MMR ordering.

Applying the memory filter before apply_diversity_penalty() keeps excluded entries from acting as similarity anchors, so the fail-closed privacy boundary still defines the visible candidate set.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
src/synthorg/memory/consolidation/llm_strategy.py (4)

487-507: ⚠️ Potential issue | 🟡 Minor

Log the non-retryable provider path before re-raising.

Auth/validation failures leave this branch without structured agent_id / category / model context unless an upstream caller happens to log them. Emit a warning/error here before raise.

As per coding guidelines "Log all error paths at WARNING or ERROR level with context before raising".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 487 - 507,
The non-retryable ProviderError path should log the error with the same
structured context before re-raising: inside the except ProviderError as exc
block (the branch after checking exc.is_retryable), call logger.error (or
warning per guidelines) with LLM_STRATEGY_FALLBACK and include agent_id,
category.value, entry_count=len(entries), model=self._model, error=str(exc),
error_type=type(exc).__name__, and reason="non_retryable_provider_error" so the
failure is recorded with context before the function re-raises the exception.

146-269: 🛠️ Refactor suggestion | 🟠 Major

Split consolidate() below the 50-line limit.

This method still combines grouping, TaskGroup orchestration, exception unwrapping, and result folding in one body, and it remains far above the repo’s 50-line limit. Extract the grouping / parallel execution / fold steps into a few cohesive private methods.

As per coding guidelines "Functions must be less than 50 lines, files must be less than 800 lines".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 146 - 269,
The consolidate method is over the 50-line limit and mixes grouping, TaskGroup
orchestration, exception unwrapping, and result folding; refactor by extracting
responsibilities into small private helpers: implement _group_entries(entries)
to perform the sorted+groupby logic and return groups_to_process, implement
_execute_group_tasks(groups_to_process, agent_id, trajectory_context) to run the
asyncio.TaskGroup, handle the except* branches (MemoryError, RecursionError,
ProviderError, Exception) and return the list of task results, and implement
_fold_group_results(group_results) to build removed_ids and summary_id and
return a ConsolidationResult; then simplify consolidate() to call
_group_entries, _fetch_trajectory_context, _execute_group_tasks,
_fold_group_results and perform only logging and returning the result (keep
existing symbol names: consolidate, _fetch_trajectory_context, STRATEGY_START,
STRATEGY_COMPLETE, LLM_STRATEGY_FALLBACK).

42-43: ⚠️ Potential issue | 🟠 Major

Disallow pairwise groups or pass the kept entry into synthesis.

With group_threshold=2, _select_entries() keeps one memory and _process_group() sends only the lone remainder to _synthesize(). The model never sees both entries, so it cannot deduplicate against the retained exemplar; this devolves into summarizing a single entry and can preserve duplicates beside the kept memory. Either raise the minimum to 3, or include the kept entry as read-only context when pairs are allowed.

Also applies to: 119-135, 344-353

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 42 - 43, The
grouping logic allows pairwise groups (group_threshold=2) which causes
_process_group to call _synthesize with only the lone remainder and prevents
deduplication; fix this by disallowing pairwise groups: change
_MIN_GROUP_THRESHOLD from 2 to 3 (update _DEFAULT_GROUP_THRESHOLD if needed) and
ensure all checks that validate or clamp group_threshold (_select_entries,
_process_group and any places referenced around the other occurrences) enforce
>= _MIN_GROUP_THRESHOLD so you never produce groups of size 2; alternatively, if
you must allow pairs, modify _process_group to pass the kept exemplar into
_synthesize as read-only context (add a parameter to _synthesize and include the
retained entry when calling) so the model can deduplicate against it.

161-164: ⚠️ Potential issue | 🟠 Major

Don’t drop the extra summary IDs from multi-category runs.

Each processed category can create its own stored summary, but the fold on Lines 252-258 overwrites summary_id on every iteration. The docstring’s fallback does not hold either: LLM_STRATEGY_SYNTHESIZED is emitted inside _synthesize() before store() and never includes the backend-assigned ID. Expose all created IDs in the result, or emit per-group IDs immediately after each store().

Also applies to: 250-258, 355-363, 451-460

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 161 - 164,
The ConsolidationResult currently only returns the last created summary ID
(ConsolidationResult.summary_id) because the loop in the consolidation path
overwrites summary_id each iteration; update the consolidation flow in
llm_strategy.py (the loop around the fold at the block that updates summary_id
on each iteration) so that all backend-assigned IDs produced by store() are
preserved and returned—either add a new field like
ConsolidationResult.summary_ids (list) and append each ID after calling store(),
or emit/record the per-group ID immediately after store() rather than relying on
the LLM_STRATEGY_SYNTHESIZED event (which is logged before store() and lacks the
backend ID). Make the same change in the other similar folding sections
referenced (the blocks around lines 250–258, 355–363, 451–460) and update any
callers of ConsolidationResult to consume summary_ids or fallback to the
existing summary_id for backward compatibility.
src/synthorg/engine/agent_engine.py (1)

707-711: ⚠️ Potential issue | 🟠 Major

Capture distillation on the non-pipeline exits too.

This hook still only runs through _post_execution_pipeline(). Lines 500-524 in run() return directly from _handle_budget_error() and _handle_fatal_error(), so pre-flight budget stops and fatal setup/pipeline failures never create distillation entries even though this helper’s docstring says budget exhaustions are captured. Centralize the final return path in run(), or call _try_capture_distillation(result.execution_result, agent_id, task_id) before those returns.

Also applies to: 722-728

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 707 - 711, The run() flow
currently skips calling _try_capture_distillation for early exits from
_handle_budget_error and _handle_fatal_error (and similar non-pipeline paths),
so budget/fatal failures never create distillation entries; modify run() to
ensure _try_capture_distillation(result.execution_result, agent_id, task_id) is
invoked on those code paths (either by centralizing final return logic so every
exit goes through _post_execution_pipeline() or by inserting a direct call to
_try_capture_distillation immediately before each return from
_handle_budget_error/_handle_fatal_error and the similar early-return blocks
around lines ~722-728), keeping the same arguments
(execution_result/agent_id/task_id) and preserving existing error handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/memory/test_tool_retriever.py`:
- Around line 380-397: Update the test_no_reformulator_is_single_shot case to
assert the sufficiency checker is not invoked when reformulator=None: ensure the
AsyncMock passed as sufficiency_checker (use a named AsyncMock like
AsyncMock(name="sufficiency_checker")) is asserted as not awaited/not called
after calling ToolBasedInjectionStrategy.handle_tool_call, in addition to the
existing assert that backend.retrieve was called once; this verifies the
reformulation loop is fully bypassed.
- Around line 677-695: Combine the three nearly identical tests into one
parameterized pytest test that calls _merge_results with pairs of entries
created by _make_entry and asserts the winner; use `@pytest.mark.parametrize` on a
new test function (e.g., test_collision_relevance_resolution) with cases
covering (low vs high), (high vs low), (None vs positive) and add the two
suggested edge cases: (existing beats None) and (equal scores keeps existing).
Ensure each param supplies (existing_entry, new_entry, expected_relevance) and
the body asserts len(result) == 1 and result[0].relevance_score ==
expected_relevance while still using _merge_results and _make_entry as the
referenced helpers.

---

Duplicate comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 707-711: The run() flow currently skips calling
_try_capture_distillation for early exits from _handle_budget_error and
_handle_fatal_error (and similar non-pipeline paths), so budget/fatal failures
never create distillation entries; modify run() to ensure
_try_capture_distillation(result.execution_result, agent_id, task_id) is invoked
on those code paths (either by centralizing final return logic so every exit
goes through _post_execution_pipeline() or by inserting a direct call to
_try_capture_distillation immediately before each return from
_handle_budget_error/_handle_fatal_error and the similar early-return blocks
around lines ~722-728), keeping the same arguments
(execution_result/agent_id/task_id) and preserving existing error handling.

In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 487-507: The non-retryable ProviderError path should log the error
with the same structured context before re-raising: inside the except
ProviderError as exc block (the branch after checking exc.is_retryable), call
logger.error (or warning per guidelines) with LLM_STRATEGY_FALLBACK and include
agent_id, category.value, entry_count=len(entries), model=self._model,
error=str(exc), error_type=type(exc).__name__, and
reason="non_retryable_provider_error" so the failure is recorded with context
before the function re-raises the exception.
- Around line 146-269: The consolidate method is over the 50-line limit and
mixes grouping, TaskGroup orchestration, exception unwrapping, and result
folding; refactor by extracting responsibilities into small private helpers:
implement _group_entries(entries) to perform the sorted+groupby logic and return
groups_to_process, implement _execute_group_tasks(groups_to_process, agent_id,
trajectory_context) to run the asyncio.TaskGroup, handle the except* branches
(MemoryError, RecursionError, ProviderError, Exception) and return the list of
task results, and implement _fold_group_results(group_results) to build
removed_ids and summary_id and return a ConsolidationResult; then simplify
consolidate() to call _group_entries, _fetch_trajectory_context,
_execute_group_tasks, _fold_group_results and perform only logging and returning
the result (keep existing symbol names: consolidate, _fetch_trajectory_context,
STRATEGY_START, STRATEGY_COMPLETE, LLM_STRATEGY_FALLBACK).
- Around line 42-43: The grouping logic allows pairwise groups
(group_threshold=2) which causes _process_group to call _synthesize with only
the lone remainder and prevents deduplication; fix this by disallowing pairwise
groups: change _MIN_GROUP_THRESHOLD from 2 to 3 (update _DEFAULT_GROUP_THRESHOLD
if needed) and ensure all checks that validate or clamp group_threshold
(_select_entries, _process_group and any places referenced around the other
occurrences) enforce >= _MIN_GROUP_THRESHOLD so you never produce groups of size
2; alternatively, if you must allow pairs, modify _process_group to pass the
kept exemplar into _synthesize as read-only context (add a parameter to
_synthesize and include the retained entry when calling) so the model can
deduplicate against it.
- Around line 161-164: The ConsolidationResult currently only returns the last
created summary ID (ConsolidationResult.summary_id) because the loop in the
consolidation path overwrites summary_id each iteration; update the
consolidation flow in llm_strategy.py (the loop around the fold at the block
that updates summary_id on each iteration) so that all backend-assigned IDs
produced by store() are preserved and returned—either add a new field like
ConsolidationResult.summary_ids (list) and append each ID after calling store(),
or emit/record the per-group ID immediately after store() rather than relying on
the LLM_STRATEGY_SYNTHESIZED event (which is logged before store() and lacks the
backend ID). Make the same change in the other similar folding sections
referenced (the blocks around lines 250–258, 355–363, 451–460) and update any
callers of ConsolidationResult to consume summary_ids or fallback to the
existing summary_id for backward compatibility.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4c5db05a-067e-4438-8975-2277cf0b824a

📥 Commits

Reviewing files that changed from the base of the PR and between f8e8bac and e659544.

📒 Files selected for processing (6)
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • tests/unit/memory/consolidation/test_distillation.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/test_tool_retriever.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations in Python -- Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax (no parentheses) for exception handling in Python 3.14 per PEP 758 -- ruff enforces this
All public functions and methods must have type hints, with mypy strict mode enforcement
Google-style docstrings are required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement.
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (via model_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with conventions: allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields; use @computed_field for derived values instead of storing redundant fields; use NotBlankStr for all identifier and name fields including optional and tuple variants, instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Maintain a line length of 88 characters (enforced by ruff)
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)

Files:

  • tests/unit/memory/test_tool_retriever.py
  • tests/unit/memory/consolidation/test_distillation.py
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/engine/agent_engine.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, and @pytest.mark.slow for test markers
Maintain 80% minimum code coverage (enforced in CI)
Prefer @pytest.mark.parametrize for testing similar cases
Test vendor names must use test-provider, test-small-001, etc. instead of real vendor names
Use Python Hypothesis for property-based testing with @given + @settings. Hypothesis profiles configured in tests/conftest.py: ci (deterministic, max_examples=10 + derandomize=True), dev (1000 examples), fuzz (10,000 examples, no deadline), extreme (500,000 examples, no deadline). Control via HYPOTHESIS_PROFILE env var.
When Hypothesis finds a failing example, it is a real bug. Fix the underlying bug and add an explicit @example(...) decorator to the test so the case is permanently covered in CI.
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number).

Files:

  • tests/unit/memory/test_tool_retriever.py
  • tests/unit/memory/consolidation/test_distillation.py
  • tests/unit/memory/consolidation/test_llm_strategy.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/test_tool_retriever.py
  • tests/unit/memory/consolidation/test_distillation.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic must have from synthorg.observability import get_logger then logger = get_logger(__name__)
Always use logger as the variable name (not _logger, not log)
Always use event name constants from domain-specific modules under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging with logger.info(EVENT, key=value) -- never logger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Vendor names allowed only in: (1) Operations design page, (2) .claude/ files, (3) third-party import paths, (4) provider presets which are user-facing runtime data. Tests must use test-provider, test-small-001, etc.

Files:

  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/engine/agent_engine.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/engine/agent_engine.py
🧠 Learnings (22)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • tests/unit/memory/test_tool_retriever.py
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/memory/consolidation/test_distillation.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

  • tests/unit/memory/consolidation/test_distillation.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (`is_retryable=True`) include: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR level with context before raising

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T06:12:03.047Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T06:12:03.047Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Keep functions under 50 lines and files under 800 lines

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Functions must be less than 50 lines; files must be less than 800 lines

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Functions must be less than 50 lines, files must be less than 800 lines

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Applied to files:

  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Applied to files:

  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-04-01T09:39:21.584Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:39:21.584Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
🔇 Additional comments (3)
tests/unit/memory/test_tool_retriever.py (3)

12-12: LGTM on importing _merge_results for direct unit testing.

Testing internal helper functions directly is appropriate here to verify the merge-by-ID logic in isolation.


339-346: Clean helper for reformulation config.

Reduces boilerplate across the reformulation tests while correctly setting required fields.


349-640: Excellent coverage of the Search-and-Ask reformulation loop.

The tests comprehensively verify:

  • Guard paths (disabled, missing reformulator, early sufficiency)
  • Multi-round iteration with proper result merging
  • Termination conditions (None return, stable query, max rounds)
  • Deduplication by ID with higher-relevance wins
  • Error isolation returning current entries without leaking internal errors

The error isolation tests (lines 582-639) particularly stand out — they validate the fail-safe behavior documented in the implementation where exceptions from sufficiency_checker or reformulator degrade gracefully.

Comment on lines +380 to +397
async def test_no_reformulator_is_single_shot(self) -> None:
"""Reformulation skipped when reformulator is None."""
entries = (_make_entry(entry_id="a"),)
backend = _make_backend(entries)

strategy = ToolBasedInjectionStrategy(
backend=backend,
config=_reformulation_config(),
reformulator=None,
sufficiency_checker=AsyncMock(),
)

await strategy.handle_tool_call(
tool_name="search_memory",
arguments={"query": "find me memories"},
agent_id="agent-1",
)
assert backend.retrieve.call_count == 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add assertion that sufficiency checker is not invoked when reformulator is missing.

The test verifies single-shot retrieval but doesn't confirm that the sufficiency checker is bypassed entirely. Per the implementation, when reformulator=None, the reformulation loop is skipped completely.

🧪 Proposed enhancement
         strategy = ToolBasedInjectionStrategy(
             backend=backend,
             config=_reformulation_config(),
             reformulator=None,
             sufficiency_checker=sufficiency,
         )

         await strategy.handle_tool_call(
             tool_name="search_memory",
             arguments={"query": "find me memories"},
             agent_id="agent-1",
         )
         assert backend.retrieve.call_count == 1
+        sufficiency.check_sufficiency.assert_not_called()

Also update the mock setup to use a named mock for consistency:

+        sufficiency = AsyncMock()
+        sufficiency.check_sufficiency = AsyncMock(return_value=True)
+
         strategy = ToolBasedInjectionStrategy(
             backend=backend,
             config=_reformulation_config(),
             reformulator=None,
-            sufficiency_checker=AsyncMock(),
+            sufficiency_checker=sufficiency,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/memory/test_tool_retriever.py` around lines 380 - 397, Update the
test_no_reformulator_is_single_shot case to assert the sufficiency checker is
not invoked when reformulator=None: ensure the AsyncMock passed as
sufficiency_checker (use a named AsyncMock like
AsyncMock(name="sufficiency_checker")) is asserted as not awaited/not called
after calling ToolBasedInjectionStrategy.handle_tool_call, in addition to the
existing assert that backend.retrieve was called once; this verifies the
reformulation loop is fully bypassed.

Adds MMR-style diversity penalty to reduce redundancy in memory retrieval results.
Promotes coverage over near-duplicate content.

- apply_diversity_penalty() in ranking.py: iterative MMR selection
  balancing relevance (combined_score) against max similarity to
  already-selected entries
- _bigram_jaccard() similarity using word-level bigrams (no external deps,
  pluggable via similarity_fn parameter for future embedding-based variants)
- diversity_penalty_enabled and diversity_lambda config fields on
  MemoryRetrievalConfig (lambda=1.0 pure relevance, 0.0 max diversity)
- Wired into ContextInjectionStrategy._execute_pipeline after ranking,
  before memory filter
- MEMORY_DIVERSITY_RERANKED event constant

Part of #704.
Adds LLMConsolidationStrategy implementing the ConsolidationStrategy
protocol. Uses a CompletionProvider to semantically merge memory entries
within each category group, producing a synthesized summary that
preserves all unique facts while removing redundancy.

Addresses SimpleConsolidationStrategy's lack of semantic deduplication.

- LLMConsolidationStrategy: groups entries by category, keeps the
  highest-relevance entry per group (recency tiebreaker), feeds the rest
  to an LLM for synthesis
- Synthesis prompt instructs the model to identify overlapping info,
  merge semantically related facts, and preserve specific details
- Error handling follows AbstractiveSummarizer pattern: re-raise
  MemoryError/RecursionError, propagate non-retryable ProviderError,
  gracefully fall back to concatenation for retryable/unexpected errors
- Summary stored with metadata.tags = (consolidated, llm-synthesized)
- LLM_STRATEGY_SYNTHESIZED and LLM_STRATEGY_FALLBACK event constants
- Exported from synthorg.memory.consolidation

Part of #704.
Adds DistillationRequest model and capture function for outcome-driven
memory curation at task completion. Captures trajectory summary, outcome,
and retrieved memory tool usage.

Feeds into the consolidation pipeline -- stored as an EPISODIC memory
entry tagged with distillation for future consolidation runs.

- DistillationRequest: frozen Pydantic model with agent_id, task_id,
  trajectory_summary, outcome, retrieved_memory_ids, created_at
- build_trajectory_summary: summarizes turn count, tool usage, and
  total tokens from turn records
- build_outcome: human-readable description from TerminationReason
  and error_message
- extract_memory_ids: extracts search_memory/recall_memory tool calls
  from turn records (non-memory tools ignored)
- capture_distillation: non-critical capture function that stores
  the distillation entry; returns None on failure, logs warning,
  propagates MemoryError/RecursionError
- DISTILLATION_CAPTURED and DISTILLATION_CAPTURE_FAILED event constants
- Exported from synthorg.memory.consolidation

Engine integration point documented but not wired: agent_engine.py
_post_execution_pipeline alongside _try_procedural_memory.

Part of #704.
Wires the existing QueryReformulator and SufficiencyChecker protocols
into ToolBasedInjectionStrategy to enable iterative Search-and-Ask
retrieval: search, check sufficiency, reformulate, re-search.

The protocols and LLM implementations already existed in
memory/reformulation.py; the config fields query_reformulation_enabled
and max_reformulation_rounds were reserved. This activates them.

- ToolBasedInjectionStrategy accepts optional reformulator and
  sufficiency_checker kwargs
- _retrieve_with_reformulation orchestrates initial search + reformulation
  loop when enabled
- _reformulation_loop runs up to max_reformulation_rounds:
  check sufficiency -> reformulate -> re-search -> merge results
- _merge_results unions entries by ID, keeping higher-relevance on collision
- Loop stops early on: sufficient results, reformulator returning None,
  reformulator returning the same query, or max rounds reached
- Config validator relaxed: query_reformulation_enabled=True now allowed
  when strategy=TOOL_BASED (still rejected for CONTEXT/SELF_EDITING)
- MEMORY_REFORMULATION_ROUND and MEMORY_REFORMULATION_SUFFICIENT events

Part of #704.
…ry consolidation

Pre-reviewed by 12 agents, 45+ findings addressed.

## Scope additions (issue #704 end-to-end wiring)

- Wire capture_distillation into AgentEngine._post_execution_pipeline behind
  distillation_capture_enabled=False opt-in flag
- Fetch distillation entries (tagged 'distillation') in LLMConsolidationStrategy
  and embed trajectory summaries in the synthesis system prompt
- Rename DistillationRequest.retrieved_memory_ids -> memory_tool_invocations
  (field stored tool names, not memory IDs)

## Code fixes

- llm_strategy: parallel per-category processing via asyncio.TaskGroup
- llm_strategy: delete-before-store + per-delete try/except for partial failure
  safety; successful IDs only reported in ConsolidationResult
- llm_strategy: explicit RetryExhaustedError handling with retryable
  ProviderError fallback for tests; non-retryable errors propagate
- llm_strategy: concat fallback tagged 'concat-fallback' (not 'llm-synthesized')
- llm_strategy: per-entry input cap to guard provider context limits
- ranking: MMR sentinel = -math.inf (not -1.0) + pre-computed bigram cache
- ranking: pre-raise log for apply_diversity_penalty ValueError
- retrieval_config: diversity_penalty/TOOL_BASED warn validator, stale reformulation
  docstrings updated, personal_boost/RRF warning clarified
- retriever: fail-closed on memory filter exceptions (no longer silently leaks
  unfiltered memories when a privacy filter raises)
- tool_retriever: Search-and-Ask loop now wraps sufficiency_checker and
  reformulator calls in error isolation, emits MEMORY_REFORMULATION_FAILED /
  MEMORY_SUFFICIENCY_CHECK_FAILED, adds max-rounds-exhausted log
- tool_retriever: explicit guard (was assert) for type narrowing
- tool_retriever: all error strings prefixed with ERROR_PREFIX; rejected
  invalid categories surfaced back to the LLM for self-correction
- tool_retriever: docstring covers reformulator/sufficiency_checker params
- Pre-raise error logs at MemoryError/RecursionError handlers per CLAUDE.md

## Tests

- RecursionError propagation for distillation + llm_strategy
- capture_distillation store request shape assertion
- LLMConsolidationStrategy: tiebreaker, None relevance, fallback truncation,
  synthesis content verbatim, fallback tag, partial delete failure
- LLMConsolidationStrategy: trajectory context fetch, disabled mode, lookup failure
- AgentEngine distillation capture wiring (new test file)
- Reformulation dedup across rounds, sufficiency_checker/reformulator errors
- Invalid category surfacing
- MMR all-negative-score deterministic behavior
- Diversity/TOOL_BASED warn validator
- Memory filter fail-closed behavior
- test_ranking.py split into test_ranking.py + test_ranking_diversity.py
  (was 823 lines, now 627 + 256)

## Docs

- docs/design/memory.md: reformulation wired-in, LLMConsolidationStrategy,
  DistillationRequest capture, MMR diversity re-ranking, fail-closed filter
- CLAUDE.md: memory package structure updated, events.consolidation examples

Closes #704
**Data safety (MAJOR):** Fix potential data loss in
`LLMConsolidationStrategy._process_group` where entries were deleted
BEFORE synthesis/store.  A non-retryable `ProviderError` (e.g.
`AuthenticationError`) raised by `_synthesize` would silently drop
already-deleted entries with no summary stored.  Reorder to
synthesize-and-store first, then delete — synthesis failure now leaves
all originals intact.  Update docstring and add
`test_no_deletes_on_synthesis_failure` to document the safety
guarantee.

**Observability (MEDIUM):** Promote `LLM_STRATEGY_SYNTHESIZED` log
call from `logger.debug` to `logger.info` in `_synthesize`.  Per-group
synthesis is a significant memory-mutation event (originals are deleted
and a summary is stored); operators need INFO-level visibility for
production auditing.

**Misleading event name (MAJOR):** `apply_diversity_penalty` was
logging a validation error (`diversity_lambda out of range`) using the
success event `MEMORY_DIVERSITY_RERANKED`.  Add new constant
`MEMORY_DIVERSITY_RERANK_FAILED` to `events/memory.py`, update the
import in `ranking.py`, and update CLAUDE.md event reference list.

https://claude.ai/code/session_015u2sAJYren3pzTAKqeqw3Q
- llm_strategy.py: add logger.error before re-raising MemoryError/
  RecursionError in _fetch_trajectory_context and _process_group
  (CLAUDE.md: all error paths must log before raising)
- agent_engine.py: remove incorrect `# pragma: no cover` from the
  `_memory_backend is None` guard in _try_capture_distillation; the
  path is exercised by test_distillation_without_backend_is_noop
- test_distillation.py: move test_error_without_message_falls_through
  from TestCaptureDistillation to TestBuildOutcome and convert from
  async def to def (tests a synchronous function)

https://claude.ai/code/session_018t73D4YLoCph1fLFqX6gR6
- llm_strategy.py: log ExceptionGroup before re-raising in except* handlers
  so operators can diagnose multi-task failures (CodeRabbit follow-up #4058984940)
- retriever.py: run MemoryFilterStrategy BEFORE MMR diversity re-ranking so
  privacy-excluded entries are not used as MMR anchors (CodeRabbit #4058864502)
- distillation.py: store tool invocation names in content instead of bare count
  so LLM consolidation has richer trajectory context
- test_distillation.py: remove spurious async from synchronous test helper
  test_error_without_message_falls_through (CodeRabbit async test cleanup)
- docs/design/memory.md: update pipeline step ordering to match implementation

https://claude.ai/code/session_011YwDu4DnVXM4yt3w3SVY1t
- distillation.py: fix grammar bug -- produce "1 turn" (singular) vs
  "N turns" (plural) in build_trajectory_summary; previously always
  produced "1 turns" for single-turn executions
- test_distillation.py: update assertion to match corrected output,
  hoist deferred MemoryCategory import to module level, fix import order
- test_tool_retriever.py: hoist repeated per-method _merge_results
  imports to module level (5 identical deferred imports removed)
- test_llm_strategy.py: collapse test_empty_llm_response_falls_back
  and test_whitespace_llm_response_falls_back into a single
  @pytest.mark.parametrize test per CLAUDE.md conventions

https://claude.ai/code/session_011YwDu4DnVXM4yt3w3SVY1t
…tillation capture

Two correctness fixes surfaced by post-push review agent analysis:

- memory/consolidation/llm_strategy.py: _fallback_summary accessed
  entries[0].category.value unconditionally; if _synthesize is called
  with an empty list (possible when a backend dedup bug yields all-same-ID
  entries), this raises IndexError. Guard with early `if not entries: return ""`
  before the header line.

- engine/agent_engine.py: _try_capture_distillation declared agent_id/task_id
  as plain str but passed them to capture_distillation which requires
  NotBlankStr. Added TypeAdapter(NotBlankStr).validate_python() coercion so
  blank-string caller bugs are caught at the boundary rather than silently
  bypassing the NotBlankStr constraint on DistillationRequest fields.

https://claude.ai/code/session_011YwDu4DnVXM4yt3w3SVY1t
Addresses 33 findings from local review agents (14) + CodeRabbit,
Gemini, Copilot on PR #1071. Follow-up items tracked in #1075.

## Behavior + correctness

- `_MIN_GROUP_THRESHOLD` raised from 2 to 3: at threshold 2,
  `_select_entries` kept 1 entry and `_synthesize` received 1, so the
  LLM could not dedup against the retained entry -- "consolidation"
  was just a paraphrase of a single entry.
- Non-retryable `ProviderError` in `_synthesize` is now logged at ERROR
  with agent/category/model context before re-raise (was silently
  propagated with no log).
- `LLM_STRATEGY_SYNTHESIZED` emits AFTER `backend.store()` succeeds,
  includes the persisted `summary_id`, fires from `_process_group`
  rather than inside `_synthesize`.
- `ConsolidationResult.summary_id` -> `summary_ids: tuple[NotBlankStr, ...]`
  so multi-category runs surface every summary produced, not just the
  last one; scalar `summary_id` retained as a `@computed_field`
  returning the last element. `SimpleConsolidationStrategy` and
  `DualModeConsolidationStrategy` updated to populate the plural field.
- `_fetch_trajectory_context` failure path promoted from DEBUG to
  WARNING (ops invisibility at default INFO log level).
- `apply_diversity_penalty` success path promoted from DEBUG to INFO
  to match other state-transition events; added `similarity` field so
  bigram-cached vs custom paths are distinguishable.
- New `LLM_STRATEGY_ERROR` event; `LLM_STRATEGY_FALLBACK` is now
  reserved for genuine concat-fallback returns, and error paths
  (TaskGroup unwrap, trajectory fetch, delete failures, non-retryable
  provider errors) emit the dedicated error event so dashboards keyed
  on fallback do not fire on unrelated errors.
- `distillation.capture_distillation` now persists `Task ID:` in the
  EPISODIC store content so downstream readers can correlate entries
  by task, and the docstring is explicit about the system-error
  propagation contract.
- `ToolBasedInjectionStrategy.__init__` now raises when
  `config.query_reformulation_enabled` is set but either
  `reformulator` or `sufficiency_checker` is missing (silent no-op
  was a configuration trap).
- `MemoryRetrievalConfig` diversity-penalty validator now RAISES on
  non-CONTEXT strategy (was WARN) -- symmetric with the reformulation
  validator; silent no-op on deployment was worse than a hard error.
- Search-and-Ask loop: backend-retrieve failures mid-loop degrade to
  the accumulated entries (was propagating and discarding them), and
  results are truncated to `limit` on return so the tool contract is
  honoured regardless of how many rounds ran.
- Added `MEMORY_REFORMULATION_EXHAUSTED` event for max-rounds and
  reformulator-stopped terminal states (was conflated with per-round
  `MEMORY_REFORMULATION_ROUND`).
- `apply_diversity_penalty` filter-before-MMR ordering preserved
  (already in base branch) and now extracted into a dedicated
  `_filter_or_fail_closed` helper for clarity.
- `_try_capture_distillation` skip paths now log at DEBUG
  (`capture_disabled`, `no_memory_backend`) so operators can debug
  missing distillation entries.

## Security hardening

- Prompt-injection defence: each memory entry in the synthesis user
  prompt is wrapped in `<entry category="...">...</entry>` with
  internal delimiter escaping, and the system prompt explicitly
  instructs the model to treat tag content as data, not instructions.
  Trajectory context snippets get the same treatment.
- Total concatenated user-content length capped at ~20000 chars
  (`_MAX_TOTAL_USER_CONTENT_CHARS`) with WARNING-logged truncation so
  oversized groups do not blow out context windows or cost budgets.

## Refactors for CLAUDE.md function-length rule (<50 lines)

- `LLMConsolidationStrategy.consolidate` (124 -> thin orchestrator over
  `_build_groups`, `_run_groups`, `_log_taskgroup_failure`,
  `_assemble_result`).
- `_synthesize` (~130 -> split across `_build_user_prompt`,
  `_call_llm`, `_handle_provider_error`, `_build_system_prompt`,
  `_fallback_summary`).
- `_process_group` split into `_store_summary` + `_delete_consolidated`.
- `capture_distillation` split into `_build_distillation_request` +
  `_render_store_content`.
- `ToolBasedInjectionStrategy._reformulation_loop` split into
  `_run_reformulation_step` + `_check_sufficiency` + `_reformulate` +
  `_retrieve_round` helpers.
- `_handle_search` split with `_safe_search` error-isolation helper.
- `ContextInjectionStrategy._execute_pipeline` extracts
  `_filter_or_fail_closed`.

## Documentation

- `AgentEngine` class docstring now documents `distillation_capture_enabled`
  and notes that `memory_backend` is used by both procedural memory
  and distillation.
- `MemoryRetrievalConfig` docstring Attributes block includes the four
  new fields (`diversity_penalty_enabled`, `diversity_lambda`,
  `query_reformulation_enabled`, `max_reformulation_rounds`).
- `docs/design/memory.md` Consolidation Strategies row expanded with
  fallback paths, kept-entry behaviour, trajectory context limits,
  prompt-injection defence, and multi-summary accessor note.
- Module docstring of `distillation.py` clarifies "tool *names*, not
  entry IDs".
- `TestExtractMemoryIds` test class renamed to
  `TestExtractMemoryToolInvocations` (stale name after field rename).

## Tests

- New `tests/unit/memory/test_retriever_diversity.py` with
  integration tests that verify `diversity_penalty_enabled=True` is
  actually wired into `ContextInjectionStrategy._execute_pipeline`
  and produces a different ordering from the disabled case.
- New `test_store_failure_after_synthesis_propagates_no_deletes` test
  guards the synthesize-before-delete invariant against future
  regressions.
- New `test_trajectory_context_entry_limit_and_truncation` test pins
  `_MAX_TRAJECTORY_CONTEXT_ENTRIES=5` and per-entry truncation to
  `_MAX_TRAJECTORY_CHARS_PER_ENTRY=500`.
- Updated `test_llm_strategy.py` threshold tests for the new
  `_MIN_GROUP_THRESHOLD=3` rule.
- Updated `test_retrieval_config.py` diversity validator test for the
  new raise semantics.
- Updated `test_tool_retriever.py` to exercise the new
  ctor-time reformulator/sufficiency-checker guard.
- Updated `test_distillation.py` store-shape assertion for the new
  `Task ID:` line.
- Added `test_multi_summary` + `test_summary_ids_reject_duplicates`
  coverage for the `ConsolidationResult.summary_ids` plural field.

## Test + lint status

- `uv run ruff check src/ tests/` -- clean
- `uv run mypy src/ tests/` -- 1539 source files, no issues
- `uv run python -m pytest tests/unit/ -n 8` -- 14154 passed, 5 skipped

Closes #704.
@Aureliolo Aureliolo force-pushed the feat/memory-consolidation-upgrades branch from e659544 to 65df4b3 Compare April 5, 2026 12:28
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview April 5, 2026 12:29 — with GitHub Actions Inactive
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/synthorg/memory/consolidation/service.py (1)

116-122: 🛠️ Refactor suggestion | 🟠 Major

Use model_copy(update=...) to evolve ConsolidationResult instead of field-by-field reconstruction.

This manual rebuild is drift-prone: summary_ids was previously dropped here, and the next field added to ConsolidationResult will be easy to lose again. Using model_copy(update=...) preserves all fields automatically and aligns with the established pattern throughout the codebase for evolving frozen Pydantic models.

Suggested change
-                result = ConsolidationResult(
-                    removed_ids=result.removed_ids,
-                    summary_ids=result.summary_ids,
-                    archived_count=archived,
-                    mode_assignments=result.mode_assignments,
-                    archival_index=index,
-                )
+                result = result.model_copy(
+                    update={
+                        "archived_count": archived,
+                        "archival_index": index,
+                    }
+                )

Per coding guidelines: "Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/consolidation/service.py` around lines 116 - 122, Replace
the manual field-by-field reconstruction of the ConsolidationResult with a
model_copy(update=...) call on the existing result object to preserve all fields
automatically; e.g. call result = result.model_copy(update={"archived_count":
archived, "archival_index": index}) instead of creating a new
ConsolidationResult(...), so fields like summary_ids and any future fields are
not accidentally dropped (use the existing result variable and the
ConsolidationResult model_copy method).
src/synthorg/memory/tool_retriever.py (1)

239-256: ⚠️ Potential issue | 🟠 Major

Validate query as a string before returning it.

This helper can currently hand back 123 or {...} as query_text because it only checks truthiness plus str(...).strip(). _handle_search() then calls len(query_text) on Line 441 and can raise before _safe_search() converts the bad tool call into Error: query must be a non-empty string. Normalize to a stripped str or reject non-strings here.

🛠️ Suggested fix
-    query_text = arguments.get("query", "")
-    if not query_text or not str(query_text).strip():
+    query_raw = arguments.get("query", "")
+    if not isinstance(query_raw, str):
+        return None, 0, None, ()
+    query_text = query_raw.strip()
+    if not query_text:
         return None, 0, None, ()

Also applies to: 430-441

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/tool_retriever.py` around lines 239 - 256, The helper
currently only checks truthiness and str(...).strip(), allowing non-string query
values like ints or dicts to slip through and later cause len(query_text) errors
in _handle_search; update the validation around query_text so you first check
its type and normalize: if arguments.get("query") is not an instance of str,
return the same failure tuple (None, 0, None, ()); otherwise set query_text =
arguments["query"].strip() and then proceed with the empty check and remaining
logic; reference the local variable query_text and the downstream callers like
_handle_search/_safe_search when making this change.
♻️ Duplicate comments (2)
src/synthorg/engine/agent_engine.py (1)

510-534: ⚠️ Potential issue | 🟠 Major

Budget and fatal exits still bypass distillation capture.

_try_capture_distillation() only runs from _post_execution_pipeline(). The BudgetExhaustedError and generic fatal branches return directly from run(), so those completion outcomes still never create distillation entries.

🛠️ One straightforward fix
             except BudgetExhaustedError as exc:
-                return self._handle_budget_error(
+                result = self._handle_budget_error(
                     exc=exc,
                     identity=identity,
                     task=task,
                     agent_id=agent_id,
                     task_id=task_id,
                     duration_seconds=time.monotonic() - start,
                     ctx=ctx,
                     system_prompt=system_prompt,
                 )
+                await self._try_capture_distillation(
+                    result.execution_result,
+                    agent_id,
+                    task_id,
+                )
+                return result
             except Exception as exc:
-                return await self._handle_fatal_error(
+                result = await self._handle_fatal_error(
                     exc=exc,
                     identity=identity,
                     task=task,
                     agent_id=agent_id,
                     task_id=task_id,
                     duration_seconds=time.monotonic() - start,
                     ctx=ctx,
                     system_prompt=system_prompt,
                     completion_config=completion_config,
                     effective_autonomy=effective_autonomy,
                     provider=provider,
                 )
+                await self._try_capture_distillation(
+                    result.execution_result,
+                    agent_id,
+                    task_id,
+                )
+                return result

Also applies to: 717-721

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/engine/agent_engine.py` around lines 510 - 534, The
BudgetExhaustedError and generic Exception branches in run() return via
_handle_budget_error and _handle_fatal_error without ever invoking
_try_capture_distillation(), so distillation entries are never created; update
those exception handlers in run() to call _try_capture_distillation(...) (with
the same args used in normal completion) before returning, or refactor so both
_handle_budget_error and _handle_fatal_error themselves invoke
_try_capture_distillation() (or call _post_execution_pipeline() which calls
_try_capture_distillation()), ensuring that any early-return paths
(BudgetExhaustedError branch and the generic Exception branch) perform the same
distillation capture as the normal flow.
tests/unit/memory/test_tool_retriever.py (1)

691-709: 🧹 Nitpick | 🔵 Trivial

Collapse the collision cases into one parametrized test.

These three methods only vary the input scores. One parametrized test keeps the same coverage and is easier to extend with the remaining edge cases here.

♻️ Proposed refactor
-    def test_higher_relevance_wins_on_collision(self) -> None:
-        low = _make_entry(entry_id="dup", relevance_score=0.3)
-        high = _make_entry(entry_id="dup", relevance_score=0.9)
-        result = _merge_results((low,), (high,))
-        assert len(result) == 1
-        assert result[0].relevance_score == 0.9
-
-    def test_existing_wins_when_new_is_lower(self) -> None:
-        high = _make_entry(entry_id="dup", relevance_score=0.9)
-        low = _make_entry(entry_id="dup", relevance_score=0.3)
-        result = _merge_results((high,), (low,))
-        assert len(result) == 1
-        assert result[0].relevance_score == 0.9
-
-    def test_none_relevance_treated_as_zero(self) -> None:
-        none_rel = _make_entry(entry_id="dup", relevance_score=None)
-        with_rel = _make_entry(entry_id="dup", relevance_score=0.1)
-        result = _merge_results((none_rel,), (with_rel,))
-        assert result[0].relevance_score == 0.1
+    `@pytest.mark.parametrize`(
+        ("existing_score", "new_score", "expected_score"),
+        [
+            pytest.param(0.3, 0.9, 0.9, id="new_higher_wins"),
+            pytest.param(0.9, 0.3, 0.9, id="existing_higher_kept"),
+            pytest.param(None, 0.1, 0.1, id="none_treated_as_zero"),
+        ],
+    )
+    def test_collision_relevance_resolution(
+        self,
+        existing_score: float | None,
+        new_score: float | None,
+        expected_score: float | None,
+    ) -> None:
+        existing = _make_entry(entry_id="dup", relevance_score=existing_score)
+        new = _make_entry(entry_id="dup", relevance_score=new_score)
+        result = _merge_results((existing,), (new,))
+        assert len(result) == 1
+        assert result[0].relevance_score == expected_score

As per coding guidelines, "Parametrize: Prefer @pytest.mark.parametrize for testing similar cases".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/memory/test_tool_retriever.py` around lines 691 - 709, Collapse
the three tests into one parametrized pytest test that exercises _merge_results
via inputs built with _make_entry: use `@pytest.mark.parametrize` with cases for
(existing_entries, new_entries, expected_score) covering (low then high => 0.9),
(high then low => 0.9), and (none then 0.1 => 0.1); in the test call
_merge_results(existing_entries, new_entries), assert len(result) == 1 and
result[0].relevance_score == expected_score so all three behaviors (higher
relevance wins, existing wins when new is lower, None treated as zero) are
covered in a single data-driven test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/design/memory.md`:
- Around line 419-423: Clarify the failure contract for LLMConsolidationStrategy
by explicitly separating which errors trigger the concat-fallback path versus
which errors propagate: state that concat-fallback is invoked for
RetryExhaustedError, retryable ProviderError, empty LLM response, and unexpected
non-ProviderError exceptions (logged at WARNING and tagged "concat-fallback"),
whereas non-retryable ProviderError is logged at ERROR and propagated, and
system errors (MemoryError, RecursionError) likewise propagate; reference
LLMConsolidationStrategy, RetryExhaustedError, ProviderError, the
"llm-synthesized" tag, and the "concat-fallback" tag when making the wording
change.

In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 378-405: The code currently deletes every entry returned by
_select_entries even when _build_user_prompt dropped some due to
_MAX_TOTAL_USER_CONTENT_CHARS; update the flow so only entries actually included
in the LLM prompt are deleted and omitted entries are preserved or processed in
subsequent chunks. Concretely: have _build_user_prompt (or _select_entries)
return both included_entries and omitted_entries (or make _synthesize return
which items were consumed), change _process_group to pass the included subset
into _synthesize and only pass the included_ids into _delete_consolidated; if
omitted_entries remain, loop to synthesize/delete them in additional chunks or
queue them for a fallback summary. Apply the same change to the other occurrence
noted (around the second block referencing lines 543-569) so deletion only
targets summarized items and omitted items are retained/processed.
- Around line 330-335: The retrieve call assumes recency ordering but
MemoryBackend.retrieve() returns relevance-ordered; change the logic around
MemoryQuery/ self._backend.retrieve to fetch the distillation entries
(remove/raise the limit so you get all relevant metadata results), then locally
sort the returned entries by their metadata.created_at in descending order and
slice to _MAX_TRAJECTORY_CONTEXT_ENTRIES before returning; reference
MemoryQuery, _DISTILLATION_TAG, _MAX_TRAJECTORY_CONTEXT_ENTRIES, and
self._backend.retrieve and ensure you use the metadata.created_at field for the
sort.

In `@src/synthorg/memory/consolidation/models.py`:
- Around line 109-112: Add a Pydantic "before" validator on the summary_ids
field to accept legacy summary_id input and map it into summary_ids;
specifically, add a `@field_validator`("summary_ids", mode="before") (or
equivalent) on the ConsolidationResult model that checks incoming data for a
"summary_id" key (or a single string), converts it into a one-element tuple of
NotBlankStr when summary_ids is empty/not provided, and otherwise returns the
original summary_ids value unchanged so existing construction via summary_ids
continues to work.

In `@src/synthorg/memory/ranking.py`:
- Around line 473-481: The guard in apply_diversity_penalty() allows
float("nan") through because NaN comparisons are false; explicitly check for NaN
(e.g., using math.isnan(diversity_lambda) or the self-inequality check) before
the existing range check and treat it like an invalid value: call logger.warning
with MEMORY_DIVERSITY_RERANK_FAILED and the same param/value/reason information
and then raise ValueError(msg) so NaN no longer bypasses the validation for
diversity_lambda.

In `@src/synthorg/memory/retrieval_config.py`:
- Around line 153-161: The docstring for the iterative reformulation flag is out
of date: replace the phrase that it "silently no-ops if either is missing" with
wording that it will raise a constructor-time error when either `reformulator`
or `sufficiency_checker` is not provided (fail-fast behavior) for
`ToolBasedInjectionStrategy` construction; keep the rest of the description
about enabling iterative reformulation, `max_reformulation_rounds`, requirement
that this applies only to the `TOOL_BASED` strategy, and that a validator
rejects this flag with strategies other than `TOOL_BASED`, and reference
`ToolBasedInjectionStrategy` and the constructor validation behavior to locate
the relevant code.

In `@src/synthorg/memory/tool_retriever.py`:
- Around line 165-216: The current _merge_results keeps first-round ordering and
appends later unseen entries so _truncate_entries may drop better later results;
change the merge to perform a cross-round re-ranking before truncation: combine
existing and new into a single map by id (as _merge_results already does), then
produce a list of unique MemoryEntry values sorted by (relevance_score or 0.0)
descending with a stable tiebreaker (e.g., original appearance order or a
timestamp) so later higher-relevance unseen entries can displace lower-relevance
earlier ones, and then let _truncate_entries enforce the caller limit; update
_merge_results (and any callers relying on its ordering) to accept this
re-ranked ordering while still deduping by MemoryEntry.id and preserving
stability for equal scores.

In `@tests/unit/memory/test_ranking_diversity.py`:
- Around line 179-192: The test test_custom_similarity_fn_used is a false
positive because single-word contents yield default bigram Jaccard 0.0; update
the test so the inputs are multi-word (e.g., "foo bar" vs "baz bar") or
otherwise ensure default similarity would be non-zero, then assert that passing
similarity_fn=lambda _x,_y: 1.0 changes ranking; modify the call that constructs
entries via _make_scored (entries a and b) and the apply_diversity_penalty
invocation to use multi-word content so the custom similarity_fn is actually
exercised.

In `@tests/unit/memory/test_retriever_diversity.py`:
- Around line 146-169: The test test_diversity_penalty_disabled_skips_mmr
currently only checks ordering and can pass even if MMR ran; update it to spy on
synthorg.memory.retriever.apply_diversity_penalty and assert it is never called
when MemoryRetrievalConfig(diversity_penalty_enabled=False) is used. Use a
mocking/spying tool (e.g., unittest.mock.patch or pytest-mock) to patch
apply_diversity_penalty before creating ContextInjectionStrategy and calling
strategy.prepare_messages, then after the call assert the spy/mock's
assert_not_called (or equivalent) to verify apply_diversity_penalty was skipped.
Ensure the spy references the exact symbol name apply_diversity_penalty to catch
calls from ContextInjectionStrategy.

---

Outside diff comments:
In `@src/synthorg/memory/consolidation/service.py`:
- Around line 116-122: Replace the manual field-by-field reconstruction of the
ConsolidationResult with a model_copy(update=...) call on the existing result
object to preserve all fields automatically; e.g. call result =
result.model_copy(update={"archived_count": archived, "archival_index": index})
instead of creating a new ConsolidationResult(...), so fields like summary_ids
and any future fields are not accidentally dropped (use the existing result
variable and the ConsolidationResult model_copy method).

In `@src/synthorg/memory/tool_retriever.py`:
- Around line 239-256: The helper currently only checks truthiness and
str(...).strip(), allowing non-string query values like ints or dicts to slip
through and later cause len(query_text) errors in _handle_search; update the
validation around query_text so you first check its type and normalize: if
arguments.get("query") is not an instance of str, return the same failure tuple
(None, 0, None, ()); otherwise set query_text = arguments["query"].strip() and
then proceed with the empty check and remaining logic; reference the local
variable query_text and the downstream callers like _handle_search/_safe_search
when making this change.

---

Duplicate comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 510-534: The BudgetExhaustedError and generic Exception branches
in run() return via _handle_budget_error and _handle_fatal_error without ever
invoking _try_capture_distillation(), so distillation entries are never created;
update those exception handlers in run() to call _try_capture_distillation(...)
(with the same args used in normal completion) before returning, or refactor so
both _handle_budget_error and _handle_fatal_error themselves invoke
_try_capture_distillation() (or call _post_execution_pipeline() which calls
_try_capture_distillation()), ensuring that any early-return paths
(BudgetExhaustedError branch and the generic Exception branch) perform the same
distillation capture as the normal flow.

In `@tests/unit/memory/test_tool_retriever.py`:
- Around line 691-709: Collapse the three tests into one parametrized pytest
test that exercises _merge_results via inputs built with _make_entry: use
`@pytest.mark.parametrize` with cases for (existing_entries, new_entries,
expected_score) covering (low then high => 0.9), (high then low => 0.9), and
(none then 0.1 => 0.1); in the test call _merge_results(existing_entries,
new_entries), assert len(result) == 1 and result[0].relevance_score ==
expected_score so all three behaviors (higher relevance wins, existing wins when
new is lower, None treated as zero) are covered in a single data-driven test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 28d77bd7-7f7b-4252-a669-72e19566a682

📥 Commits

Reviewing files that changed from the base of the PR and between e659544 and 65df4b3.

📒 Files selected for processing (29)
  • CLAUDE.md
  • docs/design/memory.md
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/consolidation/__init__.py
  • src/synthorg/memory/consolidation/distillation.py
  • src/synthorg/memory/consolidation/dual_mode_strategy.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/consolidation/models.py
  • src/synthorg/memory/consolidation/service.py
  • src/synthorg/memory/consolidation/simple_strategy.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/retriever.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/tools.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/observability/events/memory.py
  • tests/unit/engine/test_agent_engine_distillation.py
  • tests/unit/memory/consolidation/test_distillation.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/consolidation/test_models.py
  • tests/unit/memory/consolidation/test_service.py
  • tests/unit/memory/test_memory_tools.py
  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/memory/test_retrieval_config.py
  • tests/unit/memory/test_retriever.py
  • tests/unit/memory/test_retriever_diversity.py
  • tests/unit/memory/test_tool_retriever.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Build Sandbox
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations -- Python 3.14 has PEP 649
Use PEP 758 except syntax: use except A, B: (no parentheses) -- ruff enforces this on Python 3.14
Type hints: all public functions, mypy strict mode
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules)
Immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves
Never mix static config fields with mutable runtime fields in one model
Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff)
Functions: < 50 lines, files < 800 lines
Errors: handle explicitly, never silently swallow
Validate: at system boundaries (user input, external APIs, config files)
Use ruff check and ruff format for li...

Files:

  • src/synthorg/memory/consolidation/service.py
  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/consolidation/test_service.py
  • tests/unit/memory/test_memory_tools.py
  • tests/unit/memory/test_retriever.py
  • src/synthorg/memory/tools.py
  • src/synthorg/memory/consolidation/dual_mode_strategy.py
  • src/synthorg/memory/consolidation/__init__.py
  • src/synthorg/memory/consolidation/simple_strategy.py
  • src/synthorg/memory/consolidation/models.py
  • tests/unit/engine/test_agent_engine_distillation.py
  • tests/unit/memory/test_ranking_diversity.py
  • src/synthorg/memory/retriever.py
  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/test_retrieval_config.py
  • src/synthorg/observability/events/consolidation.py
  • tests/unit/memory/test_retriever_diversity.py
  • tests/unit/memory/consolidation/test_models.py
  • src/synthorg/engine/agent_engine.py
  • tests/unit/memory/consolidation/test_distillation.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/consolidation/distillation.py
  • tests/unit/memory/test_tool_retriever.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code (exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr))
Variable name for logger: always logger (not _logger, not log)
Event names: always use constants from the domain-specific module under synthorg.observability.events (imported directly, never as string literals)
Structured kwargs: always logger.info(EVENT, key=value) -- never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO
DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases

Files:

  • src/synthorg/memory/consolidation/service.py
  • src/synthorg/memory/tools.py
  • src/synthorg/memory/consolidation/dual_mode_strategy.py
  • src/synthorg/memory/consolidation/__init__.py
  • src/synthorg/memory/consolidation/simple_strategy.py
  • src/synthorg/memory/consolidation/models.py
  • src/synthorg/memory/retriever.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/consolidation/distillation.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/memory/consolidation/service.py
  • src/synthorg/memory/tools.py
  • src/synthorg/memory/consolidation/dual_mode_strategy.py
  • src/synthorg/memory/consolidation/__init__.py
  • src/synthorg/memory/consolidation/simple_strategy.py
  • src/synthorg/memory/consolidation/models.py
  • src/synthorg/memory/retriever.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/consolidation/distillation.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use test-provider, test-small-001, etc. instead of real vendor names
Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Async: asyncio_mode = "auto" -- no manual @pytest.mark.asyncio needed
Timeout: 30 seconds per test (global in pyproject.toml -- do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed)
Parametrize: Prefer @pytest.mark.parametrize for testing similar cases
Property-based testing with Hypothesis: use @given + @settings, Hypothesis profiles in tests/conftest.py with ci, dev, fuzz, extreme profiles controlled via HYPOTHESIS_PROFILE env var
When Hypothesis finds a failure, it is a real bug -- fix the underlying bug and add an explicit @example(...) decorator to the test
Flaky tests: NEVER skip, dismiss, or ignore -- always fix them fully. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number)

Files:

  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/consolidation/test_service.py
  • tests/unit/memory/test_memory_tools.py
  • tests/unit/memory/test_retriever.py
  • tests/unit/engine/test_agent_engine_distillation.py
  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/memory/test_retrieval_config.py
  • tests/unit/memory/test_retriever_diversity.py
  • tests/unit/memory/consolidation/test_models.py
  • tests/unit/memory/consolidation/test_distillation.py
  • tests/unit/memory/test_tool_retriever.py
  • tests/unit/memory/consolidation/test_llm_strategy.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/consolidation/test_service.py
  • tests/unit/memory/test_memory_tools.py
  • tests/unit/memory/test_retriever.py
  • tests/unit/engine/test_agent_engine_distillation.py
  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/memory/test_retrieval_config.py
  • tests/unit/memory/test_retriever_diversity.py
  • tests/unit/memory/consolidation/test_models.py
  • tests/unit/memory/consolidation/test_distillation.py
  • tests/unit/memory/test_tool_retriever.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
🧠 Learnings (48)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)

Applied to files:

  • src/synthorg/memory/tools.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability includes structured logging via `get_logger(__name__)`, correlation tracking, and log sinks.

Applied to files:

  • src/synthorg/memory/tools.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging

Applied to files:

  • src/synthorg/memory/tools.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module

Applied to files:

  • src/synthorg/memory/tools.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.

Applied to files:

  • src/synthorg/memory/tools.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.

Applied to files:

  • src/synthorg/memory/tools.py
  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • src/synthorg/memory/consolidation/__init__.py
  • src/synthorg/memory/consolidation/simple_strategy.py
  • tests/unit/memory/test_ranking_diversity.py
  • src/synthorg/memory/retriever.py
  • CLAUDE.md
  • src/synthorg/observability/events/memory.py
  • tests/unit/memory/test_retriever_diversity.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/tool_retriever.py
  • docs/design/memory.md
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/consolidation/distillation.py
  • tests/unit/memory/test_tool_retriever.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-17T11:41:02.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:41:02.964Z
Learning: Applies to src/**/*.py : Models: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing + validating redundant fields. Use `NotBlankStr` for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)

Applied to files:

  • tests/unit/engine/test_agent_engine_distillation.py
  • CLAUDE.md
  • src/synthorg/engine/agent_engine.py
  • src/synthorg/memory/consolidation/distillation.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...

Applied to files:

  • tests/unit/engine/test_agent_engine_distillation.py
  • CLAUDE.md
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).

Applied to files:

  • tests/unit/engine/test_agent_engine_distillation.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/core/**/*.py : Core module must contain shared domain models, base classes, resilience config (RetryConfig, RateLimiterConfig)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to docs/** : Docs source in docs/ (Markdown, built with Zensical); design spec in docs/design/ (7 pages: index, agents, organization, communication, engine, memory, operations)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Documentation source in `docs/` (Markdown, built with Zensical). Design spec in `docs/design/` (7 pages: index, agents, organization, communication, engine, memory, operations). Architecture in `docs/architecture/` (overview, tech-stack, decision log). Roadmap in `docs/roadmap/`. Security in `docs/security.md`. Licensing in `docs/licensing.md`. Reference in `docs/reference/`. REST API reference in `docs/rest-api.md`. Library reference in `docs/api/` (auto-generated from docstrings). Custom templates in `docs/overrides/`. Config in `mkdocs.yml`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine must provide: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection), promotion/demotion

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have `from synthorg.observability import get_logger` followed by `logger = get_logger(__name__)`.

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import logger via `from synthorg.observability import get_logger` and initialize with `logger = get_logger(__name__)`

Applied to files:

  • CLAUDE.md
📚 Learning: 2026-04-05T07:25:14.510Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-05T07:25:14.510Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under `synthorg.observability.events` (imported directly, never as string literals)

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/observability/events/memory.py
  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/synthorg/observability/events/consolidation.py
  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals

Applied to files:

  • src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.

Applied to files:

  • src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

  • tests/unit/memory/consolidation/test_distillation.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T06:12:03.047Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T06:12:03.047Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Keep functions under 50 lines and files under 800 lines

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Functions must be less than 50 lines; files must be less than 800 lines

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-05T07:25:14.510Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-05T07:25:14.510Z
Learning: Applies to **/*.py : Functions: < 50 lines, files < 800 lines

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/memory/**/*.py : Use MemoryBackend protocol with pluggable backends (Mem0 adapter available at backends/mem0/) for persistent agent memory

Applied to files:

  • src/synthorg/memory/consolidation/distillation.py
🪛 LanguageTool
docs/design/memory.md

[grammar] ~427-~427: Ensure spelling is correct
Context: ... 3 -- smaller groups cannot meaningfully dedup against the retained entry), `temperatu...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (1)
src/synthorg/memory/tools.py (1)

37-45: No action required—the concern is based on a false premise.

The code is safe from the original concern. All successful search_memory and recall_memory responses pass through _format_entries(), which formats entries as [{category}]{score} {memory_content} or returns "No memories found." for empty results. User-controlled memory content never appears at the start of the response string. Error paths are explicitly marked with ERROR_PREFIX or one of the predefined constants. The test test_no_memories_found_is_not_error() confirms that the empty-results success case is correctly classified as non-error, and the prefix-based check is both sufficient and correct.

Comment on lines +109 to 112
summary_ids: tuple[NotBlankStr, ...] = Field(
default=(),
description="IDs of every summary entry produced during the run",
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
python -m pip install --quiet 'pydantic>=2,<3'
python - <<'PY'
from pydantic import BaseModel, computed_field

class ConsolidationResult(BaseModel):
    summary_ids: tuple[str, ...] = ()

    `@computed_field`
    `@property`
    def summary_id(self) -> str | None:
        return self.summary_ids[-1] if self.summary_ids else None

result = ConsolidationResult(summary_id="legacy-id")
print(result.model_dump())
assert result.summary_ids == ()
assert result.summary_id is None
print("verified: legacy summary_id input is silently ignored")
PY

Repository: Aureliolo/synthorg

Length of output: 155


🏁 Script executed:

# First, locate and examine the file in question
fd "consolidation/models.py" --type f

Repository: Aureliolo/synthorg

Length of output: 44


🏁 Script executed:

# Search for ConsolidationResult and summary_id usage patterns in the codebase
rg "ConsolidationResult" -A 3 -B 1 --type py | head -100

Repository: Aureliolo/synthorg

Length of output: 7403


🏁 Script executed:

# Search for any calls using summary_id parameter
rg "summary_id\s*=" -B 2 -A 1 --type py | head -100

Repository: Aureliolo/synthorg

Length of output: 3455


🏁 Script executed:

# Try different search approaches for the models file
find . -type f -name "models.py" -path "*/consolidation/*" 2>/dev/null

Repository: Aureliolo/synthorg

Length of output: 107


🏁 Script executed:

# Also search with rg for the file content around line 109-112
rg "summary_ids.*tuple\[NotBlankStr" -B 5 -A 5 --type py

Repository: Aureliolo/synthorg

Length of output: 877


🏁 Script executed:

# Check test_models.py to see what tests exist
rg "test_models\.py" --files | head -5

Repository: Aureliolo/synthorg

Length of output: 117


🏁 Script executed:

# Read the models.py file to see the full ConsolidationResult class
cat -n ./src/synthorg/memory/consolidation/models.py | head -200

Repository: Aureliolo/synthorg

Length of output: 8726


🏁 Script executed:

# Check for test_models.py more directly
find . -name "test_models.py" -path "*/consolidation/*" 2>/dev/null

Repository: Aureliolo/synthorg

Length of output: 110


🏁 Script executed:

# Search for any usage pattern that passes summary_id as a constructor argument
rg "ConsolidationResult\s*\(" -A 3 --type py | grep -E "(summary_id\s*=|ConsolidationResult\s*\()" | head -30

Repository: Aureliolo/synthorg

Length of output: 2724


🏁 Script executed:

# Read test_models.py to see what backward compatibility tests exist
cat -n ./tests/unit/memory/consolidation/test_models.py | head -250

Repository: Aureliolo/synthorg

Length of output: 9784


🏁 Script executed:

# Also search for any validators in the file that might handle legacy summary_id
rg "model_validator|summary_id" ./src/synthorg/memory/consolidation/models.py -B 2 -A 5

Repository: Aureliolo/synthorg

Length of output: 2829


Add a mode="before" validator to handle legacy summary_id construction.

The current implementation silently ignores ConsolidationResult(summary_id="value"), resulting in summary_ids=() and summary_id=None. While no current code in the codebase uses this pattern, the docstring claims backward compatibility "with callers reading" the property—implying construction with summary_id= may have been part of the API contract. To preserve any legacy calling code, add a mode="before" validator that maps summary_id input to summary_ids.

♻️ Compatibility shim
+    `@model_validator`(mode="before")
+    `@classmethod`
+    def _coerce_legacy_summary_id(cls, data: object) -> object:
+        if not isinstance(data, dict):
+            return data
+        legacy_summary_id = data.get("summary_id")
+        if legacy_summary_id is None:
+            return data
+        if "summary_ids" in data:
+            msg = "provide either summary_id or summary_ids, not both"
+            raise ValueError(msg)
+        return {**data, "summary_ids": (legacy_summary_id,)}
+
     `@model_validator`(mode="after")
     def _validate_archival_consistency(self) -> Self:  # noqa: C901
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/synthorg/memory/consolidation/models.py` around lines 109 - 112, Add a
Pydantic "before" validator on the summary_ids field to accept legacy summary_id
input and map it into summary_ids; specifically, add a
`@field_validator`("summary_ids", mode="before") (or equivalent) on the
ConsolidationResult model that checks incoming data for a "summary_id" key (or a
single string), converts it into a one-element tuple of NotBlankStr when
summary_ids is empty/not provided, and otherwise returns the original
summary_ids value unchanged so existing construction via summary_ids continues
to work.

## Critical

- Prompt-cap overflow no longer deletes unsummarized entries:
  ``_build_user_prompt`` now returns ``(prompt_text, included_entries)``
  so ``_process_group`` knows exactly which entries were in the LLM
  prompt and only deletes those.  Dropped entries remain on the
  backend for the next consolidation pass instead of being silently
  lost when the group exceeds ``_MAX_TOTAL_USER_CONTENT_CHARS``.
  ``_synthesize`` now returns ``(summary, used_llm, summarized)``.

## Major

- ``_merge_results`` sorts by relevance after merging so later
  reformulation rounds can actually surface improved matches.  The
  previous policy preserved first-round order and appended unseen
  later hits at the end, which meant ``_truncate_entries`` always
  dropped them -- Search-and-Ask was effectively a no-op when round 1
  already returned ``limit`` entries.
- ``ConsolidationResult.model_config`` gained ``extra='forbid'`` so
  legacy callers passing the retired ``summary_id=`` kwarg hit a
  hard ``ValidationError`` instead of silently dropping the value
  (the pre-alpha ``feedback_no_backward_compat`` rule applies).
- ``_fetch_trajectory_context`` over-fetches 4x then sorts locally by
  ``created_at`` descending before slicing to
  ``_MAX_TRAJECTORY_CONTEXT_ENTRIES``.  The ``MemoryBackend.retrieve``
  contract is relevance-ordered, not recency-ordered, so the previous
  slice could feed stale task distillations into the synthesis prompt.

## Minor

- ``apply_diversity_penalty`` now rejects ``NaN``/``Inf`` for
  ``diversity_lambda`` via ``math.isfinite``.  Previously ``float('nan')``
  slipped past the ``<0 or >1`` guard because both comparisons are
  false, and MMR silently returned a "successful" rerank.
- ``MemoryRetrievalConfig.query_reformulation_enabled`` field
  description now documents the fail-fast constructor contract (was
  still saying "silently no-ops if either is missing", which
  contradicts the ctor validator landed in the previous commit).
- ``docs/design/memory.md`` Consolidation Strategies row separates
  concat-fallback paths from propagating paths; previously
  "retryable ProviderError surfaced directly" was inside the
  fallback list and "non-retryable propagates" was in the same
  sentence, making the failure contract ambiguous.

## Test updates

- ``test_retriever_diversity.py``: switch to ``monkeypatch``-based
  stubs for ``apply_diversity_penalty`` so both enabled and disabled
  cases prove the correct call path directly.  The previous enabled
  test relied on MMR tie-break details and was brittle; the disabled
  test would pass even if the flag were ignored.
- ``test_ranking_diversity.py::test_custom_similarity_fn_used``:
  replaced the no-op spy with one that records calls AND makes the
  reordering observable.  The previous test would pass even if
  ``similarity_fn`` were ignored entirely (both sides evaluated to
  0.0 via the default bigram Jaccard on single-word inputs).
- ``test_tool_retriever.py::TestMergeResults``: parametrized the
  collision tests into a single table-driven test, and added
  ``test_merge_reorders_by_relevance`` as a regression guard for the
  new relevance-sorted merge policy.
- ``test_llm_strategy.py::test_trajectory_context_entry_limit_and_truncation``:
  relaxed the ``query.limit`` assertion to ``>=`` since the new
  over-fetch factor means the backend query asks for more than the
  final slice size.

## Status

- ``uv run ruff check src/ tests/`` -- clean
- ``uv run mypy src/ tests/`` -- 1539 source files, no issues
- ``uv run python -m pytest tests/unit/ -n 8`` -- 14157 passed, 5 skipped
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
src/synthorg/memory/consolidation/models.py (1)

106-106: ⚠️ Potential issue | 🟠 Major

summary_id no longer round-trips through model_dump().

@computed_field serializes summary_id, but extra="forbid" rejects that same key on input. ConsolidationResult.model_validate(result.model_dump()) will now fail, which is a nasty surprise for any persistence or copy-through-JSON path. Either keep summary_id as a plain @property, or accept a narrow summary_id -> summary_ids shim during validation.

Expected result: the dumped payload contains summary_id, and re-validating it raises an extra-field validation error.

#!/bin/bash
python - <<'PY'
from pydantic import BaseModel, ConfigDict, ValidationError, computed_field

class ConsolidationResult(BaseModel):
    model_config = ConfigDict(extra="forbid")
    summary_ids: tuple[str, ...] = ()

    `@computed_field`
    `@property`
    def summary_id(self) -> str | None:
        return self.summary_ids[-1] if self.summary_ids else None

result = ConsolidationResult(summary_ids=("summary-1",))
payload = result.model_dump()
print(payload)

try:
    ConsolidationResult.model_validate(payload)
except ValidationError as exc:
    print(type(exc).__name__)
    print(exc)
PY

Also applies to: 184-193

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 199-205: The code calls _fetch_trajectory_context() before
determining whether any consolidation work exists, causing unnecessary backend
lookups; change the order so you call self._build_groups(entries) first, assign
to groups_to_process, return early if groups_to_process is empty (no-op), and
only then call await self._fetch_trajectory_context(agent_id) and proceed to
await self._run_groups(groups_to_process, agent_id, trajectory_context); update
references to trajectory_context and group_results accordingly to avoid
executing the fetch when there is nothing to consolidate.

In `@src/synthorg/memory/tool_retriever.py`:
- Around line 301-305: Update the public docstring for the class/constructor to
reflect the actual constructor contract: state that if
query_reformulation_enabled is True then both a QueryReformulator (reformulator)
and a sufficiency_checker are required and __init__ will raise a ValueError when
either collaborator is missing; do the same change for the second docstring
block that documents these args (mentions around the existing reformulator and
sufficiency_checker descriptions) and reference the __init__ behavior and the
query_reformulation_enabled flag explicitly.
- Around line 250-252: _parse_search_args() currently allows truthy non-string
values (e.g., {"query": 123}) which later causes errors in _handle_search();
update _parse_search_args() so it rejects non-string query values up-front:
replace the existing check with an explicit isinstance(query_text, str) test and
only accept when isinstance(...) and str(query_text).strip() is non-empty,
otherwise return the same sentinel (None, 0, None, ()). Apply the identical
change to the other parsing block mentioned (the similar logic around lines
448-453) so both _parse_search_args() and the second parser reject non-strings
before downstream code (_handle_search()/_safe_search()) uses len() or other
string ops.
- Around line 489-497: The except handlers that currently do "except
builtins.MemoryError, RecursionError: raise" need to log the error with
contextual fields (agent_id, tool, round, query_text/limit/categories where
available) at WARNING or ERROR before re-raising; update the handler around the
call to _retrieve_with_reformulation (and the analogous handlers at the other
mentioned sites) to catch (builtins.MemoryError, RecursionError) as e, call the
module logger (e.g., processLogger or the file's logger) with a clear message
and the context dict, then re-raise the original exception; also add the same
logging immediately before the unguarded initial retrieve in
_reformulation_loop() so fatal errors include agent/tool/round context.

In `@tests/unit/memory/consolidation/test_llm_strategy.py`:
- Around line 743-786: Update the
test_multi_category_groups_processed_in_parallel to actually assert parallelism
by making provider.complete block on an asyncio.Event barrier: create two
asyncio.Event instances (e.g., entered_event and release_event) and a shared
counter in the test; set provider.complete to a side_effect that increments the
counter, sets entered_event when counter==2, then await release_event.wait()
before returning _make_response(); after calling strategy.consolidate (or
immediately before releasing), await entered_event.wait() to ensure both group
calls have entered, assert the counter==2 (and provider.complete.call_count==2),
then set release_event to let both completes proceed and finally assert
backend.store.call_count and result.consolidated_count as before. This uses
provider.complete, asyncio.Event, and strategy.consolidate to enforce and verify
concurrency.

In `@tests/unit/memory/test_ranking_diversity.py`:
- Around line 231-234: The test test_invalid_lambda_raises only checks
out-of-range finite values; update its parametrization to include non-finite
lambdas (float("nan"), float("inf"), -float("inf")) so
apply_diversity_penalty(...) exercise the production guard that rejects NaN/Inf;
modify the `@pytest.mark.parametrize`("bad_lambda", ...) tuple in the
test_ranking_diversity.py test_invalid_lambda_raises to add these three
non-finite values alongside the existing bad_lambda cases to ensure the
ValueError branch for non-finite inputs is covered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c47c5062-7d39-4dba-aee7-8f98e3615882

📥 Commits

Reviewing files that changed from the base of the PR and between 65df4b3 and d215af2.

📒 Files selected for processing (10)
  • docs/design/memory.md
  • src/synthorg/memory/consolidation/llm_strategy.py
  • src/synthorg/memory/consolidation/models.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/tool_retriever.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/memory/test_retriever_diversity.py
  • tests/unit/memory/test_tool_retriever.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build Sandbox
  • GitHub Check: Build Web
  • GitHub Check: Build Backend
  • GitHub Check: Test (Python 3.14)
  • GitHub Check: Dependency Review
  • GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: No from __future__ import annotations -- Python 3.14 has PEP 649
Use PEP 758 except syntax: use except A, B: (no parentheses) -- ruff enforces this on Python 3.14
Type hints: all public functions, mypy strict mode
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules)
Immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement
For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using model_copy(update=...)) for runtime state that evolves
Never mix static config fields with mutable runtime fields in one model
Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use allow_inf_nan=False in all ConfigDict declarations to reject NaN/Inf in numeric fields at validation time
Use @computed_field for derived values instead of storing + validating redundant fields (e.g. TokenUsage.total_tokens)
Use NotBlankStr (from core.types) for all identifier/name fields -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Line length: 88 characters (ruff)
Functions: < 50 lines, files < 800 lines
Errors: handle explicitly, never silently swallow
Validate: at system boundaries (user input, external APIs, config files)
Use ruff check and ruff format for li...

Files:

  • tests/unit/memory/test_retriever_diversity.py
  • tests/unit/memory/test_ranking_diversity.py
  • src/synthorg/memory/consolidation/models.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/retrieval_config.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • src/synthorg/memory/tool_retriever.py
  • tests/unit/memory/test_tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Tests must use test-provider, test-small-001, etc. instead of real vendor names
Markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow
Async: asyncio_mode = "auto" -- no manual @pytest.mark.asyncio needed
Timeout: 30 seconds per test (global in pyproject.toml -- do not add per-file pytest.mark.timeout(30) markers; non-default overrides like timeout(60) are allowed)
Parametrize: Prefer @pytest.mark.parametrize for testing similar cases
Property-based testing with Hypothesis: use @given + @settings, Hypothesis profiles in tests/conftest.py with ci, dev, fuzz, extreme profiles controlled via HYPOTHESIS_PROFILE env var
When Hypothesis finds a failure, it is a real bug -- fix the underlying bug and add an explicit @example(...) decorator to the test
Flaky tests: NEVER skip, dismiss, or ignore -- always fix them fully. For timing-sensitive tests, mock time.monotonic() and asyncio.sleep() to make them deterministic. For tasks that must block indefinitely until cancelled, use asyncio.Event().wait() instead of asyncio.sleep(large_number)

Files:

  • tests/unit/memory/test_retriever_diversity.py
  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/test_tool_retriever.py

⚙️ CodeRabbit configuration file

Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare @settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which @given() honors automatically.

Files:

  • tests/unit/memory/test_retriever_diversity.py
  • tests/unit/memory/test_ranking_diversity.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • tests/unit/memory/test_tool_retriever.py
src/synthorg/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/synthorg/**/*.py: Every module with business logic MUST have: from synthorg.observability import get_logger then logger = get_logger(__name__)
Never use import logging / logging.getLogger() / print() in application code (exception: observability/setup.py, observability/sinks.py, observability/syslog_handler.py, and observability/http_handler.py may use stdlib logging and print(..., file=sys.stderr))
Variable name for logger: always logger (not _logger, not log)
Event names: always use constants from the domain-specific module under synthorg.observability.events (imported directly, never as string literals)
Structured kwargs: always logger.info(EVENT, key=value) -- never logger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO
DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases

Files:

  • src/synthorg/memory/consolidation/models.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
src/**/*.py

⚙️ CodeRabbit configuration file

This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.

Files:

  • src/synthorg/memory/consolidation/models.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/retrieval_config.py
  • src/synthorg/memory/tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
🧠 Learnings (24)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)

Applied to files:

  • tests/unit/memory/test_retriever_diversity.py
  • tests/unit/memory/test_ranking_diversity.py
  • src/synthorg/memory/ranking.py
  • src/synthorg/memory/retrieval_config.py
  • tests/unit/memory/consolidation/test_llm_strategy.py
  • src/synthorg/memory/tool_retriever.py
  • tests/unit/memory/test_tool_retriever.py
  • src/synthorg/memory/consolidation/llm_strategy.py
  • docs/design/memory.md
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to tests/**/*.py : Never skip, dismiss, or ignore flaky tests; fix them fully and fundamentally by mocking `time.monotonic()` and `asyncio.sleep()` for determinism or using `asyncio.Event().wait()` for indefinite blocking

Applied to files:

  • tests/unit/memory/test_retriever_diversity.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`

Applied to files:

  • tests/unit/memory/test_retriever_diversity.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins

Applied to files:

  • tests/unit/memory/test_retriever_diversity.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-17T11:41:02.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:41:02.964Z
Learning: Applies to src/**/*.py : Models: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing + validating redundant fields. Use `NotBlankStr` for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-26T15:18:16.848Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields. Use `NotBlankStr` for all identifier/name fields.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to src/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `allow_inf_nan=False` in all `ConfigDict` declarations to reject `NaN`/`Inf` in numeric fields at validation time

Applied to files:

  • src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.

Applied to files:

  • tests/unit/memory/consolidation/test_llm_strategy.py
  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
  • docs/design/memory.md
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
  • docs/design/memory.md
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
  • docs/design/memory.md
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
  • docs/design/memory.md
📚 Learning: 2026-04-01T06:12:03.047Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T06:12:03.047Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Keep functions under 50 lines and files under 800 lines

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Functions must be less than 50 lines; files must be less than 800 lines

Applied to files:

  • src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains. Rate limiter respects `RateLimitError.retry_after` from providers — automatically pauses future requests.

Applied to files:

  • docs/design/memory.md
🪛 LanguageTool
docs/design/memory.md

[grammar] ~427-~427: Ensure spelling is correct
Context: ... 3 -- smaller groups cannot meaningfully dedup against the retained entry), `temperatu...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (6)
src/synthorg/memory/consolidation/llm_strategy.py (1)

393-421: Store-first plus summarized-only deletion looks right.

This preserves originals on synthesis/store failures and avoids deleting entries that were dropped by prompt capping.

tests/unit/memory/test_retriever_diversity.py (1)

48-171: Nice deterministic wiring test.

Patching apply_diversity_penalty directly makes the call path and ordering contract observable without depending on MMR tie-break details.

src/synthorg/memory/ranking.py (1)

440-595: The MMR hardening here looks solid.

The finite-value validation plus the -math.inf sentinel closes the NaN and all-negative-score edge cases without changing the function's pure contract.

src/synthorg/memory/retrieval_config.py (1)

134-275: Fail-closed config validation is a good fit here.

The ignored-knob warnings and strategy mismatch errors make these new retrieval flags much harder to miswire.

tests/unit/memory/test_tool_retriever.py (1)

691-730: Nice merge regression coverage.

Parameterizing the collision cases and asserting post-merge relevance order gives the reformulation loop a strong guard against first-round bias.

tests/unit/memory/consolidation/test_llm_strategy.py (1)

271-328: These store/delete ordering regressions are valuable.

The no-delete-before-store cases pin down the main data-preservation invariant cleanly.

## Bug fixes

- ``ConsolidationResult.summary_id`` is now a plain ``@property`` (not
  ``@computed_field``) so it is NOT emitted by ``model_dump()``.  With
  ``extra="forbid"``, a computed-field ``summary_id`` in the dump
  payload would have broken ``model_validate(result.model_dump())``
  round-trips, which is a nasty surprise for any persistence or
  copy-through-JSON code path.  Callers that read ``.summary_id``
  directly continue to work unchanged.
- ``LLMConsolidationStrategy.consolidate`` now builds groups BEFORE
  calling ``_fetch_trajectory_context``, so a below-threshold batch
  short-circuits with zero backend calls.  Previously every call --
  even ones with nothing to consolidate -- hit the distillation
  lookup and could trip its WARNING degradation path on what should
  have been a pure no-op.
- ``_parse_categories`` no longer silently broadens retrieval when
  ``categories`` is present but has the wrong shape (e.g. a bare
  string ``"episodic"`` instead of ``["episodic"]``).  Malformed
  inputs are returned in ``rejected_values`` so ``_handle_search``
  surfaces them back to the LLM for self-correction, and a WARNING
  is logged for operators.
- ``_parse_search_args`` and ``_handle_recall`` reject non-string
  ``query`` / ``memory_id`` values up-front with an ``isinstance``
  guard.  Previously an LLM-hallucinated ``{"query": 123}`` would
  pass the truthy check and crash on ``len(query_text)`` downstream.

## Observability

- All ``except builtins.MemoryError, RecursionError: raise`` branches
  in ``tool_retriever.py`` now log at ERROR with full context
  (``agent_id``, ``tool``, ``round``, query/memory_id metadata) before
  re-raising.  Per CLAUDE.md, "all error paths must log at WARNING or
  ERROR with context before raising" -- the system-error branches
  were skipping this, so fatal OOM/recursion failures disappeared
  from observability at exactly the site where the context was still
  available.  Sites updated:
  - ``_safe_search`` (``_retrieve_with_reformulation`` call)
  - ``_reformulation_loop`` initial retrieve (was unguarded)
  - ``_check_sufficiency``, ``_reformulate``, ``_retrieve_round``
  - ``_handle_recall`` (``backend.get`` call)
- ``ToolBasedInjectionStrategy`` class docstring Args block updated:
  ``reformulator`` and ``sufficiency_checker`` are no longer
  "Optional ... silently no-ops if either is missing"; they are
  required when ``config.query_reformulation_enabled`` is True and
  the constructor raises ``ValueError`` otherwise.  ``Raises:``
  section added to match.

## Test updates

- ``test_multi_category_groups_processed_in_parallel`` now uses an
  ``asyncio.Event``-gated provider barrier that tracks concurrent
  in-flight calls and only releases when the expected concurrency
  level is reached.  A sequential refactor of ``consolidate`` would
  deadlock on the barrier (the second call would never enter), so
  the test actually catches regressions away from ``TaskGroup``.
  Previously the test only checked ``call_count == 2``, which would
  also pass for sequential execution.
- ``test_invalid_lambda_raises`` parametrization extended with
  ``float('nan')``, ``float('inf')``, and ``-float('inf')`` to lock
  down the new ``math.isfinite`` guard in ``apply_diversity_penalty``.

## Status

- ``uv run ruff check src/ tests/`` -- clean
- ``uv run mypy src/ tests/`` -- 1539 source files, no issues
- ``uv run python -m pytest tests/unit/ -n 8`` -- 14160 passed, 5 skipped
@Aureliolo
Copy link
Copy Markdown
Owner Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Aureliolo Aureliolo merged commit 174e2be into main Apr 5, 2026
34 checks passed
@Aureliolo Aureliolo deleted the feat/memory-consolidation-upgrades branch April 5, 2026 13:57
@Aureliolo Aureliolo temporarily deployed to cloudflare-preview April 5, 2026 13:57 — with GitHub Actions Inactive
Aureliolo added a commit that referenced this pull request Apr 6, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.6.2](v0.6.1...v0.6.2)
(2026-04-06)


### Features

* add issue analyzer script for priority/scope management
([#1084](#1084))
([1ccba27](1ccba27))
* config fixes and deferred improvements from PR
[#1058](#1058) review
([#1067](#1067))
([2cac2d3](2cac2d3)),
closes [#1061](#1061)
[#1060](#1060)
* cumulative risk-unit action budgets
([#806](#806)) and automated
reporting ([#245](#245))
([#1063](#1063))
([4689816](4689816))
* fine-tuning pipeline + CompositeBackend + workflow lifecycle
([#1065](#1065))
([85b05bc](85b05bc)),
closes [#1001](#1001)
[#850](#850)
[#1058](#1058)
* memory consolidation upgrades (LLM Merge, Search-and-Ask, diversity
penalty, distillation capture)
([#1071](#1071))
([174e2be](174e2be)),
closes [#704](#704)
* migrate web dashboard from Radix UI to Base UI, activate CSP nonce,
rebuild org chart page, and fix agent routing
([#1083](#1083))
([ebc6921](ebc6921))
* v0.7.0 engine foundations -- structured failure diagnosis + auditable
decisions ([#1072](#1072))
([d341d37](d341d37))
* workflow templates and versioning with diff and rollback
([#1069](#1069))
([7af94de](7af94de)),
closes [#1006](#1006)
[#1008](#1008)


### Documentation

* unify REST API docs under /docs/openapi/ and patch sitemap
([#1073](#1073))
([af19382](af19382))


### Maintenance

* bump hypothesis from 6.151.10 to 6.151.11 in the all group
([#1086](#1086))
([3176318](3176318))
* bump nginxinc/nginx-unprivileged from `f99cc61` to `601c823` in
/docker/web in the all group
([#1085](#1085))
([5eb99ac](5eb99ac))
* bump the all group in /web with 3 updates
([#1087](#1087))
([8deae44](8deae44))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Aureliolo added a commit that referenced this pull request Apr 6, 2026
## Summary

Memory consolidation follow-ups from PR #1071 review and embedding cost
tracking.

### #1075: Memory Consolidation Follow-ups (18 of 22 items)

**Code polish (D16-D22):**
- `SynthesisOutcome` StrEnum replaces bare `bool` in `_synthesize`
return
- `_select_entries` returns immutable `tuple` instead of `list`
- `MemoryToolName` StrEnum narrows
`DistillationRequest.memory_tool_invocations`
- Defensive `asyncio.CancelledError` re-raise in all `except Exception`
blocks
- Final-round sufficiency check in `_reformulation_loop` with
`MEMORY_REFORMULATION_FINAL_CHECK` event
- Remove misleading `MEMORY_RETRIEVAL_START` log from
`ContextInjectionStrategy.__init__`

**Architectural (A1, A2, A4):**
- `LLMConsolidationConfig` frozen Pydantic model replaces loose
`__init__` kwargs and module constants (9 fields with validation)
- `candidate_pool_multiplier` on `MemoryRetrievalConfig` for MMR
diversity re-ranking (over-fetch, filter+MMR, truncate)
- `scoring_strategy` discriminator field on `ScoredMemory` (LINEAR vs
RRF provenance)

**File splits (B6, B7, B8):**
- Extract `try_capture_distillation` and `try_procedural_memory` into
`engine/post_execution/memory_hooks.py`
- Move `TestHybridSearchPipeline` to `test_retriever_hybrid.py`
- Extract `_compute_pool_limit()` helper

**Test hygiene (C10-C13, C15):**
- `AsyncMock(spec=...)` discipline across all memory test files
- Replace `call_args[0][N]` positional indexing with `.args[N]`
- Consolidate `_make_strategy` fixture with `**config_overrides`
- Promote `_merge_results` and `_bigram_jaccard` to public APIs
- Log assertions on distillation error paths via
`structlog.testing.capture_logs()`

**Verified as already done:** A3 (summary_ids), C14 (parallelism proof)

**Deferred:** A5 (DistillationRequest downstream), C9 (integration
test), D19 (NotBlankStr no-ops)

### #997: Track Embedding Model Costs in Budget System

- Add `EMBEDDING` to `LLMCallCategory` enum
- `EmbeddingCostConfig` with per-model pricing and chars-per-token
heuristic
- Instrument `Mem0MemoryBackend.store()` and `retrieve()` to record
`CostRecord` with `call_category=EMBEDDING`
- Extend `CategoryBreakdown` with embedding cost/tokens/count fields
- Dashboard: embedding category in types, utils, `CategoryBreakdown`
component with `bg-accent-dim` design token
- Budget enforcement: record-only (no gating -- embedding is on the
critical path)

## Test Plan

- 14,479 Python tests pass (unit + integration), 9 expected skips
- 2,405 web dashboard tests pass
- mypy strict: 0 errors across 1,558 source files
- ruff + ESLint: 0 warnings

## Review Coverage

Pre-reviewed by 5 agents (code-reviewer, type-design-analyzer,
silent-failure-hunter, issue-resolution-verifier, conventions-enforcer).
13 findings addressed in final fix commit.

Closes #1075
Closes #997
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: memory consolidation upgrades (LLM Merge, Search-and-Ask, diversity penalty)

2 participants