Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
🧰 Additional context used📓 Path-based instructions (4)**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
tests/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
⚙️ CodeRabbit configuration file
Files:
src/synthorg/**/*.py📄 CodeRabbit inference engine (CLAUDE.md)
Files:
src/**/*.py⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (31)📓 Common learnings📚 Learning: 2026-03-19T07:12:14.508ZApplied to files:
📚 Learning: 2026-04-05T07:25:14.510ZApplied to files:
📚 Learning: 2026-03-17T22:08:13.456ZApplied to files:
📚 Learning: 2026-03-15T19:14:27.144ZApplied to files:
📚 Learning: 2026-03-20T21:44:04.528ZApplied to files:
📚 Learning: 2026-03-17T11:41:02.964ZApplied to files:
📚 Learning: 2026-03-26T15:18:16.848ZApplied to files:
📚 Learning: 2026-03-15T19:14:27.144ZApplied to files:
📚 Learning: 2026-03-15T19:14:27.144ZApplied to files:
📚 Learning: 2026-03-15T19:14:27.144ZApplied to files:
📚 Learning: 2026-03-15T18:42:17.990ZApplied to files:
📚 Learning: 2026-04-01T15:36:39.993ZApplied to files:
📚 Learning: 2026-03-17T22:08:13.456ZApplied to files:
📚 Learning: 2026-04-01T09:39:21.584ZApplied to files:
📚 Learning: 2026-04-05T07:25:14.510ZApplied to files:
📚 Learning: 2026-03-17T18:52:05.142ZApplied to files:
📚 Learning: 2026-04-01T15:36:39.993ZApplied to files:
📚 Learning: 2026-03-15T16:55:07.730ZApplied to files:
📚 Learning: 2026-03-15T18:38:44.202ZApplied to files:
📚 Learning: 2026-03-31T20:07:03.035ZApplied to files:
📚 Learning: 2026-03-16T07:22:28.134ZApplied to files:
📚 Learning: 2026-03-17T22:08:13.456ZApplied to files:
📚 Learning: 2026-03-14T16:18:57.267ZApplied to files:
📚 Learning: 2026-03-17T06:43:14.114ZApplied to files:
📚 Learning: 2026-03-16T19:13:34.746ZApplied to files:
📚 Learning: 2026-04-01T17:49:14.133ZApplied to files:
📚 Learning: 2026-03-20T11:18:48.128ZApplied to files:
📚 Learning: 2026-03-20T08:28:32.845ZApplied to files:
📚 Learning: 2026-04-01T06:12:03.047ZApplied to files:
📚 Learning: 2026-03-16T07:22:28.134ZApplied to files:
🔇 Additional comments (27)
WalkthroughAdds MMR-based diversity re-ranking to the ContextInjection retrieval path with new Suggested labels
|
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches. Re-running this action after a short time may resolve the issue. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
There was a problem hiding this comment.
Code Review
This pull request introduces several advanced memory management features, including an LLM-based consolidation strategy that synthesizes related memories using task 'distillation' context, an iterative 'Search-and-Ask' loop for tool-based retrieval, and MMR-based diversity re-ranking to reduce redundancy. It also implements a 'fail-closed' policy for memory filters to enhance privacy. Review feedback highlights critical improvements: reordering operations in the LLM consolidation strategy to prevent data loss, applying memory filters before diversity re-ranking for efficiency, ensuring tool-based search respects result limits after iterative rounds, and capping total input size for LLM synthesis to avoid context window overflows.
| removed_ids: list[NotBlankStr] = [] | ||
| for entry in to_remove: | ||
| try: | ||
| await self._backend.delete(agent_id, entry.id) | ||
| except MemoryError, RecursionError: | ||
| raise | ||
| except Exception as exc: | ||
| logger.warning( | ||
| LLM_STRATEGY_FALLBACK, | ||
| agent_id=agent_id, | ||
| category=category.value, | ||
| entry_id=entry.id, | ||
| reason="delete_failed", | ||
| error=str(exc), | ||
| error_type=type(exc).__name__, | ||
| exc_info=True, | ||
| ) | ||
| continue | ||
| removed_ids.append(entry.id) | ||
|
|
||
| synthesized, used_llm = await self._synthesize( | ||
| to_remove, | ||
| agent_id=agent_id, | ||
| category=category, | ||
| trajectory_context=trajectory_context, | ||
| ) | ||
| tag = _LLM_SYNTHESIZED_TAG if used_llm else _CONCAT_FALLBACK_TAG | ||
| store_request = MemoryStoreRequest( | ||
| category=category, | ||
| content=synthesized, | ||
| metadata=MemoryMetadata( | ||
| source="consolidation", | ||
| tags=("consolidated", tag), | ||
| ), | ||
| ) | ||
| new_id = await self._backend.store(agent_id, store_request) |
There was a problem hiding this comment.
The current implementation deletes original memory entries before performing LLM synthesis and storing the resulting summary. If the LLM call fails (e.g., due to a non-retryable ProviderError) or if the backend storage of the summary fails, the original information is permanently lost.
It is safer to perform synthesis and store the summary first, and only then proceed with deleting the original entries. This ensures that even if deletion fails (partially or fully), the information is preserved in either the original or summarized form. Since the strategy already handles partial delete failures gracefully, moving deletion to the end of the method would eliminate the risk of data loss on LLM/storage errors.
src/synthorg/memory/retriever.py
Outdated
| if self._config.diversity_penalty_enabled: | ||
| ranked = apply_diversity_penalty( | ||
| ranked, | ||
| diversity_lambda=self._config.diversity_lambda, | ||
| ) | ||
|
|
||
| if self._memory_filter is not None: |
There was a problem hiding this comment.
Diversity re-ranking (MMR) is currently applied before the memory filter. It is more efficient and semantically correct to apply the filter first.
-
Efficiency: Filtering first reduces the number of entries passed to the
$O(N^2)$ MMR algorithm. - Correctness: Diversity should be optimized among the set of memories that are actually eligible for injection (e.g., non-inferable ones). Re-ranking before filtering may result in a less diverse final set if the filter removes the most diverse candidates that were prioritized by MMR.
| result_count=len(entries), | ||
| reason="max_rounds_reached", | ||
| ) | ||
| return entries |
There was a problem hiding this comment.
The Search-and-Ask loop merges results from multiple rounds but does not truncate the final list to the requested limit. If an agent requests a limit of 10, but reformulation rounds find additional unique entries, the tool response will exceed the requested limit. The final result should be truncated to limit after merging to respect the tool's contract.
return entries[:limit]| user_content = "\n---\n".join( | ||
| f"[{e.category.value}] {e.content[:_MAX_ENTRY_INPUT_CHARS]}" | ||
| for e in entries | ||
| ) |
There was a problem hiding this comment.
While individual entries are truncated to _MAX_ENTRY_INPUT_CHARS, there is no limit on the total length of user_content sent to the LLM. If a large number of entries are consolidated at once (e.g., a group of 50+ entries), the combined prompt could exceed the LLM's context window or result in extremely high costs. Consider adding a cap on the total number of entries processed per group or a total character count limit for the synthesis input.
There was a problem hiding this comment.
Pull request overview
Implements issue #704 by expanding the memory system with LLM-driven consolidation, distillation capture, iterative Search-and-Ask retrieval, and MMR-based diversity re-ranking, plus resilience/privacy hardening and corresponding test coverage.
Changes:
- Added
LLMConsolidationStrategywith per-category parallel processing and optional trajectory (distillation) context injection. - Added distillation capture at task completion (opt-in via
AgentEngine) and wiring to store trajectory/outcome/tool-invocation summaries. - Added Search-and-Ask query reformulation loop for tool-based retrieval and MMR diversity re-ranking for context-based retrieval, with config validation + expanded unit tests.
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/memory/test_tool_retriever.py | Adds tests for Search-and-Ask reformulation loop, invalid category surfacing, and result merging. |
| tests/unit/memory/test_retriever.py | Updates filter failure behavior test to verify fail-closed privacy posture. |
| tests/unit/memory/test_retrieval_config.py | Adds config validation tests for diversity params and reformulation strategy constraints/warnings. |
| tests/unit/memory/test_ranking.py | Notes test split for diversity ranking to keep file size manageable. |
| tests/unit/memory/test_ranking_diversity.py | New tests for bigram Jaccard similarity + MMR diversity re-ranking behavior and edge cases. |
| tests/unit/memory/test_memory_tools.py | Updates error-prefix matching tests for tool error classification. |
| tests/unit/memory/consolidation/test_llm_strategy.py | New test suite covering LLM consolidation behavior, fallbacks, deletes, and distillation context wiring. |
| tests/unit/memory/consolidation/test_distillation.py | New tests for distillation request model + capture logic and stored request shape. |
| tests/unit/engine/test_agent_engine_distillation.py | New integration tests ensuring distillation capture is opt-in and behaves on success/error/no-backend. |
| src/synthorg/observability/events/memory.py | Adds events for reformulation rounds/sufficiency and diversity re-ranking. |
| src/synthorg/observability/events/consolidation.py | Adds events for LLM consolidation outcomes and distillation capture. |
| src/synthorg/memory/tool_retriever.py | Implements Search-and-Ask loop, category rejection surfacing, result de-duping, and standardized error prefixes. |
| src/synthorg/memory/retriever.py | Adds MMR diversity re-ranking hook and changes filter failures to fail closed (return empty). |
| src/synthorg/memory/retrieval_config.py | Adds diversity settings + validators and enforces reformulation only for TOOL_BASED strategy. |
| src/synthorg/memory/ranking.py | Adds bigram similarity + MMR-based apply_diversity_penalty implementation. |
| src/synthorg/memory/consolidation/llm_strategy.py | Introduces LLM-based consolidation strategy with distillation-context prompt injection and fallbacks. |
| src/synthorg/memory/consolidation/distillation.py | Introduces distillation capture utilities and storage of trajectory/outcome/tool-invocation summaries. |
| src/synthorg/memory/consolidation/init.py | Exports new consolidation/distillation components from the package. |
| src/synthorg/engine/agent_engine.py | Adds distillation_capture_enabled opt-in and post-execution distillation capture hook. |
| docs/design/memory.md | Documents new consolidation strategies, distillation capture, diversity re-ranking, and Search-and-Ask behavior. |
| CLAUDE.md | Updates memory package structure notes and logging event guidance/examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Deletes originals BEFORE storing the summary so a mid-delete | ||
| failure aborts before creating an orphan summary. Individual | ||
| delete failures are tolerated best-effort: the loop continues, | ||
| logs the failure, and only successfully-deleted entry IDs are | ||
| returned in ``removed_ids``. |
There was a problem hiding this comment.
In _process_group, originals are deleted before the synthesized summary is stored. If backend.store() fails (transient DB/network issue, validation, etc.), the consolidation will permanently delete memories and then raise, leaving no summary and causing data loss. Consider storing the summary first (or storing a “pending” summary) and only deleting originals after a successful store; if deletes fail, log and keep originals (duplication is safer than loss).
| total_tokens = sum(t.total_tokens for t in turns) | ||
| tool_call_count = sum(len(turn.tool_calls_made) for turn in turns) | ||
| unique_tools = sorted({tool for turn in turns for tool in turn.tool_calls_made}) | ||
|
|
||
| parts = [f"{len(turns)} turns, {total_tokens} tokens"] | ||
| if unique_tools: | ||
| parts.append(f"tools: {', '.join(unique_tools)}") | ||
| if tool_call_count: | ||
| parts.append(f"{tool_call_count} tool calls total") |
There was a problem hiding this comment.
build_trajectory_summary() always renders "{n} turns" (e.g. "1 turns"), which is grammatically incorrect and becomes user-visible content stored in distillation memories. Consider using singular/plural rendering ("1 turn" vs "N turns") for the turns and tool-call counts.
| total_tokens = sum(t.total_tokens for t in turns) | |
| tool_call_count = sum(len(turn.tool_calls_made) for turn in turns) | |
| unique_tools = sorted({tool for turn in turns for tool in turn.tool_calls_made}) | |
| parts = [f"{len(turns)} turns, {total_tokens} tokens"] | |
| if unique_tools: | |
| parts.append(f"tools: {', '.join(unique_tools)}") | |
| if tool_call_count: | |
| parts.append(f"{tool_call_count} tool calls total") | |
| turn_count = len(turns) | |
| total_tokens = sum(t.total_tokens for t in turns) | |
| tool_call_count = sum(len(turn.tool_calls_made) for turn in turns) | |
| unique_tools = sorted({tool for turn in turns for tool in turn.tool_calls_made}) | |
| turn_label = "turn" if turn_count == 1 else "turns" | |
| tool_call_label = "tool call" if tool_call_count == 1 else "tool calls" | |
| parts = [f"{turn_count} {turn_label}, {total_tokens} tokens"] | |
| if unique_tools: | |
| parts.append(f"tools: {', '.join(unique_tools)}") | |
| if tool_call_count: | |
| parts.append(f"{tool_call_count} {tool_call_label} total") |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1071 +/- ##
==========================================
+ Coverage 89.59% 89.62% +0.02%
==========================================
Files 742 744 +2
Lines 43408 43625 +217
Branches 4359 4379 +20
==========================================
+ Hits 38892 39099 +207
- Misses 3745 3751 +6
- Partials 771 775 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/synthorg/memory/retriever.py (1)
269-328:⚠️ Potential issue | 🟠 MajorMove diversity reranking before the final cap and after filtering.
MemoryQuery.limitis alreadymax_memorieson Line 272, and both ranking paths trim tomax_memoriesbefore control reaches this block. In the linear path that makesapply_diversity_penalty()reorder-only—it can never replace a redundant top-K item with a more diverse candidate below the cutoff. Because the filter runs afterwards, filtered-out entries can also act as MMR anchors and skew the final in-budget set. Fetch a larger candidate pool, apply the filter, then run diversity before the final truncation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/retriever.py` around lines 269 - 328, The current flow limits MemoryQuery to self._config.max_memories before ranking, so apply_diversity_penalty (and RRF vs linear pipelines: _execute_rrf_pipeline, _execute_linear_pipeline) can only reorder within the already-truncated top-K and filtered-out items still influence results; instead request a larger candidate pool (e.g. set MemoryQuery.limit to a higher candidate_limit — either a new config like candidate_pool_size or compute candidate_limit = self._config.max_memories * N when building query), run the chosen pipeline (_execute_rrf_pipeline/_execute_linear_pipeline) to get the candidate set, then run the filter via self._memory_filter.filter_for_injection on that candidate set, then apply_diversity_penalty(ranked, diversity_lambda=self._config.diversity_lambda) if self._config.diversity_penalty_enabled, and only after that truncate the final list back to self._config.max_memories before returning; update related logging/exception handling around self._memory_filter accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/memory/consolidation/distillation.py`:
- Around line 187-210: The EPISODIC store currently only saves
outcome/trajectory/count; update the store payload to persist the full
DistillationRequest (including task_id and memory_tool_invocations) by
serializing the DistillationRequest instance and using that serialized string as
MemoryStoreRequest.content before calling backend.store; prefer adding a
to_dict()/serialize() method on DistillationRequest (or reusing one if present)
and set MemoryMetadata(tags=(DISTILLATION_TAG,)) as before so downstream readers
can correlate by task_id and distinguish search_memory vs recall_memory and keep
capture_distillation() under the 50-line limit by moving serialization into
DistillationRequest.
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 297-336: The code currently deletes selected entries via
self._backend.delete (in the _select_entries/for entry loop) before calling
self._backend.store, risking unrecoverable data loss if store fails; change the
sequence to synthesize and call self._backend.store first (using the same
MemoryStoreRequest, tag logic with _LLM_SYNTHESIZED_TAG/_CONCAT_FALLBACK_TAG),
capture the new_id, then attempt to delete the original entries with
self._backend.delete and populate removed_ids only for successful deletes; on
delete exceptions log the same LLM_STRATEGY_FALLBACK info
(agent_id/category/entry_id/reason/error/error_type) and continue so store
remains intact if deletes fail.
In `@src/synthorg/memory/ranking.py`:
- Around line 472-480: The diversity_lambda validation currently emits
MEMORY_DIVERSITY_RERANKED on error; change it to a distinct validation-failure
event (e.g., MEMORY_DIVERSITY_VALIDATION_FAILED) to match the pattern used by
MEMORY_RRF_VALIDATION_FAILED in fuse_ranked_lists: update the logger.warning
call in the diversity_lambda check (the block that constructs msg and calls
logger.warning with param="diversity_lambda", value=diversity_lambda,
reason=msg) to use the new event constant, add that constant (named
MEMORY_DIVERSITY_VALIDATION_FAILED) alongside the other memory event constants,
and keep the existing fields and subsequent raise ValueError unchanged so logs
and alerts can distinguish validation failures from successful reranks.
In `@tests/unit/engine/test_agent_engine_distillation.py`:
- Around line 101-106: Current tests call the private
engine._post_execution_pipeline(...) directly which bypasses the full flow; add
at least one integration-style test that calls the public engine.run(...)
instead, mocking/stubbing the execution loop and any external dependencies
(e.g., task execution, identity, distillation storage) so run() executes
deterministically, then assert the same end-to-end outcomes currently verified
for _post_execution_pipeline; apply the same change to the other similar tests
(around the other ranges) so one or more tests exercise run() path while keeping
the existing unit-level calls for fast iteration.
In `@tests/unit/memory/consolidation/test_distillation.py`:
- Around line 299-303: The test function
test_error_without_message_falls_through is declared async but only calls the
synchronous helper build_outcome and does not await anything; make it a regular
(non-async) test function by removing the async keyword so the test runner
treats it as a normal synchronous test and avoids warnings or misbehavior.
---
Outside diff comments:
In `@src/synthorg/memory/retriever.py`:
- Around line 269-328: The current flow limits MemoryQuery to
self._config.max_memories before ranking, so apply_diversity_penalty (and RRF vs
linear pipelines: _execute_rrf_pipeline, _execute_linear_pipeline) can only
reorder within the already-truncated top-K and filtered-out items still
influence results; instead request a larger candidate pool (e.g. set
MemoryQuery.limit to a higher candidate_limit — either a new config like
candidate_pool_size or compute candidate_limit = self._config.max_memories * N
when building query), run the chosen pipeline
(_execute_rrf_pipeline/_execute_linear_pipeline) to get the candidate set, then
run the filter via self._memory_filter.filter_for_injection on that candidate
set, then apply_diversity_penalty(ranked,
diversity_lambda=self._config.diversity_lambda) if
self._config.diversity_penalty_enabled, and only after that truncate the final
list back to self._config.max_memories before returning; update related
logging/exception handling around self._memory_filter accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: d5714550-a1d8-44e0-a655-57c45be51a70
📒 Files selected for processing (21)
CLAUDE.mddocs/design/memory.mdsrc/synthorg/engine/agent_engine.pysrc/synthorg/memory/consolidation/__init__.pysrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/retriever.pysrc/synthorg/memory/tool_retriever.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/observability/events/memory.pytests/unit/engine/test_agent_engine_distillation.pytests/unit/memory/consolidation/test_distillation.pytests/unit/memory/consolidation/test_llm_strategy.pytests/unit/memory/test_memory_tools.pytests/unit/memory/test_ranking.pytests/unit/memory/test_ranking_diversity.pytests/unit/memory/test_retrieval_config.pytests/unit/memory/test_retriever.pytests/unit/memory/test_tool_retriever.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Agent
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Test (Python 3.14)
- GitHub Check: Analyze (python)
- GitHub Check: Dependency Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python -- Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) for exception handling in Python 3.14 per PEP 758 -- ruff enforces this
All public functions and methods must have type hints, with mypy strict mode enforcement
Google-style docstrings are required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (viamodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with conventions:allow_inf_nan=Falsein all ConfigDict declarations to reject NaN/Inf in numeric fields; use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStrfor all identifier and name fields including optional and tuple variants, instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Maintain a line length of 88 characters (enforced by ruff)
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
tests/unit/memory/test_ranking.pytests/unit/memory/test_retriever.pysrc/synthorg/memory/consolidation/__init__.pysrc/synthorg/observability/events/memory.pysrc/synthorg/memory/retriever.pysrc/synthorg/observability/events/consolidation.pytests/unit/memory/test_memory_tools.pytests/unit/memory/consolidation/test_llm_strategy.pytests/unit/memory/test_ranking_diversity.pytests/unit/engine/test_agent_engine_distillation.pytests/unit/memory/test_retrieval_config.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/ranking.pytests/unit/memory/test_tool_retriever.pytests/unit/memory/consolidation/test_distillation.pysrc/synthorg/memory/tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowfor test markers
Maintain 80% minimum code coverage (enforced in CI)
Prefer@pytest.mark.parametrizefor testing similar cases
Test vendor names must usetest-provider,test-small-001, etc. instead of real vendor names
Use Python Hypothesis for property-based testing with@given+@settings. Hypothesis profiles configured intests/conftest.py:ci(deterministic,max_examples=10+derandomize=True),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples, no deadline). Control viaHYPOTHESIS_PROFILEenv var.
When Hypothesis finds a failing example, it is a real bug. Fix the underlying bug and add an explicit@example(...)decorator to the test so the case is permanently covered in CI.
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/memory/test_ranking.pytests/unit/memory/test_retriever.pytests/unit/memory/test_memory_tools.pytests/unit/memory/consolidation/test_llm_strategy.pytests/unit/memory/test_ranking_diversity.pytests/unit/engine/test_agent_engine_distillation.pytests/unit/memory/test_retrieval_config.pytests/unit/memory/test_tool_retriever.pytests/unit/memory/consolidation/test_distillation.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/test_ranking.pytests/unit/memory/test_retriever.pytests/unit/memory/test_memory_tools.pytests/unit/memory/consolidation/test_llm_strategy.pytests/unit/memory/test_ranking_diversity.pytests/unit/engine/test_agent_engine_distillation.pytests/unit/memory/test_retrieval_config.pytests/unit/memory/test_tool_retriever.pytests/unit/memory/consolidation/test_distillation.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic must havefrom synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Always useloggeras the variable name (not_logger, notlog)
Always use event name constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api,TOOL_INVOKE_STARTfromevents.tool). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging withlogger.info(EVENT, key=value)-- neverlogger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names allowed only in: (1) Operations design page, (2).claude/files, (3) third-party import paths, (4) provider presets which are user-facing runtime data. Tests must usetest-provider,test-small-001, etc.
Files:
src/synthorg/memory/consolidation/__init__.pysrc/synthorg/observability/events/memory.pysrc/synthorg/memory/retriever.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/memory/consolidation/__init__.pysrc/synthorg/observability/events/memory.pysrc/synthorg/memory/retriever.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.py
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Documentation is in
docs/(Markdown, built with Zensical, config:mkdocs.yml). Design spec indocs/design/(12 pages). Architecture indocs/architecture/. Roadmap indocs/roadmap/. Security indocs/security.md. Licensing indocs/licensing.md.
Files:
docs/design/memory.md
🧠 Learnings (30)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
Applied to files:
src/synthorg/memory/consolidation/__init__.pysrc/synthorg/observability/events/memory.pysrc/synthorg/memory/retriever.pytests/unit/memory/consolidation/test_llm_strategy.pytests/unit/memory/test_ranking_diversity.pyCLAUDE.mdsrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/ranking.pydocs/design/memory.mdsrc/synthorg/memory/tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Use event constants from `synthorg.observability.events.<domain>` (e.g., `API_REQUEST_STARTED` from `events.api`); import directly and log with structured kwargs: `logger.info(EVENT, key=value)`, never interpolated strings
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module
Applied to files:
src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals
Applied to files:
src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.
Applied to files:
src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Applied to files:
tests/unit/engine/test_agent_engine_distillation.pyCLAUDE.mdsrc/synthorg/engine/agent_engine.pysrc/synthorg/memory/consolidation/distillation.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Applied to files:
tests/unit/engine/test_agent_engine_distillation.pyCLAUDE.mdsrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/core/**/*.py : Core module must contain shared domain models, base classes, resilience config (RetryConfig, RateLimiterConfig)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to docs/** : Docs source in docs/ (Markdown, built with Zensical); design spec in docs/design/ (7 pages: index, agents, organization, communication, engine, memory, operations)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Documentation source in `docs/` (Markdown, built with Zensical). Design spec in `docs/design/` (7 pages: index, agents, organization, communication, engine, memory, operations). Architecture in `docs/architecture/` (overview, tech-stack, decision log). Roadmap in `docs/roadmap/`. Security in `docs/security.md`. Licensing in `docs/licensing.md`. Reference in `docs/reference/`. REST API reference in `docs/rest-api.md`. Library reference in `docs/api/` (auto-generated from docstrings). Custom templates in `docs/overrides/`. Config in `mkdocs.yml`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine must provide: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection), promotion/demotion
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have `from synthorg.observability import get_logger` followed by `logger = get_logger(__name__)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must have `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import logger via `from synthorg.observability import get_logger` and initialize with `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
🔇 Additional comments (41)
src/synthorg/memory/ranking.py (3)
498-534: LGTM!The MMR re-ranking implementation is well-structured: precomputes bigrams once to avoid O(n²·k) re-tokenization, uses
-math.infas the sentinel (fixing the issue noted in PR objectives), and correctly tracks positions in the shrinkingremaining_indiceslist. The algorithm matches the MMR formula documented in the docstring.
549-585: LGTM!The generic MMR implementation correctly mirrors the bigram-cached version while supporting arbitrary similarity functions. The structure is clean and maintains the same MMR semantics.
401-436: LGTM!The bigram utilities handle edge cases correctly (empty strings, single words) and implement Jaccard similarity accurately. Case-insensitivity via
.lower()is appropriate for text similarity.tests/unit/memory/test_tool_retriever.py (4)
349-378: LGTM!The test correctly verifies that reformulation is disabled by default - even with reformulator and sufficiency checker configured, the loop doesn't engage when
query_reformulation_enabledis not set. Good assertion on call counts.
427-457: LGTM!Excellent test of the core reformulation flow. The side_effect setup correctly simulates the "insufficient → reformulate → sufficient" path, and assertions verify both cross-round result merging and the expected call counts.
459-492: LGTM!The test correctly validates the max rounds boundary: with
max_reformulation_rounds=2and perpetually insufficient results, the loop performs 3 retrieves (initial + 2 reformulated) and exits after exhausting the round budget.
664-705: LGTM!Thorough unit tests for
_merge_resultscovering empty inputs, disjoint concatenation, relevance-based collision resolution, andNonerelevance handling. The bidirectional collision tests (lines 681-697) verify the function is order-independent.tests/unit/memory/test_retrieval_config.py (3)
218-279: LGTM!Comprehensive tests for diversity penalty configuration: default values, boundary conditions (0.0, 1.0), rejection of invalid values (<0, >1, NaN), and appropriate warning logs when
diversity_lambdais set while diversity penalty is disabled. Thecapture_logspattern correctly validates observability behavior.
291-305: LGTM!Tests correctly validate the strategy-conditional reformulation behavior: enabling reformulation with
CONTEXTstrategy raisesValueError, whileTOOL_BASEDsucceeds.
364-396: LGTM!Tests correctly validate strategy consistency: diversity penalty with
TOOL_BASEDemits a warning (since it only applies toContextInjectionStrategy), whileCONTEXTstrategy produces no warning.src/synthorg/engine/agent_engine.py (3)
707-711: LGTM!Distillation capture is correctly wired into the post-execution pipeline after procedural memory and before returning the result. The placement ensures trajectory context is captured for all termination reasons.
264-265: LGTM!The
distillation_capture_enabledparameter follows the established opt-in pattern with a safeFalsedefault, ensuring backward compatibility as confirmed by the relevant code snippet showing existing instantiations without this parameter.Also applies to: 335-335
714-743: No action needed.The error handling in
capture_distillationcorrectly implements the documented behavior: non-system exceptions are caught at line 220 and swallowed without re-raising (line 229), while system errors (MemoryError,RecursionError) are explicitly re-raised at line 219. The docstring in_try_capture_distillationaccurately reflects this implementation, and the direct await is appropriate because error handling already occurs internally withincapture_distillation.> Likely an incorrect or invalid review comment.CLAUDE.md (2)
101-101: LGTM!Documentation accurately reflects the new memory package capabilities: MMR diversity re-ranking, iterative Search-and-Ask reformulation,
LLMConsolidationStrategywith parallel processing and trajectory context injection, andDistillationRequestcapture.
148-148: LGTM!Event name documentation updated to include the new consolidation events (
LLM_STRATEGY_SYNTHESIZED,DISTILLATION_CAPTURED) and memory events (MEMORY_DIVERSITY_RERANKED,MEMORY_REFORMULATION_ROUND), maintaining the structured logging inventory.tests/unit/engine/test_agent_engine_distillation.py (2)
85-139: LGTM!Tests comprehensively validate distillation capture: successful completion stores with "distillation" tag/source and "Task completed" content, while error termination captures "Task failed" with the error message. Good coverage of both success and failure paths.
141-181: LGTM!Tests correctly validate opt-in semantics: default disabled (store not called), and enabled without backend completes without raising. These edge cases ensure the feature degrades gracefully.
tests/unit/memory/test_ranking_diversity.py (3)
57-84: LGTM!Excellent edge case coverage for
_bigram_jaccard: identical strings (1.0), completely different (0.0), empty strings (0.0), single words (0.0 - no bigrams possible), partial overlap (1/3), and case insensitivity. The partial overlap test with explicit set calculation in comments aids readability.
236-256: LGTM!Critical test validating the MMR sentinel fix from PR objectives. With
diversity_lambda=0.0and identical content (all MMR scores are -1.0), using-math.infas the sentinel ensures deterministic selection rather than the loop jamming. This directly tests the hardening fix mentioned in the commit messages.
210-234: LGTM!This test validates the core value proposition of diversity re-ranking: with moderate lambda (0.3), the "diverse" entry (0.85 relevance, unrelated content) beats the "redundant" entry (0.9 relevance, similar content to base), demonstrating coverage promotion over pure relevance ordering.
tests/unit/memory/consolidation/test_llm_strategy.py (5)
80-89: LGTM!Tests correctly validate constructor constraints:
group_threshold < 2raisesValueError, whilegroup_threshold=2(the minimum) is accepted. This matches the documented requirement for meaningful consolidation (at least 2 entries to merge).
217-378: LGTM!Comprehensive error handling tests: retryable errors (RateLimitError) and unexpected errors (RuntimeError) fall back to concatenation, non-retryable errors (AuthenticationError) and system errors (MemoryError, RecursionError) propagate. Empty and whitespace-only LLM responses also trigger fallback. This matches the resilience patterns documented in CLAUDE.md.
560-648: LGTM!Thorough tests for trajectory context integration: enabled path verifies distillation entries are fetched and injected into the LLM prompt, disabled path skips lookup entirely, and lookup failures degrade gracefully without blocking consolidation. This validates the key integration between distillation capture and LLM consolidation.
526-558: LGTM!Critical edge case test: partial delete failures (1 of 3 fails) result in
consolidated_count=2(only successful deletions counted), and summary storage proceeds despite the failure. This validates the graceful degradation documented in PR objectives ("partial-delete tolerance").
649-692: 🧹 Nitpick | 🔵 TrivialConsider adding explicit parallelism verification.
The test validates multi-category consolidation outcomes (correct call counts and consolidated_count), but doesn't directly verify that groups are processed in parallel via
TaskGroupas documented. Consider adding timing assertions or usingasyncio.Eventcoordination to confirm concurrent execution, though the current functional assertions are sufficient for correctness validation.⛔ Skipped due to learnings
Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-17T22:08:13.456Z Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-01T09:39:21.584Z Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-04-04T19:50:52.893Z Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.Learnt from: CR Repo: Aureliolo/synthorg PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-03-17T18:52:05.142Z Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.src/synthorg/memory/retrieval_config.py (3)
116-131: LGTM! Well-designed diversity configuration fields.The
diversity_penalty_enabledanddiversity_lambdafields follow established patterns: frozen model withallow_inf_nan=False, appropriate bounds (ge=0.0, le=1.0), and clear descriptions matching the MMR formula semantics (1.0 = pure relevance, 0.0 = max diversity).
192-229: LGTM! Validator logic is sound and user-friendly.Both diversity validators correctly:
- Use
model_fields_setto avoid warning whendiversity_lambdawas never explicitly set (Line 198)- Log warnings with full context (
field,value,strategy,reason) before returning- Only warn (not raise) since these are configuration mismatches, not hard errors
The strategy consistency check at Line 216 correctly identifies that
TOOL_BASEDstrategy doesn't apply the MMR re-ranking step.
231-249: LGTM! Validator properly enforces tool-based requirement.The validator correctly:
- Returns early when reformulation is disabled (Line 234-235)
- Allows
TOOL_BASEDstrategy (Line 236-237)- Raises
ValueErrorwith a clear message including the actual strategy value for other strategies- Logs before raising per project conventions
This aligns with the documented behavior that Search-and-Ask reformulation is only wired into
ToolBasedInjectionStrategy.docs/design/memory.md (3)
415-459: Excellent documentation of the consolidation subsystem.The documentation clearly explains:
- Three strategy implementations with distinct behaviors
LLMConsolidationStrategyparallel processing viaasyncio.TaskGroup- Distillation capture lifecycle and field sources (Line 439-445)
- The
memory_tool_invocationsclarification (tool names, not entry IDs) prevents future confusion- How distillation entries flow into consolidation via tag queries
741-808: Well-documented MMR diversity re-ranking.The documentation accurately describes:
- MMR formula:
lambda * combined_score - (1 - lambda) * max_sim_to_selected- Trade-off semantics: 1.0 = pure relevance, 0.0 = max diversity
- Bigram Jaccard as default similarity function
O(n**2)complexity note with precomputation explanation- Warning about combining with strategies that ignore it
Minor note: The documentation correctly warns users via
model_validator(Line 806-808), matching the implementation.
840-848: Clear Search-and-Ask loop documentation.Documents the iterative reformulation behavior:
- Requires both
QueryReformulatorandSufficiencyChecker- Merges results by entry ID (higher relevance wins)
- Graceful degradation on failures
- Explicitly notes MMR doesn't apply in tool-based path
This matches the implementation in
tool_retriever.py.src/synthorg/memory/tool_retriever.py (4)
164-199: LGTM! Correct merge-by-ID logic with stable ordering.The
_merge_resultsfunction correctly:
- Preserves insertion order via separate
orderlist- Handles
Nonerelevance as0.0(Lines 191-192)- Keeps higher-relevance entry on duplicates (Line 193-194)
- Returns stable tuple maintaining "first seen" order
The implementation matches the documented behavior in
docs/design/memory.md.
459-500: LGTM! Defensive guard with graceful fallback.The method correctly implements fail-safe behavior:
- Checks both
query_reformulation_enabledflag AND presence of both collaborators (Lines 481-485)- Falls back to simple retrieval when any condition is unmet
- This matches the documented "silently no-ops if either is missing" contract in
retrieval_config.pyThis defensive pattern prevents runtime errors even if the config validator is bypassed.
502-627: Well-structured reformulation loop with proper error isolation.The implementation correctly:
- Propagates system errors (
builtins.MemoryError,RecursionError) at Lines 546-547, 573-574- Isolates non-system exceptions and degrades gracefully (Lines 548-557, 575-584)
- Logs with full context (
agent_id,round,error,error_type,exc_info)- Handles reformulator returning
Noneor unchanged query (Lines 586-598)- Merges results across rounds (Line 617)
The loop body is lengthy but cohesive—it handles a single concern (the Search-and-Ask protocol) with clear phase transitions. Splitting would reduce readability without improving maintainability.
121-161: LGTM! Improved category parsing with operator visibility.The updated
_parse_categories:
- Returns rejected values for caller surfacing (Line 141)
- Logs invalid categories at WARNING with
agent_idcontext (Lines 153-159)- Enables the LLM to self-correct via the feedback in Line 453-456
Good observability improvement.
tests/unit/memory/consolidation/test_distillation.py (2)
68-115: LGTM! Comprehensive model validation tests.Tests cover all key
DistillationRequestbehaviors:
- Field population including
memory_tool_invocations- Frozen model mutation rejection via
ValidationError- Default value for
memory_tool_invocationsNotBlankStrvalidation rejecting whitespaceagent_id
208-297: Excellent error propagation test coverage.The async tests correctly verify:
- Successful capture returns populated
DistillationRequest(Lines 210-232)MemoryStoreErrorreturnsNone(Lines 234-249)MemoryErrorpropagates (Lines 251-263)RecursionErrorpropagates (Lines 285-297)This matches the documented "non-critical" behavior for domain errors and "propagate" behavior for system errors.
src/synthorg/memory/consolidation/llm_strategy.py (4)
193-218: LGTM! Proper TaskGroup exception unwrapping.The
except*blocks correctly unwrapExceptionGroupto surface the original exception type, maintaining consistent error handling semantics whether processing is parallel or sequential. The unwrapping ofMemoryError,RecursionError, andProviderErrorensures callers see expected exception types.
382-463: LGTM! Comprehensive fallback handling in _synthesize.The error handling correctly:
- Propagates system errors (
MemoryError,RecursionError) at Lines 404-413- Falls back on
RetryExhaustedError(Lines 414-428)- Falls back on retryable
ProviderErrorfor edge cases (Lines 429-445)- Propagates non-retryable
ProviderError(Lines 446-449)- Falls back on unexpected exceptions with WARNING log and traceback (Lines 450-462)
This matches the documented "falls back to truncated concatenation" behavior.
241-269: LGTM! Graceful degradation for trajectory context lookup.The method correctly:
- Short-circuits when
include_distillation_contextis False (Lines 251-252)- Propagates system errors (Line 259-260)
- Degrades silently on other failures, returning empty context (Lines 261-269)
This ensures consolidation proceeds even if distillation data is unavailable.
516-539: LGTM! Correct best-entry selection logic.The
_select_entriesmethod correctly:
- Treats
Nonerelevance as0.0(Line 535)- Uses
created_atas tiebreaker (Line 536)- Returns the best entry and remaining entries for removal (Line 539)
The tuple comparison
(relevance, created_at)provides stable deterministic ordering.
| request = DistillationRequest( | ||
| agent_id=agent_id, | ||
| task_id=task_id, | ||
| trajectory_summary=trajectory, | ||
| outcome=outcome, | ||
| memory_tool_invocations=tool_invocations, | ||
| created_at=datetime.now(UTC), | ||
| ) | ||
|
|
||
| store_content = ( | ||
| f"Distillation: {outcome}\n" | ||
| f"Trajectory: {trajectory}\n" | ||
| f"Memory lookups: {len(tool_invocations)}" | ||
| ) | ||
|
|
||
| store_request = MemoryStoreRequest( | ||
| category=MemoryCategory.EPISODIC, | ||
| content=store_content, | ||
| metadata=MemoryMetadata( | ||
| source="distillation", | ||
| tags=(DISTILLATION_TAG,), | ||
| ), | ||
| ) | ||
| await backend.store(agent_id, store_request) |
There was a problem hiding this comment.
Persist the full distillation payload.
The stored EPISODIC entry only keeps outcome, trajectory, and a lookup count. task_id and the concrete memory_tool_invocations never reach the backend, so downstream readers cannot correlate entries by task or tell whether the agent used search_memory vs recall_memory. This is also a good seam to move serialization onto DistillationRequest, which would bring capture_distillation() back under the repo’s 50-line limit.
🛠️ Suggested direction
- store_content = (
- f"Distillation: {outcome}\n"
- f"Trajectory: {trajectory}\n"
- f"Memory lookups: {len(tool_invocations)}"
- )
+ invocations_text = ", ".join(tool_invocations) if tool_invocations else "none"
+ store_content = (
+ f"Task ID: {task_id}\n"
+ f"Outcome: {outcome}\n"
+ f"Trajectory: {trajectory}\n"
+ f"Memory tool invocations: {invocations_text}"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| request = DistillationRequest( | |
| agent_id=agent_id, | |
| task_id=task_id, | |
| trajectory_summary=trajectory, | |
| outcome=outcome, | |
| memory_tool_invocations=tool_invocations, | |
| created_at=datetime.now(UTC), | |
| ) | |
| store_content = ( | |
| f"Distillation: {outcome}\n" | |
| f"Trajectory: {trajectory}\n" | |
| f"Memory lookups: {len(tool_invocations)}" | |
| ) | |
| store_request = MemoryStoreRequest( | |
| category=MemoryCategory.EPISODIC, | |
| content=store_content, | |
| metadata=MemoryMetadata( | |
| source="distillation", | |
| tags=(DISTILLATION_TAG,), | |
| ), | |
| ) | |
| await backend.store(agent_id, store_request) | |
| request = DistillationRequest( | |
| agent_id=agent_id, | |
| task_id=task_id, | |
| trajectory_summary=trajectory, | |
| outcome=outcome, | |
| memory_tool_invocations=tool_invocations, | |
| created_at=datetime.now(UTC), | |
| ) | |
| invocations_text = ", ".join(tool_invocations) if tool_invocations else "none" | |
| store_content = ( | |
| f"Task ID: {task_id}\n" | |
| f"Outcome: {outcome}\n" | |
| f"Trajectory: {trajectory}\n" | |
| f"Memory tool invocations: {invocations_text}" | |
| ) | |
| store_request = MemoryStoreRequest( | |
| category=MemoryCategory.EPISODIC, | |
| content=store_content, | |
| metadata=MemoryMetadata( | |
| source="distillation", | |
| tags=(DISTILLATION_TAG,), | |
| ), | |
| ) | |
| await backend.store(agent_id, store_request) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/memory/consolidation/distillation.py` around lines 187 - 210,
The EPISODIC store currently only saves outcome/trajectory/count; update the
store payload to persist the full DistillationRequest (including task_id and
memory_tool_invocations) by serializing the DistillationRequest instance and
using that serialized string as MemoryStoreRequest.content before calling
backend.store; prefer adding a to_dict()/serialize() method on
DistillationRequest (or reusing one if present) and set
MemoryMetadata(tags=(DISTILLATION_TAG,)) as before so downstream readers can
correlate by task_id and distinguish search_memory vs recall_memory and keep
capture_distillation() under the 50-line limit by moving serialization into
DistillationRequest.
| await engine._post_execution_pipeline( | ||
| result, | ||
| identity, | ||
| str(identity.id), | ||
| "task-dist-001", | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider testing through the public run() method for integration coverage.
The tests call engine._post_execution_pipeline(...) directly (a private method). While this provides targeted unit testing of the distillation wiring, it bypasses the full execution flow. Consider adding at least one integration-style test that exercises run() with a mocked execution loop to verify end-to-end behavior.
That said, the current tests effectively validate the wiring logic and are valuable for quick iteration.
Also applies to: 129-134, 154-159, 176-180
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/engine/test_agent_engine_distillation.py` around lines 101 - 106,
Current tests call the private engine._post_execution_pipeline(...) directly
which bypasses the full flow; add at least one integration-style test that calls
the public engine.run(...) instead, mocking/stubbing the execution loop and any
external dependencies (e.g., task execution, identity, distillation storage) so
run() executes deterministically, then assert the same end-to-end outcomes
currently verified for _post_execution_pipeline; apply the same change to the
other similar tests (around the other ranges) so one or more tests exercise
run() path while keeping the existing unit-level calls for fast iteration.
|
All contributors have signed the CLA. Thank you! |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 210-217: The except* handlers in the llm_strategy consolidation
code currently unwrap an ExceptionGroup named eg and re-raise only
eg.exceptions[0], losing info from other exceptions; before re-raising, log the
full ExceptionGroup so operators can diagnose multi-group failures (e.g., call
the module/class logger with logger.exception or logger.error(...,
exc_info=True) passing eg or its string) in each of the except* MemoryError /
RecursionError / ProviderError / Exception handlers, then continue to raise
eg.exceptions[0] to preserve the existing behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: e1c1efc0-34e6-43ea-b50e-4f1d1ac2cc22
📒 Files selected for processing (5)
CLAUDE.mdsrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/ranking.pysrc/synthorg/observability/events/memory.pytests/unit/memory/consolidation/test_llm_strategy.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Build Sandbox
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python -- Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) for exception handling in Python 3.14 per PEP 758 -- ruff enforces this
All public functions and methods must have type hints, with mypy strict mode enforcement
Google-style docstrings are required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (viamodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with conventions:allow_inf_nan=Falsein all ConfigDict declarations to reject NaN/Inf in numeric fields; use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStrfor all identifier and name fields including optional and tuple variants, instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Maintain a line length of 88 characters (enforced by ruff)
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/synthorg/observability/events/memory.pytests/unit/memory/consolidation/test_llm_strategy.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/ranking.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic must havefrom synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Always useloggeras the variable name (not_logger, notlog)
Always use event name constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api,TOOL_INVOKE_STARTfromevents.tool). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging withlogger.info(EVENT, key=value)-- neverlogger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names allowed only in: (1) Operations design page, (2).claude/files, (3) third-party import paths, (4) provider presets which are user-facing runtime data. Tests must usetest-provider,test-small-001, etc.
Files:
src/synthorg/observability/events/memory.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/ranking.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/observability/events/memory.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/ranking.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowfor test markers
Maintain 80% minimum code coverage (enforced in CI)
Prefer@pytest.mark.parametrizefor testing similar cases
Test vendor names must usetest-provider,test-small-001, etc. instead of real vendor names
Use Python Hypothesis for property-based testing with@given+@settings. Hypothesis profiles configured intests/conftest.py:ci(deterministic,max_examples=10+derandomize=True),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples, no deadline). Control viaHYPOTHESIS_PROFILEenv var.
When Hypothesis finds a failing example, it is a real bug. Fix the underlying bug and add an explicit@example(...)decorator to the test so the case is permanently covered in CI.
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/memory/consolidation/test_llm_strategy.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/consolidation/test_llm_strategy.py
🧠 Learnings (33)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
Applied to files:
src/synthorg/observability/events/memory.pytests/unit/memory/consolidation/test_llm_strategy.pyCLAUDE.mdsrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/ranking.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)
Applied to files:
src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging
Applied to files:
src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Applied to files:
src/synthorg/observability/events/memory.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module
Applied to files:
src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls
Applied to files:
src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.
Applied to files:
src/synthorg/observability/events/memory.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.
Applied to files:
tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-04-01T09:39:21.584Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:39:21.584Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`
Applied to files:
tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.
Applied to files:
tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Applied to files:
tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.
Applied to files:
tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/core/**/*.py : Core module must contain shared domain models, base classes, resilience config (RetryConfig, RateLimiterConfig)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to docs/** : Docs source in docs/ (Markdown, built with Zensical); design spec in docs/design/ (7 pages: index, agents, organization, communication, engine, memory, operations)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Documentation source in `docs/` (Markdown, built with Zensical). Design spec in `docs/design/` (7 pages: index, agents, organization, communication, engine, memory, operations). Architecture in `docs/architecture/` (overview, tech-stack, decision log). Roadmap in `docs/roadmap/`. Security in `docs/security.md`. Licensing in `docs/licensing.md`. Reference in `docs/reference/`. REST API reference in `docs/rest-api.md`. Library reference in `docs/api/` (auto-generated from docstrings). Custom templates in `docs/overrides/`. Config in `mkdocs.yml`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine must provide: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection), promotion/demotion
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have `from synthorg.observability import get_logger` followed by `logger = get_logger(__name__)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must have `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import logger via `from synthorg.observability import get_logger` and initialize with `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
🔇 Additional comments (20)
src/synthorg/memory/consolidation/llm_strategy.py (7)
1-41: LGTM! Module setup follows project conventions.Logger initialization, imports, and module docstring are correctly structured. The
noqa: TC001annotations appropriately suppress type-checking-only import warnings for protocol types used at runtime.
42-74: LGTM! Constants are well-documented with clear rationale.The intentional duplication of
_DISTILLATION_TAG(Lines 49-54) is properly justified to avoid pulling engine execution protocol into consolidation strategy. Tag constants distinguish LLM synthesis from fallback paths.
76-144: LGTM! Well-structured class initialization with proper validation.The class docstring thoroughly documents behavior, error handling, and fallback paths. The
group_thresholdvalidation (Line 130-135) correctly enforces the minimum of 2 with a clear error message.
241-269: LGTM! Best-effort trajectory context fetch with proper degradation.The method correctly short-circuits when disabled (Line 251-252), propagates system errors, and degrades gracefully on retrieval failures while logging context for debugging.
307-348: LGTM! Store-before-delete ordering prevents data loss.The implementation correctly synthesizes and stores (Lines 311-326) before attempting deletions (Lines 328-346). Partial delete failures are logged and tolerated, returning only successfully deleted IDs—recoverable on the next consolidation pass.
350-485: LGTM! Comprehensive synthesis with well-defined fallback semantics.Error handling covers all documented paths:
RetryExhaustedError→ fallback (Lines 426-440)- Retryable
ProviderError→ fallback (Lines 441-457)- Non-retryable
ProviderError→ propagate (Lines 458-461)- Empty/whitespace response → fallback (Lines 476-485)
- Unexpected exceptions → fallback with traceback (Lines 462-474)
Input truncation (Line 389) guards against oversized LLM inputs.
487-551: LGTM! Helper methods are well-implemented.
_build_system_prompt(Lines 487-507): Properly truncates trajectory context per entry._fallback_summary(Lines 509-526): Correctly truncates long content._select_entries(Lines 528-551): Deterministic selection using(relevance_score, created_at)tuple key withNone→0.0coercion.tests/unit/memory/consolidation/test_llm_strategy.py (5)
1-71: LGTM! Well-structured test helpers and fixtures.Helper functions provide sensible defaults for building test data. The
_make_strategyhelper properly initializes mock dependencies with appropriate return values. Test model names follow thetest-*convention.
73-88: LGTM! Initialization tests cover boundary conditions.Protocol conformance check (Line 77) and threshold validation tests (Lines 82-88) properly verify the class interface and constructor constraints.
91-214: LGTM! Core consolidation behavior thoroughly tested.Tests cover:
- Empty input handling (Lines 93-97)
- Below-threshold groups skipped (Lines 99-103)
- Full consolidation flow with assertions on counts, IDs, and mock calls (Lines 105-141)
- Multi-category independence (Lines 143-186)
- Selection logic for highest relevance (Lines 188-214)
217-409: LGTM! Error handling tests are comprehensive.Critical safety test
test_no_deletes_on_synthesis_failure(Lines 267-297) verifies the store-before-delete ordering by asserting neitherdeletenorstoreare called when synthesis raises non-retryable errors. All fallback paths and propagation behaviors are properly verified.
412-724: LGTM! Detailed behavior tests cover edge cases thoroughly.Notable coverage includes:
- Tiebreaker determinism with equal relevance scores (Lines 414-450)
Nonerelevance coercion (Lines 452-473)- Truncation in fallback summaries (Lines 475-505)
- Verbatim LLM response storage with correct tags (Lines 507-556)
- Partial delete failure graceful degradation (Lines 558-590)
- Trajectory context integration and failure modes (Lines 592-679)
src/synthorg/observability/events/memory.py (1)
132-142: LGTM! New event constants follow naming conventions.The new constants follow the
memory.<entity>.<action>pattern:
MEMORY_REFORMULATION_ROUND/MEMORY_REFORMULATION_SUFFICIENTfor Search-and-AskMEMORY_DIVERSITY_RERANKED/MEMORY_DIVERSITY_RERANK_FAILEDfor MMR diversitySection comments provide clear organization.
CLAUDE.md (2)
101-101: LGTM! Memory package documentation accurately reflects new capabilities.The updated description comprehensively covers:
- MMR diversity re-ranking with bigram cache
- Tool-based Search-and-Ask with reformulation
- Fail-closed memory filter
- LLMConsolidationStrategy with TaskGroup parallelism
- Distillation capture with EPISODIC tagging
148-148: LGTM! Event constants documentation updated correctly.New events
LLM_STRATEGY_SYNTHESIZED,DISTILLATION_CAPTURED,MEMORY_DIVERSITY_RERANKED,MEMORY_DIVERSITY_RERANK_FAILED, andMEMORY_REFORMULATION_ROUNDare properly listed with their source modules.src/synthorg/memory/ranking.py (5)
51-99: LGTM! ScoredMemory documentation clarifies fusion-mode semantics.The updated docstring (Lines 54-75) clearly explains how
relevance_score,recency_score, andcombined_scoreshould be interpreted depending on whether the instance was produced by LINEAR ranking or RRF fusion.
399-437: LGTM! Bigram helpers are correctly implemented.
_word_bigrams(Lines 402-415): Returnsfrozensetfor efficient caching, handles short texts gracefully._bigram_jaccard(Lines 418-437): Proper empty-set handling prevents division-by-zero.
440-496: LGTM! Validation now uses distinct failure event.The
diversity_lambdavalidation (Lines 473-481) correctly emitsMEMORY_DIVERSITY_RERANK_FAILEDfor invalid parameters, addressing the past review comment about distinguishing validation failures from successful re-ranks. The early return forlen(scored) <= 1(Lines 483-484) is an appropriate optimization.
499-535: LGTM! Cached bigram MMR implementation is efficient.Pre-computing bigrams once (Line 505) and using
_bigram_jaccard_cachedavoids O(n² × k) re-tokenization. The-math.infsentinel (Line 511) ensures correct initialization. DEBUG-level logging (Lines 529-533) is appropriate for per-call details.
538-586: LGTM! Generic MMR path provides flexibility.
_bigram_jaccard_cached(Lines 538-547) correctly operates on pre-computed frozensets._mmr_rerank_generic(Lines 550-586) mirrors the cached implementation structure, enabling custom similarity functions while maintaining the same MMR selection logic.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 707-711: The distillation capture is only invoked inside
_post_execution_pipeline, so early-return paths in run() (specifically when
calling _handle_budget_error() and _handle_fatal_error()) skip creating
distillation entries; update run() to ensure _try_capture_distillation(...) is
always called before any return by either (a) centralizing result creation into
a single exit path and calling _try_capture_distillation(execution_result,
agent_id, task_id) there, or (b) invoking _try_capture_distillation immediately
after building the result returned by _handle_budget_error() and
_handle_fatal_error(); locate uses of _try_capture_distillation,
_post_execution_pipeline, _handle_budget_error, and _handle_fatal_error in run()
and adjust control flow so distillation runs for all completion outcomes.
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 161-164: The consolidation loop currently overwrites
ConsolidationResult.summary_id on each category and emits
LLM_STRATEGY_SYNTHESIZED before calling store(), losing per-group IDs; fix by
collecting per-group ids as you create them (e.g., summary_ids = []), call the
group store() first and append the returned id to summary_ids, emit
LLM_STRATEGY_SYNTHESIZED immediately after store() and include the summary_id
field, and finally set ConsolidationResult.summary_id to the most recent id but
also attach the full summary_ids list (or include it in STRATEGY_COMPLETE) so
callers retain visibility into every summary produced; update references to
ConsolidationResult.summary_id and the LLM_STRATEGY_SYNTHESIZED emission site
accordingly.
- Around line 457-477: In the except ProviderError as exc block: add a
structured log call before re-raising non-retryable provider errors so agent_id,
category, entry_count and model are recorded; mirror the existing logger fields
used in the retryable branch (use LLM_STRATEGY_FALLBACK, agent_id,
category.value, entry_count=len(entries), model=self._model, error=str(exc),
error_type=type(exc).__name__) and set a distinct reason like
"non_retryable_provider_error" at WARNING or ERROR level, then re-raise the
exception.
- Around line 42-43: The current constants _DEFAULT_GROUP_THRESHOLD and
_MIN_GROUP_THRESHOLD allow group_threshold=2 which results in _select_entries()
producing a pair that leaves one entry kept and one in to_remove so
_synthesize() no longer has multiple entries to deduplicate against; fix this by
enforcing a minimum of 3 (set _MIN_GROUP_THRESHOLD = 3 and validate
group_threshold on input) or, if pairs must be supported, modify the
consolidation flow so that when group_threshold == 2 the retained entry is
passed into _synthesize() as reference/context for deduplication (update the
code paths that call _select_entries(), _synthesize(), and the grouping logic
mentioned around the other occurrences to ensure the kept entry is included in
the LLM prompt).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 69f968bb-750f-41dd-bc9c-0ec7e01a8bea
📒 Files selected for processing (3)
src/synthorg/engine/agent_engine.pysrc/synthorg/memory/consolidation/llm_strategy.pytests/unit/memory/consolidation/test_distillation.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Build Backend
- GitHub Check: Build Web
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python -- Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) for exception handling in Python 3.14 per PEP 758 -- ruff enforces this
All public functions and methods must have type hints, with mypy strict mode enforcement
Google-style docstrings are required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (viamodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with conventions:allow_inf_nan=Falsein all ConfigDict declarations to reject NaN/Inf in numeric fields; use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStrfor all identifier and name fields including optional and tuple variants, instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Maintain a line length of 88 characters (enforced by ruff)
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
tests/unit/memory/consolidation/test_distillation.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/memory/consolidation/llm_strategy.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowfor test markers
Maintain 80% minimum code coverage (enforced in CI)
Prefer@pytest.mark.parametrizefor testing similar cases
Test vendor names must usetest-provider,test-small-001, etc. instead of real vendor names
Use Python Hypothesis for property-based testing with@given+@settings. Hypothesis profiles configured intests/conftest.py:ci(deterministic,max_examples=10+derandomize=True),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples, no deadline). Control viaHYPOTHESIS_PROFILEenv var.
When Hypothesis finds a failing example, it is a real bug. Fix the underlying bug and add an explicit@example(...)decorator to the test so the case is permanently covered in CI.
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/memory/consolidation/test_distillation.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/consolidation/test_distillation.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic must havefrom synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Always useloggeras the variable name (not_logger, notlog)
Always use event name constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api,TOOL_INVOKE_STARTfromevents.tool). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging withlogger.info(EVENT, key=value)-- neverlogger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names allowed only in: (1) Operations design page, (2).claude/files, (3) third-party import paths, (4) provider presets which are user-facing runtime data. Tests must usetest-provider,test-small-001, etc.
Files:
src/synthorg/engine/agent_engine.pysrc/synthorg/memory/consolidation/llm_strategy.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/engine/agent_engine.pysrc/synthorg/memory/consolidation/llm_strategy.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`
Applied to files:
tests/unit/memory/consolidation/test_distillation.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Applied to files:
src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T09:39:21.584Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:39:21.584Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
🔇 Additional comments (1)
tests/unit/memory/consolidation/test_distillation.py (1)
214-331: Nice coverage around distillation capture failure handling.These tests exercise the success path, backend-store fallback, and
MemoryError/RecursionErrorpropagation, which gives the new capture contract solid regression protection.
| await self._try_capture_distillation( | ||
| execution_result, | ||
| agent_id, | ||
| task_id, | ||
| ) |
There was a problem hiding this comment.
Distillation is skipped on the direct-return completion paths.
The new hook only runs inside _post_execution_pipeline(). run() still returns directly from _handle_budget_error() and _handle_fatal_error() (Lines 500-524), so pre-flight budget stops and fatal setup/pipeline failures never produce distillation entries. If this feature is meant to capture task completion outcomes, move the capture step to a single return path in run() or invoke it after those results are built.
Also applies to: 714-743
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/engine/agent_engine.py` around lines 707 - 711, The distillation
capture is only invoked inside _post_execution_pipeline, so early-return paths
in run() (specifically when calling _handle_budget_error() and
_handle_fatal_error()) skip creating distillation entries; update run() to
ensure _try_capture_distillation(...) is always called before any return by
either (a) centralizing result creation into a single exit path and calling
_try_capture_distillation(execution_result, agent_id, task_id) there, or (b)
invoking _try_capture_distillation immediately after building the result
returned by _handle_budget_error() and _handle_fatal_error(); locate uses of
_try_capture_distillation, _post_execution_pipeline, _handle_budget_error, and
_handle_fatal_error in run() and adjust control flow so distillation runs for
all completion outcomes.
PR Review Summary —
|
| Fix | File | Source |
|---|---|---|
Log ExceptionGroup before re-raising in each except* handler in consolidate() — operators can now diagnose multi-task failures where >1 category group fails concurrently |
memory/consolidation/llm_strategy.py |
CodeRabbit #4058984940 |
Run MemoryFilterStrategy before MMR diversity re-ranking — prevents privacy-excluded entries from acting as MMR anchors and suppressing visible-but-similar candidates |
memory/retriever.py |
CodeRabbit #4058864502 |
| Update both pipeline step tables + Diversity Re-ranking prose to reflect filter-first ordering | docs/design/memory.md |
Follow-on |
Store tool invocation names in distillation content instead of bare count (e.g. Memory lookups: search_memory, recall_memory instead of Memory lookups: 2) |
memory/consolidation/distillation.py |
CodeRabbit #4058864502 |
| Update test assertion for new content format | tests/unit/memory/consolidation/test_distillation.py |
Follow-on |
Remove spurious async from test_error_without_message_falls_through (sync test, no awaits) — note: a parallel review agent had already moved this test to the correct class; this commit resolved the resulting merge conflict |
tests/unit/memory/consolidation/test_distillation.py |
CodeRabbit async cleanup |
Commit 68c3aaa — local agent test-quality findings
| Fix | File |
|---|---|
Grammar bug in build_trajectory_summary: fix "1 turns" → "1 turn" (singular/plural) |
memory/consolidation/distillation.py |
Update test assertion to "1 turn" to match corrected output |
tests/unit/memory/consolidation/test_distillation.py |
Hoist 5 repeated per-method _merge_results deferred imports to module level |
tests/unit/memory/test_tool_retriever.py |
Collapse test_empty_llm_response_falls_back + test_whitespace_llm_response_falls_back into a single @pytest.mark.parametrize test per CLAUDE.md conventions |
tests/unit/memory/consolidation/test_llm_strategy.py |
Hoist deferred MemoryCategory import inside test_store_request_shape to module level |
tests/unit/memory/consolidation/test_distillation.py |
Findings NOT Fixed (with rationale)
| Finding | Reason |
|---|---|
| "Store first, delete second" concern (CodeRabbit) | False positive — _process_group already stores before deleting; docstring documents this invariant explicitly |
Serialize full DistillationRequest as content |
Design choice — human-readable text is appropriate for LLM consumption; serialized JSON would be less useful for synthesis |
| Fetch larger candidate pool before MMR truncation | Requires backend API changes (candidate_limit separate from max_memories); valid follow-up |
Add engine.run() integration tests |
Unit tests targeting _post_execution_pipeline directly are correct for this test file; public-API integration tests belong in a dedicated integration test suite |
Checks
ruff check src/ tests/— ✅ all passedmypy src/synthorg/memory/ src/synthorg/engine/agent_engine.py— ✅ no issues (133 files)pytest—⚠️ pre-existing Python 3.14 RC2 + PydanticAssertionErrorenvironment issue confirmed present onmainbefore this PR; not caused by these changes
Generated by Claude Code
Follow-up fixes — commit
|
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
src/synthorg/memory/consolidation/llm_strategy.py (3)
487-507:⚠️ Potential issue | 🟡 MinorLog non-retryable provider failures before re-raising.
This branch propagates auth/validation-style provider errors with no category/model context, so operators only get the coarse task-group log upstream.
As per coding guidelines "Log all error paths at WARNING or ERROR level with context before raising".💡 Minimal fix
except ProviderError as exc: if exc.is_retryable: logger.warning( LLM_STRATEGY_FALLBACK, agent_id=agent_id, @@ ) return self._fallback_summary(entries), False # Non-retryable provider errors (auth, validation, etc.) # propagate -- they are bugs the caller must see, not # conditions to paper over with a fallback summary. + logger.error( + LLM_STRATEGY_FALLBACK, + agent_id=agent_id, + category=category.value, + entry_count=len(entries), + model=self._model, + error=str(exc), + error_type=type(exc).__name__, + reason="non_retryable_provider_error", + exc_info=True, + ) raise🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 487 - 507, In the except ProviderError block inside the consolidation logic (the handler that currently checks exc.is_retryable), add a warning or error log call that logs non-retryable provider errors with the same context fields used for retryable cases (agent_id, category.value, entry_count=len(entries), model=self._model, error=str(exc), error_type=type(exc).__name__) before re-raising; keep using the logger and a clear message (similar to LLM_STRATEGY_FALLBACK) so operators get category/model context for ProviderError paths prior to the raise.
161-164:⚠️ Potential issue | 🟠 MajorEmit
LLM_STRATEGY_SYNTHESIZEDonly afterstore()succeeds.The success event fires inside
_synthesize()before persistence and withoutsummary_id. Ifstore()fails, the logs still report a synthesized summary, and multi-category runs still have no per-group persisted ID beyond the lastConsolidationResult.summary_id.💡 Suggested direction
- if response.content and response.content.strip(): - logger.info( - LLM_STRATEGY_SYNTHESIZED, - agent_id=agent_id, - category=category.value, - entry_count=len(entries), - input_length=len(user_content), - output_length=len(response.content), - model=self._model, - trajectory_context_count=len(trajectory_context), - ) - return response.content.strip(), True + if response.content and response.content.strip(): + return response.content.strip(), True ... new_id = await self._backend.store(agent_id, store_request) + if used_llm: + logger.info( + LLM_STRATEGY_SYNTHESIZED, + agent_id=agent_id, + category=category.value, + entry_count=len(to_remove), + summary_id=new_id, + model=self._model, + trajectory_context_count=len(trajectory_context), + )Also applies to: 250-258, 355-364, 451-461
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 161 - 164, The code currently emits the LLM_STRATEGY_SYNTHESIZED log inside _synthesize() before persisting the summary, which can produce success logs without a persisted summary_id; move the emission so it occurs only after store() completes successfully and include the persisted summary_id from ConsolidationResult.summary_id in the event payload; update every synthesize call site that currently logs before persistence (e.g., the LLM_STRATEGY_SYNTHESIZED emission in _synthesize() and the other similar emit locations) to emit after the corresponding store() returns successfully and handle/store any store() errors so the event is not emitted on failure.
42-43:⚠️ Potential issue | 🟠 Major
group_threshold=2never performs real consolidation.For a 2-entry group,
_select_entries()keeps one entry and_synthesize()sees only the other. That leaves the kept original plus a summary of a single removed entry, so there is no deduplication against the retained entry and no net count reduction.💡 Simplest fix
-_MIN_GROUP_THRESHOLD = 2 +_MIN_GROUP_THRESHOLD = 3 ... - ValueError: If ``group_threshold`` is less than 2. + ValueError: If ``group_threshold`` is less than 3.Also applies to: 115-117, 130-135, 344-353
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 42 - 43, The group_threshold of 2 cannot actually consolidate because _select_entries() will keep one entry and _synthesize() will only see the other, producing a summary that doesn't dedupe the retained entry; change the consolidation minimum to 3 by updating _MIN_GROUP_THRESHOLD and _DEFAULT_GROUP_THRESHOLD to 3 and ensure any parameter validation or clamping (where group_threshold is accepted/used) enforces >= _MIN_GROUP_THRESHOLD so calls into _select_entries() and _synthesize() always operate on groups of 3+ entries.src/synthorg/memory/consolidation/distillation.py (1)
42-47:⚠️ Potential issue | 🟠 MajorPersist
task_idwith the distillation entry.
DistillationRequestsays downstream readers can correlate these entries by task, but the serialized payload dropstask_id. Once this is written to the backend, consumers only see outcome/trajectory/tool names, so task-level correlation is impossible.💡 Minimal fix
- store_content = ( - f"Distillation: {outcome}\n" - f"Trajectory: {trajectory}\n" - f"Memory lookups: {tool_names}" - ) + store_content = ( + f"Task ID: {task_id}\n" + f"Outcome: {outcome}\n" + f"Trajectory: {trajectory}\n" + f"Memory tool invocations: {tool_names}" + )Also applies to: 196-209
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 146-269: The consolidate method is over the 50-line limit; split
its responsibilities into smaller methods: extract the grouping logic into e.g.
a new _build_groups(entries) -> list[tuple[MemoryCategory, list[MemoryEntry]]],
move the TaskGroup orchestration and exception-unwrapping block into a new
_run_groups_taskgroup(groups, agent_id, trajectory_context) ->
list[tuple[NotBlankStr, list[NotBlankStr]]] that creates tasks calling the
existing _process_group and returns task results, and factor the final folding
of group_results into a small _fold_group_results(group_results) ->
ConsolidationResult (or values used to construct it). Keep calls to
_fetch_trajectory_context and _process_group intact and update consolidate to
orchestrate by calling these new helpers so the top-level consolidate remains
under 50 lines.
---
Duplicate comments:
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 487-507: In the except ProviderError block inside the
consolidation logic (the handler that currently checks exc.is_retryable), add a
warning or error log call that logs non-retryable provider errors with the same
context fields used for retryable cases (agent_id, category.value,
entry_count=len(entries), model=self._model, error=str(exc),
error_type=type(exc).__name__) before re-raising; keep using the logger and a
clear message (similar to LLM_STRATEGY_FALLBACK) so operators get category/model
context for ProviderError paths prior to the raise.
- Around line 161-164: The code currently emits the LLM_STRATEGY_SYNTHESIZED log
inside _synthesize() before persisting the summary, which can produce success
logs without a persisted summary_id; move the emission so it occurs only after
store() completes successfully and include the persisted summary_id from
ConsolidationResult.summary_id in the event payload; update every synthesize
call site that currently logs before persistence (e.g., the
LLM_STRATEGY_SYNTHESIZED emission in _synthesize() and the other similar emit
locations) to emit after the corresponding store() returns successfully and
handle/store any store() errors so the event is not emitted on failure.
- Around line 42-43: The group_threshold of 2 cannot actually consolidate
because _select_entries() will keep one entry and _synthesize() will only see
the other, producing a summary that doesn't dedupe the retained entry; change
the consolidation minimum to 3 by updating _MIN_GROUP_THRESHOLD and
_DEFAULT_GROUP_THRESHOLD to 3 and ensure any parameter validation or clamping
(where group_threshold is accepted/used) enforces >= _MIN_GROUP_THRESHOLD so
calls into _select_entries() and _synthesize() always operate on groups of 3+
entries.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 274d0b7e-cf20-406d-b226-b40c46ad2f88
📒 Files selected for processing (5)
docs/design/memory.mdsrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/retriever.pytests/unit/memory/consolidation/test_distillation.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Test (Python 3.14)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python -- Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) for exception handling in Python 3.14 per PEP 758 -- ruff enforces this
All public functions and methods must have type hints, with mypy strict mode enforcement
Google-style docstrings are required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (viamodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with conventions:allow_inf_nan=Falsein all ConfigDict declarations to reject NaN/Inf in numeric fields; use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStrfor all identifier and name fields including optional and tuple variants, instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Maintain a line length of 88 characters (enforced by ruff)
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
src/synthorg/memory/retriever.pysrc/synthorg/memory/consolidation/distillation.pytests/unit/memory/consolidation/test_distillation.pysrc/synthorg/memory/consolidation/llm_strategy.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic must havefrom synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Always useloggeras the variable name (not_logger, notlog)
Always use event name constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api,TOOL_INVOKE_STARTfromevents.tool). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging withlogger.info(EVENT, key=value)-- neverlogger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names allowed only in: (1) Operations design page, (2).claude/files, (3) third-party import paths, (4) provider presets which are user-facing runtime data. Tests must usetest-provider,test-small-001, etc.
Files:
src/synthorg/memory/retriever.pysrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/consolidation/llm_strategy.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/memory/retriever.pysrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/consolidation/llm_strategy.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowfor test markers
Maintain 80% minimum code coverage (enforced in CI)
Prefer@pytest.mark.parametrizefor testing similar cases
Test vendor names must usetest-provider,test-small-001, etc. instead of real vendor names
Use Python Hypothesis for property-based testing with@given+@settings. Hypothesis profiles configured intests/conftest.py:ci(deterministic,max_examples=10+derandomize=True),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples, no deadline). Control viaHYPOTHESIS_PROFILEenv var.
When Hypothesis finds a failing example, it is a real bug. Fix the underlying bug and add an explicit@example(...)decorator to the test so the case is permanently covered in CI.
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/memory/consolidation/test_distillation.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/consolidation/test_distillation.py
docs/**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Documentation is in
docs/(Markdown, built with Zensical, config:mkdocs.yml). Design spec indocs/design/(12 pages). Architecture indocs/architecture/. Roadmap indocs/roadmap/. Security indocs/security.md. Licensing indocs/licensing.md.
Files:
docs/design/memory.md
🧠 Learnings (14)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
Applied to files:
src/synthorg/memory/retriever.pysrc/synthorg/memory/consolidation/distillation.pydocs/design/memory.mdsrc/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`
Applied to files:
tests/unit/memory/consolidation/test_distillation.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/memory/**/*.py : Use MemoryBackend protocol with pluggable backends (Mem0 adapter available at backends/mem0/) for persistent agent memory
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (`is_retryable=True`) include: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR level with context before raising
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
🔇 Additional comments (1)
src/synthorg/memory/retriever.py (1)
294-339: Good filter-before-MMR ordering.Applying the memory filter before
apply_diversity_penalty()keeps excluded entries from acting as similarity anchors, so the fail-closed privacy boundary still defines the visible candidate set.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (5)
src/synthorg/memory/consolidation/llm_strategy.py (4)
487-507:⚠️ Potential issue | 🟡 MinorLog the non-retryable provider path before re-raising.
Auth/validation failures leave this branch without structured
agent_id/category/modelcontext unless an upstream caller happens to log them. Emit a warning/error here beforeraise.As per coding guidelines "Log all error paths at WARNING or ERROR level with context before raising".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 487 - 507, The non-retryable ProviderError path should log the error with the same structured context before re-raising: inside the except ProviderError as exc block (the branch after checking exc.is_retryable), call logger.error (or warning per guidelines) with LLM_STRATEGY_FALLBACK and include agent_id, category.value, entry_count=len(entries), model=self._model, error=str(exc), error_type=type(exc).__name__, and reason="non_retryable_provider_error" so the failure is recorded with context before the function re-raises the exception.
146-269: 🛠️ Refactor suggestion | 🟠 MajorSplit
consolidate()below the 50-line limit.This method still combines grouping, TaskGroup orchestration, exception unwrapping, and result folding in one body, and it remains far above the repo’s 50-line limit. Extract the grouping / parallel execution / fold steps into a few cohesive private methods.
As per coding guidelines "Functions must be less than 50 lines, files must be less than 800 lines".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 146 - 269, The consolidate method is over the 50-line limit and mixes grouping, TaskGroup orchestration, exception unwrapping, and result folding; refactor by extracting responsibilities into small private helpers: implement _group_entries(entries) to perform the sorted+groupby logic and return groups_to_process, implement _execute_group_tasks(groups_to_process, agent_id, trajectory_context) to run the asyncio.TaskGroup, handle the except* branches (MemoryError, RecursionError, ProviderError, Exception) and return the list of task results, and implement _fold_group_results(group_results) to build removed_ids and summary_id and return a ConsolidationResult; then simplify consolidate() to call _group_entries, _fetch_trajectory_context, _execute_group_tasks, _fold_group_results and perform only logging and returning the result (keep existing symbol names: consolidate, _fetch_trajectory_context, STRATEGY_START, STRATEGY_COMPLETE, LLM_STRATEGY_FALLBACK).
42-43:⚠️ Potential issue | 🟠 MajorDisallow pairwise groups or pass the kept entry into synthesis.
With
group_threshold=2,_select_entries()keeps one memory and_process_group()sends only the lone remainder to_synthesize(). The model never sees both entries, so it cannot deduplicate against the retained exemplar; this devolves into summarizing a single entry and can preserve duplicates beside the kept memory. Either raise the minimum to 3, or include the kept entry as read-only context when pairs are allowed.Also applies to: 119-135, 344-353
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 42 - 43, The grouping logic allows pairwise groups (group_threshold=2) which causes _process_group to call _synthesize with only the lone remainder and prevents deduplication; fix this by disallowing pairwise groups: change _MIN_GROUP_THRESHOLD from 2 to 3 (update _DEFAULT_GROUP_THRESHOLD if needed) and ensure all checks that validate or clamp group_threshold (_select_entries, _process_group and any places referenced around the other occurrences) enforce >= _MIN_GROUP_THRESHOLD so you never produce groups of size 2; alternatively, if you must allow pairs, modify _process_group to pass the kept exemplar into _synthesize as read-only context (add a parameter to _synthesize and include the retained entry when calling) so the model can deduplicate against it.
161-164:⚠️ Potential issue | 🟠 MajorDon’t drop the extra summary IDs from multi-category runs.
Each processed category can create its own stored summary, but the fold on Lines 252-258 overwrites
summary_idon every iteration. The docstring’s fallback does not hold either:LLM_STRATEGY_SYNTHESIZEDis emitted inside_synthesize()beforestore()and never includes the backend-assigned ID. Expose all created IDs in the result, or emit per-group IDs immediately after eachstore().Also applies to: 250-258, 355-363, 451-460
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/consolidation/llm_strategy.py` around lines 161 - 164, The ConsolidationResult currently only returns the last created summary ID (ConsolidationResult.summary_id) because the loop in the consolidation path overwrites summary_id each iteration; update the consolidation flow in llm_strategy.py (the loop around the fold at the block that updates summary_id on each iteration) so that all backend-assigned IDs produced by store() are preserved and returned—either add a new field like ConsolidationResult.summary_ids (list) and append each ID after calling store(), or emit/record the per-group ID immediately after store() rather than relying on the LLM_STRATEGY_SYNTHESIZED event (which is logged before store() and lacks the backend ID). Make the same change in the other similar folding sections referenced (the blocks around lines 250–258, 355–363, 451–460) and update any callers of ConsolidationResult to consume summary_ids or fallback to the existing summary_id for backward compatibility.src/synthorg/engine/agent_engine.py (1)
707-711:⚠️ Potential issue | 🟠 MajorCapture distillation on the non-pipeline exits too.
This hook still only runs through
_post_execution_pipeline(). Lines 500-524 inrun()return directly from_handle_budget_error()and_handle_fatal_error(), so pre-flight budget stops and fatal setup/pipeline failures never create distillation entries even though this helper’s docstring says budget exhaustions are captured. Centralize the final return path inrun(), or call_try_capture_distillation(result.execution_result, agent_id, task_id)before those returns.Also applies to: 722-728
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/agent_engine.py` around lines 707 - 711, The run() flow currently skips calling _try_capture_distillation for early exits from _handle_budget_error and _handle_fatal_error (and similar non-pipeline paths), so budget/fatal failures never create distillation entries; modify run() to ensure _try_capture_distillation(result.execution_result, agent_id, task_id) is invoked on those code paths (either by centralizing final return logic so every exit goes through _post_execution_pipeline() or by inserting a direct call to _try_capture_distillation immediately before each return from _handle_budget_error/_handle_fatal_error and the similar early-return blocks around lines ~722-728), keeping the same arguments (execution_result/agent_id/task_id) and preserving existing error handling.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/memory/test_tool_retriever.py`:
- Around line 380-397: Update the test_no_reformulator_is_single_shot case to
assert the sufficiency checker is not invoked when reformulator=None: ensure the
AsyncMock passed as sufficiency_checker (use a named AsyncMock like
AsyncMock(name="sufficiency_checker")) is asserted as not awaited/not called
after calling ToolBasedInjectionStrategy.handle_tool_call, in addition to the
existing assert that backend.retrieve was called once; this verifies the
reformulation loop is fully bypassed.
- Around line 677-695: Combine the three nearly identical tests into one
parameterized pytest test that calls _merge_results with pairs of entries
created by _make_entry and asserts the winner; use `@pytest.mark.parametrize` on a
new test function (e.g., test_collision_relevance_resolution) with cases
covering (low vs high), (high vs low), (None vs positive) and add the two
suggested edge cases: (existing beats None) and (equal scores keeps existing).
Ensure each param supplies (existing_entry, new_entry, expected_relevance) and
the body asserts len(result) == 1 and result[0].relevance_score ==
expected_relevance while still using _merge_results and _make_entry as the
referenced helpers.
---
Duplicate comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 707-711: The run() flow currently skips calling
_try_capture_distillation for early exits from _handle_budget_error and
_handle_fatal_error (and similar non-pipeline paths), so budget/fatal failures
never create distillation entries; modify run() to ensure
_try_capture_distillation(result.execution_result, agent_id, task_id) is invoked
on those code paths (either by centralizing final return logic so every exit
goes through _post_execution_pipeline() or by inserting a direct call to
_try_capture_distillation immediately before each return from
_handle_budget_error/_handle_fatal_error and the similar early-return blocks
around lines ~722-728), keeping the same arguments
(execution_result/agent_id/task_id) and preserving existing error handling.
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 487-507: The non-retryable ProviderError path should log the error
with the same structured context before re-raising: inside the except
ProviderError as exc block (the branch after checking exc.is_retryable), call
logger.error (or warning per guidelines) with LLM_STRATEGY_FALLBACK and include
agent_id, category.value, entry_count=len(entries), model=self._model,
error=str(exc), error_type=type(exc).__name__, and
reason="non_retryable_provider_error" so the failure is recorded with context
before the function re-raises the exception.
- Around line 146-269: The consolidate method is over the 50-line limit and
mixes grouping, TaskGroup orchestration, exception unwrapping, and result
folding; refactor by extracting responsibilities into small private helpers:
implement _group_entries(entries) to perform the sorted+groupby logic and return
groups_to_process, implement _execute_group_tasks(groups_to_process, agent_id,
trajectory_context) to run the asyncio.TaskGroup, handle the except* branches
(MemoryError, RecursionError, ProviderError, Exception) and return the list of
task results, and implement _fold_group_results(group_results) to build
removed_ids and summary_id and return a ConsolidationResult; then simplify
consolidate() to call _group_entries, _fetch_trajectory_context,
_execute_group_tasks, _fold_group_results and perform only logging and returning
the result (keep existing symbol names: consolidate, _fetch_trajectory_context,
STRATEGY_START, STRATEGY_COMPLETE, LLM_STRATEGY_FALLBACK).
- Around line 42-43: The grouping logic allows pairwise groups
(group_threshold=2) which causes _process_group to call _synthesize with only
the lone remainder and prevents deduplication; fix this by disallowing pairwise
groups: change _MIN_GROUP_THRESHOLD from 2 to 3 (update _DEFAULT_GROUP_THRESHOLD
if needed) and ensure all checks that validate or clamp group_threshold
(_select_entries, _process_group and any places referenced around the other
occurrences) enforce >= _MIN_GROUP_THRESHOLD so you never produce groups of size
2; alternatively, if you must allow pairs, modify _process_group to pass the
kept exemplar into _synthesize as read-only context (add a parameter to
_synthesize and include the retained entry when calling) so the model can
deduplicate against it.
- Around line 161-164: The ConsolidationResult currently only returns the last
created summary ID (ConsolidationResult.summary_id) because the loop in the
consolidation path overwrites summary_id each iteration; update the
consolidation flow in llm_strategy.py (the loop around the fold at the block
that updates summary_id on each iteration) so that all backend-assigned IDs
produced by store() are preserved and returned—either add a new field like
ConsolidationResult.summary_ids (list) and append each ID after calling store(),
or emit/record the per-group ID immediately after store() rather than relying on
the LLM_STRATEGY_SYNTHESIZED event (which is logged before store() and lacks the
backend ID). Make the same change in the other similar folding sections
referenced (the blocks around lines 250–258, 355–363, 451–460) and update any
callers of ConsolidationResult to consume summary_ids or fallback to the
existing summary_id for backward compatibility.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4c5db05a-067e-4438-8975-2277cf0b824a
📒 Files selected for processing (6)
src/synthorg/engine/agent_engine.pysrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/consolidation/llm_strategy.pytests/unit/memory/consolidation/test_distillation.pytests/unit/memory/consolidation/test_llm_strategy.pytests/unit/memory/test_tool_retriever.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotationsin Python -- Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax (no parentheses) for exception handling in Python 3.14 per PEP 758 -- ruff enforces this
All public functions and methods must have type hints, with mypy strict mode enforcement
Google-style docstrings are required on all public classes and functions, enforced by ruff D rules
Create new objects rather than mutating existing ones. For non-Pydantic internal collections (registries, BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement.
Use frozen Pydantic models for config and identity; use separate mutable-via-copy models (viamodel_copy(update=...)) for runtime state that evolves. Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with conventions:allow_inf_nan=Falsein all ConfigDict declarations to reject NaN/Inf in numeric fields; use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStrfor all identifier and name fields including optional and tuple variants, instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Maintain a line length of 88 characters (enforced by ruff)
Functions must be less than 50 lines, files must be less than 800 lines
Handle errors explicitly, never silently swallow exceptions
Validate at system boundaries (user input, external APIs, config files)
Files:
tests/unit/memory/test_tool_retriever.pytests/unit/memory/consolidation/test_distillation.pysrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/engine/agent_engine.pytests/unit/memory/consolidation/test_llm_strategy.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e, and@pytest.mark.slowfor test markers
Maintain 80% minimum code coverage (enforced in CI)
Prefer@pytest.mark.parametrizefor testing similar cases
Test vendor names must usetest-provider,test-small-001, etc. instead of real vendor names
Use Python Hypothesis for property-based testing with@given+@settings. Hypothesis profiles configured intests/conftest.py:ci(deterministic,max_examples=10+derandomize=True),dev(1000 examples),fuzz(10,000 examples, no deadline),extreme(500,000 examples, no deadline). Control viaHYPOTHESIS_PROFILEenv var.
When Hypothesis finds a failing example, it is a real bug. Fix the underlying bug and add an explicit@example(...)decorator to the test so the case is permanently covered in CI.
Never skip, dismiss, or ignore flaky tests. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number).
Files:
tests/unit/memory/test_tool_retriever.pytests/unit/memory/consolidation/test_distillation.pytests/unit/memory/consolidation/test_llm_strategy.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/test_tool_retriever.pytests/unit/memory/consolidation/test_distillation.pytests/unit/memory/consolidation/test_llm_strategy.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic must havefrom synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Always useloggeras the variable name (not_logger, notlog)
Always use event name constants from domain-specific modules undersynthorg.observability.events(e.g.,API_REQUEST_STARTEDfromevents.api,TOOL_INVOKE_STARTfromevents.tool). Import directly:from synthorg.observability.events.<domain> import EVENT_CONSTANT
Use structured logging withlogger.info(EVENT, key=value)-- neverlogger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising
Log all state transitions at INFO level
Use DEBUG level logging for object creation, internal flow, and entry/exit of key functions
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Vendor names allowed only in: (1) Operations design page, (2).claude/files, (3) third-party import paths, (4) provider presets which are user-facing runtime data. Tests must usetest-provider,test-small-001, etc.
Files:
src/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/engine/agent_engine.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/engine/agent_engine.py
🧠 Learnings (22)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
Applied to files:
tests/unit/memory/test_tool_retriever.pysrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/consolidation/llm_strategy.pytests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to tests/**/*.py : Fix flaky tests completely and fundamentally; for timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/memory/consolidation/test_distillation.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`
Applied to files:
tests/unit/memory/consolidation/test_distillation.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.pytests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.pytests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (`is_retryable=True`) include: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to src/synthorg/**/*.py : Log all error paths at WARNING or ERROR level with context before raising
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T06:12:03.047Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T06:12:03.047Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Keep functions under 50 lines and files under 800 lines
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Functions must be less than 50 lines; files must be less than 800 lines
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-04T19:50:52.893Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-04T19:50:52.893Z
Learning: Applies to **/*.py : Functions must be less than 50 lines, files must be less than 800 lines
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Applied to files:
src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Applied to files:
src/synthorg/engine/agent_engine.py
📚 Learning: 2026-04-01T09:39:21.584Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T09:39:21.584Z
Learning: Applies to **/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g., multiple tool invocations, parallel agent calls); prefer structured concurrency over bare `create_task`
Applied to files:
tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T18:52:05.142Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T18:52:05.142Z
Learning: Applies to **/*.py : Async concurrency: prefer asyncio.TaskGroup for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare create_task.
Applied to files:
tests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/engine/coordination/**/*.py : Task coordination uses multi-agent pipeline with 4 dispatchers (SAS/centralized/decentralized/context-dependent), wave execution, and workspace lifecycle integration.
Applied to files:
tests/unit/memory/consolidation/test_llm_strategy.py
🔇 Additional comments (3)
tests/unit/memory/test_tool_retriever.py (3)
12-12: LGTM on importing_merge_resultsfor direct unit testing.Testing internal helper functions directly is appropriate here to verify the merge-by-ID logic in isolation.
339-346: Clean helper for reformulation config.Reduces boilerplate across the reformulation tests while correctly setting required fields.
349-640: Excellent coverage of the Search-and-Ask reformulation loop.The tests comprehensively verify:
- Guard paths (disabled, missing reformulator, early sufficiency)
- Multi-round iteration with proper result merging
- Termination conditions (
Nonereturn, stable query, max rounds)- Deduplication by ID with higher-relevance wins
- Error isolation returning current entries without leaking internal errors
The error isolation tests (lines 582-639) particularly stand out — they validate the fail-safe behavior documented in the implementation where exceptions from
sufficiency_checkerorreformulatordegrade gracefully.
| async def test_no_reformulator_is_single_shot(self) -> None: | ||
| """Reformulation skipped when reformulator is None.""" | ||
| entries = (_make_entry(entry_id="a"),) | ||
| backend = _make_backend(entries) | ||
|
|
||
| strategy = ToolBasedInjectionStrategy( | ||
| backend=backend, | ||
| config=_reformulation_config(), | ||
| reformulator=None, | ||
| sufficiency_checker=AsyncMock(), | ||
| ) | ||
|
|
||
| await strategy.handle_tool_call( | ||
| tool_name="search_memory", | ||
| arguments={"query": "find me memories"}, | ||
| agent_id="agent-1", | ||
| ) | ||
| assert backend.retrieve.call_count == 1 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add assertion that sufficiency checker is not invoked when reformulator is missing.
The test verifies single-shot retrieval but doesn't confirm that the sufficiency checker is bypassed entirely. Per the implementation, when reformulator=None, the reformulation loop is skipped completely.
🧪 Proposed enhancement
strategy = ToolBasedInjectionStrategy(
backend=backend,
config=_reformulation_config(),
reformulator=None,
sufficiency_checker=sufficiency,
)
await strategy.handle_tool_call(
tool_name="search_memory",
arguments={"query": "find me memories"},
agent_id="agent-1",
)
assert backend.retrieve.call_count == 1
+ sufficiency.check_sufficiency.assert_not_called()Also update the mock setup to use a named mock for consistency:
+ sufficiency = AsyncMock()
+ sufficiency.check_sufficiency = AsyncMock(return_value=True)
+
strategy = ToolBasedInjectionStrategy(
backend=backend,
config=_reformulation_config(),
reformulator=None,
- sufficiency_checker=AsyncMock(),
+ sufficiency_checker=sufficiency,
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/memory/test_tool_retriever.py` around lines 380 - 397, Update the
test_no_reformulator_is_single_shot case to assert the sufficiency checker is
not invoked when reformulator=None: ensure the AsyncMock passed as
sufficiency_checker (use a named AsyncMock like
AsyncMock(name="sufficiency_checker")) is asserted as not awaited/not called
after calling ToolBasedInjectionStrategy.handle_tool_call, in addition to the
existing assert that backend.retrieve was called once; this verifies the
reformulation loop is fully bypassed.
Adds MMR-style diversity penalty to reduce redundancy in memory retrieval results. Promotes coverage over near-duplicate content. - apply_diversity_penalty() in ranking.py: iterative MMR selection balancing relevance (combined_score) against max similarity to already-selected entries - _bigram_jaccard() similarity using word-level bigrams (no external deps, pluggable via similarity_fn parameter for future embedding-based variants) - diversity_penalty_enabled and diversity_lambda config fields on MemoryRetrievalConfig (lambda=1.0 pure relevance, 0.0 max diversity) - Wired into ContextInjectionStrategy._execute_pipeline after ranking, before memory filter - MEMORY_DIVERSITY_RERANKED event constant Part of #704.
Adds LLMConsolidationStrategy implementing the ConsolidationStrategy protocol. Uses a CompletionProvider to semantically merge memory entries within each category group, producing a synthesized summary that preserves all unique facts while removing redundancy. Addresses SimpleConsolidationStrategy's lack of semantic deduplication. - LLMConsolidationStrategy: groups entries by category, keeps the highest-relevance entry per group (recency tiebreaker), feeds the rest to an LLM for synthesis - Synthesis prompt instructs the model to identify overlapping info, merge semantically related facts, and preserve specific details - Error handling follows AbstractiveSummarizer pattern: re-raise MemoryError/RecursionError, propagate non-retryable ProviderError, gracefully fall back to concatenation for retryable/unexpected errors - Summary stored with metadata.tags = (consolidated, llm-synthesized) - LLM_STRATEGY_SYNTHESIZED and LLM_STRATEGY_FALLBACK event constants - Exported from synthorg.memory.consolidation Part of #704.
Adds DistillationRequest model and capture function for outcome-driven memory curation at task completion. Captures trajectory summary, outcome, and retrieved memory tool usage. Feeds into the consolidation pipeline -- stored as an EPISODIC memory entry tagged with distillation for future consolidation runs. - DistillationRequest: frozen Pydantic model with agent_id, task_id, trajectory_summary, outcome, retrieved_memory_ids, created_at - build_trajectory_summary: summarizes turn count, tool usage, and total tokens from turn records - build_outcome: human-readable description from TerminationReason and error_message - extract_memory_ids: extracts search_memory/recall_memory tool calls from turn records (non-memory tools ignored) - capture_distillation: non-critical capture function that stores the distillation entry; returns None on failure, logs warning, propagates MemoryError/RecursionError - DISTILLATION_CAPTURED and DISTILLATION_CAPTURE_FAILED event constants - Exported from synthorg.memory.consolidation Engine integration point documented but not wired: agent_engine.py _post_execution_pipeline alongside _try_procedural_memory. Part of #704.
Wires the existing QueryReformulator and SufficiencyChecker protocols into ToolBasedInjectionStrategy to enable iterative Search-and-Ask retrieval: search, check sufficiency, reformulate, re-search. The protocols and LLM implementations already existed in memory/reformulation.py; the config fields query_reformulation_enabled and max_reformulation_rounds were reserved. This activates them. - ToolBasedInjectionStrategy accepts optional reformulator and sufficiency_checker kwargs - _retrieve_with_reformulation orchestrates initial search + reformulation loop when enabled - _reformulation_loop runs up to max_reformulation_rounds: check sufficiency -> reformulate -> re-search -> merge results - _merge_results unions entries by ID, keeping higher-relevance on collision - Loop stops early on: sufficient results, reformulator returning None, reformulator returning the same query, or max rounds reached - Config validator relaxed: query_reformulation_enabled=True now allowed when strategy=TOOL_BASED (still rejected for CONTEXT/SELF_EDITING) - MEMORY_REFORMULATION_ROUND and MEMORY_REFORMULATION_SUFFICIENT events Part of #704.
…ry consolidation Pre-reviewed by 12 agents, 45+ findings addressed. ## Scope additions (issue #704 end-to-end wiring) - Wire capture_distillation into AgentEngine._post_execution_pipeline behind distillation_capture_enabled=False opt-in flag - Fetch distillation entries (tagged 'distillation') in LLMConsolidationStrategy and embed trajectory summaries in the synthesis system prompt - Rename DistillationRequest.retrieved_memory_ids -> memory_tool_invocations (field stored tool names, not memory IDs) ## Code fixes - llm_strategy: parallel per-category processing via asyncio.TaskGroup - llm_strategy: delete-before-store + per-delete try/except for partial failure safety; successful IDs only reported in ConsolidationResult - llm_strategy: explicit RetryExhaustedError handling with retryable ProviderError fallback for tests; non-retryable errors propagate - llm_strategy: concat fallback tagged 'concat-fallback' (not 'llm-synthesized') - llm_strategy: per-entry input cap to guard provider context limits - ranking: MMR sentinel = -math.inf (not -1.0) + pre-computed bigram cache - ranking: pre-raise log for apply_diversity_penalty ValueError - retrieval_config: diversity_penalty/TOOL_BASED warn validator, stale reformulation docstrings updated, personal_boost/RRF warning clarified - retriever: fail-closed on memory filter exceptions (no longer silently leaks unfiltered memories when a privacy filter raises) - tool_retriever: Search-and-Ask loop now wraps sufficiency_checker and reformulator calls in error isolation, emits MEMORY_REFORMULATION_FAILED / MEMORY_SUFFICIENCY_CHECK_FAILED, adds max-rounds-exhausted log - tool_retriever: explicit guard (was assert) for type narrowing - tool_retriever: all error strings prefixed with ERROR_PREFIX; rejected invalid categories surfaced back to the LLM for self-correction - tool_retriever: docstring covers reformulator/sufficiency_checker params - Pre-raise error logs at MemoryError/RecursionError handlers per CLAUDE.md ## Tests - RecursionError propagation for distillation + llm_strategy - capture_distillation store request shape assertion - LLMConsolidationStrategy: tiebreaker, None relevance, fallback truncation, synthesis content verbatim, fallback tag, partial delete failure - LLMConsolidationStrategy: trajectory context fetch, disabled mode, lookup failure - AgentEngine distillation capture wiring (new test file) - Reformulation dedup across rounds, sufficiency_checker/reformulator errors - Invalid category surfacing - MMR all-negative-score deterministic behavior - Diversity/TOOL_BASED warn validator - Memory filter fail-closed behavior - test_ranking.py split into test_ranking.py + test_ranking_diversity.py (was 823 lines, now 627 + 256) ## Docs - docs/design/memory.md: reformulation wired-in, LLMConsolidationStrategy, DistillationRequest capture, MMR diversity re-ranking, fail-closed filter - CLAUDE.md: memory package structure updated, events.consolidation examples Closes #704
**Data safety (MAJOR):** Fix potential data loss in `LLMConsolidationStrategy._process_group` where entries were deleted BEFORE synthesis/store. A non-retryable `ProviderError` (e.g. `AuthenticationError`) raised by `_synthesize` would silently drop already-deleted entries with no summary stored. Reorder to synthesize-and-store first, then delete — synthesis failure now leaves all originals intact. Update docstring and add `test_no_deletes_on_synthesis_failure` to document the safety guarantee. **Observability (MEDIUM):** Promote `LLM_STRATEGY_SYNTHESIZED` log call from `logger.debug` to `logger.info` in `_synthesize`. Per-group synthesis is a significant memory-mutation event (originals are deleted and a summary is stored); operators need INFO-level visibility for production auditing. **Misleading event name (MAJOR):** `apply_diversity_penalty` was logging a validation error (`diversity_lambda out of range`) using the success event `MEMORY_DIVERSITY_RERANKED`. Add new constant `MEMORY_DIVERSITY_RERANK_FAILED` to `events/memory.py`, update the import in `ranking.py`, and update CLAUDE.md event reference list. https://claude.ai/code/session_015u2sAJYren3pzTAKqeqw3Q
- llm_strategy.py: add logger.error before re-raising MemoryError/ RecursionError in _fetch_trajectory_context and _process_group (CLAUDE.md: all error paths must log before raising) - agent_engine.py: remove incorrect `# pragma: no cover` from the `_memory_backend is None` guard in _try_capture_distillation; the path is exercised by test_distillation_without_backend_is_noop - test_distillation.py: move test_error_without_message_falls_through from TestCaptureDistillation to TestBuildOutcome and convert from async def to def (tests a synchronous function) https://claude.ai/code/session_018t73D4YLoCph1fLFqX6gR6
- llm_strategy.py: log ExceptionGroup before re-raising in except* handlers so operators can diagnose multi-task failures (CodeRabbit follow-up #4058984940) - retriever.py: run MemoryFilterStrategy BEFORE MMR diversity re-ranking so privacy-excluded entries are not used as MMR anchors (CodeRabbit #4058864502) - distillation.py: store tool invocation names in content instead of bare count so LLM consolidation has richer trajectory context - test_distillation.py: remove spurious async from synchronous test helper test_error_without_message_falls_through (CodeRabbit async test cleanup) - docs/design/memory.md: update pipeline step ordering to match implementation https://claude.ai/code/session_011YwDu4DnVXM4yt3w3SVY1t
- distillation.py: fix grammar bug -- produce "1 turn" (singular) vs "N turns" (plural) in build_trajectory_summary; previously always produced "1 turns" for single-turn executions - test_distillation.py: update assertion to match corrected output, hoist deferred MemoryCategory import to module level, fix import order - test_tool_retriever.py: hoist repeated per-method _merge_results imports to module level (5 identical deferred imports removed) - test_llm_strategy.py: collapse test_empty_llm_response_falls_back and test_whitespace_llm_response_falls_back into a single @pytest.mark.parametrize test per CLAUDE.md conventions https://claude.ai/code/session_011YwDu4DnVXM4yt3w3SVY1t
…tillation capture Two correctness fixes surfaced by post-push review agent analysis: - memory/consolidation/llm_strategy.py: _fallback_summary accessed entries[0].category.value unconditionally; if _synthesize is called with an empty list (possible when a backend dedup bug yields all-same-ID entries), this raises IndexError. Guard with early `if not entries: return ""` before the header line. - engine/agent_engine.py: _try_capture_distillation declared agent_id/task_id as plain str but passed them to capture_distillation which requires NotBlankStr. Added TypeAdapter(NotBlankStr).validate_python() coercion so blank-string caller bugs are caught at the boundary rather than silently bypassing the NotBlankStr constraint on DistillationRequest fields. https://claude.ai/code/session_011YwDu4DnVXM4yt3w3SVY1t
Addresses 33 findings from local review agents (14) + CodeRabbit, Gemini, Copilot on PR #1071. Follow-up items tracked in #1075. ## Behavior + correctness - `_MIN_GROUP_THRESHOLD` raised from 2 to 3: at threshold 2, `_select_entries` kept 1 entry and `_synthesize` received 1, so the LLM could not dedup against the retained entry -- "consolidation" was just a paraphrase of a single entry. - Non-retryable `ProviderError` in `_synthesize` is now logged at ERROR with agent/category/model context before re-raise (was silently propagated with no log). - `LLM_STRATEGY_SYNTHESIZED` emits AFTER `backend.store()` succeeds, includes the persisted `summary_id`, fires from `_process_group` rather than inside `_synthesize`. - `ConsolidationResult.summary_id` -> `summary_ids: tuple[NotBlankStr, ...]` so multi-category runs surface every summary produced, not just the last one; scalar `summary_id` retained as a `@computed_field` returning the last element. `SimpleConsolidationStrategy` and `DualModeConsolidationStrategy` updated to populate the plural field. - `_fetch_trajectory_context` failure path promoted from DEBUG to WARNING (ops invisibility at default INFO log level). - `apply_diversity_penalty` success path promoted from DEBUG to INFO to match other state-transition events; added `similarity` field so bigram-cached vs custom paths are distinguishable. - New `LLM_STRATEGY_ERROR` event; `LLM_STRATEGY_FALLBACK` is now reserved for genuine concat-fallback returns, and error paths (TaskGroup unwrap, trajectory fetch, delete failures, non-retryable provider errors) emit the dedicated error event so dashboards keyed on fallback do not fire on unrelated errors. - `distillation.capture_distillation` now persists `Task ID:` in the EPISODIC store content so downstream readers can correlate entries by task, and the docstring is explicit about the system-error propagation contract. - `ToolBasedInjectionStrategy.__init__` now raises when `config.query_reformulation_enabled` is set but either `reformulator` or `sufficiency_checker` is missing (silent no-op was a configuration trap). - `MemoryRetrievalConfig` diversity-penalty validator now RAISES on non-CONTEXT strategy (was WARN) -- symmetric with the reformulation validator; silent no-op on deployment was worse than a hard error. - Search-and-Ask loop: backend-retrieve failures mid-loop degrade to the accumulated entries (was propagating and discarding them), and results are truncated to `limit` on return so the tool contract is honoured regardless of how many rounds ran. - Added `MEMORY_REFORMULATION_EXHAUSTED` event for max-rounds and reformulator-stopped terminal states (was conflated with per-round `MEMORY_REFORMULATION_ROUND`). - `apply_diversity_penalty` filter-before-MMR ordering preserved (already in base branch) and now extracted into a dedicated `_filter_or_fail_closed` helper for clarity. - `_try_capture_distillation` skip paths now log at DEBUG (`capture_disabled`, `no_memory_backend`) so operators can debug missing distillation entries. ## Security hardening - Prompt-injection defence: each memory entry in the synthesis user prompt is wrapped in `<entry category="...">...</entry>` with internal delimiter escaping, and the system prompt explicitly instructs the model to treat tag content as data, not instructions. Trajectory context snippets get the same treatment. - Total concatenated user-content length capped at ~20000 chars (`_MAX_TOTAL_USER_CONTENT_CHARS`) with WARNING-logged truncation so oversized groups do not blow out context windows or cost budgets. ## Refactors for CLAUDE.md function-length rule (<50 lines) - `LLMConsolidationStrategy.consolidate` (124 -> thin orchestrator over `_build_groups`, `_run_groups`, `_log_taskgroup_failure`, `_assemble_result`). - `_synthesize` (~130 -> split across `_build_user_prompt`, `_call_llm`, `_handle_provider_error`, `_build_system_prompt`, `_fallback_summary`). - `_process_group` split into `_store_summary` + `_delete_consolidated`. - `capture_distillation` split into `_build_distillation_request` + `_render_store_content`. - `ToolBasedInjectionStrategy._reformulation_loop` split into `_run_reformulation_step` + `_check_sufficiency` + `_reformulate` + `_retrieve_round` helpers. - `_handle_search` split with `_safe_search` error-isolation helper. - `ContextInjectionStrategy._execute_pipeline` extracts `_filter_or_fail_closed`. ## Documentation - `AgentEngine` class docstring now documents `distillation_capture_enabled` and notes that `memory_backend` is used by both procedural memory and distillation. - `MemoryRetrievalConfig` docstring Attributes block includes the four new fields (`diversity_penalty_enabled`, `diversity_lambda`, `query_reformulation_enabled`, `max_reformulation_rounds`). - `docs/design/memory.md` Consolidation Strategies row expanded with fallback paths, kept-entry behaviour, trajectory context limits, prompt-injection defence, and multi-summary accessor note. - Module docstring of `distillation.py` clarifies "tool *names*, not entry IDs". - `TestExtractMemoryIds` test class renamed to `TestExtractMemoryToolInvocations` (stale name after field rename). ## Tests - New `tests/unit/memory/test_retriever_diversity.py` with integration tests that verify `diversity_penalty_enabled=True` is actually wired into `ContextInjectionStrategy._execute_pipeline` and produces a different ordering from the disabled case. - New `test_store_failure_after_synthesis_propagates_no_deletes` test guards the synthesize-before-delete invariant against future regressions. - New `test_trajectory_context_entry_limit_and_truncation` test pins `_MAX_TRAJECTORY_CONTEXT_ENTRIES=5` and per-entry truncation to `_MAX_TRAJECTORY_CHARS_PER_ENTRY=500`. - Updated `test_llm_strategy.py` threshold tests for the new `_MIN_GROUP_THRESHOLD=3` rule. - Updated `test_retrieval_config.py` diversity validator test for the new raise semantics. - Updated `test_tool_retriever.py` to exercise the new ctor-time reformulator/sufficiency-checker guard. - Updated `test_distillation.py` store-shape assertion for the new `Task ID:` line. - Added `test_multi_summary` + `test_summary_ids_reject_duplicates` coverage for the `ConsolidationResult.summary_ids` plural field. ## Test + lint status - `uv run ruff check src/ tests/` -- clean - `uv run mypy src/ tests/` -- 1539 source files, no issues - `uv run python -m pytest tests/unit/ -n 8` -- 14154 passed, 5 skipped Closes #704.
e659544 to
65df4b3
Compare
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/synthorg/memory/consolidation/service.py (1)
116-122: 🛠️ Refactor suggestion | 🟠 MajorUse
model_copy(update=...)to evolveConsolidationResultinstead of field-by-field reconstruction.This manual rebuild is drift-prone:
summary_idswas previously dropped here, and the next field added toConsolidationResultwill be easy to lose again. Usingmodel_copy(update=...)preserves all fields automatically and aligns with the established pattern throughout the codebase for evolving frozen Pydantic models.Suggested change
- result = ConsolidationResult( - removed_ids=result.removed_ids, - summary_ids=result.summary_ids, - archived_count=archived, - mode_assignments=result.mode_assignments, - archival_index=index, - ) + result = result.model_copy( + update={ + "archived_count": archived, + "archival_index": index, + } + )Per coding guidelines: "Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (using
model_copy(update=...)) for runtime state that evolves".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/consolidation/service.py` around lines 116 - 122, Replace the manual field-by-field reconstruction of the ConsolidationResult with a model_copy(update=...) call on the existing result object to preserve all fields automatically; e.g. call result = result.model_copy(update={"archived_count": archived, "archival_index": index}) instead of creating a new ConsolidationResult(...), so fields like summary_ids and any future fields are not accidentally dropped (use the existing result variable and the ConsolidationResult model_copy method).src/synthorg/memory/tool_retriever.py (1)
239-256:⚠️ Potential issue | 🟠 MajorValidate
queryas a string before returning it.This helper can currently hand back
123or{...}asquery_textbecause it only checks truthiness plusstr(...).strip()._handle_search()then callslen(query_text)on Line 441 and can raise before_safe_search()converts the bad tool call intoError: query must be a non-empty string.Normalize to a strippedstror reject non-strings here.🛠️ Suggested fix
- query_text = arguments.get("query", "") - if not query_text or not str(query_text).strip(): + query_raw = arguments.get("query", "") + if not isinstance(query_raw, str): + return None, 0, None, () + query_text = query_raw.strip() + if not query_text: return None, 0, None, ()Also applies to: 430-441
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/memory/tool_retriever.py` around lines 239 - 256, The helper currently only checks truthiness and str(...).strip(), allowing non-string query values like ints or dicts to slip through and later cause len(query_text) errors in _handle_search; update the validation around query_text so you first check its type and normalize: if arguments.get("query") is not an instance of str, return the same failure tuple (None, 0, None, ()); otherwise set query_text = arguments["query"].strip() and then proceed with the empty check and remaining logic; reference the local variable query_text and the downstream callers like _handle_search/_safe_search when making this change.
♻️ Duplicate comments (2)
src/synthorg/engine/agent_engine.py (1)
510-534:⚠️ Potential issue | 🟠 MajorBudget and fatal exits still bypass distillation capture.
_try_capture_distillation()only runs from_post_execution_pipeline(). TheBudgetExhaustedErrorand generic fatal branches return directly fromrun(), so those completion outcomes still never create distillation entries.🛠️ One straightforward fix
except BudgetExhaustedError as exc: - return self._handle_budget_error( + result = self._handle_budget_error( exc=exc, identity=identity, task=task, agent_id=agent_id, task_id=task_id, duration_seconds=time.monotonic() - start, ctx=ctx, system_prompt=system_prompt, ) + await self._try_capture_distillation( + result.execution_result, + agent_id, + task_id, + ) + return result except Exception as exc: - return await self._handle_fatal_error( + result = await self._handle_fatal_error( exc=exc, identity=identity, task=task, agent_id=agent_id, task_id=task_id, duration_seconds=time.monotonic() - start, ctx=ctx, system_prompt=system_prompt, completion_config=completion_config, effective_autonomy=effective_autonomy, provider=provider, ) + await self._try_capture_distillation( + result.execution_result, + agent_id, + task_id, + ) + return resultAlso applies to: 717-721
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/synthorg/engine/agent_engine.py` around lines 510 - 534, The BudgetExhaustedError and generic Exception branches in run() return via _handle_budget_error and _handle_fatal_error without ever invoking _try_capture_distillation(), so distillation entries are never created; update those exception handlers in run() to call _try_capture_distillation(...) (with the same args used in normal completion) before returning, or refactor so both _handle_budget_error and _handle_fatal_error themselves invoke _try_capture_distillation() (or call _post_execution_pipeline() which calls _try_capture_distillation()), ensuring that any early-return paths (BudgetExhaustedError branch and the generic Exception branch) perform the same distillation capture as the normal flow.tests/unit/memory/test_tool_retriever.py (1)
691-709: 🧹 Nitpick | 🔵 TrivialCollapse the collision cases into one parametrized test.
These three methods only vary the input scores. One parametrized test keeps the same coverage and is easier to extend with the remaining edge cases here.
♻️ Proposed refactor
- def test_higher_relevance_wins_on_collision(self) -> None: - low = _make_entry(entry_id="dup", relevance_score=0.3) - high = _make_entry(entry_id="dup", relevance_score=0.9) - result = _merge_results((low,), (high,)) - assert len(result) == 1 - assert result[0].relevance_score == 0.9 - - def test_existing_wins_when_new_is_lower(self) -> None: - high = _make_entry(entry_id="dup", relevance_score=0.9) - low = _make_entry(entry_id="dup", relevance_score=0.3) - result = _merge_results((high,), (low,)) - assert len(result) == 1 - assert result[0].relevance_score == 0.9 - - def test_none_relevance_treated_as_zero(self) -> None: - none_rel = _make_entry(entry_id="dup", relevance_score=None) - with_rel = _make_entry(entry_id="dup", relevance_score=0.1) - result = _merge_results((none_rel,), (with_rel,)) - assert result[0].relevance_score == 0.1 + `@pytest.mark.parametrize`( + ("existing_score", "new_score", "expected_score"), + [ + pytest.param(0.3, 0.9, 0.9, id="new_higher_wins"), + pytest.param(0.9, 0.3, 0.9, id="existing_higher_kept"), + pytest.param(None, 0.1, 0.1, id="none_treated_as_zero"), + ], + ) + def test_collision_relevance_resolution( + self, + existing_score: float | None, + new_score: float | None, + expected_score: float | None, + ) -> None: + existing = _make_entry(entry_id="dup", relevance_score=existing_score) + new = _make_entry(entry_id="dup", relevance_score=new_score) + result = _merge_results((existing,), (new,)) + assert len(result) == 1 + assert result[0].relevance_score == expected_scoreAs per coding guidelines, "Parametrize: Prefer
@pytest.mark.parametrizefor testing similar cases".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/memory/test_tool_retriever.py` around lines 691 - 709, Collapse the three tests into one parametrized pytest test that exercises _merge_results via inputs built with _make_entry: use `@pytest.mark.parametrize` with cases for (existing_entries, new_entries, expected_score) covering (low then high => 0.9), (high then low => 0.9), and (none then 0.1 => 0.1); in the test call _merge_results(existing_entries, new_entries), assert len(result) == 1 and result[0].relevance_score == expected_score so all three behaviors (higher relevance wins, existing wins when new is lower, None treated as zero) are covered in a single data-driven test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/design/memory.md`:
- Around line 419-423: Clarify the failure contract for LLMConsolidationStrategy
by explicitly separating which errors trigger the concat-fallback path versus
which errors propagate: state that concat-fallback is invoked for
RetryExhaustedError, retryable ProviderError, empty LLM response, and unexpected
non-ProviderError exceptions (logged at WARNING and tagged "concat-fallback"),
whereas non-retryable ProviderError is logged at ERROR and propagated, and
system errors (MemoryError, RecursionError) likewise propagate; reference
LLMConsolidationStrategy, RetryExhaustedError, ProviderError, the
"llm-synthesized" tag, and the "concat-fallback" tag when making the wording
change.
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 378-405: The code currently deletes every entry returned by
_select_entries even when _build_user_prompt dropped some due to
_MAX_TOTAL_USER_CONTENT_CHARS; update the flow so only entries actually included
in the LLM prompt are deleted and omitted entries are preserved or processed in
subsequent chunks. Concretely: have _build_user_prompt (or _select_entries)
return both included_entries and omitted_entries (or make _synthesize return
which items were consumed), change _process_group to pass the included subset
into _synthesize and only pass the included_ids into _delete_consolidated; if
omitted_entries remain, loop to synthesize/delete them in additional chunks or
queue them for a fallback summary. Apply the same change to the other occurrence
noted (around the second block referencing lines 543-569) so deletion only
targets summarized items and omitted items are retained/processed.
- Around line 330-335: The retrieve call assumes recency ordering but
MemoryBackend.retrieve() returns relevance-ordered; change the logic around
MemoryQuery/ self._backend.retrieve to fetch the distillation entries
(remove/raise the limit so you get all relevant metadata results), then locally
sort the returned entries by their metadata.created_at in descending order and
slice to _MAX_TRAJECTORY_CONTEXT_ENTRIES before returning; reference
MemoryQuery, _DISTILLATION_TAG, _MAX_TRAJECTORY_CONTEXT_ENTRIES, and
self._backend.retrieve and ensure you use the metadata.created_at field for the
sort.
In `@src/synthorg/memory/consolidation/models.py`:
- Around line 109-112: Add a Pydantic "before" validator on the summary_ids
field to accept legacy summary_id input and map it into summary_ids;
specifically, add a `@field_validator`("summary_ids", mode="before") (or
equivalent) on the ConsolidationResult model that checks incoming data for a
"summary_id" key (or a single string), converts it into a one-element tuple of
NotBlankStr when summary_ids is empty/not provided, and otherwise returns the
original summary_ids value unchanged so existing construction via summary_ids
continues to work.
In `@src/synthorg/memory/ranking.py`:
- Around line 473-481: The guard in apply_diversity_penalty() allows
float("nan") through because NaN comparisons are false; explicitly check for NaN
(e.g., using math.isnan(diversity_lambda) or the self-inequality check) before
the existing range check and treat it like an invalid value: call logger.warning
with MEMORY_DIVERSITY_RERANK_FAILED and the same param/value/reason information
and then raise ValueError(msg) so NaN no longer bypasses the validation for
diversity_lambda.
In `@src/synthorg/memory/retrieval_config.py`:
- Around line 153-161: The docstring for the iterative reformulation flag is out
of date: replace the phrase that it "silently no-ops if either is missing" with
wording that it will raise a constructor-time error when either `reformulator`
or `sufficiency_checker` is not provided (fail-fast behavior) for
`ToolBasedInjectionStrategy` construction; keep the rest of the description
about enabling iterative reformulation, `max_reformulation_rounds`, requirement
that this applies only to the `TOOL_BASED` strategy, and that a validator
rejects this flag with strategies other than `TOOL_BASED`, and reference
`ToolBasedInjectionStrategy` and the constructor validation behavior to locate
the relevant code.
In `@src/synthorg/memory/tool_retriever.py`:
- Around line 165-216: The current _merge_results keeps first-round ordering and
appends later unseen entries so _truncate_entries may drop better later results;
change the merge to perform a cross-round re-ranking before truncation: combine
existing and new into a single map by id (as _merge_results already does), then
produce a list of unique MemoryEntry values sorted by (relevance_score or 0.0)
descending with a stable tiebreaker (e.g., original appearance order or a
timestamp) so later higher-relevance unseen entries can displace lower-relevance
earlier ones, and then let _truncate_entries enforce the caller limit; update
_merge_results (and any callers relying on its ordering) to accept this
re-ranked ordering while still deduping by MemoryEntry.id and preserving
stability for equal scores.
In `@tests/unit/memory/test_ranking_diversity.py`:
- Around line 179-192: The test test_custom_similarity_fn_used is a false
positive because single-word contents yield default bigram Jaccard 0.0; update
the test so the inputs are multi-word (e.g., "foo bar" vs "baz bar") or
otherwise ensure default similarity would be non-zero, then assert that passing
similarity_fn=lambda _x,_y: 1.0 changes ranking; modify the call that constructs
entries via _make_scored (entries a and b) and the apply_diversity_penalty
invocation to use multi-word content so the custom similarity_fn is actually
exercised.
In `@tests/unit/memory/test_retriever_diversity.py`:
- Around line 146-169: The test test_diversity_penalty_disabled_skips_mmr
currently only checks ordering and can pass even if MMR ran; update it to spy on
synthorg.memory.retriever.apply_diversity_penalty and assert it is never called
when MemoryRetrievalConfig(diversity_penalty_enabled=False) is used. Use a
mocking/spying tool (e.g., unittest.mock.patch or pytest-mock) to patch
apply_diversity_penalty before creating ContextInjectionStrategy and calling
strategy.prepare_messages, then after the call assert the spy/mock's
assert_not_called (or equivalent) to verify apply_diversity_penalty was skipped.
Ensure the spy references the exact symbol name apply_diversity_penalty to catch
calls from ContextInjectionStrategy.
---
Outside diff comments:
In `@src/synthorg/memory/consolidation/service.py`:
- Around line 116-122: Replace the manual field-by-field reconstruction of the
ConsolidationResult with a model_copy(update=...) call on the existing result
object to preserve all fields automatically; e.g. call result =
result.model_copy(update={"archived_count": archived, "archival_index": index})
instead of creating a new ConsolidationResult(...), so fields like summary_ids
and any future fields are not accidentally dropped (use the existing result
variable and the ConsolidationResult model_copy method).
In `@src/synthorg/memory/tool_retriever.py`:
- Around line 239-256: The helper currently only checks truthiness and
str(...).strip(), allowing non-string query values like ints or dicts to slip
through and later cause len(query_text) errors in _handle_search; update the
validation around query_text so you first check its type and normalize: if
arguments.get("query") is not an instance of str, return the same failure tuple
(None, 0, None, ()); otherwise set query_text = arguments["query"].strip() and
then proceed with the empty check and remaining logic; reference the local
variable query_text and the downstream callers like _handle_search/_safe_search
when making this change.
---
Duplicate comments:
In `@src/synthorg/engine/agent_engine.py`:
- Around line 510-534: The BudgetExhaustedError and generic Exception branches
in run() return via _handle_budget_error and _handle_fatal_error without ever
invoking _try_capture_distillation(), so distillation entries are never created;
update those exception handlers in run() to call _try_capture_distillation(...)
(with the same args used in normal completion) before returning, or refactor so
both _handle_budget_error and _handle_fatal_error themselves invoke
_try_capture_distillation() (or call _post_execution_pipeline() which calls
_try_capture_distillation()), ensuring that any early-return paths
(BudgetExhaustedError branch and the generic Exception branch) perform the same
distillation capture as the normal flow.
In `@tests/unit/memory/test_tool_retriever.py`:
- Around line 691-709: Collapse the three tests into one parametrized pytest
test that exercises _merge_results via inputs built with _make_entry: use
`@pytest.mark.parametrize` with cases for (existing_entries, new_entries,
expected_score) covering (low then high => 0.9), (high then low => 0.9), and
(none then 0.1 => 0.1); in the test call _merge_results(existing_entries,
new_entries), assert len(result) == 1 and result[0].relevance_score ==
expected_score so all three behaviors (higher relevance wins, existing wins when
new is lower, None treated as zero) are covered in a single data-driven test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 28d77bd7-7f7b-4252-a669-72e19566a682
📒 Files selected for processing (29)
CLAUDE.mddocs/design/memory.mdsrc/synthorg/engine/agent_engine.pysrc/synthorg/memory/consolidation/__init__.pysrc/synthorg/memory/consolidation/distillation.pysrc/synthorg/memory/consolidation/dual_mode_strategy.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/consolidation/models.pysrc/synthorg/memory/consolidation/service.pysrc/synthorg/memory/consolidation/simple_strategy.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/retriever.pysrc/synthorg/memory/tool_retriever.pysrc/synthorg/memory/tools.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/observability/events/memory.pytests/unit/engine/test_agent_engine_distillation.pytests/unit/memory/consolidation/test_distillation.pytests/unit/memory/consolidation/test_llm_strategy.pytests/unit/memory/consolidation/test_models.pytests/unit/memory/consolidation/test_service.pytests/unit/memory/test_memory_tools.pytests/unit/memory/test_ranking.pytests/unit/memory/test_ranking_diversity.pytests/unit/memory/test_retrieval_config.pytests/unit/memory/test_retriever.pytests/unit/memory/test_retriever_diversity.pytests/unit/memory/test_tool_retriever.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Test (Python 3.14)
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Build Sandbox
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations-- Python 3.14 has PEP 649
Use PEP 758 except syntax: useexcept A, B:(no parentheses) -- ruff enforces this on Python 3.14
Type hints: all public functions, mypy strict mode
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules)
Immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves
Never mix static config fields with mutable runtime fields in one model
Models: Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Useallow_inf_nan=Falsein allConfigDictdeclarations to rejectNaN/Infin numeric fields at validation time
Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Async concurrency: preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Line length: 88 characters (ruff)
Functions: < 50 lines, files < 800 lines
Errors: handle explicitly, never silently swallow
Validate: at system boundaries (user input, external APIs, config files)
Use ruff check and ruff format for li...
Files:
src/synthorg/memory/consolidation/service.pytests/unit/memory/test_ranking.pytests/unit/memory/consolidation/test_service.pytests/unit/memory/test_memory_tools.pytests/unit/memory/test_retriever.pysrc/synthorg/memory/tools.pysrc/synthorg/memory/consolidation/dual_mode_strategy.pysrc/synthorg/memory/consolidation/__init__.pysrc/synthorg/memory/consolidation/simple_strategy.pysrc/synthorg/memory/consolidation/models.pytests/unit/engine/test_agent_engine_distillation.pytests/unit/memory/test_ranking_diversity.pysrc/synthorg/memory/retriever.pysrc/synthorg/observability/events/memory.pytests/unit/memory/test_retrieval_config.pysrc/synthorg/observability/events/consolidation.pytests/unit/memory/test_retriever_diversity.pytests/unit/memory/consolidation/test_models.pysrc/synthorg/engine/agent_engine.pytests/unit/memory/consolidation/test_distillation.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/consolidation/distillation.pytests/unit/memory/test_tool_retriever.pytests/unit/memory/consolidation/test_llm_strategy.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code (exception:observability/setup.py,observability/sinks.py,observability/syslog_handler.py, andobservability/http_handler.pymay use stdlibloggingandprint(..., file=sys.stderr))
Variable name for logger: alwayslogger(not_logger, notlog)
Event names: always use constants from the domain-specific module undersynthorg.observability.events(imported directly, never as string literals)
Structured kwargs: alwayslogger.info(EVENT, key=value)-- neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO
DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases
Files:
src/synthorg/memory/consolidation/service.pysrc/synthorg/memory/tools.pysrc/synthorg/memory/consolidation/dual_mode_strategy.pysrc/synthorg/memory/consolidation/__init__.pysrc/synthorg/memory/consolidation/simple_strategy.pysrc/synthorg/memory/consolidation/models.pysrc/synthorg/memory/retriever.pysrc/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/consolidation/distillation.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/memory/consolidation/service.pysrc/synthorg/memory/tools.pysrc/synthorg/memory/consolidation/dual_mode_strategy.pysrc/synthorg/memory/consolidation/__init__.pysrc/synthorg/memory/consolidation/simple_strategy.pysrc/synthorg/memory/consolidation/models.pysrc/synthorg/memory/retriever.pysrc/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/consolidation/distillation.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Tests must usetest-provider,test-small-001, etc. instead of real vendor names
Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Async:asyncio_mode = "auto"-- no manual@pytest.mark.asyncioneeded
Timeout: 30 seconds per test (global inpyproject.toml-- do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed)
Parametrize: Prefer@pytest.mark.parametrizefor testing similar cases
Property-based testing with Hypothesis: use@given+@settings, Hypothesis profiles intests/conftest.pywithci,dev,fuzz,extremeprofiles controlled viaHYPOTHESIS_PROFILEenv var
When Hypothesis finds a failure, it is a real bug -- fix the underlying bug and add an explicit@example(...)decorator to the test
Flaky tests: NEVER skip, dismiss, or ignore -- always fix them fully. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number)
Files:
tests/unit/memory/test_ranking.pytests/unit/memory/consolidation/test_service.pytests/unit/memory/test_memory_tools.pytests/unit/memory/test_retriever.pytests/unit/engine/test_agent_engine_distillation.pytests/unit/memory/test_ranking_diversity.pytests/unit/memory/test_retrieval_config.pytests/unit/memory/test_retriever_diversity.pytests/unit/memory/consolidation/test_models.pytests/unit/memory/consolidation/test_distillation.pytests/unit/memory/test_tool_retriever.pytests/unit/memory/consolidation/test_llm_strategy.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/test_ranking.pytests/unit/memory/consolidation/test_service.pytests/unit/memory/test_memory_tools.pytests/unit/memory/test_retriever.pytests/unit/engine/test_agent_engine_distillation.pytests/unit/memory/test_ranking_diversity.pytests/unit/memory/test_retrieval_config.pytests/unit/memory/test_retriever_diversity.pytests/unit/memory/consolidation/test_models.pytests/unit/memory/consolidation/test_distillation.pytests/unit/memory/test_tool_retriever.pytests/unit/memory/consolidation/test_llm_strategy.py
🧠 Learnings (48)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability package (observability/): structured logging, correlation tracking, log sinks; event constants organized by domain under observability/events/ (e.g., events.api, events.tool, events.git, events.context_budget, events.backup)
Applied to files:
src/synthorg/memory/tools.pysrc/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-17T06:30:14.180Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:30:14.180Z
Learning: Applies to src/synthorg/observability/**/*.py : Observability includes structured logging via `get_logger(__name__)`, correlation tracking, and log sinks.
Applied to files:
src/synthorg/memory/tools.py
📚 Learning: 2026-03-31T16:09:24.320Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T16:09:24.320Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from `synthorg.observability.events.<domain>` modules (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly and use in structured logging
Applied to files:
src/synthorg/memory/tools.pysrc/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly from the domain module
Applied to files:
src/synthorg/memory/tools.pysrc/synthorg/observability/events/memory.pysrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-18T21:23:23.586Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-18T21:23:23.586Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under synthorg.observability.events (e.g., API_REQUEST_STARTED from events.api, TOOL_INVOKE_START from events.tool). Import directly from synthorg.observability.events.<domain>.
Applied to files:
src/synthorg/memory/tools.pysrc/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from synthorg.observability.events domain-specific modules (e.g., PROVIDER_CALL_START from events.provider). Import directly: from synthorg.observability.events.<domain> import EVENT_CONSTANT.
Applied to files:
src/synthorg/memory/tools.pysrc/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
Applied to files:
src/synthorg/memory/consolidation/__init__.pysrc/synthorg/memory/consolidation/simple_strategy.pytests/unit/memory/test_ranking_diversity.pysrc/synthorg/memory/retriever.pyCLAUDE.mdsrc/synthorg/observability/events/memory.pytests/unit/memory/test_retriever_diversity.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/tool_retriever.pydocs/design/memory.mdsrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/consolidation/distillation.pytests/unit/memory/test_tool_retriever.pytests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-17T11:41:02.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:41:02.964Z
Learning: Applies to src/**/*.py : Models: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing + validating redundant fields. Use `NotBlankStr` for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/engine/**/*.py : Engine package (engine/): agent orchestration, parallel execution, task decomposition, routing, TaskEngine (centralized single-writer), task lifecycle/recovery/shutdown, workspace isolation, coordination (4 dispatchers: SAS/centralized/decentralized/context-dependent, wave execution), approval gates (escalation detection, context parking/resume), stagnation detection (ToolRepetitionDetector, corrective prompt injection), AgentRuntimeState (execution status), context budget management, conversation compaction (oldest-turns summarizer)
Applied to files:
tests/unit/engine/test_agent_engine_distillation.pyCLAUDE.mdsrc/synthorg/engine/agent_engine.pysrc/synthorg/memory/consolidation/distillation.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Engine: Agent orchestration, execution loops, parallel execution, task decomposition, routing, task assignment, centralized single-writer task state engine (TaskEngine), task lifecycle, recovery, shutdown, workspace isolation, coordination (multi-agent pipeline: TopologyDispatcher protocol, 4 dispatchers — SAS/centralized/decentralized/context-dependent, wave execution, workspace lifecycle integration, CoordinationSectionConfig company config bridge, build_coordinator factory), coordination error classification, prompt policy validation, checkpoint recovery (checkpoint/, per-turn persistence, heartbeat detection, CheckpointRecoveryStrategy), approval gate (escalation detection, context parking/resume, EscalationInfo/ResumePayload models), stagnation detection (stagnation/, StagnationDetector protocol, ToolRepetitionDetector, dual-signal analysis, corrective prompt injection), agent runtime state (AgentRuntimeState, lightweight per-agent execution status for dashboard queries and recove...
Applied to files:
tests/unit/engine/test_agent_engine_distillation.pyCLAUDE.mdsrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to tests/**/*.py : Test markers: pytest.mark.unit, pytest.mark.integration, pytest.mark.e2e, pytest.mark.slow. Coverage: 80% minimum (enforced in CI).
Applied to files:
tests/unit/engine/test_agent_engine_distillation.py
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/**/*.py : Package structure: src/synthorg/ organized as: api/ (REST+WebSocket, Litestar), auth/ (auth subpackage), backup/ (scheduled/manual backups), budget/ (cost tracking, CFO), cli/ (superseded by Go CLI), communication/ (message bus, meetings), config/ (YAML loading), core/ (domain models, resilience config), engine/ (orchestration, task state, coordination, approval gates, stagnation detection, context budget, compaction), hr/ (hiring, performance, promotion), memory/ (pluggable backend, Mem0, retrieval, consolidation), persistence/ (operational data, SQLite, settings), observability/ (logging, correlation, sinks), providers/ (LLM abstraction, LiteLLM, auth types, presets, runtime CRUD), settings/ (runtime-editable, typed definitions, encryption, config bridge), security/ (SecOps, rule engine, output scanning, progressive trust, autonomy levels), templates/ (company templates, personalities), tools/ (registry, built-in tools, git, sandbox, code_runner, MCP...
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/core/**/*.py : Core module must contain shared domain models, base classes, resilience config (RetryConfig, RateLimiterConfig)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-14T15:43:05.601Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T15:43:05.601Z
Learning: Applies to docs/** : Docs source in docs/ (Markdown, built with Zensical); design spec in docs/design/ (7 pages: index, agents, organization, communication, engine, memory, operations)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:13:44.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:13:44.964Z
Learning: Applies to src/synthorg/hr/**/*.py : HR package (hr/): hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, LLM calibration, collaboration overrides, trend detection), promotion/demotion (criteria evaluation, approval strategies, model mapping)
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Documentation source in `docs/` (Markdown, built with Zensical). Design spec in `docs/design/` (7 pages: index, agents, organization, communication, engine, memory, operations). Architecture in `docs/architecture/` (overview, tech-stack, decision log). Roadmap in `docs/roadmap/`. Security in `docs/security.md`. Licensing in `docs/licensing.md`. Reference in `docs/reference/`. REST API reference in `docs/rest-api.md`. Library reference in `docs/api/` (auto-generated from docstrings). Custom templates in `docs/overrides/`. Config in `mkdocs.yml`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to docs/design/*.md : Design spec pages: 7 pages in `docs/design/` — index, agents, organization, communication, engine, memory, operations
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/hr/**/*.py : HR engine must provide: hiring, firing, onboarding, offboarding, agent registry, performance tracking (task metrics, collaboration scoring, trend detection), promotion/demotion
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger` (not `_logger`, not `log`).
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-17T06:43:14.114Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T06:43:14.114Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use `import logging` / `logging.getLogger()` / `print()` in application code. Variable name: always `logger`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`. Never use import logging / logging.getLogger() / print() in application code.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have `from synthorg.observability import get_logger` followed by `logger = get_logger(__name__)`.
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic MUST have: `from synthorg.observability import get_logger` then `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import `from synthorg.observability import get_logger` and define `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-03-19T11:33:01.580Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T11:33:01.580Z
Learning: Applies to src/synthorg/**/*.py : Every module with business logic must import logger via `from synthorg.observability import get_logger` and initialize with `logger = get_logger(__name__)`
Applied to files:
CLAUDE.md
📚 Learning: 2026-04-05T07:25:14.510Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-05T07:25:14.510Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from the domain-specific module under `synthorg.observability.events` (imported directly, never as string literals)
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from the domain-specific module under `synthorg.observability.events` in logging calls
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-15T18:28:13.207Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:28:13.207Z
Learning: Applies to src/synthorg/**/*.py : Event names: always use constants from domain-specific modules under synthorg.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget, etc.). Import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/observability/events/memory.pysrc/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`); import directly: `from synthorg.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/synthorg/observability/events/consolidation.pysrc/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-20T21:44:04.528Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T21:44:04.528Z
Learning: Applies to src/synthorg/**/*.py : Always use event name constants from domain-specific modules under `synthorg.observability.events` (e.g., `API_REQUEST_STARTED` from `events.api`, `TOOL_INVOKE_START` from `events.tool`); import directly rather than using string literals
Applied to files:
src/synthorg/observability/events/consolidation.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`.
Applied to files:
src/synthorg/engine/agent_engine.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`
Applied to files:
tests/unit/memory/consolidation/test_distillation.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.pytests/unit/memory/consolidation/test_llm_strategy.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-01T06:12:03.047Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T06:12:03.047Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Keep functions under 50 lines and files under 800 lines
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Functions must be less than 50 lines; files must be less than 800 lines
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-04-05T07:25:14.510Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-05T07:25:14.510Z
Learning: Applies to **/*.py : Functions: < 50 lines, files < 800 lines
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T06:24:56.341Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T06:24:56.341Z
Learning: Applies to src/synthorg/memory/**/*.py : Use MemoryBackend protocol with pluggable backends (Mem0 adapter available at backends/mem0/) for persistent agent memory
Applied to files:
src/synthorg/memory/consolidation/distillation.py
🪛 LanguageTool
docs/design/memory.md
[grammar] ~427-~427: Ensure spelling is correct
Context: ... 3 -- smaller groups cannot meaningfully dedup against the retained entry), `temperatu...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (1)
src/synthorg/memory/tools.py (1)
37-45: No action required—the concern is based on a false premise.The code is safe from the original concern. All successful
search_memoryandrecall_memoryresponses pass through_format_entries(), which formats entries as[{category}]{score} {memory_content}or returns"No memories found."for empty results. User-controlled memory content never appears at the start of the response string. Error paths are explicitly marked withERROR_PREFIXor one of the predefined constants. The testtest_no_memories_found_is_not_error()confirms that the empty-results success case is correctly classified as non-error, and the prefix-based check is both sufficient and correct.
| summary_ids: tuple[NotBlankStr, ...] = Field( | ||
| default=(), | ||
| description="IDs of every summary entry produced during the run", | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python -m pip install --quiet 'pydantic>=2,<3'
python - <<'PY'
from pydantic import BaseModel, computed_field
class ConsolidationResult(BaseModel):
summary_ids: tuple[str, ...] = ()
`@computed_field`
`@property`
def summary_id(self) -> str | None:
return self.summary_ids[-1] if self.summary_ids else None
result = ConsolidationResult(summary_id="legacy-id")
print(result.model_dump())
assert result.summary_ids == ()
assert result.summary_id is None
print("verified: legacy summary_id input is silently ignored")
PYRepository: Aureliolo/synthorg
Length of output: 155
🏁 Script executed:
# First, locate and examine the file in question
fd "consolidation/models.py" --type fRepository: Aureliolo/synthorg
Length of output: 44
🏁 Script executed:
# Search for ConsolidationResult and summary_id usage patterns in the codebase
rg "ConsolidationResult" -A 3 -B 1 --type py | head -100Repository: Aureliolo/synthorg
Length of output: 7403
🏁 Script executed:
# Search for any calls using summary_id parameter
rg "summary_id\s*=" -B 2 -A 1 --type py | head -100Repository: Aureliolo/synthorg
Length of output: 3455
🏁 Script executed:
# Try different search approaches for the models file
find . -type f -name "models.py" -path "*/consolidation/*" 2>/dev/nullRepository: Aureliolo/synthorg
Length of output: 107
🏁 Script executed:
# Also search with rg for the file content around line 109-112
rg "summary_ids.*tuple\[NotBlankStr" -B 5 -A 5 --type pyRepository: Aureliolo/synthorg
Length of output: 877
🏁 Script executed:
# Check test_models.py to see what tests exist
rg "test_models\.py" --files | head -5Repository: Aureliolo/synthorg
Length of output: 117
🏁 Script executed:
# Read the models.py file to see the full ConsolidationResult class
cat -n ./src/synthorg/memory/consolidation/models.py | head -200Repository: Aureliolo/synthorg
Length of output: 8726
🏁 Script executed:
# Check for test_models.py more directly
find . -name "test_models.py" -path "*/consolidation/*" 2>/dev/nullRepository: Aureliolo/synthorg
Length of output: 110
🏁 Script executed:
# Search for any usage pattern that passes summary_id as a constructor argument
rg "ConsolidationResult\s*\(" -A 3 --type py | grep -E "(summary_id\s*=|ConsolidationResult\s*\()" | head -30Repository: Aureliolo/synthorg
Length of output: 2724
🏁 Script executed:
# Read test_models.py to see what backward compatibility tests exist
cat -n ./tests/unit/memory/consolidation/test_models.py | head -250Repository: Aureliolo/synthorg
Length of output: 9784
🏁 Script executed:
# Also search for any validators in the file that might handle legacy summary_id
rg "model_validator|summary_id" ./src/synthorg/memory/consolidation/models.py -B 2 -A 5Repository: Aureliolo/synthorg
Length of output: 2829
Add a mode="before" validator to handle legacy summary_id construction.
The current implementation silently ignores ConsolidationResult(summary_id="value"), resulting in summary_ids=() and summary_id=None. While no current code in the codebase uses this pattern, the docstring claims backward compatibility "with callers reading" the property—implying construction with summary_id= may have been part of the API contract. To preserve any legacy calling code, add a mode="before" validator that maps summary_id input to summary_ids.
♻️ Compatibility shim
+ `@model_validator`(mode="before")
+ `@classmethod`
+ def _coerce_legacy_summary_id(cls, data: object) -> object:
+ if not isinstance(data, dict):
+ return data
+ legacy_summary_id = data.get("summary_id")
+ if legacy_summary_id is None:
+ return data
+ if "summary_ids" in data:
+ msg = "provide either summary_id or summary_ids, not both"
+ raise ValueError(msg)
+ return {**data, "summary_ids": (legacy_summary_id,)}
+
`@model_validator`(mode="after")
def _validate_archival_consistency(self) -> Self: # noqa: C901🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/synthorg/memory/consolidation/models.py` around lines 109 - 112, Add a
Pydantic "before" validator on the summary_ids field to accept legacy summary_id
input and map it into summary_ids; specifically, add a
`@field_validator`("summary_ids", mode="before") (or equivalent) on the
ConsolidationResult model that checks incoming data for a "summary_id" key (or a
single string), converts it into a one-element tuple of NotBlankStr when
summary_ids is empty/not provided, and otherwise returns the original
summary_ids value unchanged so existing construction via summary_ids continues
to work.
## Critical
- Prompt-cap overflow no longer deletes unsummarized entries:
``_build_user_prompt`` now returns ``(prompt_text, included_entries)``
so ``_process_group`` knows exactly which entries were in the LLM
prompt and only deletes those. Dropped entries remain on the
backend for the next consolidation pass instead of being silently
lost when the group exceeds ``_MAX_TOTAL_USER_CONTENT_CHARS``.
``_synthesize`` now returns ``(summary, used_llm, summarized)``.
## Major
- ``_merge_results`` sorts by relevance after merging so later
reformulation rounds can actually surface improved matches. The
previous policy preserved first-round order and appended unseen
later hits at the end, which meant ``_truncate_entries`` always
dropped them -- Search-and-Ask was effectively a no-op when round 1
already returned ``limit`` entries.
- ``ConsolidationResult.model_config`` gained ``extra='forbid'`` so
legacy callers passing the retired ``summary_id=`` kwarg hit a
hard ``ValidationError`` instead of silently dropping the value
(the pre-alpha ``feedback_no_backward_compat`` rule applies).
- ``_fetch_trajectory_context`` over-fetches 4x then sorts locally by
``created_at`` descending before slicing to
``_MAX_TRAJECTORY_CONTEXT_ENTRIES``. The ``MemoryBackend.retrieve``
contract is relevance-ordered, not recency-ordered, so the previous
slice could feed stale task distillations into the synthesis prompt.
## Minor
- ``apply_diversity_penalty`` now rejects ``NaN``/``Inf`` for
``diversity_lambda`` via ``math.isfinite``. Previously ``float('nan')``
slipped past the ``<0 or >1`` guard because both comparisons are
false, and MMR silently returned a "successful" rerank.
- ``MemoryRetrievalConfig.query_reformulation_enabled`` field
description now documents the fail-fast constructor contract (was
still saying "silently no-ops if either is missing", which
contradicts the ctor validator landed in the previous commit).
- ``docs/design/memory.md`` Consolidation Strategies row separates
concat-fallback paths from propagating paths; previously
"retryable ProviderError surfaced directly" was inside the
fallback list and "non-retryable propagates" was in the same
sentence, making the failure contract ambiguous.
## Test updates
- ``test_retriever_diversity.py``: switch to ``monkeypatch``-based
stubs for ``apply_diversity_penalty`` so both enabled and disabled
cases prove the correct call path directly. The previous enabled
test relied on MMR tie-break details and was brittle; the disabled
test would pass even if the flag were ignored.
- ``test_ranking_diversity.py::test_custom_similarity_fn_used``:
replaced the no-op spy with one that records calls AND makes the
reordering observable. The previous test would pass even if
``similarity_fn`` were ignored entirely (both sides evaluated to
0.0 via the default bigram Jaccard on single-word inputs).
- ``test_tool_retriever.py::TestMergeResults``: parametrized the
collision tests into a single table-driven test, and added
``test_merge_reorders_by_relevance`` as a regression guard for the
new relevance-sorted merge policy.
- ``test_llm_strategy.py::test_trajectory_context_entry_limit_and_truncation``:
relaxed the ``query.limit`` assertion to ``>=`` since the new
over-fetch factor means the backend query asks for more than the
final slice size.
## Status
- ``uv run ruff check src/ tests/`` -- clean
- ``uv run mypy src/ tests/`` -- 1539 source files, no issues
- ``uv run python -m pytest tests/unit/ -n 8`` -- 14157 passed, 5 skipped
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (1)
src/synthorg/memory/consolidation/models.py (1)
106-106:⚠️ Potential issue | 🟠 Major
summary_idno longer round-trips throughmodel_dump().
@computed_fieldserializessummary_id, butextra="forbid"rejects that same key on input.ConsolidationResult.model_validate(result.model_dump())will now fail, which is a nasty surprise for any persistence or copy-through-JSON path. Either keepsummary_idas a plain@property, or accept a narrowsummary_id -> summary_idsshim during validation.Expected result: the dumped payload contains
summary_id, and re-validating it raises an extra-field validation error.#!/bin/bash python - <<'PY' from pydantic import BaseModel, ConfigDict, ValidationError, computed_field class ConsolidationResult(BaseModel): model_config = ConfigDict(extra="forbid") summary_ids: tuple[str, ...] = () `@computed_field` `@property` def summary_id(self) -> str | None: return self.summary_ids[-1] if self.summary_ids else None result = ConsolidationResult(summary_ids=("summary-1",)) payload = result.model_dump() print(payload) try: ConsolidationResult.model_validate(payload) except ValidationError as exc: print(type(exc).__name__) print(exc) PYAlso applies to: 184-193
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/synthorg/memory/consolidation/llm_strategy.py`:
- Around line 199-205: The code calls _fetch_trajectory_context() before
determining whether any consolidation work exists, causing unnecessary backend
lookups; change the order so you call self._build_groups(entries) first, assign
to groups_to_process, return early if groups_to_process is empty (no-op), and
only then call await self._fetch_trajectory_context(agent_id) and proceed to
await self._run_groups(groups_to_process, agent_id, trajectory_context); update
references to trajectory_context and group_results accordingly to avoid
executing the fetch when there is nothing to consolidate.
In `@src/synthorg/memory/tool_retriever.py`:
- Around line 301-305: Update the public docstring for the class/constructor to
reflect the actual constructor contract: state that if
query_reformulation_enabled is True then both a QueryReformulator (reformulator)
and a sufficiency_checker are required and __init__ will raise a ValueError when
either collaborator is missing; do the same change for the second docstring
block that documents these args (mentions around the existing reformulator and
sufficiency_checker descriptions) and reference the __init__ behavior and the
query_reformulation_enabled flag explicitly.
- Around line 250-252: _parse_search_args() currently allows truthy non-string
values (e.g., {"query": 123}) which later causes errors in _handle_search();
update _parse_search_args() so it rejects non-string query values up-front:
replace the existing check with an explicit isinstance(query_text, str) test and
only accept when isinstance(...) and str(query_text).strip() is non-empty,
otherwise return the same sentinel (None, 0, None, ()). Apply the identical
change to the other parsing block mentioned (the similar logic around lines
448-453) so both _parse_search_args() and the second parser reject non-strings
before downstream code (_handle_search()/_safe_search()) uses len() or other
string ops.
- Around line 489-497: The except handlers that currently do "except
builtins.MemoryError, RecursionError: raise" need to log the error with
contextual fields (agent_id, tool, round, query_text/limit/categories where
available) at WARNING or ERROR before re-raising; update the handler around the
call to _retrieve_with_reformulation (and the analogous handlers at the other
mentioned sites) to catch (builtins.MemoryError, RecursionError) as e, call the
module logger (e.g., processLogger or the file's logger) with a clear message
and the context dict, then re-raise the original exception; also add the same
logging immediately before the unguarded initial retrieve in
_reformulation_loop() so fatal errors include agent/tool/round context.
In `@tests/unit/memory/consolidation/test_llm_strategy.py`:
- Around line 743-786: Update the
test_multi_category_groups_processed_in_parallel to actually assert parallelism
by making provider.complete block on an asyncio.Event barrier: create two
asyncio.Event instances (e.g., entered_event and release_event) and a shared
counter in the test; set provider.complete to a side_effect that increments the
counter, sets entered_event when counter==2, then await release_event.wait()
before returning _make_response(); after calling strategy.consolidate (or
immediately before releasing), await entered_event.wait() to ensure both group
calls have entered, assert the counter==2 (and provider.complete.call_count==2),
then set release_event to let both completes proceed and finally assert
backend.store.call_count and result.consolidated_count as before. This uses
provider.complete, asyncio.Event, and strategy.consolidate to enforce and verify
concurrency.
In `@tests/unit/memory/test_ranking_diversity.py`:
- Around line 231-234: The test test_invalid_lambda_raises only checks
out-of-range finite values; update its parametrization to include non-finite
lambdas (float("nan"), float("inf"), -float("inf")) so
apply_diversity_penalty(...) exercise the production guard that rejects NaN/Inf;
modify the `@pytest.mark.parametrize`("bad_lambda", ...) tuple in the
test_ranking_diversity.py test_invalid_lambda_raises to add these three
non-finite values alongside the existing bad_lambda cases to ensure the
ValueError branch for non-finite inputs is covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c47c5062-7d39-4dba-aee7-8f98e3615882
📒 Files selected for processing (10)
docs/design/memory.mdsrc/synthorg/memory/consolidation/llm_strategy.pysrc/synthorg/memory/consolidation/models.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/tool_retriever.pytests/unit/memory/consolidation/test_llm_strategy.pytests/unit/memory/test_ranking_diversity.pytests/unit/memory/test_retriever_diversity.pytests/unit/memory/test_tool_retriever.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: Build Sandbox
- GitHub Check: Build Web
- GitHub Check: Build Backend
- GitHub Check: Test (Python 3.14)
- GitHub Check: Dependency Review
- GitHub Check: Analyze (python)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Nofrom __future__ import annotations-- Python 3.14 has PEP 649
Use PEP 758 except syntax: useexcept A, B:(no parentheses) -- ruff enforces this on Python 3.14
Type hints: all public functions, mypy strict mode
Docstrings: Google style, required on public classes/functions (enforced by ruff D rules)
Immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries,BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement
Fordict/listfields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries (tool execution, LLM provider serialization, inter-agent delegation, serializing for persistence)
Config vs runtime state: frozen Pydantic models for config/identity; separate mutable-via-copy models (usingmodel_copy(update=...)) for runtime state that evolves
Never mix static config fields with mutable runtime fields in one model
Models: Pydantic v2 (BaseModel,model_validator,computed_field,ConfigDict). Useallow_inf_nan=Falsein allConfigDictdeclarations to rejectNaN/Infin numeric fields at validation time
Use@computed_fieldfor derived values instead of storing + validating redundant fields (e.g.TokenUsage.total_tokens)
UseNotBlankStr(fromcore.types) for all identifier/name fields -- including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants -- instead of manual whitespace validators
Async concurrency: preferasyncio.TaskGroupfor fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over barecreate_task.
Line length: 88 characters (ruff)
Functions: < 50 lines, files < 800 lines
Errors: handle explicitly, never silently swallow
Validate: at system boundaries (user input, external APIs, config files)
Use ruff check and ruff format for li...
Files:
tests/unit/memory/test_retriever_diversity.pytests/unit/memory/test_ranking_diversity.pysrc/synthorg/memory/consolidation/models.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/retrieval_config.pytests/unit/memory/consolidation/test_llm_strategy.pysrc/synthorg/memory/tool_retriever.pytests/unit/memory/test_tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Tests must usetest-provider,test-small-001, etc. instead of real vendor names
Markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slow
Async:asyncio_mode = "auto"-- no manual@pytest.mark.asyncioneeded
Timeout: 30 seconds per test (global inpyproject.toml-- do not add per-filepytest.mark.timeout(30)markers; non-default overrides liketimeout(60)are allowed)
Parametrize: Prefer@pytest.mark.parametrizefor testing similar cases
Property-based testing with Hypothesis: use@given+@settings, Hypothesis profiles intests/conftest.pywithci,dev,fuzz,extremeprofiles controlled viaHYPOTHESIS_PROFILEenv var
When Hypothesis finds a failure, it is a real bug -- fix the underlying bug and add an explicit@example(...)decorator to the test
Flaky tests: NEVER skip, dismiss, or ignore -- always fix them fully. For timing-sensitive tests, mocktime.monotonic()andasyncio.sleep()to make them deterministic. For tasks that must block indefinitely until cancelled, useasyncio.Event().wait()instead ofasyncio.sleep(large_number)
Files:
tests/unit/memory/test_retriever_diversity.pytests/unit/memory/test_ranking_diversity.pytests/unit/memory/consolidation/test_llm_strategy.pytests/unit/memory/test_tool_retriever.py
⚙️ CodeRabbit configuration file
Test files do not require Google-style docstrings on classes or functions -- ruff D rules are only enforced on src/. A bare
@settings() decorator with no arguments on Hypothesis property tests is a no-op and should not be suggested -- the HYPOTHESIS_PROFILE env var controls example counts via registered profiles, which@given() honors automatically.
Files:
tests/unit/memory/test_retriever_diversity.pytests/unit/memory/test_ranking_diversity.pytests/unit/memory/consolidation/test_llm_strategy.pytests/unit/memory/test_tool_retriever.py
src/synthorg/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/synthorg/**/*.py: Every module with business logic MUST have:from synthorg.observability import get_loggerthenlogger = get_logger(__name__)
Never useimport logging/logging.getLogger()/print()in application code (exception:observability/setup.py,observability/sinks.py,observability/syslog_handler.py, andobservability/http_handler.pymay use stdlibloggingandprint(..., file=sys.stderr))
Variable name for logger: alwayslogger(not_logger, notlog)
Event names: always use constants from the domain-specific module undersynthorg.observability.events(imported directly, never as string literals)
Structured kwargs: alwayslogger.info(EVENT, key=value)-- neverlogger.info("msg %s", val)
All error paths must log at WARNING or ERROR with context before raising
All state transitions must log at INFO
DEBUG for object creation, internal flow, entry/exit of key functions
Pure data models, enums, and re-exports do NOT need logging
Vendor-agnostic everywhere: NEVER use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases
Files:
src/synthorg/memory/consolidation/models.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.py
src/**/*.py
⚙️ CodeRabbit configuration file
This project uses Python 3.14+ with PEP 758 except syntax: "except A, B:" (comma-separated, no parentheses) is correct and mandatory -- do NOT flag it as a typo or suggest parenthesized form. The "except builtins.MemoryError, RecursionError: raise" pattern is intentional project convention for system-error propagation. When evaluating the 50-line function limit, count only the function body excluding the signature lines, decorators, and docstring. Functions 1-5 lines over due to docstrings or multi-line signatures should not be flagged. Do not suggest extracting single-use helper functions called exactly once -- this reduces readability without improving maintainability.
Files:
src/synthorg/memory/consolidation/models.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/retrieval_config.pysrc/synthorg/memory/tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.py
🧠 Learnings (24)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
📚 Learning: 2026-03-19T07:12:14.508Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-19T07:12:14.508Z
Learning: Applies to src/synthorg/memory/**/*.py : Memory package (memory/): pluggable MemoryBackend protocol, backends/ (Mem0 adapter), retrieval pipeline (ranking, RRF fusion, injection, formatting, non-inferable filtering), shared org memory (org/), consolidation/archival (density-aware: DensityClassifier, AbstractiveSummarizer, ExtractivePreserver, DualModeConsolidationStrategy)
Applied to files:
tests/unit/memory/test_retriever_diversity.pytests/unit/memory/test_ranking_diversity.pysrc/synthorg/memory/ranking.pysrc/synthorg/memory/retrieval_config.pytests/unit/memory/consolidation/test_llm_strategy.pysrc/synthorg/memory/tool_retriever.pytests/unit/memory/test_tool_retriever.pysrc/synthorg/memory/consolidation/llm_strategy.pydocs/design/memory.md
📚 Learning: 2026-04-02T07:18:02.381Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-02T07:18:02.381Z
Learning: Applies to tests/**/*.py : Never skip, dismiss, or ignore flaky tests; fix them fully and fundamentally by mocking `time.monotonic()` and `asyncio.sleep()` for determinism or using `asyncio.Event().wait()` for indefinite blocking
Applied to files:
tests/unit/memory/test_retriever_diversity.py
📚 Learning: 2026-03-31T21:07:37.470Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-31T21:07:37.470Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally; mock `time.monotonic()` and `asyncio.sleep()` for determinism; use `asyncio.Event().wait()` for indefinite blocking instead of `asyncio.sleep(large_number)`
Applied to files:
tests/unit/memory/test_retriever_diversity.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to tests/**/*.py : NEVER skip, dismiss, or ignore flaky tests — always fix them fully and fundamentally. For timing-sensitive tests, mock `time.monotonic()` and `asyncio.sleep()` to make them deterministic instead of widening timing margins
Applied to files:
tests/unit/memory/test_retriever_diversity.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. For derived values use `computed_field` instead of storing + validating redundant fields. Use `NotBlankStr` (from `core.types`) for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-17T11:41:02.964Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T11:41:02.964Z
Learning: Applies to src/**/*.py : Models: Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `computed_field` for derived values instead of storing + validating redundant fields. Use `NotBlankStr` for all identifier/name fields — including optional (`NotBlankStr | None`) and tuple (`tuple[NotBlankStr, ...]`) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-26T15:18:16.848Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-26T15:18:16.848Z
Learning: Applies to src/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`. Use `computed_field` for derived values instead of storing redundant fields. Use `NotBlankStr` for all identifier/name fields.
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 with adopted conventions: use computed_field for derived values instead of storing + validating redundant fields; use NotBlankStr from core.types for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators.
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to **/*.py : Models: Pydantic v2 (BaseModel, model_validator, computed_field, ConfigDict). Use computed_field for derived values instead of storing + validating redundant fields. Use NotBlankStr (from core.types) for all identifier/name fields — including optional (NotBlankStr | None) and tuple (tuple[NotBlankStr, ...]) variants — instead of manual whitespace validators.
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T19:14:27.144Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T19:14:27.144Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 BaseModel, model_validator, computed_field, ConfigDict.
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-15T18:42:17.990Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:42:17.990Z
Learning: Applies to src/synthorg/**/*.py : Use Pydantic v2 conventions: `BaseModel`, `model_validator`, `computed_field`, `ConfigDict`
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-04-01T15:36:39.993Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T15:36:39.993Z
Learning: Applies to src/**/*.py : Use Pydantic v2 (`BaseModel`, `model_validator`, `computed_field`, `ConfigDict`). Use `allow_inf_nan=False` in all `ConfigDict` declarations to reject `NaN`/`Inf` in numeric fields at validation time
Applied to files:
src/synthorg/memory/consolidation/models.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : Prefer `asyncio.TaskGroup` for fan-out/fan-in parallel operations in new code (e.g. multiple tool invocations, parallel agent calls). Prefer structured concurrency over bare `create_task`. Existing code is being migrated incrementally.
Applied to files:
tests/unit/memory/consolidation/test_llm_strategy.pysrc/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T19:13:34.746Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T19:13:34.746Z
Learning: Applies to src/synthorg/providers/**/*.py : Retryable errors (is_retryable=True): RateLimitError, ProviderTimeoutError, ProviderConnectionError, ProviderInternalError. Non-retryable errors raise immediately without retry. RetryExhaustedError signals that all retries failed — the engine layer catches this to trigger fallback chains.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.pydocs/design/memory.md
📚 Learning: 2026-04-01T17:49:14.133Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T17:49:14.133Z
Learning: Applies to src/synthorg/{providers,engine}/**/*.py : Retryable errors are `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`; non-retryable errors raise immediately; `RetryExhaustedError` signals all retries failed
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.pydocs/design/memory.md
📚 Learning: 2026-03-14T16:18:57.267Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-14T16:18:57.267Z
Learning: Applies to src/ai_company/!(observability)/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T11:18:48.128Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T11:18:48.128Z
Learning: Applies to src/synthorg/**/*.py : Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.pydocs/design/memory.md
📚 Learning: 2026-03-15T16:55:07.730Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T16:55:07.730Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-15T18:38:44.202Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-15T18:38:44.202Z
Learning: Applies to src/synthorg/**/*.py : All error paths must log at WARNING or ERROR with context before raising
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-20T08:28:32.845Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-20T08:28:32.845Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors: `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately.
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.pydocs/design/memory.md
📚 Learning: 2026-04-01T06:12:03.047Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-01T06:12:03.047Z
Learning: Applies to {**/*.py,web/src/**/*.{ts,tsx}} : Keep functions under 50 lines and files under 800 lines
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-16T07:22:28.134Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-16T07:22:28.134Z
Learning: Applies to **/*.py : Functions must be less than 50 lines; files must be less than 800 lines
Applied to files:
src/synthorg/memory/consolidation/llm_strategy.py
📚 Learning: 2026-03-17T22:08:13.456Z
Learnt from: CR
Repo: Aureliolo/synthorg PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-17T22:08:13.456Z
Learning: Applies to src/synthorg/**/*.py : `RetryConfig` and `RateLimiterConfig` are set per-provider in `ProviderConfig`. Retryable errors (`is_retryable=True`): `RateLimitError`, `ProviderTimeoutError`, `ProviderConnectionError`, `ProviderInternalError`. Non-retryable errors raise immediately without retry. `RetryExhaustedError` signals that all retries failed — the engine layer catches this to trigger fallback chains. Rate limiter respects `RateLimitError.retry_after` from providers — automatically pauses future requests.
Applied to files:
docs/design/memory.md
🪛 LanguageTool
docs/design/memory.md
[grammar] ~427-~427: Ensure spelling is correct
Context: ... 3 -- smaller groups cannot meaningfully dedup against the retained entry), `temperatu...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🔇 Additional comments (6)
src/synthorg/memory/consolidation/llm_strategy.py (1)
393-421: Store-first plus summarized-only deletion looks right.This preserves originals on synthesis/store failures and avoids deleting entries that were dropped by prompt capping.
tests/unit/memory/test_retriever_diversity.py (1)
48-171: Nice deterministic wiring test.Patching
apply_diversity_penaltydirectly makes the call path and ordering contract observable without depending on MMR tie-break details.src/synthorg/memory/ranking.py (1)
440-595: The MMR hardening here looks solid.The finite-value validation plus the
-math.infsentinel closes the NaN and all-negative-score edge cases without changing the function's pure contract.src/synthorg/memory/retrieval_config.py (1)
134-275: Fail-closed config validation is a good fit here.The ignored-knob warnings and strategy mismatch errors make these new retrieval flags much harder to miswire.
tests/unit/memory/test_tool_retriever.py (1)
691-730: Nice merge regression coverage.Parameterizing the collision cases and asserting post-merge relevance order gives the reformulation loop a strong guard against first-round bias.
tests/unit/memory/consolidation/test_llm_strategy.py (1)
271-328: These store/delete ordering regressions are valuable.The no-delete-before-store cases pin down the main data-preservation invariant cleanly.
## Bug fixes
- ``ConsolidationResult.summary_id`` is now a plain ``@property`` (not
``@computed_field``) so it is NOT emitted by ``model_dump()``. With
``extra="forbid"``, a computed-field ``summary_id`` in the dump
payload would have broken ``model_validate(result.model_dump())``
round-trips, which is a nasty surprise for any persistence or
copy-through-JSON code path. Callers that read ``.summary_id``
directly continue to work unchanged.
- ``LLMConsolidationStrategy.consolidate`` now builds groups BEFORE
calling ``_fetch_trajectory_context``, so a below-threshold batch
short-circuits with zero backend calls. Previously every call --
even ones with nothing to consolidate -- hit the distillation
lookup and could trip its WARNING degradation path on what should
have been a pure no-op.
- ``_parse_categories`` no longer silently broadens retrieval when
``categories`` is present but has the wrong shape (e.g. a bare
string ``"episodic"`` instead of ``["episodic"]``). Malformed
inputs are returned in ``rejected_values`` so ``_handle_search``
surfaces them back to the LLM for self-correction, and a WARNING
is logged for operators.
- ``_parse_search_args`` and ``_handle_recall`` reject non-string
``query`` / ``memory_id`` values up-front with an ``isinstance``
guard. Previously an LLM-hallucinated ``{"query": 123}`` would
pass the truthy check and crash on ``len(query_text)`` downstream.
## Observability
- All ``except builtins.MemoryError, RecursionError: raise`` branches
in ``tool_retriever.py`` now log at ERROR with full context
(``agent_id``, ``tool``, ``round``, query/memory_id metadata) before
re-raising. Per CLAUDE.md, "all error paths must log at WARNING or
ERROR with context before raising" -- the system-error branches
were skipping this, so fatal OOM/recursion failures disappeared
from observability at exactly the site where the context was still
available. Sites updated:
- ``_safe_search`` (``_retrieve_with_reformulation`` call)
- ``_reformulation_loop`` initial retrieve (was unguarded)
- ``_check_sufficiency``, ``_reformulate``, ``_retrieve_round``
- ``_handle_recall`` (``backend.get`` call)
- ``ToolBasedInjectionStrategy`` class docstring Args block updated:
``reformulator`` and ``sufficiency_checker`` are no longer
"Optional ... silently no-ops if either is missing"; they are
required when ``config.query_reformulation_enabled`` is True and
the constructor raises ``ValueError`` otherwise. ``Raises:``
section added to match.
## Test updates
- ``test_multi_category_groups_processed_in_parallel`` now uses an
``asyncio.Event``-gated provider barrier that tracks concurrent
in-flight calls and only releases when the expected concurrency
level is reached. A sequential refactor of ``consolidate`` would
deadlock on the barrier (the second call would never enter), so
the test actually catches regressions away from ``TaskGroup``.
Previously the test only checked ``call_count == 2``, which would
also pass for sequential execution.
- ``test_invalid_lambda_raises`` parametrization extended with
``float('nan')``, ``float('inf')``, and ``-float('inf')`` to lock
down the new ``math.isfinite`` guard in ``apply_diversity_penalty``.
## Status
- ``uv run ruff check src/ tests/`` -- clean
- ``uv run mypy src/ tests/`` -- 1539 source files, no issues
- ``uv run python -m pytest tests/unit/ -n 8`` -- 14160 passed, 5 skipped
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
🤖 I have created a release *beep* *boop* --- ## [0.6.2](v0.6.1...v0.6.2) (2026-04-06) ### Features * add issue analyzer script for priority/scope management ([#1084](#1084)) ([1ccba27](1ccba27)) * config fixes and deferred improvements from PR [#1058](#1058) review ([#1067](#1067)) ([2cac2d3](2cac2d3)), closes [#1061](#1061) [#1060](#1060) * cumulative risk-unit action budgets ([#806](#806)) and automated reporting ([#245](#245)) ([#1063](#1063)) ([4689816](4689816)) * fine-tuning pipeline + CompositeBackend + workflow lifecycle ([#1065](#1065)) ([85b05bc](85b05bc)), closes [#1001](#1001) [#850](#850) [#1058](#1058) * memory consolidation upgrades (LLM Merge, Search-and-Ask, diversity penalty, distillation capture) ([#1071](#1071)) ([174e2be](174e2be)), closes [#704](#704) * migrate web dashboard from Radix UI to Base UI, activate CSP nonce, rebuild org chart page, and fix agent routing ([#1083](#1083)) ([ebc6921](ebc6921)) * v0.7.0 engine foundations -- structured failure diagnosis + auditable decisions ([#1072](#1072)) ([d341d37](d341d37)) * workflow templates and versioning with diff and rollback ([#1069](#1069)) ([7af94de](7af94de)), closes [#1006](#1006) [#1008](#1008) ### Documentation * unify REST API docs under /docs/openapi/ and patch sitemap ([#1073](#1073)) ([af19382](af19382)) ### Maintenance * bump hypothesis from 6.151.10 to 6.151.11 in the all group ([#1086](#1086)) ([3176318](3176318)) * bump nginxinc/nginx-unprivileged from `f99cc61` to `601c823` in /docker/web in the all group ([#1085](#1085)) ([5eb99ac](5eb99ac)) * bump the all group in /web with 3 updates ([#1087](#1087)) ([8deae44](8deae44)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
## Summary Memory consolidation follow-ups from PR #1071 review and embedding cost tracking. ### #1075: Memory Consolidation Follow-ups (18 of 22 items) **Code polish (D16-D22):** - `SynthesisOutcome` StrEnum replaces bare `bool` in `_synthesize` return - `_select_entries` returns immutable `tuple` instead of `list` - `MemoryToolName` StrEnum narrows `DistillationRequest.memory_tool_invocations` - Defensive `asyncio.CancelledError` re-raise in all `except Exception` blocks - Final-round sufficiency check in `_reformulation_loop` with `MEMORY_REFORMULATION_FINAL_CHECK` event - Remove misleading `MEMORY_RETRIEVAL_START` log from `ContextInjectionStrategy.__init__` **Architectural (A1, A2, A4):** - `LLMConsolidationConfig` frozen Pydantic model replaces loose `__init__` kwargs and module constants (9 fields with validation) - `candidate_pool_multiplier` on `MemoryRetrievalConfig` for MMR diversity re-ranking (over-fetch, filter+MMR, truncate) - `scoring_strategy` discriminator field on `ScoredMemory` (LINEAR vs RRF provenance) **File splits (B6, B7, B8):** - Extract `try_capture_distillation` and `try_procedural_memory` into `engine/post_execution/memory_hooks.py` - Move `TestHybridSearchPipeline` to `test_retriever_hybrid.py` - Extract `_compute_pool_limit()` helper **Test hygiene (C10-C13, C15):** - `AsyncMock(spec=...)` discipline across all memory test files - Replace `call_args[0][N]` positional indexing with `.args[N]` - Consolidate `_make_strategy` fixture with `**config_overrides` - Promote `_merge_results` and `_bigram_jaccard` to public APIs - Log assertions on distillation error paths via `structlog.testing.capture_logs()` **Verified as already done:** A3 (summary_ids), C14 (parallelism proof) **Deferred:** A5 (DistillationRequest downstream), C9 (integration test), D19 (NotBlankStr no-ops) ### #997: Track Embedding Model Costs in Budget System - Add `EMBEDDING` to `LLMCallCategory` enum - `EmbeddingCostConfig` with per-model pricing and chars-per-token heuristic - Instrument `Mem0MemoryBackend.store()` and `retrieve()` to record `CostRecord` with `call_category=EMBEDDING` - Extend `CategoryBreakdown` with embedding cost/tokens/count fields - Dashboard: embedding category in types, utils, `CategoryBreakdown` component with `bg-accent-dim` design token - Budget enforcement: record-only (no gating -- embedding is on the critical path) ## Test Plan - 14,479 Python tests pass (unit + integration), 9 expected skips - 2,405 web dashboard tests pass - mypy strict: 0 errors across 1,558 source files - ruff + ESLint: 0 warnings ## Review Coverage Pre-reviewed by 5 agents (code-reviewer, type-design-analyzer, silent-failure-hunter, issue-resolution-verifier, conventions-enforcer). 13 findings addressed in final fix commit. Closes #1075 Closes #997
Implements issue #704 end-to-end. Adopts four patterns from arXiv:2603.17621 ("Complementary Reinforcement Learning") without the RL training loop.
Summary
ConsolidationStrategyprotocol. Groups entries by category, processes groups in parallel viaasyncio.TaskGroup, keeps the highest-relevance entry per group (with most-recent as tiebreaker), and feeds the rest to an LLM. Falls back to truncated concatenation onRetryExhaustedError/ empty response.capture_distillationrecords trajectory summary + outcome + memory-tool-invocations as an EPISODIC entry tagged"distillation"at task completion. Wired intoAgentEngine._post_execution_pipelinebehinddistillation_capture_enabled=Falseopt-in.LLMConsolidationStrategyembeds their trajectory summaries in the synthesis system prompt so the LLM has context about what the agent was trying to accomplish.ToolBasedInjectionStrategyruns an iterativeretrieve -> check sufficiency -> reformulate -> re-retrieveloop (up tomax_reformulation_rounds) whenquery_reformulation_enabled=Trueand both aQueryReformulatorandSufficiencyCheckerare provided. Results merged across rounds by ID.apply_diversity_penaltyinmemory/ranking.pyre-ranks results balancing relevance vs redundancy with pre-computed bigram cache to keep complexity atO(n**2).Hardening from pre-PR review (12 agents, 45+ findings addressed)
RetryExhaustedErrorhandling, partial delete failure tolerance (wrap each delete in try/except, record successful IDs only), concat fallback tagged"concat-fallback"distinct from"llm-synthesized".MEMORY_REFORMULATION_FAILED/MEMORY_SUFFICIENCY_CHECK_FAILEDevents; degrade to current cumulative entries rather than propagating.-math.infinstead of-1.0so the algorithm is correct even when MMR scores saturate at the theoretical minimum.DistillationRequest.retrieved_memory_idsrenamed tomemory_tool_invocations(field stored tool names, not memory IDs).assertwith explicit guards in_reformulation_loop(asserts are stripped underpython -O).tool_retriever.pyprefixed withERROR_PREFIX = "Error:"somemory/tools.pydetection viastartswith()is reliable.MemoryError/RecursionErrorsites per CLAUDE.md error rule.diversity_penalty_enabledwarns when combined with a strategy that ignores it (e.g.TOOL_BASED); stale reformulation "not yet wired" docstrings updated.Tests
+1,025 lines of new/updated tests, 25 net new test cases:
tests/unit/engine/test_agent_engine_distillation.py(new) -- captures on completion, captures on error, disabled by default, no-op without backendtests/unit/memory/test_ranking_diversity.py(new, split fromtest_ranking.pywhich was over 800 lines) -- bigram jaccard + MMR re-ranking + all-negative-score determinismtest_llm_strategy.py-- RecursionError propagation, tiebreaker, None-relevance handling, fallback truncation, verbatim synthesis, partial delete failure, trajectory context fetch + disabled mode + lookup failure, parallel category processingtest_distillation.py-- RecursionError propagation, store request shape assertiontest_tool_retriever.py-- reformulation dedup across rounds, sufficiency checker error handling, reformulator error handling, invalid category surfacingtest_retrieval_config.py-- diversity/TOOL_BASED warn validatortest_retriever.py-- memory filter fail-closed behaviortest_memory_tools.py-- updated to ERROR_PREFIX-prefixed constantsTest plan
uv run ruff check src/ tests/-- cleanuv run mypy src/ tests/-- 1538 source files, no issuesuv run python -m pytest tests/ -n 8-- 14310 passed, 9 skipped (up from 14285 baseline)Docs
docs/design/memory.md-- reformulation wired-in (was marked as not-yet-wired), new "Consolidation Strategies" subsection, "Distillation Capture" subsection, "Diversity Re-ranking (MMR)" subsection, fail-closed memory filter noteCLAUDE.md-- memory package structure updated with new subpackage details, logging events list extended withevents.consolidationexamples (LLM_STRATEGY_SYNTHESIZED,DISTILLATION_CAPTURED)Review coverage
Pre-reviewed by 12 agents: docs-consistency, code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, conventions-enforcer, async-concurrency-reviewer, issue-resolution-verifier. Followed by polish pass with code-simplifier. Every valid finding was addressed.
Closes #704