feat: add memory retrieval, ranking, and context injection pipeline (#41)#184
feat: add memory retrieval, ranking, and context injection pipeline (#41)#184
Conversation
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (20)
📝 WalkthroughSummary by CodeRabbit
WalkthroughAdds a memory retrieval and injection pipeline: pluggable injection strategy surface, deterministic ranking (relevance + recency), token-budget-aware formatting, a ContextInjectionStrategy orchestrator with parallel fetch + robust error handling, retrieval configuration and observability events, plus unit and integration tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Agent
participant ContextInjectionStrategy
participant MemoryBackend
participant SharedKnowledgeStore
participant RankingEngine as Ranking<br/>Engine
participant Formatter
participant LLM
Agent->>ContextInjectionStrategy: prepare_messages(agent_id, query_text, token_budget)
rect rgba(100, 150, 255, 0.5)
Note over ContextInjectionStrategy: Fetch Phase
par Parallel Fetch
ContextInjectionStrategy->>MemoryBackend: retrieve(agent_id, MemoryQuery)
MemoryBackend-->>ContextInjectionStrategy: personal_memories
and
ContextInjectionStrategy->>SharedKnowledgeStore: search_shared(query)
SharedKnowledgeStore-->>ContextInjectionStrategy: shared_memories
end
end
rect rgba(200, 150, 100, 0.5)
Note over RankingEngine: Ranking Phase
ContextInjectionStrategy->>RankingEngine: rank_memories(personal, shared, config)
RankingEngine->>RankingEngine: compute relevance/recency scores
RankingEngine->>RankingEngine: apply boost, merge, filter, sort
RankingEngine-->>ContextInjectionStrategy: ranked_memories
end
rect rgba(150, 200, 100, 0.5)
Note over Formatter: Format Phase
ContextInjectionStrategy->>Formatter: format_memory_context(ranked, token_budget)
Formatter->>Formatter: greedy pack into budget
Formatter->>Formatter: build ChatMessage with delimiters
Formatter-->>ContextInjectionStrategy: ChatMessage[]
end
ContextInjectionStrategy-->>Agent: ChatMessage[]
Agent->>LLM: [system: memories...] + prompt
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
✨ Simplify code
Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the agent's memory system by introducing a robust and configurable pipeline for retrieving, ranking, and injecting memories into the agent's operational context. The changes enable agents to access relevant and timely information efficiently, improving their decision-making and overall performance. The new architecture is designed for extensibility, allowing for future integration of more advanced memory interaction patterns. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive and well-designed memory retrieval pipeline, featuring a new ContextInjectionStrategy for orchestrating memory retrieval, ranking, and context formatting, supported by configuration models, protocols, and utility functions. The implementation demonstrates robustness, excellent error handling, efficient parallel data fetching, and thorough testing. However, a critical security vulnerability has been identified: the formatting of memories into the agent's context is susceptible to prompt injection. This is due to a lack of proper handling for memory content that might contain block delimiters, which could allow attackers to manipulate agent behavior via stored memories. Additionally, a minor refactoring to improve type inference and remove a type: ignore is suggested for enhanced maintainability.
| return () | ||
|
|
||
| body = "\n".join(included_lines) | ||
| block = f"{MEMORY_BLOCK_START}\n{body}\n{MEMORY_BLOCK_END}" |
There was a problem hiding this comment.
The memory context is formatted using fixed delimiters (--- AGENT MEMORY --- and --- END MEMORY ---) without sanitizing the memory content. An attacker who can influence the content of a stored memory (e.g., through a previous conversation) could include the string --- END MEMORY --- to break out of the memory block and inject arbitrary instructions into the agent's prompt. This is a classic prompt injection vector.
Recommendation: Escape or redact the delimiter strings within the memory content before formatting. For example:
safe_content = content.replace(MEMORY_BLOCK_END, "[DELIMITER_REDACTED]")
src/ai_company/memory/retriever.py
Outdated
| should_fetch_shared = ( | ||
| self._config.include_shared and self._shared_store is not None | ||
| ) | ||
|
|
||
| personal_coro = _safe_call( | ||
| self._backend.retrieve(agent_id, query), | ||
| source="personal", | ||
| agent_id=agent_id, | ||
| ) | ||
|
|
||
| if should_fetch_shared: | ||
| shared_coro = _safe_call( | ||
| self._shared_store.search_shared( # type: ignore[union-attr] | ||
| query, | ||
| exclude_agent=agent_id, | ||
| ), | ||
| source="shared", | ||
| agent_id=agent_id, | ||
| ) |
There was a problem hiding this comment.
The use of the should_fetch_shared flag and the subsequent type: ignore[union-attr] can be simplified. By moving the condition directly into the if statement, we can help the type checker (mypy) infer that self._shared_store is not None within the block. This improves code clarity and removes the need for the type: ignore.
personal_coro = _safe_call(
self._backend.retrieve(agent_id, query),
source="personal",
agent_id=agent_id,
)
if self._config.include_shared and self._shared_store is not None:
shared_coro = _safe_call(
self._shared_store.search_shared(
query,
exclude_agent=agent_id,
),
source="shared",
agent_id=agent_id,
)
Greptile SummaryThis PR delivers the full memory retrieval pipeline for M5: a Key observations:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant CIS as ContextInjectionStrategy
participant SC as _safe_call
participant MB as MemoryBackend
participant SS as SharedKnowledgeStore
participant RM as rank_memories()
participant FMC as format_memory_context()
Caller->>CIS: prepare_messages(agent_id, query_text, token_budget, categories?)
CIS->>CIS: _execute_pipeline(...)
CIS->>CIS: build MemoryQuery(limit=max_memories)
par Parallel fetch (TaskGroup)
CIS->>SC: _safe_call(backend.retrieve(...))
SC->>MB: retrieve(agent_id, query)
MB-->>SC: tuple[MemoryEntry, ...]
SC-->>CIS: personal_entries (or () on error)
and
CIS->>SC: _safe_call(shared_store.search_shared(...))
SC->>SS: search_shared(query, exclude_agent=agent_id)
SS-->>SC: tuple[MemoryEntry, ...]
SC-->>CIS: shared_entries (or () on error)
end
CIS->>RM: rank_memories(personal, config, now, shared)
Note over RM: Score → merge → filter(min_relevance) → sort↓ → truncate(max_memories)
RM-->>CIS: tuple[ScoredMemory, ...] sorted by combined_score
CIS->>FMC: format_memory_context(ranked, estimator, token_budget, injection_point)
Note over FMC: Greedy pack: delimiter overhead + per-line cost + separator cost
FMC-->>CIS: tuple[ChatMessage, ...] (0 or 1 message)
CIS-->>Caller: tuple[ChatMessage, ...]
Last reviewed commit: f189842 |
There was a problem hiding this comment.
Pull request overview
This PR introduces a full memory retrieval pipeline that fetches personal + shared memories, ranks them by relevance/recency, formats them under a token budget, and returns injected ChatMessage context via a pluggable MemoryInjectionStrategy protocol.
Changes:
- Added
ContextInjectionStrategypipeline (retrieve → rank → token-budget format → inject) with error isolation and shared-store merging. - Implemented ranking (
rank_memories) and formatting (format_memory_context) utilities plus retrieval configuration (MemoryRetrievalConfig) and injection-related protocols/enums. - Added extensive unit + integration test coverage and updated docs/exports to reflect the new memory injection capabilities.
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/memory/test_retriever.py | Unit tests for the retrieval/injection pipeline behavior and degradation paths |
| tests/unit/memory/test_retrieval_config.py | Unit tests for retrieval config defaults/validation/serialization |
| tests/unit/memory/test_ranking.py | Unit tests for scoring/ranking and shared+personal merging/truncation |
| tests/unit/memory/test_injection.py | Unit tests for injection enums and protocols + token estimator |
| tests/unit/memory/test_init.py | Updates __all__ export count assertion |
| tests/unit/memory/test_formatter.py | Unit tests for token-budget formatting and injection point mapping |
| tests/unit/memory/test_config.py | Ensures CompanyMemoryConfig includes retrieval config and supports overrides |
| tests/integration/memory/test_retriever_integration.py | End-to-end integration tests for retrieve→rank→format pipeline |
| src/ai_company/observability/events/memory.py | Adds structured logging event names for retrieval pipeline stages |
| src/ai_company/memory/retriever.py | Implements ContextInjectionStrategy orchestration + parallel shared retrieval |
| src/ai_company/memory/retrieval_config.py | Adds frozen Pydantic config for weighting/thresholds/strategy/injection point |
| src/ai_company/memory/ranking.py | Adds scoring model and ranking functions with relevance+recency weighting |
| src/ai_company/memory/injection.py | Adds MemoryInjectionStrategy/TokenEstimator protocols and enums |
| src/ai_company/memory/formatter.py | Adds greedy token-budget packing into delimited injected message blocks |
| src/ai_company/memory/config.py | Adds retrieval config to CompanyMemoryConfig |
| src/ai_company/memory/init.py | Re-exports new injection/retrieval/ranking components via ai_company.memory |
| README.md | Updates Memory Interface milestone description |
| DESIGN_SPEC.md | Documents memory injection strategies and ranking/formatting behavior |
| CLAUDE.md | Updates repo structure notes to include retrieval pipeline components |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| filtered = [s for s in scored if s.combined_score >= config.min_relevance] | ||
| filtered.sort(key=lambda s: s.combined_score, reverse=True) | ||
|
|
||
| result = tuple(filtered[: config.max_memories]) | ||
|
|
||
| logger.debug( | ||
| MEMORY_RANKING_COMPLETE, | ||
| total_candidates=len(scored), | ||
| after_filter=len(result), | ||
| min_relevance=config.min_relevance, | ||
| ) |
There was a problem hiding this comment.
The debug log field after_filter is computed from len(result), but result is already truncated to config.max_memories. This makes the metric misleading (it’s “after truncate”, not “after min_relevance filter”). Consider logging both counts (e.g., after_filter=len(filtered) and after_truncate=len(result)) or renaming the field to reflect truncation.
src/ai_company/memory/formatter.py
Outdated
| delimiter_text = f"{MEMORY_BLOCK_START}\n{MEMORY_BLOCK_END}" | ||
| delimiter_tokens = estimator.estimate_tokens(delimiter_text) | ||
|
|
||
| remaining = token_budget - delimiter_tokens | ||
| if remaining <= 0: | ||
| logger.debug( |
There was a problem hiding this comment.
Token-budget enforcement underestimates the final output size: delimiter_tokens is computed from f"{MEMORY_BLOCK_START}\n{MEMORY_BLOCK_END}", but the produced block later is f"{MEMORY_BLOCK_START}\n{body}\n{MEMORY_BLOCK_END}", and the newline separators added between lines / before the end delimiter aren’t included in any estimate_tokens(...) call. Consider accounting for separator/newline token costs (or estimating tokens on the exact final block before returning and trimming accordingly) so token_budget is actually respected under the same estimator.
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 1591-1593: The spec paragraph currently hard-codes that pipeline
output is formatted as a SYSTEM message, which conflicts with the
implementation’s configurable injection-point/strategy selection; update the
paragraph to describe that after Pipeline: MemoryBackend.retrieve() -> rank by
relevance+recency -> filter by min_relevance -> greedy token-budget packing ->
format with delimiters, the final packed context is injected at the configured
message role (injection-point) rather than always as SYSTEM, and mention that
the injection role is selectable by the configured strategy so readers know the
contract supports non-SYSTEM injection (e.g., user-configured system, assistant,
or custom role).
In `@README.md`:
- Line 30: Replace the vendor-specific name "Mem0" in the README entry "Memory
Backends (M5) - Mem0 adapter backend ([ADR-001]...)" with a generic phrase
(e.g., "vendor-specific adapter" or "proprietary adapter") while keeping the ADR
reference ([ADR-001]) intact; update the string in README.md so it reads like
"Memory Backends (M5) - vendor-specific adapter backend ([ADR-001])" to conform
to the repo’s vendor-agnostic naming rule and leave vendor details inside
ADR-001.
In `@src/ai_company/memory/formatter.py`:
- Around line 78-79: The token-estimate for delimiters underestimates by one
newline because delimiter_text is built as
f"{MEMORY_BLOCK_START}\n{MEMORY_BLOCK_END}" while the actual block is formatted
as f"{MEMORY_BLOCK_START}\n{body}\n{MEMORY_BLOCK_END}" (two newlines when body
is present); update the calculation that produces delimiter_tokens (and any
similar estimate logic) to mirror the real formatting—either build
delimiter_text with two newlines (e.g., include an extra '\n') or conditionally
add the second newline when body may be non-empty so that MEMORY_BLOCK_START,
MEMORY_BLOCK_END, delimiter_text, delimiter_tokens and the final block
formatting all agree.
In `@src/ai_company/memory/retrieval_config.py`:
- Around line 87-90: The InjectionPoint enum and the retrieval config need to
support dual injection; add a BOTH member to InjectionPoint and update the
injection_point Field in retrieval_config (symbol: injection_point) to accept
InjectionPoint.BOTH (or a meaningful default) and any validation logic so the
formatter/retriever contract can handle both-system-and-user injection; ensure
code paths that switch on InjectionPoint (e.g., formatter/retriever code that
reads injection_point) are updated to treat BOTH as injecting into both SYSTEM
and USER contexts.
- Around line 38-40: Add validation to the RetrievalConfig Pydantic model to
reject future-only InjectionStrategy values (tool_based and self_editing) during
config parsing: in the RetrievalConfig class add a `@validator`("strategy") method
that checks the supplied InjectionStrategy enum and raises a ValueError if it
equals InjectionStrategy.TOOL_BASED or InjectionStrategy.SELF_EDITING, leaving
InjectionStrategy.CONTEXT allowed; ensure the error message is clear so invalid
configs fail at validation rather than at runtime.
In `@src/ai_company/memory/retriever.py`:
- Around line 207-229: Change the two debug-level logs that emit
MEMORY_RETRIEVAL_SKIPPED to INFO so state transitions are visible: in the branch
guarding "if not personal_entries and not shared_entries" and in the branch
after calling rank_memories where "if not ranked" is true, replace
logger.debug(...) with logger.info(...) keeping the same message key and kwargs
(MEMORY_RETRIEVAL_SKIPPED, agent_id=agent_id, reason=...) so the event and its
reason are logged at INFO.
- Around line 142-154: The code currently calls self._execute_pipeline(...) even
when token_budget <= 0; short-circuit this by checking the token_budget right
after logging (or before the try) and immediately return an empty tuple if
token_budget is non-positive to avoid unnecessary backend/shared-store I/O;
update the method containing the logger.info and try block (the retriever method
that passes agent_id, query_text, token_budget, categories into
_execute_pipeline) to perform this check and return () instead of invoking
_execute_pipeline when token_budget <= 0.
- Around line 114-117: The constructor currently uses "token_estimator or
DefaultTokenEstimator()" which will ignore any valid but falsey custom
estimator; change this to an explicit None check so only None triggers the
default. In the initializer that sets self._estimator (look for token_estimator
and DefaultTokenEstimator in retriever.py/__init__), replace the truthiness
fallback with something like: if token_estimator is None then assign
DefaultTokenEstimator(), else assign token_estimator, ensuring falsey-but-valid
estimators are preserved.
In `@tests/unit/memory/test_init.py`:
- Around line 16-17: Update the test_all_has_expected_count to assert the actual
exported names instead of just their count: replace the len(...) assertion with
a set equality check comparing memory_module.__all__ to a literal set of
expected exported symbol names (the exact public API entries). Keep the test
name (test_all_has_expected_count) and the reference to memory_module.__all__ so
the change is localized; list all expected names in the test to ensure the
public API surface is pinned.
In `@tests/unit/memory/test_injection.py`:
- Around line 34-36: The test test_invalid_string_raises currently asserts
ValueError with a brittle message match; update the with pytest.raises(...) call
in test_invalid_string_raises so it no longer depends on Python's StrEnum
message format—either remove the match argument entirely and only assert
ValueError is raised when calling InjectionStrategy("invalid"), or replace the
match with a more flexible pattern that targets the invalid input string (e.g.,
a regex matching "invalid") to assert the message contains the invalid value
rather than relying on the full StrEnum error text.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: dee9386f-ed93-4910-9144-8620f907d97e
📒 Files selected for processing (20)
CLAUDE.mdDESIGN_SPEC.mdREADME.mdsrc/ai_company/memory/__init__.pysrc/ai_company/memory/config.pysrc/ai_company/memory/formatter.pysrc/ai_company/memory/injection.pysrc/ai_company/memory/ranking.pysrc/ai_company/memory/retrieval_config.pysrc/ai_company/memory/retriever.pysrc/ai_company/observability/events/memory.pytests/integration/memory/__init__.pytests/integration/memory/test_retriever_integration.pytests/unit/memory/test_config.pytests/unit/memory/test_formatter.pytests/unit/memory/test_init.pytests/unit/memory/test_injection.pytests/unit/memory/test_ranking.pytests/unit/memory/test_retrieval_config.pytests/unit/memory/test_retriever.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Agent
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
!(DESIGN_SPEC.md|.claude/**|**/litellm/**)
📄 CodeRabbit inference engine (CLAUDE.md)
Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names:
example-provider,example-large-001,example-medium-001,example-small-001,large/medium/smallas aliases. Tests must usetest-provider,test-small-001, etc.
Files:
CLAUDE.mdtests/unit/memory/test_ranking.pytests/unit/memory/test_injection.pysrc/ai_company/observability/events/memory.pysrc/ai_company/memory/injection.pysrc/ai_company/memory/ranking.pysrc/ai_company/memory/__init__.pysrc/ai_company/memory/formatter.pytests/unit/memory/test_config.pytests/unit/memory/test_retrieval_config.pyREADME.mdtests/unit/memory/test_formatter.pysrc/ai_company/memory/config.pyDESIGN_SPEC.mdtests/unit/memory/test_retriever.pysrc/ai_company/memory/retrieval_config.pysrc/ai_company/memory/retriever.pytests/unit/memory/test_init.pytests/integration/memory/test_retriever_integration.py
**/*.md
📄 CodeRabbit inference engine (CLAUDE.md)
Always read
DESIGN_SPEC.mdbefore implementing any feature or planning any issue — the spec is the mandatory starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge. When a spec section is referenced, read that section verbatim. When approved deviations occur, updateDESIGN_SPEC.mdto reflect the new reality.
Files:
CLAUDE.mdREADME.mdDESIGN_SPEC.md
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Do not usefrom __future__ import annotations— Python 3.14 has PEP 649 native lazy annotations
Useexcept A, B:syntax without parentheses (notexcept (A, B):) — PEP 758 except syntax, enforced by ruff on Python 3.14
Add type hints to all public functions and classes — mypy strict mode enforced
Use Google-style docstrings (required on all public classes and functions) — enforced by ruff D rules
Enforce immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), usecopy.deepcopy()at construction +MappingProxyTypewrapping for read-only enforcement. For dict/list fields in frozen Pydantic models, rely onfrozen=Truefor field reassignment prevention andcopy.deepcopy()at system boundaries
Separate config (frozen Pydantic models) from runtime state (mutable-via-copy models usingmodel_copy(update=...)). Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with adopted conventions: use@computed_fieldfor derived values instead of storing redundant fields; useNotBlankStr(fromcore.types) for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Preferasyncio.TaskGroupfor fan-out/fan-in parallel operations (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over barecreate_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly — never silently swallow exceptions
Validate input at system boundaries (user input, external APIs, config files)
Set line length to 88 characters (ruff configured)
Files:
tests/unit/memory/test_ranking.pytests/unit/memory/test_injection.pysrc/ai_company/observability/events/memory.pysrc/ai_company/memory/injection.pysrc/ai_company/memory/ranking.pysrc/ai_company/memory/__init__.pysrc/ai_company/memory/formatter.pytests/unit/memory/test_config.pytests/unit/memory/test_retrieval_config.pytests/unit/memory/test_formatter.pysrc/ai_company/memory/config.pytests/unit/memory/test_retriever.pysrc/ai_company/memory/retrieval_config.pysrc/ai_company/memory/retriever.pytests/unit/memory/test_init.pytests/integration/memory/test_retriever_integration.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/**/*.py: Use pytest markers:@pytest.mark.unit,@pytest.mark.integration,@pytest.mark.e2e,@pytest.mark.slowto categorize tests
Maintain 80% minimum code coverage (enforced in CI with--cov-fail-under=80)
Useasyncio_mode = "auto"in pytest configuration — no manual@pytest.mark.asyncioneeded on async tests
Set test timeout to 30 seconds per test
Usepytest-xdistvia-n autofor parallel test execution
Prefer@pytest.mark.parametrizefor testing similar cases instead of multiple nearly-identical tests
Files:
tests/unit/memory/test_ranking.pytests/unit/memory/test_injection.pytests/unit/memory/test_config.pytests/unit/memory/test_retrieval_config.pytests/unit/memory/test_formatter.pytests/unit/memory/test_retriever.pytests/unit/memory/test_init.pytests/integration/memory/test_retriever_integration.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Every module with business logic must have:from ai_company.observability import get_loggerfollowed bylogger = get_logger(__name__)— never useimport logging/logging.getLogger()/print()
Use event name constants from domain-specific modules underai_company.observability.events(e.g.,PROVIDER_CALL_STARTfromevents.provider,BUDGET_RECORD_ADDEDfromevents.budget). Import directly:from ai_company.observability.events.<domain> import EVENT_CONSTANT
Use structured logging with kwargs: alwayslogger.info(EVENT, key=value)— neverlogger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions
Files:
src/ai_company/observability/events/memory.pysrc/ai_company/memory/injection.pysrc/ai_company/memory/ranking.pysrc/ai_company/memory/__init__.pysrc/ai_company/memory/formatter.pysrc/ai_company/memory/config.pysrc/ai_company/memory/retrieval_config.pysrc/ai_company/memory/retriever.py
🧠 Learnings (2)
📚 Learning: 2026-03-09T10:20:23.072Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T10:20:23.072Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`
Applied to files:
src/ai_company/observability/events/memory.py
📚 Learning: 2026-03-09T10:20:23.072Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T10:20:23.072Z
Learning: Applies to **/*.py : Separate config (frozen Pydantic models) from runtime state (mutable-via-copy models using `model_copy(update=...)`). Never mix static config fields with mutable runtime fields in one model.
Applied to files:
src/ai_company/memory/retrieval_config.py
🧬 Code graph analysis (9)
tests/unit/memory/test_ranking.py (2)
src/ai_company/core/enums.py (1)
MemoryCategory(100-107)src/ai_company/memory/ranking.py (4)
ScoredMemory(26-58)compute_combined_score(87-104)compute_recency_score(61-84)rank_memories(161-207)
tests/unit/memory/test_injection.py (1)
src/ai_company/memory/injection.py (10)
DefaultTokenEstimator(71-91)InjectionPoint(37-46)InjectionStrategy(23-34)MemoryInjectionStrategy(95-145)TokenEstimator(50-68)estimate_tokens(57-68)estimate_tokens(78-91)prepare_messages(105-125)get_tool_definitions(127-136)strategy_name(139-145)
src/ai_company/memory/formatter.py (5)
src/ai_company/memory/injection.py (4)
InjectionPoint(37-46)TokenEstimator(50-68)estimate_tokens(57-68)estimate_tokens(78-91)src/ai_company/memory/ranking.py (1)
ScoredMemory(26-58)src/ai_company/providers/enums.py (1)
MessageRole(6-12)src/ai_company/providers/models.py (1)
ChatMessage(138-210)tests/unit/memory/test_injection.py (1)
estimate_tokens(63-64)
tests/unit/memory/test_retrieval_config.py (2)
src/ai_company/memory/injection.py (2)
InjectionPoint(37-46)InjectionStrategy(23-34)src/ai_company/memory/retrieval_config.py (1)
MemoryRetrievalConfig(20-109)
tests/unit/memory/test_formatter.py (6)
src/ai_company/core/enums.py (1)
MemoryCategory(100-107)src/ai_company/memory/formatter.py (1)
format_memory_context(53-138)src/ai_company/memory/injection.py (2)
DefaultTokenEstimator(71-91)InjectionPoint(37-46)src/ai_company/memory/models.py (2)
MemoryEntry(82-150)MemoryMetadata(20-52)src/ai_company/memory/ranking.py (1)
ScoredMemory(26-58)src/ai_company/providers/enums.py (1)
MessageRole(6-12)
src/ai_company/memory/config.py (1)
src/ai_company/memory/retrieval_config.py (1)
MemoryRetrievalConfig(20-109)
tests/unit/memory/test_retriever.py (8)
src/ai_company/core/enums.py (1)
MemoryCategory(100-107)src/ai_company/memory/errors.py (2)
MemoryRetrievalError(25-26)MemoryError(13-14)src/ai_company/memory/injection.py (5)
DefaultTokenEstimator(71-91)MemoryInjectionStrategy(95-145)strategy_name(139-145)get_tool_definitions(127-136)prepare_messages(105-125)src/ai_company/memory/models.py (3)
MemoryEntry(82-150)MemoryMetadata(20-52)MemoryQuery(153-230)src/ai_company/memory/retrieval_config.py (1)
MemoryRetrievalConfig(20-109)src/ai_company/memory/retriever.py (4)
ContextInjectionStrategy(93-324)strategy_name(318-324)get_tool_definitions(309-315)prepare_messages(119-175)src/ai_company/providers/enums.py (1)
MessageRole(6-12)tests/integration/memory/test_retriever_integration.py (2)
retrieve(58-66)store(51-56)
src/ai_company/memory/retrieval_config.py (2)
src/ai_company/memory/injection.py (2)
InjectionPoint(37-46)InjectionStrategy(23-34)src/ai_company/observability/_logger.py (1)
get_logger(8-28)
src/ai_company/memory/retriever.py (8)
src/ai_company/memory/formatter.py (1)
format_memory_context(53-138)src/ai_company/memory/injection.py (2)
DefaultTokenEstimator(71-91)TokenEstimator(50-68)src/ai_company/memory/models.py (2)
MemoryQuery(153-230)MemoryEntry(82-150)src/ai_company/memory/ranking.py (1)
rank_memories(161-207)src/ai_company/core/enums.py (1)
MemoryCategory(100-107)src/ai_company/memory/protocol.py (1)
MemoryBackend(20-182)src/ai_company/memory/retrieval_config.py (1)
MemoryRetrievalConfig(20-109)src/ai_company/memory/shared.py (1)
SharedKnowledgeStore(18-82)
🪛 LanguageTool
README.md
[typographical] ~24-~24: To join two clauses or introduce examples, consider using an em dash.
Context: ...a migrations - Memory Interface (M5) - Pluggable MemoryBackend protocol with ...
(DASH_RULE)
[typographical] ~25-~25: To join two clauses or introduce examples, consider using an em dash.
Context: ...) - Coordination Error Taxonomy (M5) - Post-execution classification pipeline d...
(DASH_RULE)
[typographical] ~26-~26: To join two clauses or introduce examples, consider using an em dash.
Context: ...n failures - Budget Enforcement (M5) - BudgetEnforcer service with pre-flight...
(DASH_RULE)
🔇 Additional comments (10)
tests/unit/memory/test_injection.py (1)
1-139: LGTM!Well-structured test suite covering enum values, protocol conformance, and token estimator behavior. Good use of
@pytest.mark.parametrizefor the various lengths test case.tests/unit/memory/test_formatter.py (1)
1-237: LGTM!Comprehensive test coverage for the memory formatter including edge cases (empty input, zero/negative budgets, budget too small), greedy packing behavior, injection point handling, and delimiter wrapping. Good use of the helper function to reduce boilerplate.
src/ai_company/memory/formatter.py (2)
34-50: LGTM!Clean helper function with clear formatting logic. The shared prefix handling and score formatting are well-implemented.
91-118: LGTM!The greedy packing algorithm is well-documented and correctly implements the "skip large, include small" behavior. Good DEBUG logging for skipped memories and budget exhaustion scenarios.
tests/unit/memory/test_ranking.py (1)
1-397: LGTM!Excellent test coverage for the ranking module including:
- Exponential decay behavior with various decay rates
- Edge cases (zero age, future timestamps, high decay)
- Combined score weighted calculations
- ScoredMemory model validation (range, frozen, inf/nan rejection)
- Full rank_memories workflow with filtering, sorting, merging, and truncation
Good use of
pytest.approxfor floating-point comparisons and@pytest.mark.parametrizefor validation edge cases.src/ai_company/observability/events/memory.py (1)
53-62: LGTM!New event constants follow the established
memory.<entity>.<action>naming convention. Thenoqa: S105suppression on Line 62 is appropriate since "TOKEN" inMEMORY_TOKEN_BUDGET_EXCEEDEDis a false positive from bandit's hardcoded password detection.tests/unit/memory/test_retriever.py (1)
1-400: LGTM!Comprehensive unit test coverage for ContextInjectionStrategy including:
- Protocol conformance (isinstance, strategy_name, get_tool_definitions)
- Query construction and parameter propagation
- Shared memory merging with include_shared flag
- Graceful degradation with proper error classification (swallow recoverable errors, propagate MemoryError/RecursionError)
- Token budget handling
The graceful degradation tests (Lines 218-368) are particularly well-designed, ensuring critical system errors propagate while recoverable errors degrade gracefully.
src/ai_company/memory/injection.py (1)
1-145: LGTM!Well-designed module with clean separation of concerns:
TokenEstimatorprotocol decouples token estimation from engine dependenciesMemoryInjectionStrategyprotocol enables pluggable injection strategies- Both protocols are
@runtime_checkableenabling isinstance checksDefaultTokenEstimatorprovides a reasonable heuristic fallback- Good use of
TYPE_CHECKINGto avoid import cyclesDocumentation clearly indicates which strategies are implemented vs. planned.
tests/integration/memory/test_retriever_integration.py (2)
27-95: LGTM!Well-designed in-memory test doubles that implement the necessary interface methods. The
InMemoryBackendcorrectly filters byagent_idandcategories, and respectsquery.limit. The semantic search (query.text) is intentionally not implemented since these are integration tests focused on pipeline behavior rather than search algorithm correctness.
148-386: LGTM!Excellent end-to-end integration tests covering:
- Full pipeline producing formatted output with memory block markers
- Shared memory inclusion with
[shared]prefix- Empty result handling
- Recency-based ranking order verification
- Token budget enforcement affecting content size
- Category filtering
These tests validate the complete retrieve → rank → format pipeline with realistic scenarios.
| filtered = [s for s in scored if s.combined_score >= config.min_relevance] | ||
| filtered.sort(key=lambda s: s.combined_score, reverse=True) | ||
|
|
||
| result = tuple(filtered[: config.max_memories]) | ||
|
|
||
| logger.debug( | ||
| MEMORY_RANKING_COMPLETE, | ||
| total_candidates=len(scored), | ||
| after_filter=len(result), | ||
| min_relevance=config.min_relevance, | ||
| ) |
There was a problem hiding this comment.
Fix the relevance floor and ranking completion telemetry.
Line 195 applies config.min_relevance to combined_score, so recency can push a low-relevance memory above the filter. Line 203 then reports the truncated tuple as after_filter, and Line 200 emits the completion event at DEBUG, which both under-reports and hides ranking outcomes.
Proposed fix
- filtered = [s for s in scored if s.combined_score >= config.min_relevance]
+ filtered = [s for s in scored if s.relevance_score >= config.min_relevance]
filtered.sort(key=lambda s: s.combined_score, reverse=True)
result = tuple(filtered[: config.max_memories])
- logger.debug(
+ logger.info(
MEMORY_RANKING_COMPLETE,
total_candidates=len(scored),
- after_filter=len(result),
+ after_filter=len(filtered),
+ returned=len(result),
min_relevance=config.min_relevance,
)As per coding guidelines, "Log all state transitions at INFO level".
) Implement the full retrieval pipeline between MemoryBackend.retrieve() and AgentEngine.run(memory_messages=...): query backend, re-rank by relevance+recency, enforce token budget, merge personal+shared, and format as ChatMessage tuples. - MemoryInjectionStrategy protocol with three planned strategies - ContextInjectionStrategy (context injection, implemented) - MemoryRetrievalConfig with weight validation - ScoredMemory ranking with exponential recency decay - Token-budget-aware formatter with greedy packing - Parallel fetch with error isolation (TaskGroup) - Graceful degradation (domain errors → empty, builtins propagate) - DESIGN_SPEC.md §7.7 documenting all three injection strategies
Pre-reviewed by 9 agents, 29 findings addressed: - Fix TaskGroup ExceptionGroup wrapping (except* unwrap) - Consolidate _safe_retrieve_* into single _safe_call helper - Add max_memories truncation after personal+shared merge - Add error_type kwarg to except Exception handlers - Add defensive ValueError for unknown InjectionPoint variants - Fix DefaultTokenEstimator to return min 1 for non-empty text - Update DESIGN_SPEC §15.3, §7.7 protocol params, CLAUDE.md, README - Add missing docstring attributes, Raises sections, boost docs - Add 26 new tests (validation, error propagation, edge cases)
…ers (#41) Source fixes: - Critical: clamp combined_score to prevent Pydantic ValidationError from float tolerance - Critical: handle ExceptionGroup wrapping system-level errors in prepare_messages - Major: add _validate_supported_strategy rejecting unimplemented strategies - Major: short-circuit prepare_messages on non-positive token budget - Major: promote MEMORY_RETRIEVAL_SKIPPED from debug to info level - Medium: remove type:ignore by narrowing SharedKnowledgeStore via local variable - Medium: fix delimiter token accounting and add per-line separator cost - Medium: sanitize memory content against delimiter injection - Medium: log warning on unsupported injection point before raising ValueError - Minor: add clarifying comment to except* ExceptionGroup unwrapping Docs fixes: - DESIGN_SPEC §7.7: say "configured message role" not "SYSTEM message" - DESIGN_SPEC §1.4: fix #41 description (retrieval pipeline, not Mem0 adapter) - README: replace vendor name "Mem0" with generic reference - injection.py: add cross-module reference to ContextInjectionStrategy Test fixes: - Add test for both backends failing simultaneously - Add test for unsupported InjectionPoint ValueError - Add min_relevance exact boundary test - Replace fragile count assertion with set equality in test_init - Fix match pattern in test_injection to avoid false positives - Update test_retrieval_config for new strategy validation
717444c to
f189842
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 19 out of 20 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ), | ||
| ) | ||
| if system_errors is not None: | ||
| raise system_errors from exc |
There was a problem hiding this comment.
prepare_messages() docstring says it re-raises bare builtins.MemoryError / RecursionError, but this branch currently re-raises an ExceptionGroup subgroup (raise system_errors). Consider unwrapping and raising a single underlying exception (or switching to except* where applicable) to match the documented behavior.
| raise system_errors from exc | |
| # Re-raise a bare builtins.MemoryError / RecursionError | |
| # to match the documented behavior, instead of raising | |
| # an ExceptionGroup subgroup. | |
| for error in system_errors.exceptions: | |
| if isinstance(error, (builtins_MemoryError, RecursionError)): | |
| raise error from exc |
| system_errors = exc.subgroup( | ||
| lambda e: isinstance( | ||
| e, | ||
| builtins_MemoryError | RecursionError, |
There was a problem hiding this comment.
isinstance() does not accept PEP 604 union types (A | B) as the second argument; this will raise TypeError if this ExceptionGroup path is hit. Use a tuple instead (e.g., isinstance(e, (builtins_MemoryError, RecursionError))).
| builtins_MemoryError | RecursionError, | |
| (builtins_MemoryError, RecursionError), |
| def __init__( | ||
| self, | ||
| *, | ||
| backend: MemoryBackend, | ||
| config: MemoryRetrievalConfig, | ||
| shared_store: SharedKnowledgeStore | None = None, | ||
| token_estimator: TokenEstimator | None = None, | ||
| ) -> None: | ||
| """Initialise the context injection strategy. | ||
|
|
||
| Args: | ||
| backend: Memory backend for personal memories. | ||
| config: Retrieval pipeline configuration. | ||
| shared_store: Optional shared knowledge store. | ||
| token_estimator: Optional custom token estimator. | ||
| """ | ||
| self._backend = backend | ||
| self._config = config | ||
| self._shared_store = shared_store | ||
| self._estimator = ( | ||
| token_estimator if token_estimator is not None else DefaultTokenEstimator() | ||
| ) | ||
| logger.debug( | ||
| MEMORY_RETRIEVAL_START, | ||
| strategy="context_injection", | ||
| backend=backend.backend_name | ||
| if hasattr(backend, "backend_name") | ||
| else type(backend).__qualname__, | ||
| has_shared_store=shared_store is not None, | ||
| ) |
There was a problem hiding this comment.
Silent misconfiguration: include_shared=True with no shared_store
When config.include_shared=True (the default) but shared_store=None, shared memory retrieval is silently skipped. An operator who keeps the default config but omits the shared_store argument to the constructor will receive no indication that shared memories are absent from context — the has_shared_store=False field in the debug log is easy to miss.
A warning at construction time whenever this mismatch occurs would make the misconfiguration immediately visible:
if config.include_shared and shared_store is None:
logger.warning(
"include_shared is True but no shared_store was provided; "
"shared memories will not be retrieved",
strategy="context_injection",
)Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/memory/retriever.py
Line: 101-130
Comment:
**Silent misconfiguration: `include_shared=True` with no `shared_store`**
When `config.include_shared=True` (the default) but `shared_store=None`, shared memory retrieval is silently skipped. An operator who keeps the default config but omits the `shared_store` argument to the constructor will receive no indication that shared memories are absent from context — the `has_shared_store=False` field in the debug log is easy to miss.
A warning at construction time whenever this mismatch occurs would make the misconfiguration immediately visible:
```python
if config.include_shared and shared_store is None:
logger.warning(
"include_shared is True but no shared_store was provided; "
"shared memories will not be retrieved",
strategy="context_injection",
)
```
How can I resolve this? If you propose a fix, please make it concise.| async def prepare_messages( | ||
| self, | ||
| agent_id: NotBlankStr, | ||
| query_text: NotBlankStr, | ||
| token_budget: int, | ||
| *, | ||
| categories: frozenset[MemoryCategory] | None = None, | ||
| ) -> tuple[ChatMessage, ...]: | ||
| """Full pipeline: retrieve → rank → budget-fit → format. | ||
|
|
||
| Returns empty tuple on any failure (graceful degradation). | ||
| Never raises domain memory errors to the caller. | ||
| Re-raises ``builtins.MemoryError`` and ``RecursionError``. | ||
|
|
||
| Args: | ||
| agent_id: The agent requesting memories. | ||
| query_text: Text for semantic retrieval. | ||
| token_budget: Maximum tokens for memory content. | ||
| categories: Optional category filter. | ||
|
|
||
| Returns: | ||
| Tuple of ``ChatMessage`` instances (may be empty). | ||
| """ |
There was a problem hiding this comment.
Docstring inaccurate for concurrent system-error case
The docstring states:
Re-raises
builtins.MemoryErrorandRecursionError.
This holds when only one backend fails with a system-level error — _fetch_memories's except* block raises a bare exception that reaches the except builtins_MemoryError: raise guard here.
However, when both concurrent backends (personal + shared) fail with system-level errors simultaneously, Python's except* semantics collect the re-raised exceptions from both handlers and synthesise a new ExceptionGroup([MemoryError(…), RecursionError(…)]). Because bare except MemoryError: does not match an ExceptionGroup, this falls through to except Exception as exc: → the isinstance(exc, ExceptionGroup) branch → raise system_errors from exc (line 200), which re-raises an ExceptionGroup — not a bare exception.
A caller using a bare except MemoryError: guard above a prepare_messages call will silently swallow this concurrent-failure ExceptionGroup, defeating the intended propagation contract. The docstring should document this distinction:
Re-raises ``builtins.MemoryError`` and ``RecursionError`` as bare
exceptions when a single source fails. If both the personal backend
and shared store simultaneously raise system-level errors, re-raises
an ``ExceptionGroup`` wrapping those exceptions.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/memory/retriever.py
Line: 132-154
Comment:
**Docstring inaccurate for concurrent system-error case**
The docstring states:
> Re-raises ``builtins.MemoryError`` and ``RecursionError``.
This holds when only **one** backend fails with a system-level error — `_fetch_memories`'s `except*` block raises a bare exception that reaches the `except builtins_MemoryError: raise` guard here.
However, when **both** concurrent backends (personal + shared) fail with system-level errors simultaneously, Python's `except*` semantics collect the re-raised exceptions from both handlers and synthesise a new `ExceptionGroup([MemoryError(…), RecursionError(…)])`. Because bare `except MemoryError:` does not match an `ExceptionGroup`, this falls through to `except Exception as exc:` → the `isinstance(exc, ExceptionGroup)` branch → `raise system_errors from exc` (line 200), which re-raises an `ExceptionGroup` — not a bare exception.
A caller using a bare `except MemoryError:` guard above a `prepare_messages` call will silently swallow this concurrent-failure `ExceptionGroup`, defeating the intended propagation contract. The docstring should document this distinction:
```
Re-raises ``builtins.MemoryError`` and ``RecursionError`` as bare
exceptions when a single source fails. If both the personal backend
and shared store simultaneously raise system-level errors, re-raises
an ``ExceptionGroup`` wrapping those exceptions.
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
ContextInjectionStrategyorchestrates backend query → ranking → token-budget formatting → ChatMessage injectionrank_memories()with relevance+recency weighted scoring, personal boost, exponential decay, min_relevance filtering, and max_memories truncationMemoryInjectionStrategyprotocol (context/tool-based/self-editing),TokenEstimatorprotocol (avoids memory→engine import cycle)MemoryRetrievalConfig(frozen Pydantic model) with weight-sum validation,InjectionStrategy/InjectionPointenums_safe_call), per-pipeline (prepare_messages), withbuiltins.MemoryError/RecursionErrorpropagation throughexcept*ExceptionGroup unwrappingCloses #41
Pre-PR Review
Pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency). 29 findings addressed:
except*), DESIGN_SPEC §15.3 missing filesmax_memoriestruncation after merge,error_typein log kwargs, docstring accuracy, docs updatesLogging-audit and resilience-audit found zero issues.
Test plan
🤖 Generated with Claude Code