Skip to content

feat: add memory retrieval, ranking, and context injection pipeline (#41)#184

Merged
Aureliolo merged 3 commits intomainfrom
feat/memory-retrieval
Mar 9, 2026
Merged

feat: add memory retrieval, ranking, and context injection pipeline (#41)#184
Aureliolo merged 3 commits intomainfrom
feat/memory-retrieval

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

  • Memory retrieval pipeline: ContextInjectionStrategy orchestrates backend query → ranking → token-budget formatting → ChatMessage injection
  • Ranking engine: rank_memories() with relevance+recency weighted scoring, personal boost, exponential decay, min_relevance filtering, and max_memories truncation
  • Token-budget formatter: Greedy packing of ranked memories into delimited ChatMessage blocks with configurable injection point (SYSTEM/USER)
  • Pluggable protocols: MemoryInjectionStrategy protocol (context/tool-based/self-editing), TokenEstimator protocol (avoids memory→engine import cycle)
  • Configuration: MemoryRetrievalConfig (frozen Pydantic model) with weight-sum validation, InjectionStrategy/InjectionPoint enums
  • Graceful degradation: Triple-layer error isolation — per-source (_safe_call), per-pipeline (prepare_messages), with builtins.MemoryError/RecursionError propagation through except* ExceptionGroup unwrapping
  • DESIGN_SPEC.md §7.7: New section documenting injection strategies, ranking algorithm, and protocol

Closes #41

Pre-PR Review

Pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency). 29 findings addressed:

Severity Count Key fixes
Critical 2 TaskGroup ExceptionGroup unwrapping (except*), DESIGN_SPEC §15.3 missing files
Major 16 max_memories truncation after merge, error_type in log kwargs, docstring accuracy, docs updates
Medium 8 Test coverage gaps, docstring clarifications
Minor 3 Token estimator min-1, summary logging, redundant comment

Logging-audit and resilience-audit found zero issues.

Test plan

  • 4580 tests pass (26 new tests added), 6 skipped (platform-specific)
  • 96.31% coverage (80% minimum)
  • ruff check + format clean
  • mypy strict clean
  • pre-commit hooks pass
  • CI pipeline (lint + type-check + test + coverage)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 9, 2026 11:40
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 9, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 9, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c1df873f-674f-4e5d-bd29-1767280b5ac8

📥 Commits

Reviewing files that changed from the base of the PR and between 717444c and f189842.

📒 Files selected for processing (20)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • README.md
  • src/ai_company/memory/__init__.py
  • src/ai_company/memory/config.py
  • src/ai_company/memory/formatter.py
  • src/ai_company/memory/injection.py
  • src/ai_company/memory/ranking.py
  • src/ai_company/memory/retrieval_config.py
  • src/ai_company/memory/retriever.py
  • src/ai_company/observability/events/memory.py
  • tests/integration/memory/__init__.py
  • tests/integration/memory/test_retriever_integration.py
  • tests/unit/memory/test_config.py
  • tests/unit/memory/test_formatter.py
  • tests/unit/memory/test_init.py
  • tests/unit/memory/test_injection.py
  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/test_retrieval_config.py
  • tests/unit/memory/test_retriever.py

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Memory retrieval pipeline with a context-injection MVP, ranked relevance/recency, and token-budget-aware formatting for prompt injection.
    • Pluggable injection strategy surface (defaults to context injection) and a resilient fetch → rank → format flow.
  • Documentation

    • Design spec and README updated to describe the retrieval/injection pipeline and configuration.
  • Observability

    • New memory lifecycle events for retrieval, ranking, formatting, and token-budget signals.
  • Tests

    • Extensive unit and integration tests covering ranking, formatting, config validation, strategy behavior, and end-to-end retrieval.

Walkthrough

Adds a memory retrieval and injection pipeline: pluggable injection strategy surface, deterministic ranking (relevance + recency), token-budget-aware formatting, a ContextInjectionStrategy orchestrator with parallel fetch + robust error handling, retrieval configuration and observability events, plus unit and integration tests.

Changes

Cohort / File(s) Summary
Design & Docs
CLAUDE.md, DESIGN_SPEC.md, README.md
Docs updated to describe the retrieval/ranking/formatting pipeline, MemoryInjectionStrategy surface, retrieval config, and planned strategies.
Memory Public Surface
src/ai_company/memory/__init__.py, src/ai_company/memory/config.py
Public exports expanded to include new modules/types and CompanyMemoryConfig gains retrieval: MemoryRetrievalConfig.
Injection Types & Estimator
src/ai_company/memory/injection.py
New enums/protocols: InjectionStrategy, InjectionPoint, TokenEstimator protocol, DefaultTokenEstimator, and MemoryInjectionStrategy protocol.
Retrieval Configuration
src/ai_company/memory/retrieval_config.py
New frozen MemoryRetrievalConfig model with validation (weight sum, supported strategy), defaults and observability on validation failure.
Ranking Pipeline
src/ai_company/memory/ranking.py
New ScoredMemory model, recency/relevance scoring utilities, _score_entry, and rank_memories workflow with filtering, truncation, and ranking observability.
Formatting
src/ai_company/memory/formatter.py
New format_memory_context() that sanitizes and greedy-packs ScoredMemory items into a token budget and returns ChatMessage(s) for the specified InjectionPoint.
Retriever / Strategy
src/ai_company/memory/retriever.py
New ContextInjectionStrategy orchestrating query build, parallel fetch (personal + shared), safe error handling (_safe_call), ranking, budgeting, formatting, and memory lifecycle events.
Observability Events
src/ai_company/observability/events/memory.py
Added retrieval/format/ranking lifecycle event constants (start, complete, degraded, skipped, ranking/format events, token budget and invalid injection point events).
Tests — Unit
tests/unit/memory/*
Comprehensive unit tests added/updated for injection enums/protocols/estimator, ranking, retrieval config validation, formatter behavior, retriever pipeline, and public exports/config surface.
Tests — Integration
tests/integration/memory/test_retriever_integration.py
End-to-end integration tests with in-memory backend/shared-store doubles validating fetch→rank→format pipeline, shared memory inclusion, categories, recency, and token budgets.

Sequence Diagram(s)

sequenceDiagram
    participant Agent
    participant ContextInjectionStrategy
    participant MemoryBackend
    participant SharedKnowledgeStore
    participant RankingEngine as Ranking<br/>Engine
    participant Formatter
    participant LLM

    Agent->>ContextInjectionStrategy: prepare_messages(agent_id, query_text, token_budget)
    
    rect rgba(100, 150, 255, 0.5)
        Note over ContextInjectionStrategy: Fetch Phase
        par Parallel Fetch
            ContextInjectionStrategy->>MemoryBackend: retrieve(agent_id, MemoryQuery)
            MemoryBackend-->>ContextInjectionStrategy: personal_memories
        and
            ContextInjectionStrategy->>SharedKnowledgeStore: search_shared(query)
            SharedKnowledgeStore-->>ContextInjectionStrategy: shared_memories
        end
    end

    rect rgba(200, 150, 100, 0.5)
        Note over RankingEngine: Ranking Phase
        ContextInjectionStrategy->>RankingEngine: rank_memories(personal, shared, config)
        RankingEngine->>RankingEngine: compute relevance/recency scores
        RankingEngine->>RankingEngine: apply boost, merge, filter, sort
        RankingEngine-->>ContextInjectionStrategy: ranked_memories
    end

    rect rgba(150, 200, 100, 0.5)
        Note over Formatter: Format Phase
        ContextInjectionStrategy->>Formatter: format_memory_context(ranked, token_budget)
        Formatter->>Formatter: greedy pack into budget
        Formatter->>Formatter: build ChatMessage with delimiters
        Formatter-->>ContextInjectionStrategy: ChatMessage[]
    end

    ContextInjectionStrategy-->>Agent: ChatMessage[]
    Agent->>LLM: [system: memories...] + prompt
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.38% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly and concisely summarizes the main change: adding memory retrieval, ranking, and context injection pipeline; directly reflects the primary objective and aligns with core changes throughout the changeset.
Description check ✅ Passed Description comprehensively covers key implementation details (retrieval pipeline, ranking engine, token-budget formatter, protocols, configuration, graceful degradation, and documentation) and relates directly to the changeset.
Linked Issues check ✅ Passed The code changes successfully implement all major acceptance criteria from #41: retrieval pipeline with ranking, token budget enforcement, context injection with configurable points, shared knowledge support, and comprehensive test coverage (>80%).
Out of Scope Changes check ✅ Passed All changes are directly scoped to the memory retrieval, ranking, and injection pipeline as defined in #41; no extraneous modifications to unrelated subsystems or dependencies detected beyond the specified objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/memory-retrieval
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch feat/memory-retrieval

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the agent's memory system by introducing a robust and configurable pipeline for retrieving, ranking, and injecting memories into the agent's operational context. The changes enable agents to access relevant and timely information efficiently, improving their decision-making and overall performance. The new architecture is designed for extensibility, allowing for future integration of more advanced memory interaction patterns.

Highlights

  • Memory Retrieval Pipeline: Introduced a comprehensive memory retrieval pipeline orchestrated by ContextInjectionStrategy. This pipeline handles querying memory backends, ranking retrieved memories, formatting them to fit a token budget, and injecting them as ChatMessage objects into the agent's context.
  • Advanced Ranking Engine: Implemented a sophisticated rank_memories() function that scores memories based on a weighted combination of relevance and recency. It includes features like a personal boost for individual memories, exponential decay for recency, filtering by a minimum relevance threshold, and truncation to a maximum number of memories.
  • Token-Budgeted Context Formatting: Developed a greedy packing mechanism that formats ranked memories into delimited ChatMessage blocks. This ensures that the injected context adheres to a specified token budget, with configurable injection points (SYSTEM or USER roles).
  • Pluggable Protocols for Flexibility: Defined MemoryInjectionStrategy and TokenEstimator protocols. This allows for future expansion with different memory injection methods (e.g., tool-based or self-editing) and flexible token estimation, while preventing import cycles.
  • Robust Configuration and Error Handling: Added MemoryRetrievalConfig, a frozen Pydantic model, for managing retrieval pipeline settings, including weight-sum validation. The system incorporates triple-layer error isolation with _safe_call and prepare_messages, gracefully degrading on most failures while propagating critical builtins.MemoryError and RecursionError.
  • Updated Design Documentation: A new section (7.7) was added to DESIGN_SPEC.md to thoroughly document the memory injection strategies, the ranking algorithm, and the associated protocols, providing clear architectural guidance.
Changelog
  • CLAUDE.md
    • Updated the description of the memory/ directory to reflect the addition of the retrieval pipeline, including ranking, injection, and context formatting capabilities.
  • DESIGN_SPEC.md
    • Added a new section (7.7) detailing memory injection strategies, including context injection, tool-based retrieval, and self-editing memory.
    • Documented the memory ranking algorithm, covering relevance, recency, combined scoring, filtering, and sorting.
    • Included the definition of the MemoryInjectionStrategy protocol within the design specification.
    • Updated the file structure diagram to include new memory-related modules.
  • README.md
    • Updated the 'Memory Interface (M5)' feature description to highlight the new context injection retrieval pipeline, encompassing ranking and token-budget formatting.
  • src/ai_company/memory/init.py
    • Updated the package docstring to mention the inclusion of the retrieval pipeline and MemoryInjectionStrategy.
    • Added imports and re-exports for newly introduced modules and classes, such as formatter, injection, ranking, retrieval_config, retriever, ContextInjectionStrategy, DefaultTokenEstimator, InjectionPoint, InjectionStrategy, MemoryInjectionStrategy, ScoredMemory, MemoryRetrievalConfig, and TokenEstimator.
  • src/ai_company/memory/config.py
    • Imported MemoryRetrievalConfig.
    • Added a retrieval field of type MemoryRetrievalConfig to CompanyMemoryConfig to allow configuration of the memory retrieval pipeline.
  • src/ai_company/memory/formatter.py
    • Added a new module formatter.py responsible for converting ranked memories into ChatMessage instances.
    • Implemented greedy packing logic to respect token budgets when formatting memory content.
    • Defined constants for memory block delimiters (MEMORY_BLOCK_START, MEMORY_BLOCK_END) and a mapping for InjectionPoint to MessageRole.
  • src/ai_company/memory/injection.py
    • Added a new module injection.py to define memory injection protocols and enums.
    • Introduced InjectionStrategy (CONTEXT, TOOL_BASED, SELF_EDITING) and InjectionPoint (SYSTEM, USER) enumerations.
    • Defined the TokenEstimator structural protocol and provided a DefaultTokenEstimator heuristic implementation.
    • Established the MemoryInjectionStrategy protocol for pluggable memory delivery mechanisms.
  • src/ai_company/memory/ranking.py
    • Added a new module ranking.py containing functions for scoring and sorting memory entries.
    • Defined the ScoredMemory Pydantic model to hold memory entries with computed relevance, recency, and combined scores.
    • Implemented compute_recency_score for exponential decay based on age and compute_combined_score for weighted score aggregation.
    • Provided the rank_memories function to orchestrate scoring, merging personal and shared memories, filtering by relevance, sorting, and truncating.
  • src/ai_company/memory/retrieval_config.py
    • Added a new module retrieval_config.py defining MemoryRetrievalConfig, a frozen Pydantic model for configuring the memory retrieval pipeline.
    • Included fields for strategy selection, relevance/recency weights, decay rate, personal boost, minimum relevance, maximum memories, shared memory inclusion, default relevance, and injection point.
    • Implemented a model validator to ensure that relevance_weight and recency_weight sum to 1.0.
  • src/ai_company/memory/retriever.py
    • Added a new module retriever.py which implements the ContextInjectionStrategy.
    • Orchestrated the full memory retrieval pipeline: querying backends, ranking memories, and formatting them for injection.
    • Implemented _safe_call for error isolation during memory fetching, gracefully handling domain-specific errors while propagating system-level exceptions.
    • Enabled parallel fetching of personal and shared memories using asyncio.TaskGroup.
  • src/ai_company/observability/events/memory.py
    • Added new logging event constants for memory retrieval start/complete, degradation, skipping, ranking completion, formatting completion, and token budget exceeding.
  • tests/integration/memory/test_retriever_integration.py
    • Added new integration tests for the ContextInjectionStrategy to verify the end-to-end memory retrieval, ranking, and formatting pipeline.
    • Included tests for shared memory integration, recency-based ranking, token budget enforcement, and category filtering using mock in-memory backends.
  • tests/unit/memory/test_config.py
    • Updated unit tests to assert the presence and type of the new retrieval configuration within CompanyMemoryConfig.
    • Added a test case for configuring custom retrieval settings.
  • tests/unit/memory/test_formatter.py
    • Added new unit tests for format_memory_context to cover various scenarios including empty memories, single/multiple entries, label presence, token budget limits, injection points, and greedy packing behavior.
  • tests/unit/memory/test_init.py
    • Updated the test for __all__ exports to reflect the increased number of re-exported symbols from the new memory modules.
  • tests/unit/memory/test_injection.py
    • Added new unit tests for InjectionStrategy and InjectionPoint enums, TokenEstimator protocol, DefaultTokenEstimator functionality, and MemoryInjectionStrategy protocol compliance.
  • tests/unit/memory/test_ranking.py
    • Added new unit tests for memory ranking functions, including compute_recency_score, compute_combined_score, and rank_memories.
    • Verified ScoredMemory model validation, personal boost application, default relevance handling, shared entry merging, sorting logic, filtering, and truncation.
  • tests/unit/memory/test_retrieval_config.py
    • Added new unit tests for MemoryRetrievalConfig covering default values, validation rules (weight sum, field ranges, NaN/Inf checks), strategy and injection point settings, and serialization/deserialization.
  • tests/unit/memory/test_retriever.py
    • Added new unit tests for ContextInjectionStrategy to ensure protocol compliance, correct prepare_messages behavior, shared store integration, graceful degradation for various error types, and propagation of critical system errors.
Activity
  • The pull request was pre-reviewed by 9 agents (code-reviewer, python-reviewer, pr-test-analyzer, silent-failure-hunter, comment-analyzer, type-design-analyzer, logging-audit, resilience-audit, docs-consistency).
  • 29 findings were addressed during the pre-review process, including 2 critical, 16 major, 8 medium, and 3 minor issues.
  • Critical fixes included TaskGroup ExceptionGroup unwrapping and addressing missing files in DESIGN_SPEC §15.3.
  • Major fixes involved max_memories truncation after merge, error_type in log kwargs, and docstring/documentation updates.
  • Logging-audit and resilience-audit found zero issues, indicating a robust design in these areas.
  • The test plan confirms 4580 tests pass (with 26 new tests added), maintaining 96.31% coverage (above the 80% minimum).
  • All ruff checks, formatting, mypy strict checks, and pre-commit hooks passed successfully.
  • The CI pipeline (lint + type-check + test + coverage) is pending completion.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive and well-designed memory retrieval pipeline, featuring a new ContextInjectionStrategy for orchestrating memory retrieval, ranking, and context formatting, supported by configuration models, protocols, and utility functions. The implementation demonstrates robustness, excellent error handling, efficient parallel data fetching, and thorough testing. However, a critical security vulnerability has been identified: the formatting of memories into the agent's context is susceptible to prompt injection. This is due to a lack of proper handling for memory content that might contain block delimiters, which could allow attackers to manipulate agent behavior via stored memories. Additionally, a minor refactoring to improve type inference and remove a type: ignore is suggested for enhanced maintainability.

return ()

body = "\n".join(included_lines)
block = f"{MEMORY_BLOCK_START}\n{body}\n{MEMORY_BLOCK_END}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The memory context is formatted using fixed delimiters (--- AGENT MEMORY --- and --- END MEMORY ---) without sanitizing the memory content. An attacker who can influence the content of a stored memory (e.g., through a previous conversation) could include the string --- END MEMORY --- to break out of the memory block and inject arbitrary instructions into the agent's prompt. This is a classic prompt injection vector.

Recommendation: Escape or redact the delimiter strings within the memory content before formatting. For example:

safe_content = content.replace(MEMORY_BLOCK_END, "[DELIMITER_REDACTED]")

Comment on lines +273 to +291
should_fetch_shared = (
self._config.include_shared and self._shared_store is not None
)

personal_coro = _safe_call(
self._backend.retrieve(agent_id, query),
source="personal",
agent_id=agent_id,
)

if should_fetch_shared:
shared_coro = _safe_call(
self._shared_store.search_shared( # type: ignore[union-attr]
query,
exclude_agent=agent_id,
),
source="shared",
agent_id=agent_id,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of the should_fetch_shared flag and the subsequent type: ignore[union-attr] can be simplified. By moving the condition directly into the if statement, we can help the type checker (mypy) infer that self._shared_store is not None within the block. This improves code clarity and removes the need for the type: ignore.

        personal_coro = _safe_call(
            self._backend.retrieve(agent_id, query),
            source="personal",
            agent_id=agent_id,
        )

        if self._config.include_shared and self._shared_store is not None:
            shared_coro = _safe_call(
                self._shared_store.search_shared(
                    query,
                    exclude_agent=agent_id,
                ),
                source="shared",
                agent_id=agent_id,
            )

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 9, 2026

Greptile Summary

This PR delivers the full memory retrieval pipeline for M5: a ContextInjectionStrategy that orchestrates backend query → relevance+recency ranking → greedy token-budget formatting → ChatMessage injection. The implementation is well-structured, with clean separation across ranking.py, formatter.py, injection.py, retrieval_config.py, and retriever.py, and is backed by 26 new tests (unit + integration) at 96% coverage.

Key observations:

  • Silent misconfigurationContextInjectionStrategy.__init__ emits no warning when config.include_shared=True (the default) but shared_store=None. An operator who omits shared_store while keeping the default config will silently receive personal-only memories with no diagnostic signal beyond a has_shared_store=False debug field.
  • Docstring inaccuracy for concurrent system errorsprepare_messages documents "Re-raises builtins.MemoryError and RecursionError", which holds for single-backend failures. However, when both the personal backend and shared store simultaneously raise system-level errors, Python's except* semantics collect both re-raised exceptions into a new ExceptionGroup. This group falls through to the isinstance(exc, ExceptionGroup) branch and is re-raised as an ExceptionGroup, not a bare exception — a behavioural detail that bare except MemoryError: guards at the call site would miss.

Confidence Score: 3/5

  • Safe to merge with the two identified issues fixed first; core pipeline logic is sound and well-tested.
  • The memory retrieval pipeline is well-implemented with strong test coverage (96%, 26 new tests). The compute_combined_score clamp correctly prevents validation overflows, and the token-budget accounting is conservative. However, two production-readiness issues must be addressed: (1) the silent misconfiguration when include_shared=True but shared_store=None could cause operators to receive unexpected personal-only results without diagnostic warning, and (2) the docstring inaccuracy for concurrent system errors could cause call-site error handlers to silently swallow ExceptionGroup exceptions that should propagate. Both are low-risk but worth resolving before the shared-store path is relied upon in production.
  • src/ai_company/memory/retriever.py — add warning for silent misconfiguration in __init__, and update prepare_messages docstring to document ExceptionGroup propagation.

Important Files Changed

Filename Overview
src/ai_company/memory/retriever.py Core pipeline orchestrator implementing ContextInjectionStrategy; two issues identified: (1) silent misconfiguration when include_shared=True but shared_store=None — no warning logged at construction time, (2) docstring inaccuracy regarding concurrent system-error ExceptionGroup propagation. Both issues require targeted fixes before production use.
src/ai_company/memory/ranking.py Functionally pure ranking module; compute_combined_score correctly clamps to 1.0 (line 109), and separate after_filter/after_truncation log fields are clearly distinguishable. No issues found.
src/ai_company/memory/formatter.py Token-budget greedy packer; separator cost accounting is correctly implemented (lines 105-109). Budget arithmetic is conservative and correct.
src/ai_company/memory/injection.py Protocol definitions and enums are clean; no issues with the pluggable strategy interface.
src/ai_company/memory/retrieval_config.py Frozen Pydantic config with weight-sum and strategy validators; defaults are sensible and all constraints are properly enforced.
tests/integration/memory/test_retriever_integration.py Comprehensive end-to-end tests covering full pipeline, shared memories, token budget limits, recency ranking, and category filtering. Mock backends are minimal but sufficient.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant CIS as ContextInjectionStrategy
    participant SC as _safe_call
    participant MB as MemoryBackend
    participant SS as SharedKnowledgeStore
    participant RM as rank_memories()
    participant FMC as format_memory_context()

    Caller->>CIS: prepare_messages(agent_id, query_text, token_budget, categories?)
    CIS->>CIS: _execute_pipeline(...)
    CIS->>CIS: build MemoryQuery(limit=max_memories)

    par Parallel fetch (TaskGroup)
        CIS->>SC: _safe_call(backend.retrieve(...))
        SC->>MB: retrieve(agent_id, query)
        MB-->>SC: tuple[MemoryEntry, ...]
        SC-->>CIS: personal_entries (or () on error)
    and
        CIS->>SC: _safe_call(shared_store.search_shared(...))
        SC->>SS: search_shared(query, exclude_agent=agent_id)
        SS-->>SC: tuple[MemoryEntry, ...]
        SC-->>CIS: shared_entries (or () on error)
    end

    CIS->>RM: rank_memories(personal, config, now, shared)
    Note over RM: Score → merge → filter(min_relevance) → sort↓ → truncate(max_memories)
    RM-->>CIS: tuple[ScoredMemory, ...] sorted by combined_score

    CIS->>FMC: format_memory_context(ranked, estimator, token_budget, injection_point)
    Note over FMC: Greedy pack: delimiter overhead + per-line cost + separator cost
    FMC-->>CIS: tuple[ChatMessage, ...] (0 or 1 message)

    CIS-->>Caller: tuple[ChatMessage, ...]
Loading

Last reviewed commit: f189842

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a full memory retrieval pipeline that fetches personal + shared memories, ranks them by relevance/recency, formats them under a token budget, and returns injected ChatMessage context via a pluggable MemoryInjectionStrategy protocol.

Changes:

  • Added ContextInjectionStrategy pipeline (retrieve → rank → token-budget format → inject) with error isolation and shared-store merging.
  • Implemented ranking (rank_memories) and formatting (format_memory_context) utilities plus retrieval configuration (MemoryRetrievalConfig) and injection-related protocols/enums.
  • Added extensive unit + integration test coverage and updated docs/exports to reflect the new memory injection capabilities.

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/unit/memory/test_retriever.py Unit tests for the retrieval/injection pipeline behavior and degradation paths
tests/unit/memory/test_retrieval_config.py Unit tests for retrieval config defaults/validation/serialization
tests/unit/memory/test_ranking.py Unit tests for scoring/ranking and shared+personal merging/truncation
tests/unit/memory/test_injection.py Unit tests for injection enums and protocols + token estimator
tests/unit/memory/test_init.py Updates __all__ export count assertion
tests/unit/memory/test_formatter.py Unit tests for token-budget formatting and injection point mapping
tests/unit/memory/test_config.py Ensures CompanyMemoryConfig includes retrieval config and supports overrides
tests/integration/memory/test_retriever_integration.py End-to-end integration tests for retrieve→rank→format pipeline
src/ai_company/observability/events/memory.py Adds structured logging event names for retrieval pipeline stages
src/ai_company/memory/retriever.py Implements ContextInjectionStrategy orchestration + parallel shared retrieval
src/ai_company/memory/retrieval_config.py Adds frozen Pydantic config for weighting/thresholds/strategy/injection point
src/ai_company/memory/ranking.py Adds scoring model and ranking functions with relevance+recency weighting
src/ai_company/memory/injection.py Adds MemoryInjectionStrategy/TokenEstimator protocols and enums
src/ai_company/memory/formatter.py Adds greedy token-budget packing into delimited injected message blocks
src/ai_company/memory/config.py Adds retrieval config to CompanyMemoryConfig
src/ai_company/memory/init.py Re-exports new injection/retrieval/ranking components via ai_company.memory
README.md Updates Memory Interface milestone description
DESIGN_SPEC.md Documents memory injection strategies and ranking/formatting behavior
CLAUDE.md Updates repo structure notes to include retrieval pipeline components

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +195 to +205
filtered = [s for s in scored if s.combined_score >= config.min_relevance]
filtered.sort(key=lambda s: s.combined_score, reverse=True)

result = tuple(filtered[: config.max_memories])

logger.debug(
MEMORY_RANKING_COMPLETE,
total_candidates=len(scored),
after_filter=len(result),
min_relevance=config.min_relevance,
)
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The debug log field after_filter is computed from len(result), but result is already truncated to config.max_memories. This makes the metric misleading (it’s “after truncate”, not “after min_relevance filter”). Consider logging both counts (e.g., after_filter=len(filtered) and after_truncate=len(result)) or renaming the field to reflect truncation.

Copilot uses AI. Check for mistakes.
Comment on lines +78 to +83
delimiter_text = f"{MEMORY_BLOCK_START}\n{MEMORY_BLOCK_END}"
delimiter_tokens = estimator.estimate_tokens(delimiter_text)

remaining = token_budget - delimiter_tokens
if remaining <= 0:
logger.debug(
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token-budget enforcement underestimates the final output size: delimiter_tokens is computed from f"{MEMORY_BLOCK_START}\n{MEMORY_BLOCK_END}", but the produced block later is f"{MEMORY_BLOCK_START}\n{body}\n{MEMORY_BLOCK_END}", and the newline separators added between lines / before the end delimiter aren’t included in any estimate_tokens(...) call. Consider accounting for separator/newline token costs (or estimating tokens on the exact final block before returning and trimming accordingly) so token_budget is actually respected under the same estimator.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Around line 1591-1593: The spec paragraph currently hard-codes that pipeline
output is formatted as a SYSTEM message, which conflicts with the
implementation’s configurable injection-point/strategy selection; update the
paragraph to describe that after Pipeline: MemoryBackend.retrieve() -> rank by
relevance+recency -> filter by min_relevance -> greedy token-budget packing ->
format with delimiters, the final packed context is injected at the configured
message role (injection-point) rather than always as SYSTEM, and mention that
the injection role is selectable by the configured strategy so readers know the
contract supports non-SYSTEM injection (e.g., user-configured system, assistant,
or custom role).

In `@README.md`:
- Line 30: Replace the vendor-specific name "Mem0" in the README entry "Memory
Backends (M5) - Mem0 adapter backend ([ADR-001]...)" with a generic phrase
(e.g., "vendor-specific adapter" or "proprietary adapter") while keeping the ADR
reference ([ADR-001]) intact; update the string in README.md so it reads like
"Memory Backends (M5) - vendor-specific adapter backend ([ADR-001])" to conform
to the repo’s vendor-agnostic naming rule and leave vendor details inside
ADR-001.

In `@src/ai_company/memory/formatter.py`:
- Around line 78-79: The token-estimate for delimiters underestimates by one
newline because delimiter_text is built as
f"{MEMORY_BLOCK_START}\n{MEMORY_BLOCK_END}" while the actual block is formatted
as f"{MEMORY_BLOCK_START}\n{body}\n{MEMORY_BLOCK_END}" (two newlines when body
is present); update the calculation that produces delimiter_tokens (and any
similar estimate logic) to mirror the real formatting—either build
delimiter_text with two newlines (e.g., include an extra '\n') or conditionally
add the second newline when body may be non-empty so that MEMORY_BLOCK_START,
MEMORY_BLOCK_END, delimiter_text, delimiter_tokens and the final block
formatting all agree.

In `@src/ai_company/memory/retrieval_config.py`:
- Around line 87-90: The InjectionPoint enum and the retrieval config need to
support dual injection; add a BOTH member to InjectionPoint and update the
injection_point Field in retrieval_config (symbol: injection_point) to accept
InjectionPoint.BOTH (or a meaningful default) and any validation logic so the
formatter/retriever contract can handle both-system-and-user injection; ensure
code paths that switch on InjectionPoint (e.g., formatter/retriever code that
reads injection_point) are updated to treat BOTH as injecting into both SYSTEM
and USER contexts.
- Around line 38-40: Add validation to the RetrievalConfig Pydantic model to
reject future-only InjectionStrategy values (tool_based and self_editing) during
config parsing: in the RetrievalConfig class add a `@validator`("strategy") method
that checks the supplied InjectionStrategy enum and raises a ValueError if it
equals InjectionStrategy.TOOL_BASED or InjectionStrategy.SELF_EDITING, leaving
InjectionStrategy.CONTEXT allowed; ensure the error message is clear so invalid
configs fail at validation rather than at runtime.

In `@src/ai_company/memory/retriever.py`:
- Around line 207-229: Change the two debug-level logs that emit
MEMORY_RETRIEVAL_SKIPPED to INFO so state transitions are visible: in the branch
guarding "if not personal_entries and not shared_entries" and in the branch
after calling rank_memories where "if not ranked" is true, replace
logger.debug(...) with logger.info(...) keeping the same message key and kwargs
(MEMORY_RETRIEVAL_SKIPPED, agent_id=agent_id, reason=...) so the event and its
reason are logged at INFO.
- Around line 142-154: The code currently calls self._execute_pipeline(...) even
when token_budget <= 0; short-circuit this by checking the token_budget right
after logging (or before the try) and immediately return an empty tuple if
token_budget is non-positive to avoid unnecessary backend/shared-store I/O;
update the method containing the logger.info and try block (the retriever method
that passes agent_id, query_text, token_budget, categories into
_execute_pipeline) to perform this check and return () instead of invoking
_execute_pipeline when token_budget <= 0.
- Around line 114-117: The constructor currently uses "token_estimator or
DefaultTokenEstimator()" which will ignore any valid but falsey custom
estimator; change this to an explicit None check so only None triggers the
default. In the initializer that sets self._estimator (look for token_estimator
and DefaultTokenEstimator in retriever.py/__init__), replace the truthiness
fallback with something like: if token_estimator is None then assign
DefaultTokenEstimator(), else assign token_estimator, ensuring falsey-but-valid
estimators are preserved.

In `@tests/unit/memory/test_init.py`:
- Around line 16-17: Update the test_all_has_expected_count to assert the actual
exported names instead of just their count: replace the len(...) assertion with
a set equality check comparing memory_module.__all__ to a literal set of
expected exported symbol names (the exact public API entries). Keep the test
name (test_all_has_expected_count) and the reference to memory_module.__all__ so
the change is localized; list all expected names in the test to ensure the
public API surface is pinned.

In `@tests/unit/memory/test_injection.py`:
- Around line 34-36: The test test_invalid_string_raises currently asserts
ValueError with a brittle message match; update the with pytest.raises(...) call
in test_invalid_string_raises so it no longer depends on Python's StrEnum
message format—either remove the match argument entirely and only assert
ValueError is raised when calling InjectionStrategy("invalid"), or replace the
match with a more flexible pattern that targets the invalid input string (e.g.,
a regex matching "invalid") to assert the message contains the invalid value
rather than relying on the full StrEnum error text.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: dee9386f-ed93-4910-9144-8620f907d97e

📥 Commits

Reviewing files that changed from the base of the PR and between 26b3108 and a63053f.

📒 Files selected for processing (20)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • README.md
  • src/ai_company/memory/__init__.py
  • src/ai_company/memory/config.py
  • src/ai_company/memory/formatter.py
  • src/ai_company/memory/injection.py
  • src/ai_company/memory/ranking.py
  • src/ai_company/memory/retrieval_config.py
  • src/ai_company/memory/retriever.py
  • src/ai_company/observability/events/memory.py
  • tests/integration/memory/__init__.py
  • tests/integration/memory/test_retriever_integration.py
  • tests/unit/memory/test_config.py
  • tests/unit/memory/test_formatter.py
  • tests/unit/memory/test_init.py
  • tests/unit/memory/test_injection.py
  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/test_retrieval_config.py
  • tests/unit/memory/test_retriever.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
!(DESIGN_SPEC.md|.claude/**|**/litellm/**)

📄 CodeRabbit inference engine (CLAUDE.md)

Never use real vendor names (Anthropic, OpenAI, Claude, GPT, etc.) in project-owned code, docstrings, comments, tests, or config examples. Use generic names: example-provider, example-large-001, example-medium-001, example-small-001, large/medium/small as aliases. Tests must use test-provider, test-small-001, etc.

Files:

  • CLAUDE.md
  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/test_injection.py
  • src/ai_company/observability/events/memory.py
  • src/ai_company/memory/injection.py
  • src/ai_company/memory/ranking.py
  • src/ai_company/memory/__init__.py
  • src/ai_company/memory/formatter.py
  • tests/unit/memory/test_config.py
  • tests/unit/memory/test_retrieval_config.py
  • README.md
  • tests/unit/memory/test_formatter.py
  • src/ai_company/memory/config.py
  • DESIGN_SPEC.md
  • tests/unit/memory/test_retriever.py
  • src/ai_company/memory/retrieval_config.py
  • src/ai_company/memory/retriever.py
  • tests/unit/memory/test_init.py
  • tests/integration/memory/test_retriever_integration.py
**/*.md

📄 CodeRabbit inference engine (CLAUDE.md)

Always read DESIGN_SPEC.md before implementing any feature or planning any issue — the spec is the mandatory starting point for architecture, data models, and behavior. If implementation deviates from the spec, alert the user and explain why — user decides whether to proceed or update the spec. Do NOT silently diverge. When a spec section is referenced, read that section verbatim. When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality.

Files:

  • CLAUDE.md
  • README.md
  • DESIGN_SPEC.md
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Do not use from __future__ import annotations — Python 3.14 has PEP 649 native lazy annotations
Use except A, B: syntax without parentheses (not except (A, B):) — PEP 758 except syntax, enforced by ruff on Python 3.14
Add type hints to all public functions and classes — mypy strict mode enforced
Use Google-style docstrings (required on all public classes and functions) — enforced by ruff D rules
Enforce immutability: create new objects, never mutate existing ones. For non-Pydantic internal collections (registries, BaseTool), use copy.deepcopy() at construction + MappingProxyType wrapping for read-only enforcement. For dict/list fields in frozen Pydantic models, rely on frozen=True for field reassignment prevention and copy.deepcopy() at system boundaries
Separate config (frozen Pydantic models) from runtime state (mutable-via-copy models using model_copy(update=...)). Never mix static config fields with mutable runtime fields in one model.
Use Pydantic v2 with adopted conventions: use @computed_field for derived values instead of storing redundant fields; use NotBlankStr (from core.types) for all identifier/name fields (including optional and tuple variants) instead of manual whitespace validators
Prefer asyncio.TaskGroup for fan-out/fan-in parallel operations (e.g., multiple tool invocations, parallel agent calls) — prefer structured concurrency over bare create_task
Keep functions under 50 lines and files under 800 lines
Handle errors explicitly — never silently swallow exceptions
Validate input at system boundaries (user input, external APIs, config files)
Set line length to 88 characters (ruff configured)

Files:

  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/test_injection.py
  • src/ai_company/observability/events/memory.py
  • src/ai_company/memory/injection.py
  • src/ai_company/memory/ranking.py
  • src/ai_company/memory/__init__.py
  • src/ai_company/memory/formatter.py
  • tests/unit/memory/test_config.py
  • tests/unit/memory/test_retrieval_config.py
  • tests/unit/memory/test_formatter.py
  • src/ai_company/memory/config.py
  • tests/unit/memory/test_retriever.py
  • src/ai_company/memory/retrieval_config.py
  • src/ai_company/memory/retriever.py
  • tests/unit/memory/test_init.py
  • tests/integration/memory/test_retriever_integration.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/**/*.py: Use pytest markers: @pytest.mark.unit, @pytest.mark.integration, @pytest.mark.e2e, @pytest.mark.slow to categorize tests
Maintain 80% minimum code coverage (enforced in CI with --cov-fail-under=80)
Use asyncio_mode = "auto" in pytest configuration — no manual @pytest.mark.asyncio needed on async tests
Set test timeout to 30 seconds per test
Use pytest-xdist via -n auto for parallel test execution
Prefer @pytest.mark.parametrize for testing similar cases instead of multiple nearly-identical tests

Files:

  • tests/unit/memory/test_ranking.py
  • tests/unit/memory/test_injection.py
  • tests/unit/memory/test_config.py
  • tests/unit/memory/test_retrieval_config.py
  • tests/unit/memory/test_formatter.py
  • tests/unit/memory/test_retriever.py
  • tests/unit/memory/test_init.py
  • tests/integration/memory/test_retriever_integration.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Every module with business logic must have: from ai_company.observability import get_logger followed by logger = get_logger(__name__) — never use import logging / logging.getLogger() / print()
Use event name constants from domain-specific modules under ai_company.observability.events (e.g., PROVIDER_CALL_START from events.provider, BUDGET_RECORD_ADDED from events.budget). Import directly: from ai_company.observability.events.<domain> import EVENT_CONSTANT
Use structured logging with kwargs: always logger.info(EVENT, key=value) — never logger.info("msg %s", val)
Log all error paths at WARNING or ERROR level with context before raising exceptions
Log all state transitions at INFO level
Use DEBUG logging for object creation, internal flow, and entry/exit of key functions

Files:

  • src/ai_company/observability/events/memory.py
  • src/ai_company/memory/injection.py
  • src/ai_company/memory/ranking.py
  • src/ai_company/memory/__init__.py
  • src/ai_company/memory/formatter.py
  • src/ai_company/memory/config.py
  • src/ai_company/memory/retrieval_config.py
  • src/ai_company/memory/retriever.py
🧠 Learnings (2)
📚 Learning: 2026-03-09T10:20:23.072Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T10:20:23.072Z
Learning: Applies to src/**/*.py : Use event name constants from domain-specific modules under `ai_company.observability.events` (e.g., `PROVIDER_CALL_START` from `events.provider`, `BUDGET_RECORD_ADDED` from `events.budget`). Import directly: `from ai_company.observability.events.<domain> import EVENT_CONSTANT`

Applied to files:

  • src/ai_company/observability/events/memory.py
📚 Learning: 2026-03-09T10:20:23.072Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-09T10:20:23.072Z
Learning: Applies to **/*.py : Separate config (frozen Pydantic models) from runtime state (mutable-via-copy models using `model_copy(update=...)`). Never mix static config fields with mutable runtime fields in one model.

Applied to files:

  • src/ai_company/memory/retrieval_config.py
🧬 Code graph analysis (9)
tests/unit/memory/test_ranking.py (2)
src/ai_company/core/enums.py (1)
  • MemoryCategory (100-107)
src/ai_company/memory/ranking.py (4)
  • ScoredMemory (26-58)
  • compute_combined_score (87-104)
  • compute_recency_score (61-84)
  • rank_memories (161-207)
tests/unit/memory/test_injection.py (1)
src/ai_company/memory/injection.py (10)
  • DefaultTokenEstimator (71-91)
  • InjectionPoint (37-46)
  • InjectionStrategy (23-34)
  • MemoryInjectionStrategy (95-145)
  • TokenEstimator (50-68)
  • estimate_tokens (57-68)
  • estimate_tokens (78-91)
  • prepare_messages (105-125)
  • get_tool_definitions (127-136)
  • strategy_name (139-145)
src/ai_company/memory/formatter.py (5)
src/ai_company/memory/injection.py (4)
  • InjectionPoint (37-46)
  • TokenEstimator (50-68)
  • estimate_tokens (57-68)
  • estimate_tokens (78-91)
src/ai_company/memory/ranking.py (1)
  • ScoredMemory (26-58)
src/ai_company/providers/enums.py (1)
  • MessageRole (6-12)
src/ai_company/providers/models.py (1)
  • ChatMessage (138-210)
tests/unit/memory/test_injection.py (1)
  • estimate_tokens (63-64)
tests/unit/memory/test_retrieval_config.py (2)
src/ai_company/memory/injection.py (2)
  • InjectionPoint (37-46)
  • InjectionStrategy (23-34)
src/ai_company/memory/retrieval_config.py (1)
  • MemoryRetrievalConfig (20-109)
tests/unit/memory/test_formatter.py (6)
src/ai_company/core/enums.py (1)
  • MemoryCategory (100-107)
src/ai_company/memory/formatter.py (1)
  • format_memory_context (53-138)
src/ai_company/memory/injection.py (2)
  • DefaultTokenEstimator (71-91)
  • InjectionPoint (37-46)
src/ai_company/memory/models.py (2)
  • MemoryEntry (82-150)
  • MemoryMetadata (20-52)
src/ai_company/memory/ranking.py (1)
  • ScoredMemory (26-58)
src/ai_company/providers/enums.py (1)
  • MessageRole (6-12)
src/ai_company/memory/config.py (1)
src/ai_company/memory/retrieval_config.py (1)
  • MemoryRetrievalConfig (20-109)
tests/unit/memory/test_retriever.py (8)
src/ai_company/core/enums.py (1)
  • MemoryCategory (100-107)
src/ai_company/memory/errors.py (2)
  • MemoryRetrievalError (25-26)
  • MemoryError (13-14)
src/ai_company/memory/injection.py (5)
  • DefaultTokenEstimator (71-91)
  • MemoryInjectionStrategy (95-145)
  • strategy_name (139-145)
  • get_tool_definitions (127-136)
  • prepare_messages (105-125)
src/ai_company/memory/models.py (3)
  • MemoryEntry (82-150)
  • MemoryMetadata (20-52)
  • MemoryQuery (153-230)
src/ai_company/memory/retrieval_config.py (1)
  • MemoryRetrievalConfig (20-109)
src/ai_company/memory/retriever.py (4)
  • ContextInjectionStrategy (93-324)
  • strategy_name (318-324)
  • get_tool_definitions (309-315)
  • prepare_messages (119-175)
src/ai_company/providers/enums.py (1)
  • MessageRole (6-12)
tests/integration/memory/test_retriever_integration.py (2)
  • retrieve (58-66)
  • store (51-56)
src/ai_company/memory/retrieval_config.py (2)
src/ai_company/memory/injection.py (2)
  • InjectionPoint (37-46)
  • InjectionStrategy (23-34)
src/ai_company/observability/_logger.py (1)
  • get_logger (8-28)
src/ai_company/memory/retriever.py (8)
src/ai_company/memory/formatter.py (1)
  • format_memory_context (53-138)
src/ai_company/memory/injection.py (2)
  • DefaultTokenEstimator (71-91)
  • TokenEstimator (50-68)
src/ai_company/memory/models.py (2)
  • MemoryQuery (153-230)
  • MemoryEntry (82-150)
src/ai_company/memory/ranking.py (1)
  • rank_memories (161-207)
src/ai_company/core/enums.py (1)
  • MemoryCategory (100-107)
src/ai_company/memory/protocol.py (1)
  • MemoryBackend (20-182)
src/ai_company/memory/retrieval_config.py (1)
  • MemoryRetrievalConfig (20-109)
src/ai_company/memory/shared.py (1)
  • SharedKnowledgeStore (18-82)
🪛 LanguageTool
README.md

[typographical] ~24-~24: To join two clauses or introduce examples, consider using an em dash.
Context: ...a migrations - Memory Interface (M5) - Pluggable MemoryBackend protocol with ...

(DASH_RULE)


[typographical] ~25-~25: To join two clauses or introduce examples, consider using an em dash.
Context: ...) - Coordination Error Taxonomy (M5) - Post-execution classification pipeline d...

(DASH_RULE)


[typographical] ~26-~26: To join two clauses or introduce examples, consider using an em dash.
Context: ...n failures - Budget Enforcement (M5) - BudgetEnforcer service with pre-flight...

(DASH_RULE)

🔇 Additional comments (10)
tests/unit/memory/test_injection.py (1)

1-139: LGTM!

Well-structured test suite covering enum values, protocol conformance, and token estimator behavior. Good use of @pytest.mark.parametrize for the various lengths test case.

tests/unit/memory/test_formatter.py (1)

1-237: LGTM!

Comprehensive test coverage for the memory formatter including edge cases (empty input, zero/negative budgets, budget too small), greedy packing behavior, injection point handling, and delimiter wrapping. Good use of the helper function to reduce boilerplate.

src/ai_company/memory/formatter.py (2)

34-50: LGTM!

Clean helper function with clear formatting logic. The shared prefix handling and score formatting are well-implemented.


91-118: LGTM!

The greedy packing algorithm is well-documented and correctly implements the "skip large, include small" behavior. Good DEBUG logging for skipped memories and budget exhaustion scenarios.

tests/unit/memory/test_ranking.py (1)

1-397: LGTM!

Excellent test coverage for the ranking module including:

  • Exponential decay behavior with various decay rates
  • Edge cases (zero age, future timestamps, high decay)
  • Combined score weighted calculations
  • ScoredMemory model validation (range, frozen, inf/nan rejection)
  • Full rank_memories workflow with filtering, sorting, merging, and truncation

Good use of pytest.approx for floating-point comparisons and @pytest.mark.parametrize for validation edge cases.

src/ai_company/observability/events/memory.py (1)

53-62: LGTM!

New event constants follow the established memory.<entity>.<action> naming convention. The noqa: S105 suppression on Line 62 is appropriate since "TOKEN" in MEMORY_TOKEN_BUDGET_EXCEEDED is a false positive from bandit's hardcoded password detection.

tests/unit/memory/test_retriever.py (1)

1-400: LGTM!

Comprehensive unit test coverage for ContextInjectionStrategy including:

  • Protocol conformance (isinstance, strategy_name, get_tool_definitions)
  • Query construction and parameter propagation
  • Shared memory merging with include_shared flag
  • Graceful degradation with proper error classification (swallow recoverable errors, propagate MemoryError/RecursionError)
  • Token budget handling

The graceful degradation tests (Lines 218-368) are particularly well-designed, ensuring critical system errors propagate while recoverable errors degrade gracefully.

src/ai_company/memory/injection.py (1)

1-145: LGTM!

Well-designed module with clean separation of concerns:

  • TokenEstimator protocol decouples token estimation from engine dependencies
  • MemoryInjectionStrategy protocol enables pluggable injection strategies
  • Both protocols are @runtime_checkable enabling isinstance checks
  • DefaultTokenEstimator provides a reasonable heuristic fallback
  • Good use of TYPE_CHECKING to avoid import cycles

Documentation clearly indicates which strategies are implemented vs. planned.

tests/integration/memory/test_retriever_integration.py (2)

27-95: LGTM!

Well-designed in-memory test doubles that implement the necessary interface methods. The InMemoryBackend correctly filters by agent_id and categories, and respects query.limit. The semantic search (query.text) is intentionally not implemented since these are integration tests focused on pipeline behavior rather than search algorithm correctness.


148-386: LGTM!

Excellent end-to-end integration tests covering:

  • Full pipeline producing formatted output with memory block markers
  • Shared memory inclusion with [shared] prefix
  • Empty result handling
  • Recency-based ranking order verification
  • Token budget enforcement affecting content size
  • Category filtering

These tests validate the complete retrieve → rank → format pipeline with realistic scenarios.

Comment on lines +195 to +205
filtered = [s for s in scored if s.combined_score >= config.min_relevance]
filtered.sort(key=lambda s: s.combined_score, reverse=True)

result = tuple(filtered[: config.max_memories])

logger.debug(
MEMORY_RANKING_COMPLETE,
total_candidates=len(scored),
after_filter=len(result),
min_relevance=config.min_relevance,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix the relevance floor and ranking completion telemetry.

Line 195 applies config.min_relevance to combined_score, so recency can push a low-relevance memory above the filter. Line 203 then reports the truncated tuple as after_filter, and Line 200 emits the completion event at DEBUG, which both under-reports and hides ranking outcomes.

Proposed fix
-    filtered = [s for s in scored if s.combined_score >= config.min_relevance]
+    filtered = [s for s in scored if s.relevance_score >= config.min_relevance]
     filtered.sort(key=lambda s: s.combined_score, reverse=True)

     result = tuple(filtered[: config.max_memories])

-    logger.debug(
+    logger.info(
         MEMORY_RANKING_COMPLETE,
         total_candidates=len(scored),
-        after_filter=len(result),
+        after_filter=len(filtered),
+        returned=len(result),
         min_relevance=config.min_relevance,
     )

As per coding guidelines, "Log all state transitions at INFO level".

)

Implement the full retrieval pipeline between MemoryBackend.retrieve()
and AgentEngine.run(memory_messages=...): query backend, re-rank by
relevance+recency, enforce token budget, merge personal+shared, and
format as ChatMessage tuples.

- MemoryInjectionStrategy protocol with three planned strategies
- ContextInjectionStrategy (context injection, implemented)
- MemoryRetrievalConfig with weight validation
- ScoredMemory ranking with exponential recency decay
- Token-budget-aware formatter with greedy packing
- Parallel fetch with error isolation (TaskGroup)
- Graceful degradation (domain errors → empty, builtins propagate)
- DESIGN_SPEC.md §7.7 documenting all three injection strategies
Pre-reviewed by 9 agents, 29 findings addressed:
- Fix TaskGroup ExceptionGroup wrapping (except* unwrap)
- Consolidate _safe_retrieve_* into single _safe_call helper
- Add max_memories truncation after personal+shared merge
- Add error_type kwarg to except Exception handlers
- Add defensive ValueError for unknown InjectionPoint variants
- Fix DefaultTokenEstimator to return min 1 for non-empty text
- Update DESIGN_SPEC §15.3, §7.7 protocol params, CLAUDE.md, README
- Add missing docstring attributes, Raises sections, boost docs
- Add 26 new tests (validation, error propagation, edge cases)
…ers (#41)

Source fixes:
- Critical: clamp combined_score to prevent Pydantic ValidationError from float tolerance
- Critical: handle ExceptionGroup wrapping system-level errors in prepare_messages
- Major: add _validate_supported_strategy rejecting unimplemented strategies
- Major: short-circuit prepare_messages on non-positive token budget
- Major: promote MEMORY_RETRIEVAL_SKIPPED from debug to info level
- Medium: remove type:ignore by narrowing SharedKnowledgeStore via local variable
- Medium: fix delimiter token accounting and add per-line separator cost
- Medium: sanitize memory content against delimiter injection
- Medium: log warning on unsupported injection point before raising ValueError
- Minor: add clarifying comment to except* ExceptionGroup unwrapping

Docs fixes:
- DESIGN_SPEC §7.7: say "configured message role" not "SYSTEM message"
- DESIGN_SPEC §1.4: fix #41 description (retrieval pipeline, not Mem0 adapter)
- README: replace vendor name "Mem0" with generic reference
- injection.py: add cross-module reference to ContextInjectionStrategy

Test fixes:
- Add test for both backends failing simultaneously
- Add test for unsupported InjectionPoint ValueError
- Add min_relevance exact boundary test
- Replace fragile count assertion with set equality in test_init
- Fix match pattern in test_injection to avoid false positives
- Update test_retrieval_config for new strategy validation
Copilot AI review requested due to automatic review settings March 9, 2026 12:12
@Aureliolo Aureliolo force-pushed the feat/memory-retrieval branch from 717444c to f189842 Compare March 9, 2026 12:12
@Aureliolo Aureliolo merged commit 873b0aa into main Mar 9, 2026
11 checks passed
@Aureliolo Aureliolo deleted the feat/memory-retrieval branch March 9, 2026 12:13
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 20 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

),
)
if system_errors is not None:
raise system_errors from exc
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prepare_messages() docstring says it re-raises bare builtins.MemoryError / RecursionError, but this branch currently re-raises an ExceptionGroup subgroup (raise system_errors). Consider unwrapping and raising a single underlying exception (or switching to except* where applicable) to match the documented behavior.

Suggested change
raise system_errors from exc
# Re-raise a bare builtins.MemoryError / RecursionError
# to match the documented behavior, instead of raising
# an ExceptionGroup subgroup.
for error in system_errors.exceptions:
if isinstance(error, (builtins_MemoryError, RecursionError)):
raise error from exc

Copilot uses AI. Check for mistakes.
system_errors = exc.subgroup(
lambda e: isinstance(
e,
builtins_MemoryError | RecursionError,
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance() does not accept PEP 604 union types (A | B) as the second argument; this will raise TypeError if this ExceptionGroup path is hit. Use a tuple instead (e.g., isinstance(e, (builtins_MemoryError, RecursionError))).

Suggested change
builtins_MemoryError | RecursionError,
(builtins_MemoryError, RecursionError),

Copilot uses AI. Check for mistakes.
Comment on lines +101 to +130
def __init__(
self,
*,
backend: MemoryBackend,
config: MemoryRetrievalConfig,
shared_store: SharedKnowledgeStore | None = None,
token_estimator: TokenEstimator | None = None,
) -> None:
"""Initialise the context injection strategy.

Args:
backend: Memory backend for personal memories.
config: Retrieval pipeline configuration.
shared_store: Optional shared knowledge store.
token_estimator: Optional custom token estimator.
"""
self._backend = backend
self._config = config
self._shared_store = shared_store
self._estimator = (
token_estimator if token_estimator is not None else DefaultTokenEstimator()
)
logger.debug(
MEMORY_RETRIEVAL_START,
strategy="context_injection",
backend=backend.backend_name
if hasattr(backend, "backend_name")
else type(backend).__qualname__,
has_shared_store=shared_store is not None,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silent misconfiguration: include_shared=True with no shared_store

When config.include_shared=True (the default) but shared_store=None, shared memory retrieval is silently skipped. An operator who keeps the default config but omits the shared_store argument to the constructor will receive no indication that shared memories are absent from context — the has_shared_store=False field in the debug log is easy to miss.

A warning at construction time whenever this mismatch occurs would make the misconfiguration immediately visible:

if config.include_shared and shared_store is None:
    logger.warning(
        "include_shared is True but no shared_store was provided; "
        "shared memories will not be retrieved",
        strategy="context_injection",
    )
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/memory/retriever.py
Line: 101-130

Comment:
**Silent misconfiguration: `include_shared=True` with no `shared_store`**

When `config.include_shared=True` (the default) but `shared_store=None`, shared memory retrieval is silently skipped. An operator who keeps the default config but omits the `shared_store` argument to the constructor will receive no indication that shared memories are absent from context — the `has_shared_store=False` field in the debug log is easy to miss.

A warning at construction time whenever this mismatch occurs would make the misconfiguration immediately visible:

```python
if config.include_shared and shared_store is None:
    logger.warning(
        "include_shared is True but no shared_store was provided; "
        "shared memories will not be retrieved",
        strategy="context_injection",
    )
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +132 to +154
async def prepare_messages(
self,
agent_id: NotBlankStr,
query_text: NotBlankStr,
token_budget: int,
*,
categories: frozenset[MemoryCategory] | None = None,
) -> tuple[ChatMessage, ...]:
"""Full pipeline: retrieve → rank → budget-fit → format.

Returns empty tuple on any failure (graceful degradation).
Never raises domain memory errors to the caller.
Re-raises ``builtins.MemoryError`` and ``RecursionError``.

Args:
agent_id: The agent requesting memories.
query_text: Text for semantic retrieval.
token_budget: Maximum tokens for memory content.
categories: Optional category filter.

Returns:
Tuple of ``ChatMessage`` instances (may be empty).
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring inaccurate for concurrent system-error case

The docstring states:

Re-raises builtins.MemoryError and RecursionError.

This holds when only one backend fails with a system-level error — _fetch_memories's except* block raises a bare exception that reaches the except builtins_MemoryError: raise guard here.

However, when both concurrent backends (personal + shared) fail with system-level errors simultaneously, Python's except* semantics collect the re-raised exceptions from both handlers and synthesise a new ExceptionGroup([MemoryError(…), RecursionError(…)]). Because bare except MemoryError: does not match an ExceptionGroup, this falls through to except Exception as exc: → the isinstance(exc, ExceptionGroup) branch → raise system_errors from exc (line 200), which re-raises an ExceptionGroup — not a bare exception.

A caller using a bare except MemoryError: guard above a prepare_messages call will silently swallow this concurrent-failure ExceptionGroup, defeating the intended propagation contract. The docstring should document this distinction:

Re-raises ``builtins.MemoryError`` and ``RecursionError`` as bare
exceptions when a single source fails.  If both the personal backend
and shared store simultaneously raise system-level errors, re-raises
an ``ExceptionGroup`` wrapping those exceptions.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/ai_company/memory/retriever.py
Line: 132-154

Comment:
**Docstring inaccurate for concurrent system-error case**

The docstring states:
> Re-raises ``builtins.MemoryError`` and ``RecursionError``.

This holds when only **one** backend fails with a system-level error — `_fetch_memories`'s `except*` block raises a bare exception that reaches the `except builtins_MemoryError: raise` guard here.

However, when **both** concurrent backends (personal + shared) fail with system-level errors simultaneously, Python's `except*` semantics collect the re-raised exceptions from both handlers and synthesise a new `ExceptionGroup([MemoryError(…), RecursionError(…)])`. Because bare `except MemoryError:` does not match an `ExceptionGroup`, this falls through to `except Exception as exc:` → the `isinstance(exc, ExceptionGroup)` branch → `raise system_errors from exc` (line 200), which re-raises an `ExceptionGroup` — not a bare exception.

A caller using a bare `except MemoryError:` guard above a `prepare_messages` call will silently swallow this concurrent-failure `ExceptionGroup`, defeating the intended propagation contract. The docstring should document this distinction:

```
Re-raises ``builtins.MemoryError`` and ``RecursionError`` as bare
exceptions when a single source fails.  If both the personal backend
and shared store simultaneously raise system-level errors, re-raises
an ``ExceptionGroup`` wrapping those exceptions.
```

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement memory retrieval, ranking, and context injection

2 participants