Skip to content

docs: add ADR-001 memory layer evaluation and selection#178

Merged
Aureliolo merged 4 commits intomainfrom
docs/memory-layer-evaluation
Mar 8, 2026
Merged

docs: add ADR-001 memory layer evaluation and selection#178
Aureliolo merged 4 commits intomainfrom
docs/memory-layer-evaluation

Conversation

@Aureliolo
Copy link
Copy Markdown
Owner

Summary

Key Findings

  • Letta/Cognee eliminated: Python <3.14 constraint (conservative bounds, not technical — on watch list)
  • Supermemory eliminated: proprietary engine, SDK-only open source, no self-hosting
  • Kuzu archived Oct 2025 — optional in all candidates, not a blocker, but not recommended for new projects
  • Protocol-based architecture: any backend swappable via config, decision is never final

Test Plan

  • Verify ADR-001 renders correctly on GitHub
  • Verify DESIGN_SPEC.md §15.2 update is accurate
  • No code changes — docs only

Closes #39

Evaluated 16+ agent memory candidates across gate checks (local-first,
license, Docker, Python 3.14 compat, isolation, embeddings) and scored
criteria (memory type coverage, retrieval quality, graph capability,
stability, protocol fit, async support, etc.).

Decision: Mem0 as initial backend (in-process, Qdrant embedded + SQLite,
persistent to Docker volume) behind pluggable MemoryBackend protocol.
Custom stack (Neo4j + Qdrant external) as planned future upgrade.
Cognee/Letta on watch list pending Python 3.14 support.

Updated DESIGN_SPEC.md §15.2 to reflect the decision.
Copilot AI review requested due to automatic review settings March 8, 2026 21:52
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 8, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 8, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6cdab757-270b-4ea6-855f-236bb9b1cf96

📥 Commits

Reviewing files that changed from the base of the PR and between fa20c33 and 248e9e2.

📒 Files selected for processing (4)
  • CLAUDE.md
  • DESIGN_SPEC.md
  • README.md
  • docs/decisions/ADR-001-memory-layer.md

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Memory configuration now specifies an initial in-process memory backend (Mem0) and exposes storage settings (data_dir, vector_store, history_store) for persistence and vector storage.
  • Documentation

    • Updated system architecture and product docs to reflect Mem0 as the initial memory backend, a documented upgrade path to a custom stack, and an architectural decision record (ADR-001) describing selection and phased rollout.

Walkthrough

Replaces "memory TBD" with Mem0 as the initial in-process memory backend (configurable via MemoryBackend ADR-001), adds persistence/storage fields to memory config, documents the decision and migration path to a custom Neo4j+Qdrant stack in ADR-001, and updates package/readme docs and tech-stack references accordingly.

Changes

Cohort / File(s) Summary
Design Spec
DESIGN_SPEC.md
Replaced memory placeholder with Mem0 (initial) behind a pluggable MemoryBackend (ADR-001); added memory.storage schema (data_dir, vector_store, history_store); updated Technology Stack §15.2 to show Mem0 → custom stack path and ADR reference.
ADR / Decision Record
docs/decisions/ADR-001-memory-layer.md
New ADR documenting selection of Mem0 as initial in-process backend, evaluation of candidates, gate criteria, memory protocol concepts (MemoryBackend, MemoryCapabilities), per-agent isolation, phased rollout (Mem0 → graph enablement → custom backend), config schemas, and swappability policy.
Docs & Packaging
CLAUDE.md, README.md
Updated package/module description and README status/tech-stack text to reference Mem0 initial backend, ADR-001, and M5 milestone for Memory & Budget. Minor phrasing and milestone sequencing updates.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • Copilot
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'docs: add ADR-001 memory layer evaluation and selection' accurately and concisely describes the main change: adding a decision record document for memory layer evaluation and selection.
Description check ✅ Passed The description clearly details the evaluation process, gate checks, scoring criteria, key findings, the Mem0 decision, and downstream impacts—all directly related to the changeset.
Linked Issues check ✅ Passed The PR comprehensively meets all acceptance criteria from issue #39: evaluated 16+ candidates, applied gate checks, produced scored comparison, documented decision in ADR-001, assessed organizational memory compatibility, and updated DESIGN_SPEC §15.2.
Out of Scope Changes check ✅ Passed All changes are in scope: ADR-001 creation and DESIGN_SPEC.md update directly address issue #39 objectives with no unrelated code modifications introduced.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch docs/memory-layer-evaluation
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch docs/memory-layer-evaluation

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request formalizes the architectural decision for the project's agent memory layer by introducing ADR-001. It outlines a thorough evaluation of various memory solutions against defined architectural constraints and project requirements. The outcome is the strategic adoption of Mem0 as the initial, easily deployable in-process backend, with a clear roadmap for transitioning to a more robust custom stack in the future. This approach ensures flexibility through a pluggable protocol, addressing critical dependencies for upcoming development milestones.

Highlights

  • Memory Layer Selection: A comprehensive evaluation of over 16 agent memory candidates was conducted, leading to the selection of Mem0 as the initial backend.
  • Initial Backend Decision: Mem0 was chosen as the initial in-process memory backend, utilizing embedded Qdrant and SQLite for persistence, due to its maturity and ease of deployment.
  • Future Backend Strategy: A custom stack comprising Neo4j and external Qdrant is planned as a future upgrade to provide more comprehensive memory type coverage and control.
  • Protocol-Based Architecture: The memory layer is designed with a pluggable MemoryBackend protocol, ensuring that different backends can be swapped via configuration without impacting consumers.
  • Eliminated Candidates: Several candidates like Letta and Cognee were eliminated due to Python 3.14 compatibility issues, while Supermemory and Graphlit were excluded for being proprietary or cloud-only.
  • Documentation Update: The DESIGN_SPEC.md was updated to reflect the memory layer decision, and a new Architecture Decision Record (ADR-001) was added to detail the evaluation process.
Changelog
  • DESIGN_SPEC.md
    • Updated the 'Agent Memory' entry to specify Mem0 as the initial backend and a custom stack as the future upgrade, including details on persistence and configurability.
  • docs/decisions/ADR-001-memory-layer.md
    • Added a new Architecture Decision Record (ADR-001) documenting the comprehensive evaluation and selection process for the agent memory layer.
    • Detailed the context, architectural constraints, and requirements for the memory module, including five memory types and four persistence levels.
    • Presented a long list of over 16 candidate memory solutions and their key features.
    • Outlined gate check definitions and results, leading to the elimination of several candidates based on criteria like local-first, license, Docker, and Python 3.14+ compatibility.
    • Provided a scored evaluation of the remaining candidates (Mem0, Graphiti, Custom Stack) across 11 criteria, including memory type coverage, retrieval quality, and stability.
    • Documented the decision to use Mem0 as the initial in-process backend and a custom stack (Neo4j + Qdrant external) as the target future architecture.
    • Included architectural diagrams for both the initial Mem0 in-process setup and the future custom stack with external services.
    • Explained the configuration approach for memory backends and the mechanism for per-agent isolation.
    • Detailed the impact of this decision on related issues (Design individual agent memory interface: working, episodic, semantic, procedural (DESIGN_SPEC §7.1-7.3) #32 Memory Interface Design, Implement pluggable PersistenceBackend protocol with SQLite backend (DESIGN_SPEC §7.5) #36 Persistence, Implement shared organizational memory with OrgMemoryBackend protocol (DESIGN_SPEC §7.4) #125 Org Memory Backends) and strategies for embedding providers and graph databases.
    • Identified potential risks and their mitigations, and discussed alternatives considered, such as using a custom stack initially or other candidates like Cognee and Letta.
    • Emphasized backend swappability as a core design principle and listed candidates on a watch list for future reconsideration.
    • Provided component version references for key technologies and detailed reasons for the elimination of specific candidates.
Activity
  • The PR description includes a test plan, but no actual activity such as comments, reviews, or CI/CD results were provided in the context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive Architecture Decision Record (ADR) for the agent memory layer, along with an update to the main design specification to reflect the decision. The ADR is exceptionally well-researched, detailed, and clearly structured, evaluating multiple candidates and providing a solid rationale for selecting Mem0 as the initial backend. The inclusion of architecture diagrams, configuration examples, and analysis of consequences is excellent. I have one minor correction to suggest in the ADR to resolve a small contradiction, but otherwise, this is a fantastic piece of documentation.

Note: Security Review has been skipped due to the limited scope of the PR.


The protocol-based architecture means **the memory layer decision is never final**.
Any backend that satisfies the `MemoryBackend` protocol can be added as an alternative
implementation. The custom stack is the **initial backend**, not the only one.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a small contradiction here. This line states that the 'custom stack' is the initial backend, but the 'Decision' section (line 199) clearly selects 'Mem0'. This should be corrected to avoid confusion and maintain consistency throughout the document.

Suggested change
implementation. The custom stack is the **initial backend**, not the only one.
implementation. **Mem0** is the **initial backend**, not the only one.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Mar 8, 2026

Greptile Summary

This PR closes issue #39 by adding ADR-001, a comprehensive evaluation of 16+ agent memory candidates for the memory/ module, and updating DESIGN_SPEC.md, README.md, and CLAUDE.md to reflect the decision. The ADR applies gate checks (local-first, license, Docker, Python 3.14, isolation, embeddings) to narrow 16 candidates to 3 viable finalists (Mem0, Graphiti, Custom Stack), then applies a weighted scoring rubric (S1–S11, 100 points) to select Mem0 in-process (Qdrant embedded + SQLite) as the M5-Phase 1 backend, with a Custom Stack (Neo4j + Qdrant external) as the planned Phase 3 upgrade — all behind a pluggable MemoryBackend protocol.

Key strengths:

  • ADR structure is thorough and well-reasoned; gate checks, scoring, and alternatives sections are clearly documented
  • DESIGN_SPEC.md §15.2 updates correctly replace all "TBD" placeholders and align the config YAML snippet and risk register with ADR-001
  • Decision rationale is transparent and decision-making is traceable
  • Protocol-based architecture ensures future backend swappability

Minor documentation gaps:

  • mem0ai itself is absent from the Component Version References table, which directly contradicts the risk mitigation advice to "Pin mem0ai version" — the canonical version to pin is only discoverable by searching the evaluation tables rather than the normative reference section
  • Typo in the appendix: "Memari" does not match any candidate in the long list (likely intended to be "Memary", position 11 in the long list)

Confidence Score: 4/5

  • Safe to merge — docs-only PR with no code changes; minor documentation polish items remain but none are blocking.
  • The ADR is thorough, well-structured, and internally consistent. The DESIGN_SPEC.md and README updates are accurate and minimal. Two non-blocking documentation issues remain: (1) mem0ai is absent from the Component Version References table despite being the selected backend and having a stated risk mitigation to "pin" its version, contradicting the normative reference section design; (2) a "Memari" typo in the appendix that should be "Memary" based on the long list. Neither issue affects the correctness of the decision or the downstream implementation.
  • docs/decisions/ADR-001-memory-layer.md — Component Version References table (missing mem0ai row) and appendix typo

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Agent Code] -->|calls| B[MemoryBackend Protocol]
    B -->|Phase 1 M5| C[Mem0MemoryBackend]
    B -->|Phase 3 future| D[CustomMemoryBackend]

    C --> E[Mem0 Engine mem0ai]
    C --> F[Qdrant Embedded in-process]
    C --> G[SQLite History in-process]

    D --> H[Neo4j CE Docker]
    D --> I[Qdrant External Docker]

    F -->|persists to| J[Docker Volume /data/memory]
    G -->|persists to| J

    E --> F
    E --> G

    C -.->|enable via config flag| K[Neo4j CE Phase 2 optional graph]
Loading

Last reviewed commit: 248e9e2

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an Architecture Decision Record (ADR) documenting the evaluation and selection of an initial agent memory backend for the planned memory/ module, and updates the design spec tech stack to reflect the decision (issue #39).

Changes:

  • Added ADR-001 with gate checks + weighted scoring across memory-layer candidates and a decision to start with Mem0 behind a pluggable MemoryBackend protocol.
  • Updated DESIGN_SPEC.md §15.2 to replace “TBD” with the Mem0→custom-stack plan and persistence notes.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
docs/decisions/ADR-001-memory-layer.md New ADR capturing candidate discovery, gating, scoring, decision, and rollout plan for the memory layer.
DESIGN_SPEC.md Updates the technology stack table entry for Agent Memory to reflect ADR-001’s decision and planned evolution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +316 to +320
│ │ │ working → in-process │ │ │
│ │ │ episodic → Qdrant (external) │ │ │
│ │ │ semantic → Neo4j (external) + Qdrant │ │ │
│ │ │ procedur → Qdrant (external) │ │ │
│ │ │ social → Neo4j (external) │ │ │
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ASCII architecture diagram for the custom backend has a typo: “procedur → Qdrant (external)” should be “procedural”. Fixing this avoids confusion and keeps terminology consistent with the 5 memory types used elsewhere in the ADR/spec.

Suggested change
│ │ │ working → in-process │ │ │
│ │ │ episodic → Qdrant (external) │ │ │
│ │ │ semantic → Neo4j (external) + Qdrant │ │ │
│ │ │ procedur → Qdrant (external) │ │ │
│ │ │ social → Neo4j (external) │ │ │
│ │ │ working → in-process │ │ │
│ │ │ episodic → Qdrant (external) │ │ │
│ │ │ semantic → Neo4j (external) + Qdrant │ │ │
│ │ │ procedural → Qdrant (external) │ │ │
│ │ │ social → Neo4j (external) │ │ │

Copilot uses AI. Check for mistakes.
Comment on lines +520 to +528
implementation. The custom stack is the **initial backend**, not the only one.

Future backends can be added without modifying existing code:

| Candidate | Trigger to Revisit | What It Would Replace |
|-----------|-------------------|----------------------|
| **Cognee** | Adds Python 3.14 support | Could replace custom graph+vector layer with unified cognify pipeline |
| **Letta** | Adds Python 3.14 support + standalone memory extraction | Could power self-editing memory for advanced agents |
| **Mem0** | If 5-type taxonomy support improves | Could replace custom vector layer with managed extraction |
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section says “The custom stack is the initial backend”, but earlier the ADR’s explicit decision is “Initial Backend: Mem0”. This looks like a copy/paste error and reverses the decision; please update it so the “initial backend” terminology is consistent throughout (and ensure the backend replacement table that follows matches the chosen initial backend).

Suggested change
implementation. The custom stack is the **initial backend**, not the only one.
Future backends can be added without modifying existing code:
| Candidate | Trigger to Revisit | What It Would Replace |
|-----------|-------------------|----------------------|
| **Cognee** | Adds Python 3.14 support | Could replace custom graph+vector layer with unified cognify pipeline |
| **Letta** | Adds Python 3.14 support + standalone memory extraction | Could power self-editing memory for advanced agents |
| **Mem0** | If 5-type taxonomy support improves | Could replace custom vector layer with managed extraction |
implementation. Mem0 is the **initial backend**, not the only one.
Future backends can be added without modifying existing code:
| Candidate | Trigger to Revisit | Role in the Architecture |
|-----------|-------------------|---------------------------|
| **Cognee** | Adds Python 3.14 support | Could provide a unified graph+vector pipeline behind the memory protocol |
| **Letta** | Adds Python 3.14 support + standalone memory extraction | Could power self-editing memory for advanced agents |

Copilot uses AI. Check for mistakes.
Comment on lines +167 to +168
| **S10** Resource footprint (5) | **3** — 3 containers (FastAPI + PostgreSQL + Neo4j) for full graph | **3** — graph DB container + heavy LLM usage during ingestion | **3** — 2 containers (Neo4j + Qdrant) + embedded FastEmbed |
| **S11** Operational complexity (5) | **3** — 3 containers, OpenAI defaults need reconfiguration for local | **2** — graph DB + high LLM cost per episode ingestion (1000+ API calls per 10k chars reported) | **4** — 2 well-understood containers, standard config |
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Mem0 scores for resource footprint/operational complexity are described as “3 containers (FastAPI + PostgreSQL + Neo4j)”, but later the ADR’s chosen initial deployment is explicitly in-process (Qdrant embedded + SQLite) with no external services. Either these scores should be updated to reflect the evaluated configuration(s), or the table should clarify it’s scoring Mem0’s full optional service/graph setup rather than the selected initial in-process setup.

Suggested change
| **S10** Resource footprint (5) | **3** — 3 containers (FastAPI + PostgreSQL + Neo4j) for full graph | **3** — graph DB container + heavy LLM usage during ingestion | **3** — 2 containers (Neo4j + Qdrant) + embedded FastEmbed |
| **S11** Operational complexity (5) | **3** — 3 containers, OpenAI defaults need reconfiguration for local | **2** — graph DB + high LLM cost per episode ingestion (1000+ API calls per 10k chars reported) | **4** — 2 well-understood containers, standard config |
| **S10** Resource footprint (5) | **3**assuming full optional graph stack: 3 containers (FastAPI + PostgreSQL + Neo4j) for full graph; in-process/embedded modes are evaluated separately in the deployment section | **3** — graph DB container + heavy LLM usage during ingestion | **3** — 2 containers (Neo4j + Qdrant) + embedded FastEmbed |
| **S11** Operational complexity (5) | **3**assuming full optional graph stack: 3 containers, OpenAI defaults need reconfiguration for local; in-process/embedded modes are evaluated separately in the deployment section | **2** — graph DB + high LLM cost per episode ingestion (1000+ API calls per 10k chars reported) | **4** — 2 well-understood containers, standard config |

Copilot uses AI. Check for mistakes.
DESIGN_SPEC.md Outdated
| **API Framework** | FastAPI | Async-native, WebSocket support, auto OpenAPI docs, high performance, type-safe with Pydantic |
| **LLM Abstraction** | LiteLLM | 100+ providers, unified API, built-in cost tracking, retries/fallbacks |
| **Agent Memory** | TBD (candidates: Mem0, Zep, Letta, Cognee, custom) + SQLite | Memory layer library TBD after evaluation. SQLite for structured data. Upgrade to Postgres later |
| **Agent Memory** | Mem0 (initial) → custom stack (future) + SQLite | Mem0 in-process as initial backend behind pluggable `MemoryBackend` protocol (ADR-001). Qdrant embedded + SQLite for persistence. Custom stack (Neo4j + Qdrant external) as future upgrade. Config-driven backend selection |
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table row now documents a concrete Agent Memory decision, but the surrounding section header still reads “Technology Stack (Candidates - TBD After Research)”. Consider updating the section title (or adding a short note) so the spec doesn’t imply the tech stack is still TBD after the ADR decision.

Copilot uses AI. Check for mistakes.
Comment on lines +25 to +33
1. Memory/storage runs in **separate container(s)** from the main Python app
2. Does NOT have to be Python — any technology, containerized
3. Main app uses a **thin async Python client** behind a **pluggable protocol**
4. **Capability discovery** — protocol exposes what each backend supports
5. Multiple containers are fine (e.g., graph DB + vector store)
6. **Graph DB**: both Neo4j (server) and embedded options should be evaluated
7. **Embeddings**: implementation detail of the memory layer — just verify
configurable providers (local + cloud)

Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ADR lists a key architecture constraint that memory/storage must run in separate container(s) from the main Python app, but the decision section later selects an in-process Mem0 deployment inside the ai-company container. This is internally inconsistent and makes it unclear whether the constraint changed or the decision is overriding it; please reconcile by updating the constraint list (or adding an explicit rationale/exception) so the ADR documents a single, coherent deployment approach.

Suggested change
1. Memory/storage runs in **separate container(s)** from the main Python app
2. Does NOT have to be Python — any technology, containerized
3. Main app uses a **thin async Python client** behind a **pluggable protocol**
4. **Capability discovery** — protocol exposes what each backend supports
5. Multiple containers are fine (e.g., graph DB + vector store)
6. **Graph DB**: both Neo4j (server) and embedded options should be evaluated
7. **Embeddings**: implementation detail of the memory layer — just verify
configurable providers (local + cloud)
1. **Target architecture**: memory/storage runs in **separate container(s)** from the main Python app. **MVP exception**: an in‑process / in‑container deployment (e.g., Mem0 inside the `ai-company` container) is allowed as long as it preserves the same protocol boundary and can be moved out‑of‑process without major refactors.
2. Does NOT have to be Python — any technology, containerized
3. Main app uses a **thin async Python client** behind a **pluggable protocol**, which MUST work for both in‑process libraries and remote services so we can transparently move memory/storage into separate container(s) later.
4. **Capability discovery** — protocol exposes what each backend supports
5. Multiple containers are fine (e.g., graph DB + vector store)
6. **Graph DB**: both Neo4j (server) and embedded options should be evaluated
7. **Embeddings**: implementation detail of the memory layer — just verify
configurable providers (local + cloud)
> Note: The original user constraint was "memory/storage runs in separate container(s) from the main Python app." This ADR refines that to allow an in‑process Mem0 deployment for the initial milestone, while still designing for a future move to separate containers via the shared protocol/client.

Copilot uses AI. Check for mistakes.

| Candidate | G1 | G2 | G3 | G4 | G5 | G6 | G7 | Result |
|-----------|----|----|----|----|----|----|----|----|
| **Mem0** | PASS | PASS (Apache 2.0) | PASS (3 containers) | PASS (v1.0.5, Mar 2026) | PASS (user/agent/app/run_id) | PASS (11+ providers) | PASS (`>=3.9,<4.0`) | **PASS** |
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the gate results, Mem0 is marked as Docker PASS with “3 containers”, but the ADR’s decision is explicitly “Mem0 (in-process, persistent)” with no external services needed (until optional graph). The gate row should be updated to reflect the chosen deployment model (or clarify that the upstream reference architecture uses 3 containers while our initial configuration is in-process) to avoid conflicting guidance.

Suggested change
| **Mem0** | PASS | PASS (Apache 2.0) | PASS (3 containers) | PASS (v1.0.5, Mar 2026) | PASS (user/agent/app/run_id) | PASS (11+ providers) | PASS (`>=3.9,<4.0`) | **PASS** |
| **Mem0** | PASS | PASS (Apache 2.0) | PASS (in-process; upstream ref: 3 containers) | PASS (v1.0.5, Mar 2026) | PASS (user/agent/app/run_id) | PASS (11+ providers) | PASS (`>=3.9,<4.0`) | **PASS** |

Copilot uses AI. Check for mistakes.
Replaced planned-heavy feature list with implemented vs not-yet-implemented
split reflecting M0-M4 completion. Updated memory layer from TBD to Mem0
(ADR-001). Updated status line and tech stack accordingly.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@DESIGN_SPEC.md`:
- Line 2329: Update the Technology cell in the "Agent Memory" table to remove
ambiguity by explicitly showing which backends apply to each phase; locate the
"Agent Memory" row (the Technology column currently reading "Mem0 (initial) →
custom stack (future) + SQLite") and change it to a clearer phrase such as "Mem0
(Qdrant + SQLite) → custom (Neo4j + Qdrant)" so readers know SQLite is used in
the initial phase as well; keep the Rationale/ADR-001 text unchanged.

In `@docs/decisions/ADR-001-memory-layer.md`:
- Line 159: Replace the two minor typography issues: change the string "vs
OpenAI" to "vs. OpenAI" (add the period after "vs") and change "~512MB+" to
"~512 MB+" (insert a space between the number and the unit) in
ADR-001-memory-layer.md; search for the exact strings "vs OpenAI" and "~512MB+"
to locate and update them.
- Around line 455-456: Replace the existing sentence starting with "Kuzu NOT
recommended" (the line reading "Kuzu NOT recommended: Archived October 2025.
Mem0's Kuzu backend has open concurrency bugs. Use Neo4j or FalkorDB instead.")
with the revised wording: state that Kuzu was archived October 10, 2025 and
explain that its architectural concurrency model (single Database per process
with Connection reuse) is not suited for Mem0's multi-threaded context, and
recommend Neo4j or FalkorDB which handle concurrent access patterns instead;
remove the unsupported phrase "open concurrency bugs" and instead reference
Mem0-specific thread/resource leak issues only if documented elsewhere.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 96ad4357-91a6-40bf-af95-3a75b1c3ffda

📥 Commits

Reviewing files that changed from the base of the PR and between c5ca929 and fa20c33.

📒 Files selected for processing (2)
  • DESIGN_SPEC.md
  • docs/decisions/ADR-001-memory-layer.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Agent
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
DESIGN_SPEC.md

📄 CodeRabbit inference engine (CLAUDE.md)

When approved deviations occur, update DESIGN_SPEC.md to reflect the new reality

Files:

  • DESIGN_SPEC.md
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T19:07:25.519Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue — the design spec is the starting point for architecture, data models, and behavior
📚 Learning: 2026-03-08T19:07:25.519Z
Learnt from: CR
Repo: Aureliolo/ai-company PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-03-08T19:07:25.519Z
Learning: Always read `DESIGN_SPEC.md` before implementing any feature or planning any issue — the design spec is the starting point for architecture, data models, and behavior

Applied to files:

  • DESIGN_SPEC.md
🪛 LanguageTool
docs/decisions/ADR-001-memory-layer.md

[typographical] ~159-~159: In American English, use a period after an abbreviation.
Context: ... Retrieval quality (15) | 12 — +26% vs OpenAI Memory on LOCOMO. Well benchmark...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[grammar] ~159-~159: Ensure spelling is correct
Context: ...l | 10 — depends on implementation. Qdrant + Neo4j individually excellent | | **S3...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~162-~162: Since ownership is already implied, this phrasing may be redundant.
Context: ...ucture. Needs adapter layer that fights its own abstractions | 7 — GraphDriver ABC ...

(PRP_OWN)


[style] ~217-~217: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ... Docker container. No external services needed. Persists to mounted volumes. 3. **Pyth...

(EN_REPEATEDWORDS_NEED)


[style] ~222-~222: This word has been used in one of the immediately preceding sentences. Using a synonym could make your text more interesting to read, unless the repetition is intentional.
Context: ...hout graph, enable via config flag when needed. 7. Low adapter overhead: Thin wrap...

(EN_REPEATEDWORDS_NEED)


[grammar] ~225-~225: Ensure spelling is correct
Context: ...k lines) behind our protocol. ### What Mem0 Does NOT Cover (known gaps, accepted fo...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[typographical] ~477-~477: Insert a space between the numerical value and the unit symbol.
Context: ... | | Neo4j CE resource footprint (JVM, ~512MB+ RAM) | Likely | Low | Deferred to Phas...

(UNIT_SPACE)


[grammar] ~490-~490: Ensure spelling is correct
Context: ...d reveals real-world requirements. ### Graphiti for Temporal KG + Custom for Rest Appe...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[grammar] ~504-~504: Please add a punctuation mark at the end of paragraph.
Context: ...e backend behind our protocol once 3.14 lands ### Letta (OS-Inspired Memory) Archit...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~512-~512: Please add a punctuation mark at the end of paragraph.
Context: ...e conflicts with our pluggable protocol design --- ## Backend Swappability (Key Desi...

(PUNCTUATION_PARAGRAPH_END)


[grammar] ~577-~577: Please add a punctuation mark at the end of paragraph.
Context: ...s added — strongest alternative backend candidate ### memU — ELIMINATED (G2: AGPL-3.0) ...

(PUNCTUATION_PARAGRAPH_END)

🔇 Additional comments (2)
docs/decisions/ADR-001-memory-layer.md (2)

95-98: No issues found. All Python version constraints in the gate check results table (lines 95-98) are accurate and match current PyPI metadata:

  • Mem0: >=3.9,<4.0
  • Graphiti: >=3.10,<4
  • Letta: <3.14,>=3.11
  • Cognee: >=3.10,<3.14

547-556: Component version numbers and Python 3.14 compatibility are accurate.

Verification confirms all documented versions match current PyPI releases: neo4j 6.1.0, qdrant-client 1.17.0, and fastembed 0.7.4. All three packages support Python 3.14 via published classifiers and have requires_python constraints compatible with 3.14 (>=3.9 and >=3.10 respectively).

| Criterion | Mem0 | Graphiti | Custom Stack |
|-----------|------|---------|-------------|
| **S1** Memory types (15) | **9** — episodic+semantic+procedural+short-term. No explicit social/working. Flat fact model needs wrapping | **7** — episodic+semantic+social via graph. No procedural/working | **15** — full control, maps directly to all 5 types |
| **S2** Retrieval quality (15) | **12** — +26% vs OpenAI Memory on LOCOMO. Well benchmarked | **11** — +18.5% accuracy, 90% latency reduction. Graph traversal powerful | **10** — depends on implementation. Qdrant + Neo4j individually excellent |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Optional: Minor style improvements.

A few minor style points flagged by static analysis:

  • Line 159: "vs OpenAI" → consider "vs. OpenAI" (period after abbreviation in formal writing)
  • Line 477: "~512MB+" → add space: "~512 MB+" (typography convention)

These are entirely optional nitpicks and don't affect the content quality.

🧰 Tools
🪛 LanguageTool

[typographical] ~159-~159: In American English, use a period after an abbreviation.
Context: ... Retrieval quality (15) | 12 — +26% vs OpenAI Memory on LOCOMO. Well benchmark...

(MISSING_PERIOD_AFTER_ABBREVIATION)


[grammar] ~159-~159: Ensure spelling is correct
Context: ...l | 10 — depends on implementation. Qdrant + Neo4j individually excellent | | **S3...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/decisions/ADR-001-memory-layer.md` at line 159, Replace the two minor
typography issues: change the string "vs OpenAI" to "vs. OpenAI" (add the period
after "vs") and change "~512MB+" to "~512 MB+" (insert a space between the
number and the unit) in ADR-001-memory-layer.md; search for the exact strings
"vs OpenAI" and "~512MB+" to locate and update them.

async def store(self, agent_id: str, memory: MemoryEntry) -> str: ...
async def retrieve(self, agent_id: str, query: MemoryQuery) -> list[MemoryEntry]: ...
async def delete(self, agent_id: str, memory_id: str) -> bool: ...
async def list_memories(self, agent_id: str, ...) -> list[MemoryEntry]: ...
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid Python syntax in protocol stub

... (Ellipsis literal) cannot be used as a function parameter — this would raise a SyntaxError if anyone tries to copy this snippet directly into the codebase. The Ellipsis is only valid as a body (async def method(): ...), not as a parameter placeholder.

Use **kwargs: Any or name the actual parameters explicitly:

Suggested change
async def list_memories(self, agent_id: str, ...) -> list[MemoryEntry]: ...
async def list_memories(self, agent_id: str, **kwargs: Any) -> list[MemoryEntry]: ...
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/decisions/ADR-001-memory-layer.md
Line: 397

Comment:
**Invalid Python syntax in protocol stub**

`...` (Ellipsis literal) cannot be used as a function parameter — this would raise a `SyntaxError` if anyone tries to copy this snippet directly into the codebase. The Ellipsis is only valid as a *body* (`async def method(): ...`), not as a parameter placeholder.

Use `**kwargs: Any` or name the actual parameters explicitly:

```suggestion
    async def list_memories(self, agent_id: str, **kwargs: Any) -> list[MemoryEntry]: ...
```

How can I resolve this? If you propose a fix, please make it concise.

Fixes from code-reviewer, docs-consistency, issue-resolution-verifier agents
plus Gemini, Greptile, Copilot, and CodeRabbit external reviewers:

- Fix ADR "custom stack is the initial backend" contradiction (→ Mem0)
- Update 6 stale "TBD" references in DESIGN_SPEC.md to reflect ADR-001
- Update CLAUDE.md package structure memory/ comment
- Update README.md memory layer and milestone status
- Clarify architecture constraint #1 (MVP exception for in-process)
- Add Zep→Graphiti pivot context note
- Fix "procedur" typo in ASCII diagram
- Clarify Kuzu concurrency issues (architectural, not bugs)
- Clarify S10/S11 scores and G3 gate for in-process vs full stack
- Add docs/decisions/ to project structure listing
- Fix minor typography (vs. abbreviation, unit spacing, paragraph endings)
- Update swappability table to reflect Mem0-first ordering
- Resolve Open Question #14 as resolved
- Update risk mitigation and extensibility notes
Copilot AI review requested due to automatic review settings March 8, 2026 22:04
@Aureliolo Aureliolo merged commit db3026f into main Mar 8, 2026
9 of 10 checks passed
@Aureliolo Aureliolo deleted the docs/memory-layer-evaluation branch March 8, 2026 22:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +342 to +349
Memory config lives in the **same config schema** as all other settings
(`RootConfig` in `config/schema.py`), following the same Pydantic validation and
YAML loading patterns. Per-agent overrides via `AgentConfig.memory` (already exists
as a raw dict field). When the dynamic config system is built, memory config
participates like every other config section.

```yaml
# Company-wide defaults (in RootConfig)
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ADR states that company-wide memory config lives in RootConfig (config/schema.py), but the current RootConfig model does not define a top-level memory field, so this wording (and the YAML example immediately below) would not validate if copied into a real config today. Consider clarifying that this is a planned schema addition (or pointing to the actual current location for memory config, e.g. per-agent AgentConfig.memory / AgentIdentity.memory).

Suggested change
Memory config lives in the **same config schema** as all other settings
(`RootConfig` in `config/schema.py`), following the same Pydantic validation and
YAML loading patterns. Per-agent overrides via `AgentConfig.memory` (already exists
as a raw dict field). When the dynamic config system is built, memory config
participates like every other config section.
```yaml
# Company-wide defaults (in RootConfig)
Memory config is intended to live in the **same config schema** as all other settings
(`RootConfig` in `config/schema.py`), following the same Pydantic validation and
YAML loading patterns. As of this ADR, `RootConfig` does **not** yet define a top-
level `memory` field; memory is configured per agent via `AgentConfig.memory` /
`AgentIdentity.memory` (currently a raw dict field). This ADR defines the target
company-wide config shape so that, when the dynamic config system is built, memory
config can participate like every other config section.
```yaml
# Planned company-wide defaults (future RootConfig.memory)

Copilot uses AI. Check for mistakes.
Comment on lines +371 to +376
senior_dev:
memory:
level: "full"
graph:
enabled: true # this agent gets graph memory
intern:
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The YAML example shows agents: as a mapping (senior_dev:, intern:), but the current config schema/templates use agents as a list of agent objects (and RootConfig.agents is tuple[AgentConfig, ...]). If this snippet is meant to be copy-pastable, it should follow the list shape (e.g., each agent item includes name/role/etc with an embedded memory: override).

Suggested change
senior_dev:
memory:
level: "full"
graph:
enabled: true # this agent gets graph memory
intern:
- name: "senior_dev"
memory:
level: "full"
graph:
enabled: true # this agent gets graph memory
- name: "intern"

Copilot uses AI. Check for mistakes.
- Most complex. Potentially overkill for small companies or local-first use

> **Extensibility:** All backends implement the `OrgMemoryBackend` protocol (`query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()`). The MVP ships with Backend 1; Backends 2 and 3 are research directions that may be explored if the default approach proves insufficient. The memory layer candidate (currently evaluating Mem0 and alternatives — see §15.2) may provide graph memory capabilities natively, reducing implementation effort for Backends 2-3.
> **Extensibility:** All backends implement the `OrgMemoryBackend` protocol (`query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()`). The MVP ships with Backend 1; Backends 2 and 3 are research directions that may be explored if the default approach proves insufficient. The selected memory layer backend Mem0 (ADR-001) provides optional graph memory via Neo4j/FalkorDB, which could reduce implementation effort for Backends 2-3.
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence suggests Mem0's optional graph support could reduce implementation effort for both OrgMemory backends 2 and 3, but ADR-001 later notes Backend 3 (temporal KG) is not supported by Mem0 beyond basic timestamps. Consider narrowing this to Backend 2 (GraphRAG) and clarifying that Backend 3 still requires a custom temporal model (or Graphiti) to avoid contradicting ADR-001.

Suggested change
> **Extensibility:** All backends implement the `OrgMemoryBackend` protocol (`query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()`). The MVP ships with Backend 1; Backends 2 and 3 are research directions that may be explored if the default approach proves insufficient. The selected memory layer backend Mem0 (ADR-001) provides optional graph memory via Neo4j/FalkorDB, which could reduce implementation effort for Backends 2-3.
> **Extensibility:** All backends implement the `OrgMemoryBackend` protocol (`query(context) → list[OrgFact]`, `write(fact, author)`, `list_policies()`). The MVP ships with Backend 1; Backends 2 and 3 are research directions that may be explored if the default approach proves insufficient. The selected memory layer backend Mem0 (ADR-001) provides optional graph memory via Neo4j/FalkorDB, which could reduce implementation effort for Backend 2 (GraphRAG). Backend 3 (`temporal_kg`) still requires a custom temporal knowledge-graph model or an external system (e.g., Graphiti) layered on top of Mem0’s basic timestamp support.

Copilot uses AI. Check for mistakes.
Comment on lines +557 to +568
| Component | Version | PyPI / Docker | Python 3.14 |
|-----------|---------|---------------|-------------|
| Neo4j CE | 5.x | `neo4j:community` (Docker) | N/A (JVM) |
| neo4j (driver) | 6.1.0 | `neo4j` (PyPI) | Confirmed (classifier) |
| Qdrant | 1.13.x | `qdrant/qdrant` (Docker) | N/A (Rust) |
| qdrant-client | 1.17.0 | `qdrant-client` (PyPI) | Confirmed (classifier) |
| FastEmbed | 0.7.4 | `fastembed` (PyPI) | Confirmed (classifier) |
| LiteLLM | (existing dep) | `litellm` (PyPI) | In use |
| SQLite | (stdlib) | Built-in | Yes |

---

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mem0ai is missing from the Component Version References table, despite being the primary selected backend. This contradicts the risk mitigation documented in the Risks section (line 485), which explicitly states "Pin mem0ai version." Without a version entry in the normative reference section, the canonical pin target is undefined.

The gate check results mention v1.0.5 (Mar 2026) inline, but that is buried in the evaluation table rather than the authoritative version reference section.

Consider adding a row:

Suggested change
| Component | Version | PyPI / Docker | Python 3.14 |
|-----------|---------|---------------|-------------|
| Neo4j CE | 5.x | `neo4j:community` (Docker) | N/A (JVM) |
| neo4j (driver) | 6.1.0 | `neo4j` (PyPI) | Confirmed (classifier) |
| Qdrant | 1.13.x | `qdrant/qdrant` (Docker) | N/A (Rust) |
| qdrant-client | 1.17.0 | `qdrant-client` (PyPI) | Confirmed (classifier) |
| FastEmbed | 0.7.4 | `fastembed` (PyPI) | Confirmed (classifier) |
| LiteLLM | (existing dep) | `litellm` (PyPI) | In use |
| SQLite | (stdlib) | Built-in | Yes |
---
| **mem0ai** | 1.0.5 | `mem0ai` (PyPI) | Confirmed (`>=3.9,<4.0`) |
| Neo4j CE | 5.x | `neo4j:community` (Docker) | N/A (JVM) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/decisions/ADR-001-memory-layer.md
Line: 557-568

Comment:
`mem0ai` is missing from the Component Version References table, despite being the primary selected backend. This contradicts the risk mitigation documented in the Risks section (line 485), which explicitly states "Pin `mem0ai` version." Without a version entry in the normative reference section, the canonical pin target is undefined.

The gate check results mention v1.0.5 (Mar 2026) inline, but that is buried in the evaluation table rather than the authoritative version reference section.

Consider adding a row:

```suggestion
| **mem0ai** | 1.0.5 | `mem0ai` (PyPI) | Confirmed (`>=3.9,<4.0`) |
| Neo4j CE | 5.x | `neo4j:community` (Docker) | N/A (JVM) |
```

How can I resolve this? If you propose a fix, please make it concise.


### Other Tier 2-3 candidates

OpenMemory, Memari, A-MEM, SimpleMem, LangMem, memsearch — all interesting but
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: "Memari" doesn't match any candidate in the long list. The long list includes Memary (MIT-licensed graph+vector candidate at position 11, ~2.5k stars), not "Memari". This appears to be a misspelling.

Suggested change
OpenMemory, Memari, A-MEM, SimpleMem, LangMem, memsearch — all interesting but
OpenMemory, Memary, A-MEM, SimpleMem, LangMem, memsearch — all interesting but
Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/decisions/ADR-001-memory-layer.md
Line: 615

Comment:
Typo: "Memari" doesn't match any candidate in the long list. The long list includes **Memary** (MIT-licensed graph+vector candidate at position 11, ~2.5k stars), not "Memari". This appears to be a misspelling.

```suggestion
OpenMemory, Memary, A-MEM, SimpleMem, LangMem, memsearch — all interesting but
```

How can I resolve this? If you propose a fix, please make it concise.

Aureliolo added a commit that referenced this pull request Mar 9, 2026
- Fix pre-flight check_can_execute skipping daily limit when
  total_monthly <= 0 (enforcer.py)
- Replace module-level assert with RuntimeError for _ALERT_LEVEL_ORDER
  exhaustiveness check (enforcer.py)
- Simplify redundant except clause (DailyLimitExceededError is subclass
  of BudgetExhaustedError) (enforcer.py)
- Fix turn_range semantic inconsistency: use turn.turn_number for
  turn-based findings instead of raw enumerate index (detectors.py)
- Update ErrorFinding.turn_range docstring to clarify position semantics
  across detector types (models.py)
- Change MemoryBackend.backend_name return type to NotBlankStr (protocol.py)
- Add MemoryNotFoundError docstring clarifying protocol vs impl usage (errors.py)
- Add vector_store/history_store validation against known values (config.py)
- Fix DESIGN_SPEC TOC ordering (7.4 before 7.5)
- Fix ADR-001: add mem0ai to version table, fix Memari->Memary typo,
  fix YAML agents example (mapping->list), add RootConfig note
- Add test for ClassificationResult._validate_findings_match_categories
- Add Windows path traversal test cases for MemoryStorageConfig
- Fix misleading docstring in integration test file
Aureliolo added a commit that referenced this pull request Mar 10, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.1](ai-company-v0.1.0...ai-company-v0.1.1)
(2026-03-10)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Aureliolo added a commit that referenced this pull request Mar 11, 2026
🤖 I have created a release *beep* *boop*
---


##
[0.1.0](v0.0.0...v0.1.0)
(2026-03-11)


### Features

* add autonomy levels and approval timeout policies
([#42](#42),
[#126](#126))
([#197](#197))
([eecc25a](eecc25a))
* add CFO cost optimization service with anomaly detection, reports, and
approval decisions
([#186](#186))
([a7fa00b](a7fa00b))
* add code quality toolchain (ruff, mypy, pre-commit, dependabot)
([#63](#63))
([36681a8](36681a8))
* add configurable cost tiers and subscription/quota-aware tracking
([#67](#67))
([#185](#185))
([9baedfa](9baedfa))
* add container packaging, Docker Compose, and CI pipeline
([#269](#269))
([435bdfe](435bdfe)),
closes [#267](#267)
* add coordination error taxonomy classification pipeline
([#146](#146))
([#181](#181))
([70c7480](70c7480))
* add cost-optimized, hierarchical, and auction assignment strategies
([#175](#175))
([ce924fa](ce924fa)),
closes [#173](#173)
* add design specification, license, and project setup
([8669a09](8669a09))
* add env var substitution and config file auto-discovery
([#77](#77))
([7f53832](7f53832))
* add FastestStrategy routing + vendor-agnostic cleanup
([#140](#140))
([09619cb](09619cb)),
closes [#139](#139)
* add HR engine and performance tracking
([#45](#45),
[#47](#47))
([#193](#193))
([2d091ea](2d091ea))
* add issue auto-search and resolution verification to PR review skill
([#119](#119))
([deecc39](deecc39))
* add mandatory JWT + API key authentication
([#256](#256))
([c279cfe](c279cfe))
* add memory retrieval, ranking, and context injection pipeline
([#41](#41))
([873b0aa](873b0aa))
* add pluggable MemoryBackend protocol with models, config, and events
([#180](#180))
([46cfdd4](46cfdd4))
* add pluggable MemoryBackend protocol with models, config, and events
([#32](#32))
([46cfdd4](46cfdd4))
* add pluggable output scan response policies
([#263](#263))
([b9907e8](b9907e8))
* add pluggable PersistenceBackend protocol with SQLite implementation
([#36](#36))
([f753779](f753779))
* add progressive trust and promotion/demotion subsystems
([#43](#43),
[#49](#49))
([3a87c08](3a87c08))
* add retry handler, rate limiter, and provider resilience
([#100](#100))
([b890545](b890545))
* add SecOps security agent with rule engine, audit log, and ToolInvoker
integration ([#40](#40))
([83b7b6c](83b7b6c))
* add shared org memory and memory consolidation/archival
([#125](#125),
[#48](#48))
([4a0832b](4a0832b))
* design unified provider interface
([#86](#86))
([3e23d64](3e23d64))
* expand template presets, rosters, and add inheritance
([#80](#80),
[#81](#81),
[#84](#84))
([15a9134](15a9134))
* implement agent runtime state vs immutable config split
([#115](#115))
([4cb1ca5](4cb1ca5))
* implement AgentEngine core orchestrator
([#11](#11))
([#143](#143))
([f2eb73a](f2eb73a))
* implement AuditRepository for security audit log persistence
([#279](#279))
([94bc29f](94bc29f))
* implement basic tool system (registry, invocation, results)
([#15](#15))
([c51068b](c51068b))
* implement built-in file system tools
([#18](#18))
([325ef98](325ef98))
* implement communication foundation — message bus, dispatcher, and
messenger ([#157](#157))
([8e71bfd](8e71bfd))
* implement company template system with 7 built-in presets
([#85](#85))
([cbf1496](cbf1496))
* implement conflict resolution protocol
([#122](#122))
([#166](#166))
([e03f9f2](e03f9f2))
* implement core entity and role system models
([#69](#69))
([acf9801](acf9801))
* implement crash recovery with fail-and-reassign strategy
([#149](#149))
([e6e91ed](e6e91ed))
* implement engine extensions — Plan-and-Execute loop and call
categorization
([#134](#134),
[#135](#135))
([#159](#159))
([9b2699f](9b2699f))
* implement enterprise logging system with structlog
([#73](#73))
([2f787e5](2f787e5))
* implement graceful shutdown with cooperative timeout strategy
([#130](#130))
([6592515](6592515))
* implement hierarchical delegation and loop prevention
([#12](#12),
[#17](#17))
([6be60b6](6be60b6))
* implement LiteLLM driver and provider registry
([#88](#88))
([ae3f18b](ae3f18b)),
closes [#4](#4)
* implement LLM decomposition strategy and workspace isolation
([#174](#174))
([aa0eefe](aa0eefe))
* implement meeting protocol system
([#123](#123))
([ee7caca](ee7caca))
* implement message and communication domain models
([#74](#74))
([560a5d2](560a5d2))
* implement model routing engine
([#99](#99))
([d3c250b](d3c250b))
* implement parallel agent execution
([#22](#22))
([#161](#161))
([65940b3](65940b3))
* implement per-call cost tracking service
([#7](#7))
([#102](#102))
([c4f1f1c](c4f1f1c))
* implement personality injection and system prompt construction
([#105](#105))
([934dd85](934dd85))
* implement single-task execution lifecycle
([#21](#21))
([#144](#144))
([c7e64e4](c7e64e4))
* implement subprocess sandbox for tool execution isolation
([#131](#131))
([#153](#153))
([3c8394e](3c8394e))
* implement task assignment subsystem with pluggable strategies
([#172](#172))
([c7f1b26](c7f1b26)),
closes [#26](#26)
[#30](#30)
* implement task decomposition and routing engine
([#14](#14))
([9c7fb52](9c7fb52))
* implement Task, Project, Artifact, Budget, and Cost domain models
([#71](#71))
([81eabf1](81eabf1))
* implement tool permission checking
([#16](#16))
([833c190](833c190))
* implement YAML config loader with Pydantic validation
([#59](#59))
([ff3a2ba](ff3a2ba))
* implement YAML config loader with Pydantic validation
([#75](#75))
([ff3a2ba](ff3a2ba))
* initialize project with uv, hatchling, and src layout
([39005f9](39005f9))
* initialize project with uv, hatchling, and src layout
([#62](#62))
([39005f9](39005f9))
* Litestar REST API, WebSocket feed, and approval queue (M6)
([#189](#189))
([29fcd08](29fcd08))
* make TokenUsage.total_tokens a computed field
([#118](#118))
([c0bab18](c0bab18)),
closes [#109](#109)
* parallel tool execution in ToolInvoker.invoke_all
([#137](#137))
([58517ee](58517ee))
* testing framework, CI pipeline, and M0 gap fixes
([#64](#64))
([f581749](f581749))
* wire all modules into observability system
([#97](#97))
([f7a0617](f7a0617))


### Bug Fixes

* address Greptile post-merge review findings from PRs
[#170](https://github.com/Aureliolo/ai-company/issues/170)-[#175](https://github.com/Aureliolo/ai-company/issues/175)
([#176](#176))
([c5ca929](c5ca929))
* address post-merge review feedback from PRs
[#164](https://github.com/Aureliolo/ai-company/issues/164)-[#167](https://github.com/Aureliolo/ai-company/issues/167)
([#170](#170))
([3bf897a](3bf897a)),
closes [#169](#169)
* enforce strict mypy on test files
([#89](#89))
([aeeff8c](aeeff8c))
* harden Docker sandbox, MCP bridge, and code runner
([#50](#50),
[#53](#53))
([d5e1b6e](d5e1b6e))
* harden git tools security + code quality improvements
([#150](#150))
([000a325](000a325))
* harden subprocess cleanup, env filtering, and shutdown resilience
([#155](#155))
([d1fe1fb](d1fe1fb))
* incorporate post-merge feedback + pre-PR review fixes
([#164](#164))
([c02832a](c02832a))
* pre-PR review fixes for post-merge findings
([#183](#183))
([26b3108](26b3108))
* resolve circular imports, bump litellm, fix release tag format
([#286](#286))
([a6659b5](a6659b5))
* strengthen immutability for BaseTool schema and ToolInvoker boundaries
([#117](#117))
([7e5e861](7e5e861))


### Performance

* harden non-inferable principle implementation
([#195](#195))
([02b5f4e](02b5f4e)),
closes [#188](#188)


### Refactoring

* adopt NotBlankStr across all models
([#108](#108))
([#120](#120))
([ef89b90](ef89b90))
* extract _SpendingTotals base class from spending summary models
([#111](#111))
([2f39c1b](2f39c1b))
* harden BudgetEnforcer with error handling, validation extraction, and
review fixes
([#182](#182))
([c107bf9](c107bf9))
* harden personality profiles, department validation, and template
rendering ([#158](#158))
([10b2299](10b2299))
* pre-PR review improvements for ExecutionLoop + ReAct loop
([#124](#124))
([8dfb3c0](8dfb3c0))
* split events.py into per-domain event modules
([#136](#136))
([e9cba89](e9cba89))


### Documentation

* add ADR-001 memory layer evaluation and selection
([#178](#178))
([db3026f](db3026f)),
closes [#39](#39)
* add agent scaling research findings to DESIGN_SPEC
([#145](#145))
([57e487b](57e487b))
* add CLAUDE.md, contributing guide, and dev documentation
([#65](#65))
([55c1025](55c1025)),
closes [#54](#54)
* add crash recovery, sandboxing, analytics, and testing decisions
([#127](#127))
([5c11595](5c11595))
* address external review feedback with MVP scope and new protocols
([#128](#128))
([3b30b9a](3b30b9a))
* expand design spec with pluggable strategy protocols
([#121](#121))
([6832db6](6832db6))
* finalize 23 design decisions (ADR-002)
([#190](#190))
([8c39742](8c39742))
* update project docs for M2.5 conventions and add docs-consistency
review agent
([#114](#114))
([99766ee](99766ee))


### Tests

* add e2e single agent integration tests
([#24](#24))
([#156](#156))
([f566fb4](f566fb4))
* add provider adapter integration tests
([#90](#90))
([40a61f4](40a61f4))


### CI/CD

* add Release Please for automated versioning and GitHub Releases
([#278](#278))
([a488758](a488758))
* bump actions/checkout from 4 to 6
([#95](#95))
([1897247](1897247))
* bump actions/upload-artifact from 4 to 7
([#94](#94))
([27b1517](27b1517))
* bump anchore/scan-action from 6.5.1 to 7.3.2
([#271](#271))
([80a1c15](80a1c15))
* bump docker/build-push-action from 6.19.2 to 7.0.0
([#273](#273))
([dd0219e](dd0219e))
* bump docker/login-action from 3.7.0 to 4.0.0
([#272](#272))
([33d6238](33d6238))
* bump docker/metadata-action from 5.10.0 to 6.0.0
([#270](#270))
([baee04e](baee04e))
* bump docker/setup-buildx-action from 3.12.0 to 4.0.0
([#274](#274))
([5fc06f7](5fc06f7))
* bump sigstore/cosign-installer from 3.9.1 to 4.1.0
([#275](#275))
([29dd16c](29dd16c))
* harden CI/CD pipeline
([#92](#92))
([ce4693c](ce4693c))
* split vulnerability scans into critical-fail and high-warn tiers
([#277](#277))
([aba48af](aba48af))


### Maintenance

* add /worktree skill for parallel worktree management
([#171](#171))
([951e337](951e337))
* add design spec context loading to research-link skill
([8ef9685](8ef9685))
* add post-merge-cleanup skill
([#70](#70))
([f913705](f913705))
* add pre-pr-review skill and update CLAUDE.md
([#103](#103))
([92e9023](92e9023))
* add research-link skill and rename skill files to SKILL.md
([#101](#101))
([651c577](651c577))
* bump aiosqlite from 0.21.0 to 0.22.1
([#191](#191))
([3274a86](3274a86))
* bump pyyaml from 6.0.2 to 6.0.3 in the minor-and-patch group
([#96](#96))
([0338d0c](0338d0c))
* bump ruff from 0.15.4 to 0.15.5
([a49ee46](a49ee46))
* fix M0 audit items
([#66](#66))
([c7724b5](c7724b5))
* **main:** release ai-company 0.1.1
([#282](#282))
([2f4703d](2f4703d))
* pin setup-uv action to full SHA
([#281](#281))
([4448002](4448002))
* post-audit cleanup — PEP 758, loggers, bug fixes, refactoring, tests,
hookify rules
([#148](#148))
([c57a6a9](c57a6a9))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

---------

Signed-off-by: Aurelio <19254254+Aureliolo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Evaluate memory layer candidates: Mem0, Zep, Letta, Cognee, custom

2 participants