Open Source · Local-First · MCP + UI

mnemos

μνῆμος — of memory

Reliable scoped memory for coding agents.

Mnemos keeps project, workspace, and global memory separate, survives restarts in a single local SQLite file, ships a guided mnemos ui control plane for setup and host config, and plugs into Claude Code, Claude Desktop, generic MCP hosts, and documented Codex setups. Biomimetic retrieval and consolidation stay under the hood so memory stays compact and adaptive without extra services.

Current agent memory breaks under scope and drift.

Most agent memory tools either dump every interaction into one growing log or append new facts forever without cleaning up old ones. That works for demos. It breaks when an agent moves between repos, drags stale instructions forward, or has to answer from memory under latency limits.

Mnemos v1 is aimed at a narrower problem: safe scoped memory for solo coding-agent workflows. Project, workspace, and global knowledge stay partitioned. Retrieval stays compact. Consolidation turns raw episodes into durable facts instead of one more append-only transcript.

Common Pattern
Append-Only Memory Layer
  • Mixed project context in one pool
  • Contradictions accumulate over time
  • Retrieval quality depends on transcript volume
  • Operational readiness is hard to inspect
  • Host support is often implied, not verified
  • Easy to demo, harder to trust every day
Mnemos v1 Target
Scoped, Local-First Memory
  • Project, workspace, and global scope boundaries
  • Reconsolidation rewrites stale facts
  • Single-file SQLite persistence with built-in graph edges
  • doctor and mnemos_health expose readiness
  • Claude Code, Claude Desktop, and generic MCP are Tier 1
  • Codex is documented via MCP plus AGENTS.md

Biomimetic internals for practical agent memory.

The product promise is narrow: reliable scoped memory for solo coding-agent workflows. These neuroscience-inspired modules are how Mnemos keeps retrieval compact, adaptive, and less append-only than standard RAG.

MODULE 01

Surprisal Gate

Predictive coding (Active Inference) — the brain only permanently encodes prediction errors, ignoring what it already expects.

Vector databases encode everything equally. A mundane greeting consumes the same conceptual weight and storage as a critical instruction like "My production server is down."

A fast local model runs as a background Prediction Engine, constantly predicting user intent. When input arrives, semantic divergence (cosine distance) is computed between the prediction and actual input. Low divergence is discarded. High divergence is stored with an elevated salience weight — only surprises become long-term memories.

MODULE 02

Mutable RAG

Memory reconsolidation — every time a human recall occurs, the memory enters an unstable labile state and is physically rewritten with current context before restoring.

RAG is append-only. If a user says "I use React" in 2025 and "I'm migrating to Rust" in 2026, standard RAG retrieves both facts, forcing the LLM to waste tokens resolving the contradiction in its context window.

Retrieved memory chunks are flagged as "labile." After each conversational turn, an async background agent evaluates whether the retrieved fact has changed given the new context. If it has, the stored chunk is overwritten — not appended to — with a synthesized, updated chunk. The AI's beliefs naturally drift and adapt without accumulating contradictory junk.

MODULE 03

Affective Router

State-dependent memory (amygdala filter) — the brain retrieves past emotional states when the current state matches, naturally surfacing contextually relevant experiences.

Standard embedding models retrieve based purely on semantic text similarity. A critical, urgent constraint ("PROD IS DOWN") carries the same retrieval weight as a trivial passing comment.

Every interaction is classified on three axes — Valence (−1.0 to 1.0), Arousal (0.0–1.0), and Complexity (0.0–1.0) — and appended as a CognitiveState metadata vector. The retrieval formula blends semantic similarity (70%) with affective state match (30%). A panicked user surfaces past crisis resolutions, not just semantically similar code snippets.

MODULE 04

Sleep Daemon

Hippocampal-neocortical transfer — during deep sleep, the brain replays episodic memories, extracts generalized semantic rules into the neocortex, then prunes the raw episodes to reclaim capacity.

Raw transcripts accumulate fast when every session is kept forever. Scoped stores get noisy, retrieval slows down, and the facts that matter stay buried inside turn-by-turn logs.

A per-scope episodic buffer collects recent interactions. On idle, sleep consolidation extracts durable facts and preferences into long-term storage, preserves the originating scope, and prunes the episodic trace. The result is smaller, cleaner memory without cross-project bleed.

MODULE 05

Spreading Activation

Collins & Loftus associative networks — hearing "server" pre-activates "AWS," "downtime," and "Nginx" in human neural networks, priming associated concepts before they're consciously needed.

Vector search is a discrete point-in-space lookup. It retrieves exact mathematical matches but completely misses the broader associative "train of thought" — unless explicitly queried by name.

When a node is retrieved via vector search, activation energy (1.0) is injected into it and propagates along graph edges to connected nodes, decaying by 20% per hop. The LLM receives the directly retrieved node plus all adjacent nodes above the activation threshold — creating a fluid, moving spotlight of context that delivers human-like associative intuition.

Encode → Retrieve → Consolidate

The MnemosEngine composes all five modules into a coherent pipeline. Each module is independently usable — or let the engine orchestrate the full sequence.

ENCODE Input interaction Surprisal Gate filters mundane Affective Router tags emotion Spreading Activation links graph Store memory RETRIEVE Query user prompt Spreading Activation graph traversal Affective Router re-ranks results Mutable RAG reconsolidates Results to LLM CONSOLIDATE Idle system quiet Sleep Daemon triggers Facts Extracted prefs + patterns Episodes Pruned Knowledge Graph updated

Start local in minutes.

Start with the control plane. It writes canonical config around the built-in SQLite store, applies host setup for Claude Code, Cursor, or Codex, and runs the built-in smoke checks. Use the Python API directly only when you want a manual or embedded integration.

install & guided onboarding
# Install with MCP support
pip install "mnemos-memory[mcp]"

# Launch the control plane
mnemos ui

# In the UI:
# 1. Choose your model provider
# 2. Review the local SQLite memory path
# 3. Apply Claude Code, Cursor, or Codex config
# 4. Run the smoke check and start using Mnemos
python api (advanced / manual)
import asyncio
from mnemos import Interaction, MnemosEngine

async def main():
    engine = MnemosEngine()

    await engine.process(
        Interaction(role="user", content="Use uv for Python tooling in this repo.")
    )

    memories = await engine.retrieve("python tooling", top_k=3)
    for memory in memories:
        print(memory.content)

    await engine.consolidate()

asyncio.run(main())
scoped cli + codex workflow
mnemos-cli doctor

mnemos-cli store "Use uv for Python tooling" \
  --scope project --scope-id repo-alpha

mnemos-cli retrieve "tooling preferences" \
  --current-scope project \
  --scope-id repo-alpha \
  --allowed-scopes project,global

mnemos-cli antigravity codex
# add the generated policy to AGENTS.md
# keep Codex on retrieve -> work -> store -> consolidate

Supported hosts and current release posture.

Tier 1 means real end-to-end validation in a supported workflow. Tier 2 means documented and usable, but not yet promoted to release-blocking support.

Client Tier Status Notes
Claude Code Tier 1 Supported Primary install path via plugin; uses the built-in single-file SQLite store.
Claude Desktop Tier 1 Supported Minimal tested stdio config ships in the repo docs.
Generic MCP host Tier 1 Supported Verified against the live stdio server, not just static config.
Codex Tier 2 Documented MCP plus AGENTS.md setup is documented; promotion needs verified daily-use E2E validation.
Cursor Tier 2 Best effort Config and antigravity docs exist, but support is not release-blocking.
Windsurf Tier 2 Best effort Config is documented, but not yet part of the release-blocking validation set.

Mnemos is much closer to a public open-source release now, but the v1 claim gate still requires external pilot evidence: at least two pilot users or projects, two weeks of daily use, and zero blocker incidents for scope leakage, startup failure, or data corruption.