feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction by LucidPaths · Pull Request #4480 · NousResearch/hermes-agent

LucidPaths · 2026-04-01T18:50:57Z

feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction

Summary

Replaces the flat-file memory store with a full SQLite-backed knowledge system featuring FTS5 full-text search, optional embedding-based semantic search, a relationship graph, automatic knowledge extraction from conversations, session memory injection, and a 5-gate consolidation scheduler with tiered lifecycle management. The migration from flat files is automatic and backward-compatible — existing memories are imported on first load with no user action required.

What Changed

New Files

File	Lines	Purpose
`tools/memory_engine.py`	2,010	Core SQLite engine: FTS5 search, embedding index, relationship graph, tiered lifecycle (active → archive → tombstone), importance scoring, budget enforcement, cursor-based pagination
`agent/memory_extractor.py`	428	Automatic knowledge extraction from conversation turns via auxiliary LLM — identifies facts, preferences, procedures, events; classifies and deduplicates before storage
`agent/session_memory.py`	301	Session-scoped memory: injects relevant memories into system prompt at session start, tracks what was surfaced to avoid repetition
`agent/memory_consolidator.py`	266	5-gate consolidation scheduler: staleness check, duplicate merge, importance decay, archive promotion, and purge — runs on configurable intervals
`agent/yake.py`	215	Pure-Python YAKE keyword extraction (no external dependencies) for memory tagging and search enhancement
`MEMORY_V2_PLAN.md`	422	Implementation plan and architecture documentation
`tests/tools/test_memory_engine.py`	448	MemoryEngine unit tests: CRUD, FTS5 search, graph operations, lifecycle transitions, budget enforcement
`tests/agent/test_memory_extractor.py`	347	Extraction pipeline tests: LLM output parsing, deduplication, classification
`tests/agent/test_memory_consolidator.py`	179	Consolidation gate tests: scheduling, staleness, merge behavior
`tests/agent/test_session_memory.py`	136	Session memory injection and retrieval tests
`tests/agent/test_yake.py`	98	YAKE keyword extraction accuracy tests

Modified Files

File	Change
`tools/memory_tool.py`	Rewired to use `MemoryEngine` backend; added `search` action with hybrid (FTS5 + embedding) retrieval; added `graph` action for relationship queries; preserved all existing tool signatures for backward compat
`hermes_state.py`	Added memory engine initialization, session memory hooks, consolidator lifecycle management; state object now carries `memory_engine` and `session_memory` instances
`run_agent.py`	Wired extraction into post-turn hook, session memory into prompt assembly, consolidator into idle/shutdown; added embedding provider resolution for Hermes provider stack
`tests/tools/test_memory_tool.py`	Minor fixture updates for new engine backend

Architecture

┌─────────────────────────────────────────────────┐
│                  run_agent.py                     │
│  (post-turn extraction, session inject, idle      │
│   consolidation)                                  │
├──────────┬──────────────┬───────────────┬────────┤
│ Extractor│ SessionMemory│ Consolidator  │  YAKE  │
├──────────┴──────────────┴───────────────┴────────┤
│              tools/memory_engine.py                │
│  ┌──────────┬────────────┬──────────┬──────────┐ │
│  │ SQLite   │ FTS5 Index │Embeddings│  Graph   │ │
│  │(memories)│(full-text) │(semantic)│(relations)│ │
│  └──────────┴────────────┴──────────┴──────────┘ │
│              Tiered Lifecycle                      │
│         active → archive → tombstone               │
└─────────────────────────────────────────────────┘

For full architecture details, see MEMORY_V2_PLAN.md.

Key Features

SQLite + FTS5 full-text search — sub-millisecond keyword search across all memories, no external services
Optional embedding-based semantic search — hybrid retrieval combining FTS5 BM25 scores with cosine similarity; adapts to Hermes provider stack (local Ollama, OpenAI, Anthropic)
Relationship graph — memories can link to each other with typed edges (related_to, contradicts, supersedes, part_of); queryable via memory graph tool action
Automatic knowledge extraction — auxiliary LLM extracts facts, preferences, procedures, and events from conversation turns; deduplicates against existing knowledge
Session memory injection — relevant memories auto-injected into system prompt at session start based on context similarity
5-gate consolidation — background scheduler handles staleness detection, duplicate merging, importance decay, archive promotion, and tombstone purge
Tiered lifecycle — memories progress through active → archive → tombstone states with configurable retention budgets
Importance scoring — memories scored by access frequency, recency, and explicit user signals; budget enforcement evicts lowest-importance entries first
Pure-Python YAKE keywords — zero-dependency keyword extraction for automatic tagging
Cursor-based pagination — efficient traversal of large memory sets
Flat-file migration — existing V1 memories automatically imported on first engine initialization

Breaking Changes

None. The flat-file to SQLite migration is automatic and backward-compatible:

On first load, MemoryEngine detects existing flat-file memories and imports them into SQLite
All existing memory_tool.py actions (store, recall, list, delete) continue to work with the same signatures
New actions (search, graph) are additive
If the SQLite DB is deleted, flat files still serve as a fallback source for re-import

Testing

Test File	Tests	Lines
`tests/tools/test_memory_engine.py`	Core engine CRUD, FTS5, graph, lifecycle	448
`tests/agent/test_memory_extractor.py`	Extraction pipeline, parsing, dedup	347
`tests/agent/test_memory_consolidator.py`	5-gate scheduling, merge, decay	179
`tests/agent/test_session_memory.py`	Session injection, retrieval	136
`tests/agent/test_yake.py`	Keyword extraction accuracy	98
`tests/tools/test_memory_tool.py`	Tool interface compat (updated)	10±
Total		~1,218

Provenance

This implementation draws from multiple sources:

HiveMind / MAGMA — The tiered lifecycle model (active → archive → tombstone), importance scoring with decay, and relationship graph patterns are inspired by memory.rs in the MAGMA framework
Claude Code — The autoDream consolidation pattern, automatic extraction from conversation turns, and session memory injection patterns originate from Claude Code's memory system
Original work — SQLite FTS5 hybrid search engine, YAKE keyword extraction integration, Hermes provider stack adaptation for embeddings, budget enforcement with cursor pagination, and the 5-gate consolidation scheduler are original to this implementation

Commit Log

3406abcf wire Ollama cloud: nemotron-3-super fallback, ministral-3:3b auxiliary, OLLAMA_API_KEY resolution
57085719 memory v2: close all gaps — local embeddings, LLM reranker, procedures, events
d9313997 memory v2: lifecycle hardening — importance scoring, budget enforcement, purge, cursor persistence
6d1f78fe fix(memory): activate by default + adapt embeddings to Hermes provider stack
27b53b25 feat(memory): embeddings pipeline, hybrid search, graph tools, full wiring
1bf81da0 feat(memory): complete memory system — YAKE, graph, chunking, classification, session memory, extraction hardening
3752e1dd feat(memory): add memory consolidation with 5-gate scheduling
a443d1d4 feat(memory): add automatic memory extraction via auxiliary LLM
1da2b76f feat(memory): wire MemoryEngine into MemoryStore + add search action + type taxonomy
0d8c7637 feat(memory): add MemoryEngine — SQLite-backed memory with FTS5 search and tiered lifecycle
d66132e2 docs: memory system v2 implementation plan
83b107df feat(prompt): extensible system prompt + credential redaction (local governance)

Stats

 MEMORY_V2_PLAN.md                       |  422 +++++++
 agent/memory_consolidator.py            |  266 ++++
 agent/memory_extractor.py               |  428 +++++++
 agent/session_memory.py                 |  301 +++++
 agent/yake.py                           |  215 ++++
 hermes_state.py                         |  404 +++++--
 run_agent.py                            | 1352 ++++++++++++++++++---
 tests/agent/test_memory_consolidator.py |  179 +++
 tests/agent/test_memory_extractor.py    |  347 ++++++
 tests/agent/test_session_memory.py      |  136 +++
 tests/agent/test_yake.py               |   98 ++
 tests/tools/test_memory_engine.py       |  448 +++++++
 tests/tools/test_memory_tool.py         |   10 +-
 tools/memory_engine.py                  | 2010 +++++++++++++++++++++++++++++++
 tools/memory_tool.py                    |  461 ++++---
 15 files changed, 6685 insertions(+), 392 deletions(-)

Net delta: +6,293 lines

…governance) Port of local governance features to upstream v0.5.0 base: - load_rules(): ~/.hermes/rules/*.md injection into system prompt - load_samples(): ~/.hermes/samples/*.md behavioral examples - load_working_state(): ~/.hermes/working_state.md cross-session context - atexit snapshot of working_state.md to checkpoints/ - Credential redaction in gateway and cron delivery Dropped: lifecycle hook wiring (superseded by upstream NousResearch#3542)

Cannibalized from: - HiveMind (SQLite schema, hybrid search, tiers, YAKE, lifecycle, decay) - Claude Code leaked (auto-extraction, autoDream, type taxonomy, relevance selection) 6-phase plan: schema -> tool upgrade -> search -> prompt integration -> extraction -> consolidation ~1,300 new lines + ~180 modified across 5 new files and 4 modified files

…h and tiered lifecycle Phase 1 of memory system v2. New MemoryEngine class provides: - SQLite storage with WAL mode for concurrent access - FTS5 full-text search with BM25 ranking - Hybrid search: BM25 * recency_decay * strength * tier_weight * type_boost - 5 memory types: general, preference, correction, project, reference - 4 memory tiers: active, archived, consolidated, superseded - Power-law recency decay (from HiveMind) - Logarithmic strength reinforcement (from HiveMind) - Automatic stale archival (90 days + low strength) - Supersession tracking (newer memory replaces older) - Frozen snapshot pattern for prompt cache stability - Migration from flat MEMORY.md/USER.md files - Memory manifest for extraction dedup (from Claude Code) - Exact-match duplicate detection - Type-tagged prompt formatting 38 new tests, all passing. Full suite: 7360 passed. Zero new dependencies (sqlite3 + uuid are stdlib).

…+ type taxonomy Phase 2+3+4 of memory system v2: MemoryStore compatibility layer: - When engine='sqlite' (default): delegates to MemoryEngine (SQLite + FTS5) - When engine='flat': falls back to legacy MEMORY.md flat files - Automatic migration from flat files on first SQLite run - MemoryEngine init failure gracefully falls back to flat mode Memory tool upgrades: - New 'search' action with search_query parameter (FTS5 + hybrid scoring) - New 'type' parameter: preference/correction/project/reference/general - Search reinforces accessed memories (strength increases on hit) - All existing actions (add/replace/remove) work identically Config additions (memory section): - engine: 'sqlite' or 'flat' - auto_extract: false (Phase 5 placeholder) - extract_interval: 3 - consolidation_enabled: false (Phase 6 placeholder) run_agent.py: - Creates MemoryEngine when engine='sqlite', passes to MemoryStore - Forwards type and search_query params in tool dispatch All 32 existing memory tests pass unchanged (backward compat verified). Full suite: 7360 passed, 0 failures.

Phase 5 of memory system v2. Post-response hook that extracts durable memories using a lightweight auxiliary LLM call (not a full agent fork). Architecture (cannibalized from Claude Code extractMemories): - Runs in background thread after every N responses (extract_interval) - Pre-injects manifest of existing memories to prevent duplicates - Structured JSON output: target, type, content per extracted memory - Processes last 20 messages with 8KB budget - Handles malformed JSON, code fences, empty/NONE responses gracefully - Source tagged as 'extraction' for provenance tracking Config: memory.auto_extract (default: false), memory.extract_interval (default: 3) Requires: engine='sqlite' + auxiliary_client available 13 new tests covering extraction logic, dedup, error handling. Full suite: 7373 passed, 0 failures.

Phase 6 of memory system v2. Periodic memory maintenance system cannibalized from Claude Code (autoDream) and HiveMind. 5-gate scheduling (cheapest first, from Claude Code): 1. Feature enabled? (config check) 2. Time since last consolidation >= threshold (default 24h) 3. Session count since last run >= threshold (default 5) 4. Concurrent lock (via metadata) 5. Auxiliary LLM available Consolidation actions (LLM-directed): - merge: combine duplicate/overlapping memories, supersede originals - update: fix stale content (relative dates, outdated facts) - archive: mark low-value memories for archival - Automatic stale archival (90 days + low strength, from HiveMind) Metadata tracking: last_consolidation timestamp, session counter. Designed to run via Hermes cron: hermes cron create --schedule '0 4 * * *' 11 new tests covering gates, merge/archive actions, metadata updates. Full suite: 7384 passed, 0 failures.

…ication, session memory, extraction hardening The full memory v2.5 implementation. Ported from HiveMind (Rust) and Claude Code (TypeScript): FROM HIVEMIND: - YAKE keyword extraction (agent/yake.py): 5-feature scoring, n-gram candidates, dedup, full stopword list. Direct Rust->Python transliteration. - Cosine similarity: pure Python, handles edge cases (empty/mismatched/zero-norm) - Chunking: 1600 char max, 320 char overlap, line-boundary aware - Topic auto-classification: keyword-based (tech/project/personal word lists) - Graph tables: edges (typed, weighted), entities (name, type, metadata) - Auto-edge creation: keyword overlap -> related_to edges - Graph-augmented search: 1-hop BFS expansion with 0.5x weight boost - Content-hash embedding cache stub (ready for fastembed provider) FROM CLAUDE CODE: - Extraction hardening: cursor tracking (only new messages), mutual exclusion (skip if agent wrote this turn), trailing run stash (coalesced execution) - Session memory (agent/session_memory.py): 9-section structured notes, token+tool_call thresholds, LLM-generated summaries - Type taxonomy depth: per-type when_to_save guidance, WHAT_NOT_TO_SAVE block - Staleness caveats: memories >7d get '(Xd old — verify)' suffix - Trusting Recall: verify files exist before recommending from memory Schema v2: +chunks, +embeddings, +edges, +entities tables with FTS5 triggers. Auto-classification on add(). Auto-keyword extraction on add(). Auto-chunking for content >500 chars. Graph traversal in search results. +1,853 lines across 10 files. 131 memory-specific tests, 7420 total suite.

…iring Final implementation phase: EMBEDDINGS (from HiveMind): - Real embedding generation via litellm (provider-agnostic) - Graceful degradation: no API key -> returns [], falls back to BM25-only - Content-hash caching in embeddings table (skip API for known content) - Background embedding generation (fire-and-forget thread) - numpy-accelerated cosine similarity with pure Python fallback HYBRID SEARCH (from HiveMind formula): - (0.7 * cosine + 0.3 * normalized_bm25) * recency * strength * tier * type - Falls back to BM25-only when no embeddings available - search_by_embedding() for direct vector search NEAR-DUPLICATE UPGRADE: - Cosine > 0.92 rejection (HiveMind threshold) when embeddings available - Exact-match fallback when no embeddings WIRING (run_agent.py): - Session memory update on each turn (token + tool_call thresholds) - Session memory injected into system prompt - Consolidation session counter incremented at session end - All wiring is best-effort (try/except, never breaks agent) GRAPH TOOLS (memory_tool.py): - graph_query action: get_related() and get_edges() with short-ID resolution - entity_track action: track_entity() for entity CRUD - MEMORY_SCHEMA updated with new actions +328 lines. 7425 total tests passing.

…r stack Memory should be ACTIVE, not dormant behind flags. It's MY memory system. - auto_extract: True by default (was False — why implement it and not use it?) - consolidation_enabled: True by default (same) - Embedding provider auto-detection from available API keys: OPENAI_API_KEY -> text-embedding-3-small OPENROUTER_API_KEY -> openrouter/openai/text-embedding-3-small VOYAGE_API_KEY -> voyage/voyage-3-lite No key -> graceful BM25 fallback (still works, just no vectors) - Config: memory.embedding_model for explicit override - All generate_embedding() calls now pass config for model resolution 7425 tests passing.

…nt, purge, cursor persistence Extraction: - Importance scoring (1-10) in extraction prompt, filter threshold >= 5 - Corrections/preferences get +1 importance bonus - Hard cap: max 5 entries per extraction run - Explicit 'do not extract' rules for noise (conversational artifacts, task-specific) - Extractor cursor persisted to SQLite via memory_meta (survives restart) Budget enforcement: - MAX_ACTIVE_MEMORY=50, MAX_ACTIVE_USER=25 hard caps - enforce_budget() archives weakest (lowest strength, oldest) when over cap - Corrections/preferences protected from budget archival (sorted last) - Called after every engine.add() and at session end DB hygiene: - purge_dead() hard-deletes superseded/archived entries >30 days old - Cleans up orphaned chunks, embeddings, edges - Runs at every session end Lifecycle at session end (run_agent.py): - archive_stale() — independent of consolidation now - enforce_budget() — prevent runaway growth - purge_dead() — prevent monotonic DB growth - increment_session_count() — consolidation gating Consolidation tuning: - Session gate: 5 → 3 (matches intermittent usage pattern) - Time gate: 24h → 12h - Prompt includes budget caps and protection priority - Prompt instructs: protect corrections/preferences, archive general first Bug fixes: - FTS5 query crash on apostrophes/quotes (regex tokenizer) - DEDUP_THRESHOLD 5.0 → 8.0 (false supersession prevention) - auto_extract default: False → True - consolidation_enabled default: False → True 7425 tests pass, 9/9 custom verification tests pass.

…s, events Local embeddings (fastembed): - BAAI/bge-small-en-v1.5 via ONNX (384 dims, ~50ms/query) - First in cascade: local → OpenAI → OpenRouter → Voyage - Module-level model cache (_LOCAL_EMBEDDER) - Zero API keys needed — near-duplicate detection, hybrid search, embedding dedup all now ACTIVE by default - _get_or_create_embedding model tracking fixed LLM reranker (Claude Code port): - rerank_with_llm() method on MemoryEngine - Uses auxiliary_client for cheap model reranking - search() accepts optional auxiliary_client parameter - Falls back to score-based ranking when no client Procedures table (HiveMind port): - learn_procedure(name, description, tool_chain) - reinforce_procedure(name, success) — track success/fail counts - get_procedures() — ordered by success rate - find_procedure(name) — LIKE match Events table (HiveMind MAGMA port): - log_event(type, summary, details, session_id) - get_recent_events(type, limit, session_id) - purge_old_events(max_age_days=90) — wired into purge_dead() - Event types: tool_success/failure, memory_write, session_start/end, consolidation, error, milestone Test update: - test_generate_embedding_graceful_failure → test_generate_embedding_works_locally (fastembed means embeddings work without API keys now) 7425 tests pass.

…y, OLLAMA_API_KEY resolution Fallback chain (config.yaml): - nemotron-3-super (120B MoE, 1.8s, top quality) as primary fallback - devstral-2:123b (Mistral coding model) as secondary - Both via Ollama cloud at ollama.com/v1 Auxiliary tasks routed to Ollama cloud: - web_extract, session_search, skills_hub, approval, flush_memories → ministral-3:3b - compression → ministral-3:8b (needs more capability for summarization) - vision, mcp → unchanged (auto) Provider resolution: - OLLAMA_API_KEY added to custom provider API key cascade in auxiliary_client.py - Fallback provider now passes base_url and api_key from config to resolve_provider_client 7425 tests pass.

LucidPaths · 2026-04-01T19:09:43Z

Retracting — needs cleanup before review. Dead code and unexercised subsystems identified. Will resubmit after trimming.

Story 9.1 — access_count + last_hit_at reinforcement: - Add reinforce_entry() to hermes_memory.py as new canonical sibling (Hard Invariant NousResearch#1 extension). Atomically bumps access_count, sets last_hit_at=now, pairs with raw-layer reinforce event. Body bytes unchanged (content-hash stable). - Idempotency via raw-layer scan: same (entry_id, source) pair reported twice doesn't double-bump. Scans today+yesterday JSONL. - Wire verify hook: preflight_verify_helper.py --match hit triggers reinforce_entry() for each cited ID. Fail-open on failure. - read_entries() now returns access_count + last_hit_at fields. - Ranker strength factor (1.0 + 0.1*log(1+access_count)) already wired from Epic 8 Story 8.3 — cross-referenced here. Story 9.2 — Manifest-based dedup in trajectory recorder: - Add build_manifest() to hermes_memory.py: lists up to 50 trajectory entries sorted by last_used_at desc. - Add classify_trajectory_with_manifest(): sends manifest + failure pattern to LLM via hermes_llm.llm_call (Hard Invariant NousResearch#2). Pydantic-gated response (Hard Invariant NousResearch#11). Returns {action: reinforce, id} or {action: new, type, body}. - Dedup prompt matches upstream PR NousResearch#4480 commit a443d1d shape. - LLM calls reinforce_entry() for rematch (reuses 9.1's sibling). - Telemetry: trajectory_outcome: reinforced-existing | new-entry. Story 9.3 — Skill-dream consumes hit-rate signal: - Add build_hit_rate_report(): joins preflight telemetry with verify_citation events, groups by category, computes hit_rate. Gated on ≥ min_fires (default 20) per category. - Add propose_category_weight_nudges(): applies hard thresholds: - hit_rate < 0.15 → nudge_down - hit_rate > 0.5 → nudge_up (cites top-3 by access_count) - hit_rate < 0.05 AND unrelated > 0.6 → domain blind spot All are PROPOSALS only (ADR-2 / FR-14 / Hard Invariant NousResearch#4). Tests: 36 new tests across 2 files (test_reinforce_entry.py: 11, test_manifest_dedup.py: 25). All pass. Full suite: 152 pass, 6 pre-existing failures (unchanged), 0 regressions.

LucidPaths added 14 commits March 31, 2026 13:03

docs: Memory V2 architecture doc and PR description

5f5abf8

docs: rewrite PR description to match upstream template

948faaa

LucidPaths closed this Apr 1, 2026

LucidPaths mentioned this pull request Apr 1, 2026

feat: Memory V2 — SQLite-backed knowledge system with hybrid search, lifecycle, and auto-extraction #4488

Closed

12 tasks

OutThisLife mentioned this pull request Apr 23, 2026

fix(ui-tui): heal post-resize alt-screen drift #14640

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction#4480

feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction#4480
LucidPaths wants to merge 14 commits into
NousResearch:mainfrom
LucidPaths:feat/memory-system-v2

LucidPaths commented Apr 1, 2026

Uh oh!

LucidPaths commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LucidPaths commented Apr 1, 2026

feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction

Summary

What Changed

New Files

Modified Files

Architecture

Key Features

Breaking Changes

Testing

Provenance

Commit Log

Stats

Uh oh!

LucidPaths commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant