Skip to content

feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction#4480

Closed
LucidPaths wants to merge 14 commits into
NousResearch:mainfrom
LucidPaths:feat/memory-system-v2
Closed

feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction#4480
LucidPaths wants to merge 14 commits into
NousResearch:mainfrom
LucidPaths:feat/memory-system-v2

Conversation

@LucidPaths

Copy link
Copy Markdown
Contributor

feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction

Summary

Replaces the flat-file memory store with a full SQLite-backed knowledge system featuring FTS5 full-text search, optional embedding-based semantic search, a relationship graph, automatic knowledge extraction from conversations, session memory injection, and a 5-gate consolidation scheduler with tiered lifecycle management. The migration from flat files is automatic and backward-compatible — existing memories are imported on first load with no user action required.


What Changed

New Files

File Lines Purpose
tools/memory_engine.py 2,010 Core SQLite engine: FTS5 search, embedding index, relationship graph, tiered lifecycle (active → archive → tombstone), importance scoring, budget enforcement, cursor-based pagination
agent/memory_extractor.py 428 Automatic knowledge extraction from conversation turns via auxiliary LLM — identifies facts, preferences, procedures, events; classifies and deduplicates before storage
agent/session_memory.py 301 Session-scoped memory: injects relevant memories into system prompt at session start, tracks what was surfaced to avoid repetition
agent/memory_consolidator.py 266 5-gate consolidation scheduler: staleness check, duplicate merge, importance decay, archive promotion, and purge — runs on configurable intervals
agent/yake.py 215 Pure-Python YAKE keyword extraction (no external dependencies) for memory tagging and search enhancement
MEMORY_V2_PLAN.md 422 Implementation plan and architecture documentation
tests/tools/test_memory_engine.py 448 MemoryEngine unit tests: CRUD, FTS5 search, graph operations, lifecycle transitions, budget enforcement
tests/agent/test_memory_extractor.py 347 Extraction pipeline tests: LLM output parsing, deduplication, classification
tests/agent/test_memory_consolidator.py 179 Consolidation gate tests: scheduling, staleness, merge behavior
tests/agent/test_session_memory.py 136 Session memory injection and retrieval tests
tests/agent/test_yake.py 98 YAKE keyword extraction accuracy tests

Modified Files

File Change
tools/memory_tool.py Rewired to use MemoryEngine backend; added search action with hybrid (FTS5 + embedding) retrieval; added graph action for relationship queries; preserved all existing tool signatures for backward compat
hermes_state.py Added memory engine initialization, session memory hooks, consolidator lifecycle management; state object now carries memory_engine and session_memory instances
run_agent.py Wired extraction into post-turn hook, session memory into prompt assembly, consolidator into idle/shutdown; added embedding provider resolution for Hermes provider stack
tests/tools/test_memory_tool.py Minor fixture updates for new engine backend

Architecture

┌─────────────────────────────────────────────────┐
│                  run_agent.py                     │
│  (post-turn extraction, session inject, idle      │
│   consolidation)                                  │
├──────────┬──────────────┬───────────────┬────────┤
│ Extractor│ SessionMemory│ Consolidator  │  YAKE  │
├──────────┴──────────────┴───────────────┴────────┤
│              tools/memory_engine.py                │
│  ┌──────────┬────────────┬──────────┬──────────┐ │
│  │ SQLite   │ FTS5 Index │Embeddings│  Graph   │ │
│  │(memories)│(full-text) │(semantic)│(relations)│ │
│  └──────────┴────────────┴──────────┴──────────┘ │
│              Tiered Lifecycle                      │
│         active → archive → tombstone               │
└─────────────────────────────────────────────────┘

For full architecture details, see MEMORY_V2_PLAN.md.


Key Features

  • SQLite + FTS5 full-text search — sub-millisecond keyword search across all memories, no external services
  • Optional embedding-based semantic search — hybrid retrieval combining FTS5 BM25 scores with cosine similarity; adapts to Hermes provider stack (local Ollama, OpenAI, Anthropic)
  • Relationship graph — memories can link to each other with typed edges (related_to, contradicts, supersedes, part_of); queryable via memory graph tool action
  • Automatic knowledge extraction — auxiliary LLM extracts facts, preferences, procedures, and events from conversation turns; deduplicates against existing knowledge
  • Session memory injection — relevant memories auto-injected into system prompt at session start based on context similarity
  • 5-gate consolidation — background scheduler handles staleness detection, duplicate merging, importance decay, archive promotion, and tombstone purge
  • Tiered lifecycle — memories progress through active → archive → tombstone states with configurable retention budgets
  • Importance scoring — memories scored by access frequency, recency, and explicit user signals; budget enforcement evicts lowest-importance entries first
  • Pure-Python YAKE keywords — zero-dependency keyword extraction for automatic tagging
  • Cursor-based pagination — efficient traversal of large memory sets
  • Flat-file migration — existing V1 memories automatically imported on first engine initialization

Breaking Changes

None. The flat-file to SQLite migration is automatic and backward-compatible:

  • On first load, MemoryEngine detects existing flat-file memories and imports them into SQLite
  • All existing memory_tool.py actions (store, recall, list, delete) continue to work with the same signatures
  • New actions (search, graph) are additive
  • If the SQLite DB is deleted, flat files still serve as a fallback source for re-import

Testing

Test File Tests Lines
tests/tools/test_memory_engine.py Core engine CRUD, FTS5, graph, lifecycle 448
tests/agent/test_memory_extractor.py Extraction pipeline, parsing, dedup 347
tests/agent/test_memory_consolidator.py 5-gate scheduling, merge, decay 179
tests/agent/test_session_memory.py Session injection, retrieval 136
tests/agent/test_yake.py Keyword extraction accuracy 98
tests/tools/test_memory_tool.py Tool interface compat (updated) 10±
Total ~1,218

Provenance

This implementation draws from multiple sources:

  • HiveMind / MAGMA — The tiered lifecycle model (active → archive → tombstone), importance scoring with decay, and relationship graph patterns are inspired by memory.rs in the MAGMA framework
  • Claude Code — The autoDream consolidation pattern, automatic extraction from conversation turns, and session memory injection patterns originate from Claude Code's memory system
  • Original work — SQLite FTS5 hybrid search engine, YAKE keyword extraction integration, Hermes provider stack adaptation for embeddings, budget enforcement with cursor pagination, and the 5-gate consolidation scheduler are original to this implementation

Commit Log

3406abcf wire Ollama cloud: nemotron-3-super fallback, ministral-3:3b auxiliary, OLLAMA_API_KEY resolution
57085719 memory v2: close all gaps — local embeddings, LLM reranker, procedures, events
d9313997 memory v2: lifecycle hardening — importance scoring, budget enforcement, purge, cursor persistence
6d1f78fe fix(memory): activate by default + adapt embeddings to Hermes provider stack
27b53b25 feat(memory): embeddings pipeline, hybrid search, graph tools, full wiring
1bf81da0 feat(memory): complete memory system — YAKE, graph, chunking, classification, session memory, extraction hardening
3752e1dd feat(memory): add memory consolidation with 5-gate scheduling
a443d1d4 feat(memory): add automatic memory extraction via auxiliary LLM
1da2b76f feat(memory): wire MemoryEngine into MemoryStore + add search action + type taxonomy
0d8c7637 feat(memory): add MemoryEngine — SQLite-backed memory with FTS5 search and tiered lifecycle
d66132e2 docs: memory system v2 implementation plan
83b107df feat(prompt): extensible system prompt + credential redaction (local governance)

Stats

 MEMORY_V2_PLAN.md                       |  422 +++++++
 agent/memory_consolidator.py            |  266 ++++
 agent/memory_extractor.py               |  428 +++++++
 agent/session_memory.py                 |  301 +++++
 agent/yake.py                           |  215 ++++
 hermes_state.py                         |  404 +++++--
 run_agent.py                            | 1352 ++++++++++++++++++---
 tests/agent/test_memory_consolidator.py |  179 +++
 tests/agent/test_memory_extractor.py    |  347 ++++++
 tests/agent/test_session_memory.py      |  136 +++
 tests/agent/test_yake.py               |   98 ++
 tests/tools/test_memory_engine.py       |  448 +++++++
 tests/tools/test_memory_tool.py         |   10 +-
 tools/memory_engine.py                  | 2010 +++++++++++++++++++++++++++++++
 tools/memory_tool.py                    |  461 ++++---
 15 files changed, 6685 insertions(+), 392 deletions(-)

Net delta: +6,293 lines

LucidPaths added 14 commits March 31, 2026 13:03
…governance)

Port of local governance features to upstream v0.5.0 base:
- load_rules(): ~/.hermes/rules/*.md injection into system prompt
- load_samples(): ~/.hermes/samples/*.md behavioral examples
- load_working_state(): ~/.hermes/working_state.md cross-session context
- atexit snapshot of working_state.md to checkpoints/
- Credential redaction in gateway and cron delivery

Dropped: lifecycle hook wiring (superseded by upstream NousResearch#3542)
Cannibalized from:
- HiveMind (SQLite schema, hybrid search, tiers, YAKE, lifecycle, decay)
- Claude Code leaked (auto-extraction, autoDream, type taxonomy, relevance selection)

6-phase plan: schema -> tool upgrade -> search -> prompt integration -> extraction -> consolidation
~1,300 new lines + ~180 modified across 5 new files and 4 modified files
…h and tiered lifecycle

Phase 1 of memory system v2. New MemoryEngine class provides:
- SQLite storage with WAL mode for concurrent access
- FTS5 full-text search with BM25 ranking
- Hybrid search: BM25 * recency_decay * strength * tier_weight * type_boost
- 5 memory types: general, preference, correction, project, reference
- 4 memory tiers: active, archived, consolidated, superseded
- Power-law recency decay (from HiveMind)
- Logarithmic strength reinforcement (from HiveMind)
- Automatic stale archival (90 days + low strength)
- Supersession tracking (newer memory replaces older)
- Frozen snapshot pattern for prompt cache stability
- Migration from flat MEMORY.md/USER.md files
- Memory manifest for extraction dedup (from Claude Code)
- Exact-match duplicate detection
- Type-tagged prompt formatting

38 new tests, all passing. Full suite: 7360 passed.
Zero new dependencies (sqlite3 + uuid are stdlib).
…+ type taxonomy

Phase 2+3+4 of memory system v2:

MemoryStore compatibility layer:
- When engine='sqlite' (default): delegates to MemoryEngine (SQLite + FTS5)
- When engine='flat': falls back to legacy MEMORY.md flat files
- Automatic migration from flat files on first SQLite run
- MemoryEngine init failure gracefully falls back to flat mode

Memory tool upgrades:
- New 'search' action with search_query parameter (FTS5 + hybrid scoring)
- New 'type' parameter: preference/correction/project/reference/general
- Search reinforces accessed memories (strength increases on hit)
- All existing actions (add/replace/remove) work identically

Config additions (memory section):
- engine: 'sqlite' or 'flat'
- auto_extract: false (Phase 5 placeholder)
- extract_interval: 3
- consolidation_enabled: false (Phase 6 placeholder)

run_agent.py:
- Creates MemoryEngine when engine='sqlite', passes to MemoryStore
- Forwards type and search_query params in tool dispatch

All 32 existing memory tests pass unchanged (backward compat verified).
Full suite: 7360 passed, 0 failures.
Phase 5 of memory system v2. Post-response hook that extracts durable
memories using a lightweight auxiliary LLM call (not a full agent fork).

Architecture (cannibalized from Claude Code extractMemories):
- Runs in background thread after every N responses (extract_interval)
- Pre-injects manifest of existing memories to prevent duplicates
- Structured JSON output: target, type, content per extracted memory
- Processes last 20 messages with 8KB budget
- Handles malformed JSON, code fences, empty/NONE responses gracefully
- Source tagged as 'extraction' for provenance tracking

Config: memory.auto_extract (default: false), memory.extract_interval (default: 3)
Requires: engine='sqlite' + auxiliary_client available

13 new tests covering extraction logic, dedup, error handling.
Full suite: 7373 passed, 0 failures.
Phase 6 of memory system v2. Periodic memory maintenance system
cannibalized from Claude Code (autoDream) and HiveMind.

5-gate scheduling (cheapest first, from Claude Code):
1. Feature enabled? (config check)
2. Time since last consolidation >= threshold (default 24h)
3. Session count since last run >= threshold (default 5)
4. Concurrent lock (via metadata)
5. Auxiliary LLM available

Consolidation actions (LLM-directed):
- merge: combine duplicate/overlapping memories, supersede originals
- update: fix stale content (relative dates, outdated facts)
- archive: mark low-value memories for archival
- Automatic stale archival (90 days + low strength, from HiveMind)

Metadata tracking: last_consolidation timestamp, session counter.
Designed to run via Hermes cron: hermes cron create --schedule '0 4 * * *'

11 new tests covering gates, merge/archive actions, metadata updates.
Full suite: 7384 passed, 0 failures.
…ication, session memory, extraction hardening

The full memory v2.5 implementation. Ported from HiveMind (Rust) and Claude Code (TypeScript):

FROM HIVEMIND:
- YAKE keyword extraction (agent/yake.py): 5-feature scoring, n-gram candidates,
  dedup, full stopword list. Direct Rust->Python transliteration.
- Cosine similarity: pure Python, handles edge cases (empty/mismatched/zero-norm)
- Chunking: 1600 char max, 320 char overlap, line-boundary aware
- Topic auto-classification: keyword-based (tech/project/personal word lists)
- Graph tables: edges (typed, weighted), entities (name, type, metadata)
- Auto-edge creation: keyword overlap -> related_to edges
- Graph-augmented search: 1-hop BFS expansion with 0.5x weight boost
- Content-hash embedding cache stub (ready for fastembed provider)

FROM CLAUDE CODE:
- Extraction hardening: cursor tracking (only new messages), mutual exclusion
  (skip if agent wrote this turn), trailing run stash (coalesced execution)
- Session memory (agent/session_memory.py): 9-section structured notes,
  token+tool_call thresholds, LLM-generated summaries
- Type taxonomy depth: per-type when_to_save guidance, WHAT_NOT_TO_SAVE block
- Staleness caveats: memories >7d get '(Xd old — verify)' suffix
- Trusting Recall: verify files exist before recommending from memory

Schema v2: +chunks, +embeddings, +edges, +entities tables with FTS5 triggers.
Auto-classification on add(). Auto-keyword extraction on add(). Auto-chunking
for content >500 chars. Graph traversal in search results.

+1,853 lines across 10 files. 131 memory-specific tests, 7420 total suite.
…iring

Final implementation phase:

EMBEDDINGS (from HiveMind):
- Real embedding generation via litellm (provider-agnostic)
- Graceful degradation: no API key -> returns [], falls back to BM25-only
- Content-hash caching in embeddings table (skip API for known content)
- Background embedding generation (fire-and-forget thread)
- numpy-accelerated cosine similarity with pure Python fallback

HYBRID SEARCH (from HiveMind formula):
- (0.7 * cosine + 0.3 * normalized_bm25) * recency * strength * tier * type
- Falls back to BM25-only when no embeddings available
- search_by_embedding() for direct vector search

NEAR-DUPLICATE UPGRADE:
- Cosine > 0.92 rejection (HiveMind threshold) when embeddings available
- Exact-match fallback when no embeddings

WIRING (run_agent.py):
- Session memory update on each turn (token + tool_call thresholds)
- Session memory injected into system prompt
- Consolidation session counter incremented at session end
- All wiring is best-effort (try/except, never breaks agent)

GRAPH TOOLS (memory_tool.py):
- graph_query action: get_related() and get_edges() with short-ID resolution
- entity_track action: track_entity() for entity CRUD
- MEMORY_SCHEMA updated with new actions

+328 lines. 7425 total tests passing.
…r stack

Memory should be ACTIVE, not dormant behind flags. It's MY memory system.

- auto_extract: True by default (was False — why implement it and not use it?)
- consolidation_enabled: True by default (same)
- Embedding provider auto-detection from available API keys:
  OPENAI_API_KEY -> text-embedding-3-small
  OPENROUTER_API_KEY -> openrouter/openai/text-embedding-3-small
  VOYAGE_API_KEY -> voyage/voyage-3-lite
  No key -> graceful BM25 fallback (still works, just no vectors)
- Config: memory.embedding_model for explicit override
- All generate_embedding() calls now pass config for model resolution

7425 tests passing.
…nt, purge, cursor persistence

Extraction:
- Importance scoring (1-10) in extraction prompt, filter threshold >= 5
- Corrections/preferences get +1 importance bonus
- Hard cap: max 5 entries per extraction run
- Explicit 'do not extract' rules for noise (conversational artifacts, task-specific)
- Extractor cursor persisted to SQLite via memory_meta (survives restart)

Budget enforcement:
- MAX_ACTIVE_MEMORY=50, MAX_ACTIVE_USER=25 hard caps
- enforce_budget() archives weakest (lowest strength, oldest) when over cap
- Corrections/preferences protected from budget archival (sorted last)
- Called after every engine.add() and at session end

DB hygiene:
- purge_dead() hard-deletes superseded/archived entries >30 days old
- Cleans up orphaned chunks, embeddings, edges
- Runs at every session end

Lifecycle at session end (run_agent.py):
- archive_stale() — independent of consolidation now
- enforce_budget() — prevent runaway growth
- purge_dead() — prevent monotonic DB growth
- increment_session_count() — consolidation gating

Consolidation tuning:
- Session gate: 5 → 3 (matches intermittent usage pattern)
- Time gate: 24h → 12h
- Prompt includes budget caps and protection priority
- Prompt instructs: protect corrections/preferences, archive general first

Bug fixes:
- FTS5 query crash on apostrophes/quotes (regex tokenizer)
- DEDUP_THRESHOLD 5.0 → 8.0 (false supersession prevention)
- auto_extract default: False → True
- consolidation_enabled default: False → True

7425 tests pass, 9/9 custom verification tests pass.
…s, events

Local embeddings (fastembed):
- BAAI/bge-small-en-v1.5 via ONNX (384 dims, ~50ms/query)
- First in cascade: local → OpenAI → OpenRouter → Voyage
- Module-level model cache (_LOCAL_EMBEDDER)
- Zero API keys needed — near-duplicate detection, hybrid search,
  embedding dedup all now ACTIVE by default
- _get_or_create_embedding model tracking fixed

LLM reranker (Claude Code port):
- rerank_with_llm() method on MemoryEngine
- Uses auxiliary_client for cheap model reranking
- search() accepts optional auxiliary_client parameter
- Falls back to score-based ranking when no client

Procedures table (HiveMind port):
- learn_procedure(name, description, tool_chain)
- reinforce_procedure(name, success) — track success/fail counts
- get_procedures() — ordered by success rate
- find_procedure(name) — LIKE match

Events table (HiveMind MAGMA port):
- log_event(type, summary, details, session_id)
- get_recent_events(type, limit, session_id)
- purge_old_events(max_age_days=90) — wired into purge_dead()
- Event types: tool_success/failure, memory_write, session_start/end,
  consolidation, error, milestone

Test update:
- test_generate_embedding_graceful_failure → test_generate_embedding_works_locally
  (fastembed means embeddings work without API keys now)

7425 tests pass.
…y, OLLAMA_API_KEY resolution

Fallback chain (config.yaml):
- nemotron-3-super (120B MoE, 1.8s, top quality) as primary fallback
- devstral-2:123b (Mistral coding model) as secondary
- Both via Ollama cloud at ollama.com/v1

Auxiliary tasks routed to Ollama cloud:
- web_extract, session_search, skills_hub, approval, flush_memories → ministral-3:3b
- compression → ministral-3:8b (needs more capability for summarization)
- vision, mcp → unchanged (auto)

Provider resolution:
- OLLAMA_API_KEY added to custom provider API key cascade in auxiliary_client.py
- Fallback provider now passes base_url and api_key from config to resolve_provider_client

7425 tests pass.
@LucidPaths

Copy link
Copy Markdown
Contributor Author

Retracting — needs cleanup before review. Dead code and unexercised subsystems identified. Will resubmit after trimming.

@LucidPaths LucidPaths closed this Apr 1, 2026
marozau pushed a commit to marozau/hermes-agent that referenced this pull request Jun 8, 2026
Story 9.1 — access_count + last_hit_at reinforcement:
- Add reinforce_entry() to hermes_memory.py as new canonical sibling
  (Hard Invariant NousResearch#1 extension). Atomically bumps access_count,
  sets last_hit_at=now, pairs with raw-layer reinforce event.
  Body bytes unchanged (content-hash stable).
- Idempotency via raw-layer scan: same (entry_id, source) pair
  reported twice doesn't double-bump. Scans today+yesterday JSONL.
- Wire verify hook: preflight_verify_helper.py --match hit triggers
  reinforce_entry() for each cited ID. Fail-open on failure.
- read_entries() now returns access_count + last_hit_at fields.
- Ranker strength factor (1.0 + 0.1*log(1+access_count)) already
  wired from Epic 8 Story 8.3 — cross-referenced here.

Story 9.2 — Manifest-based dedup in trajectory recorder:
- Add build_manifest() to hermes_memory.py: lists up to 50
  trajectory entries sorted by last_used_at desc.
- Add classify_trajectory_with_manifest(): sends manifest + failure
  pattern to LLM via hermes_llm.llm_call (Hard Invariant NousResearch#2).
  Pydantic-gated response (Hard Invariant NousResearch#11).
  Returns {action: reinforce, id} or {action: new, type, body}.
- Dedup prompt matches upstream PR NousResearch#4480 commit a443d1d shape.
- LLM calls reinforce_entry() for rematch (reuses 9.1's sibling).
- Telemetry: trajectory_outcome: reinforced-existing | new-entry.

Story 9.3 — Skill-dream consumes hit-rate signal:
- Add build_hit_rate_report(): joins preflight telemetry with
  verify_citation events, groups by category, computes hit_rate.
  Gated on ≥ min_fires (default 20) per category.
- Add propose_category_weight_nudges(): applies hard thresholds:
  - hit_rate < 0.15 → nudge_down
  - hit_rate > 0.5 → nudge_up (cites top-3 by access_count)
  - hit_rate < 0.05 AND unrelated > 0.6 → domain blind spot
  All are PROPOSALS only (ADR-2 / FR-14 / Hard Invariant NousResearch#4).

Tests: 36 new tests across 2 files (test_reinforce_entry.py: 11,
test_manifest_dedup.py: 25). All pass. Full suite: 152 pass,
6 pre-existing failures (unchanged), 0 regressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant