feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction#4480
Closed
LucidPaths wants to merge 14 commits into
Closed
feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction#4480LucidPaths wants to merge 14 commits into
LucidPaths wants to merge 14 commits into
Conversation
added 14 commits
March 31, 2026 13:03
…governance) Port of local governance features to upstream v0.5.0 base: - load_rules(): ~/.hermes/rules/*.md injection into system prompt - load_samples(): ~/.hermes/samples/*.md behavioral examples - load_working_state(): ~/.hermes/working_state.md cross-session context - atexit snapshot of working_state.md to checkpoints/ - Credential redaction in gateway and cron delivery Dropped: lifecycle hook wiring (superseded by upstream NousResearch#3542)
Cannibalized from: - HiveMind (SQLite schema, hybrid search, tiers, YAKE, lifecycle, decay) - Claude Code leaked (auto-extraction, autoDream, type taxonomy, relevance selection) 6-phase plan: schema -> tool upgrade -> search -> prompt integration -> extraction -> consolidation ~1,300 new lines + ~180 modified across 5 new files and 4 modified files
…h and tiered lifecycle Phase 1 of memory system v2. New MemoryEngine class provides: - SQLite storage with WAL mode for concurrent access - FTS5 full-text search with BM25 ranking - Hybrid search: BM25 * recency_decay * strength * tier_weight * type_boost - 5 memory types: general, preference, correction, project, reference - 4 memory tiers: active, archived, consolidated, superseded - Power-law recency decay (from HiveMind) - Logarithmic strength reinforcement (from HiveMind) - Automatic stale archival (90 days + low strength) - Supersession tracking (newer memory replaces older) - Frozen snapshot pattern for prompt cache stability - Migration from flat MEMORY.md/USER.md files - Memory manifest for extraction dedup (from Claude Code) - Exact-match duplicate detection - Type-tagged prompt formatting 38 new tests, all passing. Full suite: 7360 passed. Zero new dependencies (sqlite3 + uuid are stdlib).
…+ type taxonomy Phase 2+3+4 of memory system v2: MemoryStore compatibility layer: - When engine='sqlite' (default): delegates to MemoryEngine (SQLite + FTS5) - When engine='flat': falls back to legacy MEMORY.md flat files - Automatic migration from flat files on first SQLite run - MemoryEngine init failure gracefully falls back to flat mode Memory tool upgrades: - New 'search' action with search_query parameter (FTS5 + hybrid scoring) - New 'type' parameter: preference/correction/project/reference/general - Search reinforces accessed memories (strength increases on hit) - All existing actions (add/replace/remove) work identically Config additions (memory section): - engine: 'sqlite' or 'flat' - auto_extract: false (Phase 5 placeholder) - extract_interval: 3 - consolidation_enabled: false (Phase 6 placeholder) run_agent.py: - Creates MemoryEngine when engine='sqlite', passes to MemoryStore - Forwards type and search_query params in tool dispatch All 32 existing memory tests pass unchanged (backward compat verified). Full suite: 7360 passed, 0 failures.
Phase 5 of memory system v2. Post-response hook that extracts durable memories using a lightweight auxiliary LLM call (not a full agent fork). Architecture (cannibalized from Claude Code extractMemories): - Runs in background thread after every N responses (extract_interval) - Pre-injects manifest of existing memories to prevent duplicates - Structured JSON output: target, type, content per extracted memory - Processes last 20 messages with 8KB budget - Handles malformed JSON, code fences, empty/NONE responses gracefully - Source tagged as 'extraction' for provenance tracking Config: memory.auto_extract (default: false), memory.extract_interval (default: 3) Requires: engine='sqlite' + auxiliary_client available 13 new tests covering extraction logic, dedup, error handling. Full suite: 7373 passed, 0 failures.
Phase 6 of memory system v2. Periodic memory maintenance system cannibalized from Claude Code (autoDream) and HiveMind. 5-gate scheduling (cheapest first, from Claude Code): 1. Feature enabled? (config check) 2. Time since last consolidation >= threshold (default 24h) 3. Session count since last run >= threshold (default 5) 4. Concurrent lock (via metadata) 5. Auxiliary LLM available Consolidation actions (LLM-directed): - merge: combine duplicate/overlapping memories, supersede originals - update: fix stale content (relative dates, outdated facts) - archive: mark low-value memories for archival - Automatic stale archival (90 days + low strength, from HiveMind) Metadata tracking: last_consolidation timestamp, session counter. Designed to run via Hermes cron: hermes cron create --schedule '0 4 * * *' 11 new tests covering gates, merge/archive actions, metadata updates. Full suite: 7384 passed, 0 failures.
…ication, session memory, extraction hardening The full memory v2.5 implementation. Ported from HiveMind (Rust) and Claude Code (TypeScript): FROM HIVEMIND: - YAKE keyword extraction (agent/yake.py): 5-feature scoring, n-gram candidates, dedup, full stopword list. Direct Rust->Python transliteration. - Cosine similarity: pure Python, handles edge cases (empty/mismatched/zero-norm) - Chunking: 1600 char max, 320 char overlap, line-boundary aware - Topic auto-classification: keyword-based (tech/project/personal word lists) - Graph tables: edges (typed, weighted), entities (name, type, metadata) - Auto-edge creation: keyword overlap -> related_to edges - Graph-augmented search: 1-hop BFS expansion with 0.5x weight boost - Content-hash embedding cache stub (ready for fastembed provider) FROM CLAUDE CODE: - Extraction hardening: cursor tracking (only new messages), mutual exclusion (skip if agent wrote this turn), trailing run stash (coalesced execution) - Session memory (agent/session_memory.py): 9-section structured notes, token+tool_call thresholds, LLM-generated summaries - Type taxonomy depth: per-type when_to_save guidance, WHAT_NOT_TO_SAVE block - Staleness caveats: memories >7d get '(Xd old — verify)' suffix - Trusting Recall: verify files exist before recommending from memory Schema v2: +chunks, +embeddings, +edges, +entities tables with FTS5 triggers. Auto-classification on add(). Auto-keyword extraction on add(). Auto-chunking for content >500 chars. Graph traversal in search results. +1,853 lines across 10 files. 131 memory-specific tests, 7420 total suite.
…iring Final implementation phase: EMBEDDINGS (from HiveMind): - Real embedding generation via litellm (provider-agnostic) - Graceful degradation: no API key -> returns [], falls back to BM25-only - Content-hash caching in embeddings table (skip API for known content) - Background embedding generation (fire-and-forget thread) - numpy-accelerated cosine similarity with pure Python fallback HYBRID SEARCH (from HiveMind formula): - (0.7 * cosine + 0.3 * normalized_bm25) * recency * strength * tier * type - Falls back to BM25-only when no embeddings available - search_by_embedding() for direct vector search NEAR-DUPLICATE UPGRADE: - Cosine > 0.92 rejection (HiveMind threshold) when embeddings available - Exact-match fallback when no embeddings WIRING (run_agent.py): - Session memory update on each turn (token + tool_call thresholds) - Session memory injected into system prompt - Consolidation session counter incremented at session end - All wiring is best-effort (try/except, never breaks agent) GRAPH TOOLS (memory_tool.py): - graph_query action: get_related() and get_edges() with short-ID resolution - entity_track action: track_entity() for entity CRUD - MEMORY_SCHEMA updated with new actions +328 lines. 7425 total tests passing.
…r stack Memory should be ACTIVE, not dormant behind flags. It's MY memory system. - auto_extract: True by default (was False — why implement it and not use it?) - consolidation_enabled: True by default (same) - Embedding provider auto-detection from available API keys: OPENAI_API_KEY -> text-embedding-3-small OPENROUTER_API_KEY -> openrouter/openai/text-embedding-3-small VOYAGE_API_KEY -> voyage/voyage-3-lite No key -> graceful BM25 fallback (still works, just no vectors) - Config: memory.embedding_model for explicit override - All generate_embedding() calls now pass config for model resolution 7425 tests passing.
…nt, purge, cursor persistence Extraction: - Importance scoring (1-10) in extraction prompt, filter threshold >= 5 - Corrections/preferences get +1 importance bonus - Hard cap: max 5 entries per extraction run - Explicit 'do not extract' rules for noise (conversational artifacts, task-specific) - Extractor cursor persisted to SQLite via memory_meta (survives restart) Budget enforcement: - MAX_ACTIVE_MEMORY=50, MAX_ACTIVE_USER=25 hard caps - enforce_budget() archives weakest (lowest strength, oldest) when over cap - Corrections/preferences protected from budget archival (sorted last) - Called after every engine.add() and at session end DB hygiene: - purge_dead() hard-deletes superseded/archived entries >30 days old - Cleans up orphaned chunks, embeddings, edges - Runs at every session end Lifecycle at session end (run_agent.py): - archive_stale() — independent of consolidation now - enforce_budget() — prevent runaway growth - purge_dead() — prevent monotonic DB growth - increment_session_count() — consolidation gating Consolidation tuning: - Session gate: 5 → 3 (matches intermittent usage pattern) - Time gate: 24h → 12h - Prompt includes budget caps and protection priority - Prompt instructs: protect corrections/preferences, archive general first Bug fixes: - FTS5 query crash on apostrophes/quotes (regex tokenizer) - DEDUP_THRESHOLD 5.0 → 8.0 (false supersession prevention) - auto_extract default: False → True - consolidation_enabled default: False → True 7425 tests pass, 9/9 custom verification tests pass.
…s, events Local embeddings (fastembed): - BAAI/bge-small-en-v1.5 via ONNX (384 dims, ~50ms/query) - First in cascade: local → OpenAI → OpenRouter → Voyage - Module-level model cache (_LOCAL_EMBEDDER) - Zero API keys needed — near-duplicate detection, hybrid search, embedding dedup all now ACTIVE by default - _get_or_create_embedding model tracking fixed LLM reranker (Claude Code port): - rerank_with_llm() method on MemoryEngine - Uses auxiliary_client for cheap model reranking - search() accepts optional auxiliary_client parameter - Falls back to score-based ranking when no client Procedures table (HiveMind port): - learn_procedure(name, description, tool_chain) - reinforce_procedure(name, success) — track success/fail counts - get_procedures() — ordered by success rate - find_procedure(name) — LIKE match Events table (HiveMind MAGMA port): - log_event(type, summary, details, session_id) - get_recent_events(type, limit, session_id) - purge_old_events(max_age_days=90) — wired into purge_dead() - Event types: tool_success/failure, memory_write, session_start/end, consolidation, error, milestone Test update: - test_generate_embedding_graceful_failure → test_generate_embedding_works_locally (fastembed means embeddings work without API keys now) 7425 tests pass.
…y, OLLAMA_API_KEY resolution Fallback chain (config.yaml): - nemotron-3-super (120B MoE, 1.8s, top quality) as primary fallback - devstral-2:123b (Mistral coding model) as secondary - Both via Ollama cloud at ollama.com/v1 Auxiliary tasks routed to Ollama cloud: - web_extract, session_search, skills_hub, approval, flush_memories → ministral-3:3b - compression → ministral-3:8b (needs more capability for summarization) - vision, mcp → unchanged (auto) Provider resolution: - OLLAMA_API_KEY added to custom provider API key cascade in auxiliary_client.py - Fallback provider now passes base_url and api_key from config to resolve_provider_client 7425 tests pass.
Contributor
Author
|
Retracting — needs cleanup before review. Dead code and unexercised subsystems identified. Will resubmit after trimming. |
12 tasks
marozau
pushed a commit
to marozau/hermes-agent
that referenced
this pull request
Jun 8, 2026
Story 9.1 — access_count + last_hit_at reinforcement: - Add reinforce_entry() to hermes_memory.py as new canonical sibling (Hard Invariant NousResearch#1 extension). Atomically bumps access_count, sets last_hit_at=now, pairs with raw-layer reinforce event. Body bytes unchanged (content-hash stable). - Idempotency via raw-layer scan: same (entry_id, source) pair reported twice doesn't double-bump. Scans today+yesterday JSONL. - Wire verify hook: preflight_verify_helper.py --match hit triggers reinforce_entry() for each cited ID. Fail-open on failure. - read_entries() now returns access_count + last_hit_at fields. - Ranker strength factor (1.0 + 0.1*log(1+access_count)) already wired from Epic 8 Story 8.3 — cross-referenced here. Story 9.2 — Manifest-based dedup in trajectory recorder: - Add build_manifest() to hermes_memory.py: lists up to 50 trajectory entries sorted by last_used_at desc. - Add classify_trajectory_with_manifest(): sends manifest + failure pattern to LLM via hermes_llm.llm_call (Hard Invariant NousResearch#2). Pydantic-gated response (Hard Invariant NousResearch#11). Returns {action: reinforce, id} or {action: new, type, body}. - Dedup prompt matches upstream PR NousResearch#4480 commit a443d1d shape. - LLM calls reinforce_entry() for rematch (reuses 9.1's sibling). - Telemetry: trajectory_outcome: reinforced-existing | new-entry. Story 9.3 — Skill-dream consumes hit-rate signal: - Add build_hit_rate_report(): joins preflight telemetry with verify_citation events, groups by category, computes hit_rate. Gated on ≥ min_fires (default 20) per category. - Add propose_category_weight_nudges(): applies hard thresholds: - hit_rate < 0.15 → nudge_down - hit_rate > 0.5 → nudge_up (cites top-3 by access_count) - hit_rate < 0.05 AND unrelated > 0.6 → domain blind spot All are PROPOSALS only (ADR-2 / FR-14 / Hard Invariant NousResearch#4). Tests: 36 new tests across 2 files (test_reinforce_entry.py: 11, test_manifest_dedup.py: 25). All pass. Full suite: 152 pass, 6 pre-existing failures (unchanged), 0 regressions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat: Memory V2 — SQLite-backed knowledge system with hybrid search, graph, and auto-extraction
Summary
Replaces the flat-file memory store with a full SQLite-backed knowledge system featuring FTS5 full-text search, optional embedding-based semantic search, a relationship graph, automatic knowledge extraction from conversations, session memory injection, and a 5-gate consolidation scheduler with tiered lifecycle management. The migration from flat files is automatic and backward-compatible — existing memories are imported on first load with no user action required.
What Changed
New Files
tools/memory_engine.pyagent/memory_extractor.pyagent/session_memory.pyagent/memory_consolidator.pyagent/yake.pyMEMORY_V2_PLAN.mdtests/tools/test_memory_engine.pytests/agent/test_memory_extractor.pytests/agent/test_memory_consolidator.pytests/agent/test_session_memory.pytests/agent/test_yake.pyModified Files
tools/memory_tool.pyMemoryEnginebackend; addedsearchaction with hybrid (FTS5 + embedding) retrieval; addedgraphaction for relationship queries; preserved all existing tool signatures for backward compathermes_state.pymemory_engineandsession_memoryinstancesrun_agent.pytests/tools/test_memory_tool.pyArchitecture
For full architecture details, see
MEMORY_V2_PLAN.md.Key Features
memory graphtool actionBreaking Changes
None. The flat-file to SQLite migration is automatic and backward-compatible:
MemoryEnginedetects existing flat-file memories and imports them into SQLitememory_tool.pyactions (store,recall,list,delete) continue to work with the same signaturessearch,graph) are additiveTesting
tests/tools/test_memory_engine.pytests/agent/test_memory_extractor.pytests/agent/test_memory_consolidator.pytests/agent/test_session_memory.pytests/agent/test_yake.pytests/tools/test_memory_tool.pyProvenance
This implementation draws from multiple sources:
memory.rsin the MAGMA frameworkCommit Log
Stats
Net delta: +6,293 lines