Skip to content

Feature: Cognitive Memory Operations — LLM-Driven Encoding, Consolidation, Adaptive Recall & Extraction (inspired by CrewAI) #509

@teknium1

Description

@teknium1

Overview

Hermes Agent's current memory system stores flat text entries (MEMORY.md and USER.md) with manual add/replace/remove operations. Every memory operation is a passive read or write — the agent decides what to save and where, there's no contradiction detection, no confidence-aware retrieval, no automatic extraction, and no forgetting mechanism. This means the agent can store "We use PostgreSQL" on Monday and "We switched to MySQL" on Friday, and both coexist permanently until the agent manually notices the conflict.

CrewAI's Cognitive Memory (v1.10.1, MIT licensed) introduces a fundamentally different approach: memory as cognition rather than storage. Every memory operation — encode, consolidate, recall, extract, forget — is an active reasoning process powered by LLM analysis. When you store a memory, the system auto-classifies it, detects contradictions against existing knowledge, and resolves them. When you recall, the system evaluates its own confidence and searches deeper when unsure. The result is a self-maintaining knowledge base that compounds over time rather than accumulating contradictions.

This issue proposes adding a cognitive operations layer to Hermes Agent's memory system, building on the structured storage foundation proposed in #346. Where #346 defines the storage and retrieval infrastructure (typed nodes, vector search, LanceDB), this issue defines the intelligent operations that make memory behave like cognition.

Research source: CrewAI memory source code (~3,500 lines across 8 files), Blog post. License: MIT.


Research Findings

CrewAI's Architecture: Five Cognitive Operations

CrewAI's Memory class exposes five cognitive operations, each backed by an LLM-powered pipeline:

1. Encode (remember())

When content is stored, an EncodingFlow runs a 5-step pipeline:

  1. Batch embed — One embedder call for all items (OpenAI text-embedding-3-small, 1536 dims)
  2. Intra-batch dedup — Cosine similarity matrix; drop items with ≥0.98 similarity (pure math, no LLM cost)
  3. Parallel find similar — Concurrent vector searches against existing memories
  4. Parallel analyze — Concurrent LLM calls for field resolution + consolidation, classified into 4 groups:
    • Group A: Caller provided scope/importance + no similar records → 0 LLM calls (fast insert)
    • Group B: Caller provided fields + similar records → 1 consolidation LLM call
    • Group C: Fields missing + no similar records → 1 field-resolution LLM call
    • Group D: Fields missing + similar records → 2 concurrent LLM calls
  5. Execute plans — Batch re-embed updates + bulk insert with write lock

The LLM produces a MemoryAnalysis for each item:

class MemoryAnalysis(BaseModel):
    suggested_scope: str   # Hierarchical path, e.g. "/infrastructure/database"
    categories: list[str]  # Tags like ["postgresql", "migration"]
    importance: float      # 0.0 to 1.0
    extracted_metadata: dict  # Entities, dates, topics

This is the key insight: structure emerges from content, the agent doesn't need to specify types or categories — the system infers them. But the agent CAN override when it wants control.

2. Consolidate (triggered during encoding)

When similarity ≥0.85 is detected against existing memories, the LLM produces a ConsolidationPlan:

class ConsolidationAction(BaseModel):
    record_id: str
    action: Literal["keep", "update", "delete"]
    updated_content: str | None = None

class ConsolidationPlan(BaseModel):
    actions: list[ConsolidationAction]
    insert_new: bool

Example: "We use PostgreSQL for the user database" exists. New memory: "We migrated to MySQL last week." Consolidation detects the contradiction and produces: [{record_id: "abc", action: "update", updated_content: "We migrated from PostgreSQL to MySQL for the user database last week"}], insert_new: false. Result: one coherent memory instead of two contradictory ones.

3. Recall (recall()) — Adaptive Depth

The RecallFlow implements confidence-based retrieval routing:

  1. Analyze query — LLM distills the query into targeted sub-queries (skipped for queries <200 chars)
  2. Filter and chunk — Select candidate scopes to search
  3. Search chunks — Parallel vector search across (embeddings × scopes)
  4. Confidence routing:
    • confidence ≥ 0.8 → synthesize and return results
    • confidence < 0.5 and budget > 0 → explore deeper (broader scopes, different strategies)
    • complex query + confidence < 0.7 → explore deeper
  5. Synthesize — Deduplicate, composite-score, rank

Composite scoring formula:

score = (semantic_weight × similarity) + (recency_weight × decay) + (importance_weight × importance)
decay = 0.5^(age_days / recency_half_life_days)

Defaults: semantic=0.5, recency=0.3, importance=0.2, half_life=30 days. This means a critical architecture decision from 6 months ago outranks a trivial note from yesterday that happens to mention the same keyword.

Each result includes match_reasons (why it scored high) and evidence_gaps (what information is still missing) — the system knows what it doesn't know.

4. Extract (extract_memories())

Decomposes raw text into atomic, self-contained facts using LLM:

raw = """After reviewing options, the team recommends PostgreSQL for JSONB support.
Estimated cost is $2,400/month on RDS. Compliance requires EU data residency.
DevOps prefers managed services."""

facts = memory.extract_memories(raw)
# → ["Team recommends PostgreSQL for user database due to JSONB support",
#    "Estimated database cost is $2,400/month on RDS",
#    "Compliance requires all user data to remain in EU regions",
#    "DevOps prefers managed services over self-hosted"]

Each extracted fact enters the full encoding pipeline independently. This is what powers automatic memory capture from task outputs — the agent doesn't need to decide what's worth remembering.

5. Forget (forget())

Targeted purging by scope, categories, age, or specific record IDs:

memory.forget(scope="/project/alpha", older_than=datetime.utcnow() - timedelta(days=30))

Hierarchical Scopes

Memories are organized in filesystem-like paths: /infrastructure/database, /compliance/eu, /project/alpha/decisions. The LLM auto-assigns scopes during encoding, building a self-organizing hierarchy.

Two access patterns:

  • MemoryScope — Restricts an agent to a subtree (e.g., memory.scope("/agent/researcher"))
  • MemorySlice — Reads from multiple disjoint branches (e.g., memory.slice(["/compliance", "/security"]))

Key Design Decisions

  1. Structure emerges from content — No predefined schema. The LLM infers scope, categories, importance. This avoids the "8 fixed types" problem where the agent needs to learn a taxonomy.
  2. Consolidation over contradiction edges — Instead of maintaining "Contradicts" graph edges (Spacebot), resolve conflicts at write time via LLM. Cleaner data, fewer stale edges.
  3. Confidence-aware retrieval — The system tells you when it's not sure, rather than silently returning low-quality matches.
  4. Non-blocking writesremember_many() runs encoding in a background thread; recall() auto-drains pending writes first. Good for performance-sensitive paths.
  5. Graceful degradation — Every LLM call in the pipeline has a fallback (safe defaults on failure). If the analysis LLM is down, memories still get stored with default importance/scope.

Current State in Hermes Agent

What We Have

Component Current Implementation Gap
Memory write Manual add with flat text entry No auto-classification, no contradiction detection
Memory read Substring matching (old_text) No semantic search, no confidence scoring
Session recall FTS5 keyword search + LLM summarization Good for sessions, not for structured memory
Memory organization Two buckets: memory / user No hierarchy, no categories, no scopes
Capacity Hard 2200/1375 char limits Forces manual pruning, currently at 97%
Conflict resolution Manual replace by agent Agent must notice and resolve conflicts itself
Compaction Summarizes old context Doesn't extract memories from compressed content
Forgetting Manual remove No automatic decay or pruning

Relevant Existing Issues


Implementation Plan

Skill vs. Tool Classification

This should be a core codebase change because:

  • It extends tools/memory_tool.py with new operations (recall, extract, forget)
  • It requires LLM integration inside the memory tool (not just storing text the agent provides)
  • It modifies the memory storage layer (new fields: scope, categories, importance, embeddings)
  • It integrates with context compression (automatic memory extraction)
  • It needs embedding infrastructure as a core dependency
  • It must work across all platforms (CLI, Telegram, Discord) uniformly

What We'd Need

  1. Embedding infrastructure — Local (fastembed/sentence-transformers) or API-based (OpenAI). LanceDB as vector store (Apache 2.0, already validated by both Spacebot and CrewAI).
  2. Auxiliary LLM calls — Use the existing auxiliary client (agent/auxiliary_client.py, same pattern as session_search) for encoding analysis, consolidation, extraction, and query analysis.
  3. Extended memory tool — New actions: recall (semantic search), extract (decompose text), forget (targeted purge). Existing actions (add/replace/remove) remain backward compatible.
  4. Encoding pipeline — On add, optionally run LLM analysis for auto-scope/categories/importance and check for contradictions.
  5. Memory schema upgrade — Extend SQLite storage (or migrate to LanceDB) with scope, categories, importance, embedding, timestamps.
  6. Compaction integration — During context compression, run extract_memories on the compressed content and store atomic facts.

Phased Rollout

Phase 1: Cognitive Encoding — Auto-Classification + Contradiction Resolution

  • Extend memory_tool.py with optional LLM analysis on add (auto-infer scope, categories, importance if not provided by agent)
  • Add consolidation check: before inserting, search for similar existing memories. If similarity ≥0.85, run LLM consolidation to resolve conflicts.
  • Add scope and importance parameters to the memory tool schema
  • Migrate from flat files to SQLite-backed storage with scope, categories, importance columns
  • Keep backward compatibility: existing add/replace/remove still work
  • Configurable: memory.cognitive: true|false in config.yaml (default: true)
  • Deliverable: Memory entries auto-classified with scope/importance, contradictions auto-resolved

Phase 2: Semantic Recall + Composite Scoring

  • Add embeddings to memory entries (LanceDB or SQLite + fastembed)
  • New recall action: semantic search with composite scoring (similarity × w_sim + recency × w_rec + importance × w_imp)
  • Confidence-based depth: simple queries do pure vector search, complex queries use LLM query analysis
  • Results include match_reasons and evidence_gaps
  • Expose configurable weights in config.yaml
  • Deliverable: Agent can recall semantically (not just substring match), with confidence-aware retrieval

Phase 3: Automatic Memory Extraction + Forgetting

  • extract action: decompose text blobs into atomic facts
  • Integration with context compression: when compacting, extract memories from compressed content
  • Integration with task completion: option to extract memories from tool outputs
  • forget action: targeted purge by scope, age, categories
  • Automatic importance decay: periodic maintenance (daily) that decays stale memories
  • Pruning: soft-delete memories below importance threshold after configurable days
  • Deliverable: Self-maintaining memory system that captures knowledge from operations and prunes stale information

Phase 4: Scoped Access + Multi-Agent Memory

  • Hierarchical scope paths with MemoryScope (restrict to subtree) and MemorySlice (read from multiple branches)
  • Per-user memory scoping for gateway platforms (Telegram/Discord users get their own memory subtrees)
  • Integration with Feature: Shared Memory Pools Between Sub-Agents in Workflows (inspired by CAMEL-AI) #377 (shared memory pools) — sub-agents can share a memory scope
  • Memory tree visualization: tree command showing scope hierarchy with counts
  • Deliverable: Multi-agent, multi-user memory with access control

Pros & Cons

Pros

  • Self-maintaining knowledge — Contradictions auto-resolved instead of silently accumulating. The agent stops having to manually detect and fix conflicting memories.
  • No capacity ceiling — Importance decay + pruning replaces hard character limits (currently at 97% capacity). Knowledge compounds instead of hitting walls.
  • Semantic retrieval — Find relevant memories by meaning, not just keyword overlap. "What database are we using?" matches "We migrated to MySQL" even without shared keywords.
  • Automatic knowledge capture — Memory extraction from task outputs and context compression means less information is lost between sessions.
  • Confidence awareness — The system tells you when retrieval is uncertain, enabling better decision-making.
  • Backward compatible — Phase 1 maintains existing add/replace/remove interface. Cognitive features are additive.
  • MIT-licensed reference — CrewAI's Python implementation (3,500 lines) can be freely studied and adapted.
  • Proven at scale — CrewAI processes billions of agentic executions; this design is battle-tested.
  • Configurable — Weights, thresholds, half-life, and depth are all tunable per deployment.

Cons / Risks

  • LLM cost on every write — Encoding analysis adds 1-2 LLM calls per remember(). For high-volume scenarios, this could be expensive. Mitigations: use cheap models (gpt-4o-mini), skip analysis when all fields provided (Group A path), batch processing.
  • Latency on writes — LLM calls add ~200-500ms per memory write. Mitigation: non-blocking writes for batch operations.
  • Complexity jump — Moving from flat text files to LLM-powered cognitive pipelines is a major architectural change. Must be phased carefully to avoid breaking the current simple workflow.
  • Dependency on auxiliary LLM — Encoding/consolidation/extraction require a working LLM. If the auxiliary model is unavailable, cognitive features degrade to simple storage. CrewAI handles this with graceful fallbacks — we should too.
  • Embedding model dependency — Vector search requires either an API-based embedder (adds API cost) or a local model (adds ~100MB disk + memory). Decision needed.
  • Testing complexity — Cognitive operations are non-deterministic (LLM-dependent). Need mocked tests + integration tests with real models.
  • Risk of over-classification — LLM might assign incorrect scopes or importance. Bad classifications could make recall worse. Mitigated by configurable overrides and fallback defaults.

Open Questions

  • Embedding model: local vs. API? Local (fastembed with all-MiniLM-L6-v2, 384 dims, ~100MB) is private and fast. API (OpenAI text-embedding-3-small, 1536 dims) is higher quality but adds cost and latency. Could support both with a config flag.
  • Should cognitive encoding be opt-in or opt-out? Default enabled with memory.cognitive: true in config, or default disabled requiring explicit opt-in? Given the LLM cost, opt-in might be safer initially.
  • Consolidation threshold? CrewAI uses 0.85 cosine similarity. Too low = false positives (merging unrelated memories). Too high = missed contradictions. Should be configurable.
  • How should scoped memory interact with current MEMORY.md/USER.md distinction? Replace the two-file split with scopes (/agent/notes and /user/profile) or keep them as special-case scopes?
  • Integration with session_search? Should recall also search session transcripts, or keep structured memory and session recall as separate tools?
  • Storage backend? LanceDB (validated by both Spacebot and CrewAI) vs. extending the existing SQLite state.db with pgvector-like vector operations?

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions