Feature: Cognitive Memory Operations — LLM-Driven Encoding, Consolidation, Adaptive Recall & Extraction (inspired by CrewAI)

## Overview

Hermes Agent's current memory system stores flat text entries (MEMORY.md and USER.md) with manual add/replace/remove operations. Every memory operation is a passive read or write — the agent decides what to save and where, there's no contradiction detection, no confidence-aware retrieval, no automatic extraction, and no forgetting mechanism. This means the agent can store "We use PostgreSQL" on Monday and "We switched to MySQL" on Friday, and both coexist permanently until the agent manually notices the conflict.

[CrewAI's Cognitive Memory](https://github.com/crewAIInc/crewAI/tree/main/lib/crewai/src/crewai/memory) (v1.10.1, MIT licensed) introduces a fundamentally different approach: **memory as cognition rather than storage**. Every memory operation — encode, consolidate, recall, extract, forget — is an active reasoning process powered by LLM analysis. When you store a memory, the system auto-classifies it, detects contradictions against existing knowledge, and resolves them. When you recall, the system evaluates its own confidence and searches deeper when unsure. The result is a self-maintaining knowledge base that compounds over time rather than accumulating contradictions.

This issue proposes adding a cognitive operations layer to Hermes Agent's memory system, building on the structured storage foundation proposed in #346. Where #346 defines the storage and retrieval infrastructure (typed nodes, vector search, LanceDB), this issue defines the **intelligent operations** that make memory behave like cognition.

**Research source:** [CrewAI memory source code](https://github.com/crewAIInc/crewAI/tree/main/lib/crewai/src/crewai/memory) (~3,500 lines across 8 files), [Blog post](https://blog.crewai.com/how-we-built-cognitive-memory-for-agentic-systems/). License: MIT.

---

## Research Findings

### CrewAI's Architecture: Five Cognitive Operations

CrewAI's `Memory` class exposes five cognitive operations, each backed by an LLM-powered pipeline:

#### 1. Encode (`remember()`)

When content is stored, an `EncodingFlow` runs a 5-step pipeline:

1. **Batch embed** — One embedder call for all items (OpenAI text-embedding-3-small, 1536 dims)
2. **Intra-batch dedup** — Cosine similarity matrix; drop items with ≥0.98 similarity (pure math, no LLM cost)
3. **Parallel find similar** — Concurrent vector searches against existing memories
4. **Parallel analyze** — Concurrent LLM calls for field resolution + consolidation, classified into 4 groups:
   - Group A: Caller provided scope/importance + no similar records → 0 LLM calls (fast insert)
   - Group B: Caller provided fields + similar records → 1 consolidation LLM call
   - Group C: Fields missing + no similar records → 1 field-resolution LLM call
   - Group D: Fields missing + similar records → 2 concurrent LLM calls
5. **Execute plans** — Batch re-embed updates + bulk insert with write lock

The LLM produces a `MemoryAnalysis` for each item:
```python
class MemoryAnalysis(BaseModel):
    suggested_scope: str   # Hierarchical path, e.g. "/infrastructure/database"
    categories: list[str]  # Tags like ["postgresql", "migration"]
    importance: float      # 0.0 to 1.0
    extracted_metadata: dict  # Entities, dates, topics
```

This is the key insight: **structure emerges from content**, the agent doesn't need to specify types or categories — the system infers them. But the agent CAN override when it wants control.

#### 2. Consolidate (triggered during encoding)

When similarity ≥0.85 is detected against existing memories, the LLM produces a `ConsolidationPlan`:

```python
class ConsolidationAction(BaseModel):
    record_id: str
    action: Literal["keep", "update", "delete"]
    updated_content: str | None = None

class ConsolidationPlan(BaseModel):
    actions: list[ConsolidationAction]
    insert_new: bool
```

**Example:** "We use PostgreSQL for the user database" exists. New memory: "We migrated to MySQL last week." Consolidation detects the contradiction and produces: `[{record_id: "abc", action: "update", updated_content: "We migrated from PostgreSQL to MySQL for the user database last week"}]`, `insert_new: false`. Result: one coherent memory instead of two contradictory ones.

#### 3. Recall (`recall()`) — Adaptive Depth

The `RecallFlow` implements confidence-based retrieval routing:

1. **Analyze query** — LLM distills the query into targeted sub-queries (skipped for queries <200 chars)
2. **Filter and chunk** — Select candidate scopes to search
3. **Search chunks** — Parallel vector search across (embeddings × scopes)
4. **Confidence routing:**
   - confidence ≥ 0.8 → synthesize and return results
   - confidence < 0.5 and budget > 0 → explore deeper (broader scopes, different strategies)
   - complex query + confidence < 0.7 → explore deeper
5. **Synthesize** — Deduplicate, composite-score, rank

Composite scoring formula:
```
score = (semantic_weight × similarity) + (recency_weight × decay) + (importance_weight × importance)
decay = 0.5^(age_days / recency_half_life_days)
```

Defaults: semantic=0.5, recency=0.3, importance=0.2, half_life=30 days. This means a critical architecture decision from 6 months ago outranks a trivial note from yesterday that happens to mention the same keyword.

Each result includes `match_reasons` (why it scored high) and `evidence_gaps` (what information is still missing) — the system knows what it doesn't know.

#### 4. Extract (`extract_memories()`)

Decomposes raw text into atomic, self-contained facts using LLM:

```python
raw = """After reviewing options, the team recommends PostgreSQL for JSONB support.
Estimated cost is $2,400/month on RDS. Compliance requires EU data residency.
DevOps prefers managed services."""

facts = memory.extract_memories(raw)
# → ["Team recommends PostgreSQL for user database due to JSONB support",
#    "Estimated database cost is $2,400/month on RDS",
#    "Compliance requires all user data to remain in EU regions",
#    "DevOps prefers managed services over self-hosted"]
```

Each extracted fact enters the full encoding pipeline independently. This is what powers automatic memory capture from task outputs — the agent doesn't need to decide what's worth remembering.

#### 5. Forget (`forget()`)

Targeted purging by scope, categories, age, or specific record IDs:
```python
memory.forget(scope="/project/alpha", older_than=datetime.utcnow() - timedelta(days=30))
```

### Hierarchical Scopes

Memories are organized in filesystem-like paths: `/infrastructure/database`, `/compliance/eu`, `/project/alpha/decisions`. The LLM auto-assigns scopes during encoding, building a self-organizing hierarchy.

Two access patterns:
- **`MemoryScope`** — Restricts an agent to a subtree (e.g., `memory.scope("/agent/researcher")`)
- **`MemorySlice`** — Reads from multiple disjoint branches (e.g., `memory.slice(["/compliance", "/security"])`)

### Key Design Decisions

1. **Structure emerges from content** — No predefined schema. The LLM infers scope, categories, importance. This avoids the "8 fixed types" problem where the agent needs to learn a taxonomy.
2. **Consolidation over contradiction edges** — Instead of maintaining "Contradicts" graph edges (Spacebot), resolve conflicts at write time via LLM. Cleaner data, fewer stale edges.
3. **Confidence-aware retrieval** — The system tells you when it's not sure, rather than silently returning low-quality matches.
4. **Non-blocking writes** — `remember_many()` runs encoding in a background thread; `recall()` auto-drains pending writes first. Good for performance-sensitive paths.
5. **Graceful degradation** — Every LLM call in the pipeline has a fallback (safe defaults on failure). If the analysis LLM is down, memories still get stored with default importance/scope.

---

## Current State in Hermes Agent

### What We Have

| Component | Current Implementation | Gap |
|:---|:---|:---|
| **Memory write** | Manual `add` with flat text entry | No auto-classification, no contradiction detection |
| **Memory read** | Substring matching (old_text) | No semantic search, no confidence scoring |
| **Session recall** | FTS5 keyword search + LLM summarization | Good for sessions, not for structured memory |
| **Memory organization** | Two buckets: memory / user | No hierarchy, no categories, no scopes |
| **Capacity** | Hard 2200/1375 char limits | Forces manual pruning, currently at 97% |
| **Conflict resolution** | Manual replace by agent | Agent must notice and resolve conflicts itself |
| **Compaction** | Summarizes old context | Doesn't extract memories from compressed content |
| **Forgetting** | Manual `remove` | No automatic decay or pruning |

### Relevant Existing Issues
- **#346** — Structured Memory System: storage infrastructure (typed nodes, graph edges, hybrid search). This issue's cognitive layer builds on #346's foundation.
- **#377** — Shared Memory Pools: multi-agent memory sharing. Complementary — scopes/slices from this issue could enable shared pools.
- **#362** — PAHF Personalization Loop: learning from user feedback. Memory extraction could capture feedback as memories.
- **#480** — Context Condensation: LLM-based compaction. Memory extraction during compaction is a natural integration point.

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **core codebase change** because:
- It extends `tools/memory_tool.py` with new operations (recall, extract, forget)
- It requires LLM integration inside the memory tool (not just storing text the agent provides)
- It modifies the memory storage layer (new fields: scope, categories, importance, embeddings)
- It integrates with context compression (automatic memory extraction)
- It needs embedding infrastructure as a core dependency
- It must work across all platforms (CLI, Telegram, Discord) uniformly

### What We'd Need

1. **Embedding infrastructure** — Local (fastembed/sentence-transformers) or API-based (OpenAI). LanceDB as vector store (Apache 2.0, already validated by both Spacebot and CrewAI).
2. **Auxiliary LLM calls** — Use the existing auxiliary client (`agent/auxiliary_client.py`, same pattern as session_search) for encoding analysis, consolidation, extraction, and query analysis.
3. **Extended memory tool** — New actions: `recall` (semantic search), `extract` (decompose text), `forget` (targeted purge). Existing actions (add/replace/remove) remain backward compatible.
4. **Encoding pipeline** — On `add`, optionally run LLM analysis for auto-scope/categories/importance and check for contradictions.
5. **Memory schema upgrade** — Extend SQLite storage (or migrate to LanceDB) with scope, categories, importance, embedding, timestamps.
6. **Compaction integration** — During context compression, run `extract_memories` on the compressed content and store atomic facts.

### Phased Rollout

**Phase 1: Cognitive Encoding — Auto-Classification + Contradiction Resolution**
- Extend `memory_tool.py` with optional LLM analysis on `add` (auto-infer scope, categories, importance if not provided by agent)
- Add consolidation check: before inserting, search for similar existing memories. If similarity ≥0.85, run LLM consolidation to resolve conflicts.
- Add `scope` and `importance` parameters to the memory tool schema
- Migrate from flat files to SQLite-backed storage with scope, categories, importance columns
- Keep backward compatibility: existing add/replace/remove still work
- Configurable: `memory.cognitive: true|false` in config.yaml (default: true)
- **Deliverable:** Memory entries auto-classified with scope/importance, contradictions auto-resolved

**Phase 2: Semantic Recall + Composite Scoring**
- Add embeddings to memory entries (LanceDB or SQLite + fastembed)
- New `recall` action: semantic search with composite scoring (similarity × w_sim + recency × w_rec + importance × w_imp)
- Confidence-based depth: simple queries do pure vector search, complex queries use LLM query analysis
- Results include `match_reasons` and `evidence_gaps`
- Expose configurable weights in config.yaml
- **Deliverable:** Agent can recall semantically (not just substring match), with confidence-aware retrieval

**Phase 3: Automatic Memory Extraction + Forgetting**
- `extract` action: decompose text blobs into atomic facts
- Integration with context compression: when compacting, extract memories from compressed content
- Integration with task completion: option to extract memories from tool outputs
- `forget` action: targeted purge by scope, age, categories
- Automatic importance decay: periodic maintenance (daily) that decays stale memories
- Pruning: soft-delete memories below importance threshold after configurable days
- **Deliverable:** Self-maintaining memory system that captures knowledge from operations and prunes stale information

**Phase 4: Scoped Access + Multi-Agent Memory**
- Hierarchical scope paths with MemoryScope (restrict to subtree) and MemorySlice (read from multiple branches)
- Per-user memory scoping for gateway platforms (Telegram/Discord users get their own memory subtrees)
- Integration with #377 (shared memory pools) — sub-agents can share a memory scope
- Memory tree visualization: `tree` command showing scope hierarchy with counts
- **Deliverable:** Multi-agent, multi-user memory with access control

---

## Pros & Cons

### Pros
- **Self-maintaining knowledge** — Contradictions auto-resolved instead of silently accumulating. The agent stops having to manually detect and fix conflicting memories.
- **No capacity ceiling** — Importance decay + pruning replaces hard character limits (currently at 97% capacity). Knowledge compounds instead of hitting walls.
- **Semantic retrieval** — Find relevant memories by meaning, not just keyword overlap. "What database are we using?" matches "We migrated to MySQL" even without shared keywords.
- **Automatic knowledge capture** — Memory extraction from task outputs and context compression means less information is lost between sessions.
- **Confidence awareness** — The system tells you when retrieval is uncertain, enabling better decision-making.
- **Backward compatible** — Phase 1 maintains existing add/replace/remove interface. Cognitive features are additive.
- **MIT-licensed reference** — CrewAI's Python implementation (3,500 lines) can be freely studied and adapted.
- **Proven at scale** — CrewAI processes billions of agentic executions; this design is battle-tested.
- **Configurable** — Weights, thresholds, half-life, and depth are all tunable per deployment.

### Cons / Risks
- **LLM cost on every write** — Encoding analysis adds 1-2 LLM calls per `remember()`. For high-volume scenarios, this could be expensive. Mitigations: use cheap models (gpt-4o-mini), skip analysis when all fields provided (Group A path), batch processing.
- **Latency on writes** — LLM calls add ~200-500ms per memory write. Mitigation: non-blocking writes for batch operations.
- **Complexity jump** — Moving from flat text files to LLM-powered cognitive pipelines is a major architectural change. Must be phased carefully to avoid breaking the current simple workflow.
- **Dependency on auxiliary LLM** — Encoding/consolidation/extraction require a working LLM. If the auxiliary model is unavailable, cognitive features degrade to simple storage. CrewAI handles this with graceful fallbacks — we should too.
- **Embedding model dependency** — Vector search requires either an API-based embedder (adds API cost) or a local model (adds ~100MB disk + memory). Decision needed.
- **Testing complexity** — Cognitive operations are non-deterministic (LLM-dependent). Need mocked tests + integration tests with real models.
- **Risk of over-classification** — LLM might assign incorrect scopes or importance. Bad classifications could make recall worse. Mitigated by configurable overrides and fallback defaults.

---

## Open Questions

- **Embedding model: local vs. API?** Local (fastembed with all-MiniLM-L6-v2, 384 dims, ~100MB) is private and fast. API (OpenAI text-embedding-3-small, 1536 dims) is higher quality but adds cost and latency. Could support both with a config flag.
- **Should cognitive encoding be opt-in or opt-out?** Default enabled with `memory.cognitive: true` in config, or default disabled requiring explicit opt-in? Given the LLM cost, opt-in might be safer initially.
- **Consolidation threshold?** CrewAI uses 0.85 cosine similarity. Too low = false positives (merging unrelated memories). Too high = missed contradictions. Should be configurable.
- **How should scoped memory interact with current MEMORY.md/USER.md distinction?** Replace the two-file split with scopes (`/agent/notes` and `/user/profile`) or keep them as special-case scopes?
- **Integration with session_search?** Should `recall` also search session transcripts, or keep structured memory and session recall as separate tools?
- **Storage backend?** LanceDB (validated by both Spacebot and CrewAI) vs. extending the existing SQLite state.db with pgvector-like vector operations?

---

## References

- [CrewAI Cognitive Memory source code](https://github.com/crewAIInc/crewAI/tree/main/lib/crewai/src/crewai/memory) (~3,500 lines, MIT license)
- [CrewAI blog post: How we built Cognitive Memory for Agentic Systems](https://blog.crewai.com/how-we-built-cognitive-memory-for-agentic-systems/)
- [CrewAI PR #4420: New Unified Memory System](https://github.com/crewAIInc/crewAI/pull/4420)
- [LanceDB](https://github.com/lancedb/lancedb) (Apache 2.0, embedded vector database)
- [CrewAI Memory Documentation](https://docs.crewai.com/en/concepts/memory)
- Hermes Agent #346 — Structured Memory System (complementary storage layer)
- Hermes Agent #377 — Shared Memory Pools (complementary multi-agent memory)
- Hermes Agent #480 — Context Condensation (integration point for memory extraction)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Cognitive Memory Operations — LLM-Driven Encoding, Consolidation, Adaptive Recall & Extraction (inspired by CrewAI) #509

Overview

Research Findings

CrewAI's Architecture: Five Cognitive Operations

1. Encode (`remember()`)

2. Consolidate (triggered during encoding)

3. Recall (`recall()`) — Adaptive Depth

4. Extract (`extract_memories()`)

5. Forget (`forget()`)

Hierarchical Scopes

Key Design Decisions

Current State in Hermes Agent

What We Have

Relevant Existing Issues

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Component	Current Implementation	Gap
Memory write	Manual `add` with flat text entry	No auto-classification, no contradiction detection
Memory read	Substring matching (old_text)	No semantic search, no confidence scoring
Session recall	FTS5 keyword search + LLM summarization	Good for sessions, not for structured memory
Memory organization	Two buckets: memory / user	No hierarchy, no categories, no scopes
Capacity	Hard 2200/1375 char limits	Forces manual pruning, currently at 97%
Conflict resolution	Manual replace by agent	Agent must notice and resolve conflicts itself
Compaction	Summarizes old context	Doesn't extract memories from compressed content
Forgetting	Manual `remove`	No automatic decay or pruning

Feature: Cognitive Memory Operations — LLM-Driven Encoding, Consolidation, Adaptive Recall & Extraction (inspired by CrewAI) #509

Description

Overview

Research Findings

CrewAI's Architecture: Five Cognitive Operations

1. Encode (remember())

2. Consolidate (triggered during encoding)

3. Recall (recall()) — Adaptive Depth

4. Extract (extract_memories())

5. Forget (forget())

Hierarchical Scopes

Key Design Decisions

Current State in Hermes Agent

What We Have

Relevant Existing Issues

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. Encode (`remember()`)

3. Recall (`recall()`) — Adaptive Depth

4. Extract (`extract_memories()`)

5. Forget (`forget()`)