Feature: Subconscious Observer Agent — Background Memory Processing, Cross-Session Pattern Detection & Proactive Guidance (inspired by Letta claude-subconscious)

## Overview

[Letta AI's claude-subconscious](https://github.com/letta-ai/claude-subconscious) (MIT, TypeScript, v1.5.1) introduces a fundamentally different approach to persistent memory: instead of the main agent managing its own memory (our current approach), a **separate "subconscious" agent** observes session transcripts asynchronously, extracts patterns, and injects guidance — without the conscious agent needing to think about memory at all.

This is inspired by MemGPT's stateful agent architecture and Letta's [sleep-time compute](https://www.letta.com/blog/letta-code) concept, where agents process information during downtime to form new connections. The key architectural insight is **separation of concerns**: the conscious agent focuses on the task; the subconscious agent handles learning, pattern detection, and proactive guidance.

Hermes already has the foundational infrastructure to implement this natively — `auxiliary_client` for cheap background LLM calls, `flush_memories` for pre-compression processing, `session_search` for cross-session recall, and the memory tool for persistent storage. What's missing is the **observer architecture** that ties these together into an autonomous memory processing pipeline.

---

## Research Findings

### How claude-subconscious Works

The plugin registers 4 hook points in Claude Code's lifecycle:

1. **SessionStart** — Notifies the Letta agent, syncs initial memory blocks
2. **UserPromptSubmit** — Injects memory blocks + agent messages via stdout; sends user's prompt to Letta as "early notification" so it can start processing while Claude works
3. **PreToolUse (checkpoint)** — At natural pause points (when Claude is about to ask the user a question), sends the full transcript to Letta and **blocks for 2-5 seconds** waiting for guidance. The guidance is injected as `additionalContext` before the tool executes, giving the subconscious advisory power at decision points
4. **Stop** — Fire-and-forget background worker sends the full transcript to Letta asynchronously (detached Node process that survives the hook's exit)

The Letta agent has **8 structured memory blocks** (each 20K char limit):
- `core_directives` — Role and behavior rules
- `guidance` — Active advice for the current session (the primary output channel)
- `user_preferences` — Learned coding styles
- `project_context` — Architecture decisions, key files
- `session_patterns` — Recurring struggles, time-based patterns
- `pending_items` — Unfinished work, TODOs
- `self_improvement` — Meta-learning guidelines
- `tool_guidelines` — How to use its own tools effectively

**Transcript processing** is efficient: thinking is truncated to 500 chars, tool results to 1500 chars, tool inputs are summarized intelligently (file operations → file path, bash → command, search → query). The formatted transcript is sent as XML:

```xml
<claude_code_session_update>
  <session_id>...</session_id>
  <transcript>
    <message role="user">...</message>
    <message role="claude_code">...</message>
  </transcript>
  <instructions>You may provide commentary or guidance...</instructions>
</claude_code_session_update>
```

### Key Design Decisions

1. **Diff-based memory injection**: First prompt gets full memory blocks. Subsequent prompts only show line-level diffs of changed blocks, minimizing context waste.

2. **Conversation multiplexing**: One Letta agent serves ALL projects. Each Claude Code session gets its own Letta "conversation" (thread), but memory blocks are shared globally. Learning from project-A automatically benefits project-B.

3. **Three operating modes**: `whisper` (messages only), `full` (blocks + messages), `off`. Default is whisper — minimal intrusion.

4. **The subconscious has personality**: System prompt establishes it as a "persistent presence that builds rapport" — not a logging service. It can "share partial thoughts, have opinions, express curiosity."

5. **Checkpoint intervention**: The blocking checkpoint at `AskUserQuestion` hooks is the most interesting pattern — the subconscious can modify Claude's behavior at decision points by injecting advisory context BEFORE Claude asks its question.

---

## Current State in Hermes Agent

**What we have:**
- **MEMORY.md + USER.md**: Flat-file, free-form entries separated by `§`, character-limited (~3.5K chars total). The main agent manually decides what to save via the `memory` tool. ([tools/memory_tool.py](https://github.com/NousResearch/hermes-agent/blob/main/tools/memory_tool.py))
- **flush_memories**: Pre-compression mechanism that injects a system message asking the agent to save anything worth remembering, then makes ONE LLM call with only the memory tool available. ([run_agent.py](https://github.com/NousResearch/hermes-agent/blob/main/run_agent.py))
- **Memory nudges**: Every N turns, appends a system reminder to consider saving memories.
- **Frozen snapshot pattern**: Memory loaded once at session start, writes go to disk but don't update the system prompt mid-session (preserves KV-cache prefix caching).
- **Session search**: FTS5-indexed SQLite database of all past conversations, searchable via `session_search` tool with LLM-summarized results.
- **Skills**: Procedural memory as SKILL.md files with templates and scripts.
- **auxiliary_client**: Cheaper model (e.g., Gemini Flash) already available for background LLM tasks.
- **Honcho integration**: Optional external cross-session user modeling.

**What's missing (the gap):**
- No **automatic** memory processing — the agent must consciously decide to use the memory tool
- No **pattern detection** across sessions — the agent must manually search past sessions
- No **structured memory blocks** — everything is free-form text
- No **proactive guidance** — no mechanism to prepare insights for the next session
- No **background processing** — flush_memories runs synchronously in the main loop
- No **diff-based injection** — full memory snapshot injected every time
- The current `flush_memories` is a single rushed API call at compression time — it's a "last chance save" not a thoughtful observer

**Related existing issues:**
- #346 — Structured Memory System (typed nodes, graph edges, hybrid search) — provides the **storage layer** this observer would write to
- #509 — Cognitive Memory Operations (encode, consolidate, recall, extract, forget) — provides the **operations** the observer would perform
- #500 — Proactive Agent Context Loop (signal detection, context injection) — provides **real-time perception** signals
- #362 — PAHF Personalization Loop (preference learning from feedback) — provides **preference extraction** logic

**How this issue differs**: All existing issues improve memory *primitives* but assume the **same agent** manages its own memory within the conversation loop. This issue proposes a **separate observer process** that runs asynchronously, processing transcripts with a dedicated LLM call and updating memory independently. It's an architectural pattern that layers on top of whatever memory storage and operations we build.

---

## Implementation Plan

### Skill vs. Tool Classification

This should be a **core codebase change**, not a skill or tool. Reasons:
- It requires deep integration with the agent lifecycle (session start, compression, session end)
- It needs access to the full conversation transcript and session database
- It runs automatically in the background without explicit invocation
- It modifies the system prompt injection (structured blocks, diffs)
- It manages the auxiliary_client for background LLM calls

The observer would be a new module (e.g., `agent/subconscious.py`) integrated into `run_agent.py`'s lifecycle hooks.

### What We'd Need

- New module: `agent/subconscious.py` — Observer agent logic
- Extended memory storage: Structured blocks instead of/alongside free-form entries
- Modified prompt injection: Block-based memory with optional diff mode
- New config section: `subconscious` with enable/disable, block definitions, processing triggers
- Integration points in `run_agent.py`: post-session, pre-compression, session-start

### Phased Rollout

**Phase 1: Post-Session Observer (MVP)**
- After session end or context compression, spawn an auxiliary LLM call (Gemini Flash / cheap model)
- Feed it the full conversation transcript (truncated/summarized like claude-subconscious does — thinking to 500 chars, tool results to 1500 chars, tool inputs summarized)
- Prompt it to extract and categorize information into structured blocks:
  - `user_preferences` — coding style, tool preferences, communication patterns
  - `project_context` — architecture decisions, key files, tech stack
  - `pending_items` — unfinished work, TODOs, follow-ups
  - `session_patterns` — recurring topics, common errors, time patterns
  - `guidance` — proactive suggestions for the next session
- Write extracted information to structured memory storage (could start as extended MEMORY.md sections or new files in `~/.hermes/memories/`)
- Replace the current one-shot `flush_memories` with this richer processing pipeline
- **Config**: `subconscious.enabled: true`, `subconscious.model: auto` (uses auxiliary_client), `subconscious.blocks: [user_preferences, project_context, pending_items, session_patterns, guidance]`

**Phase 2: Cross-Session Pattern Detection + Guidance Injection**
- At session start, the observer reviews the last N sessions (via session DB) and produces a `guidance` block
- Pattern detection: "You've been debugging auth for 3 sessions", "User always runs tests before committing", "This project uses pnpm not npm"
- Guidance injected into system prompt as a structured block (separate from MEMORY/USER sections)
- Diff-based injection: Track what the agent has already seen, only inject changes
- `pending_items` block highlights unfinished work from the previous session
- **Config**: `subconscious.session_start_review: true`, `subconscious.review_depth: 3` (sessions)

**Phase 3: Real-time Advisory + Sleep-time Compute**
- During long sessions, periodically process the current transcript (every N turns or at compression)
- Inject mid-session guidance: "Consider saving this helper function as a skill", "The user seems frustrated — the error is probably in X"
- **Sleep-time compute**: Schedule a cron job that processes recent sessions during downtime, consolidates memory blocks, resolves contradictions, and prepares briefings
- Optional: Checkpoint-style intervention before `clarify` tool calls (like claude-subconscious's AskUserQuestion hook)
- **Config**: `subconscious.realtime: false` (opt-in), `subconscious.sleep_compute: false` (opt-in)

---

## Pros & Cons

### Pros
- **Zero cognitive overhead**: The main agent focuses on tasks instead of memory management. No more "did I remember to save that?" — the observer handles it automatically
- **Richer memory**: A dedicated observer with the full transcript extracts more information than the current rushed flush_memories single-call approach
- **Cross-session continuity**: Pattern detection and guidance blocks help the agent pick up where it left off, which is especially valuable for gateway/messaging sessions that reset frequently
- **Cheap to run**: Observer uses auxiliary_client (Gemini Flash or similar) — pennies per session. The transcript summarization keeps input small
- **Builds on existing infrastructure**: No new external dependencies. Uses auxiliary_client, session DB, memory tool, and config system already in place
- **Complementary**: Layers cleanly on top of whatever memory storage (#346) and cognitive operations (#509) we build. The observer is the "when/how to process" layer; those issues define "what to store/retrieve"
- **Proven pattern**: Letta's implementation validates the architecture. Their blog reports meaningful improvements in cross-session task continuation
- **Cache-friendly**: Structured blocks injected at the start of the system prompt (stable position) are more cache-friendly than free-form entries that change unpredictably

### Cons / Risks
- **Extra API cost**: Each session end triggers a background LLM call. For heavy users (50+ sessions/day on gateway), this adds up. Mitigation: configurable, use cheapest available model, skip short sessions
- **Latency at session start**: Phase 2's session-start review adds an LLM call before the first response. Mitigation: pre-compute during sleep-time, cache results, make it async
- **Memory staleness**: If the observer runs only at session end, its knowledge is always one session behind. Phase 3 addresses this with real-time processing but adds complexity
- **Contradictory guidance**: The observer might provide guidance that contradicts the user's current intent (e.g., "you usually use npm" when the user has switched to pnpm). Mitigation: recency-weighted extraction, user can override via memory tool
- **Complexity**: Another moving part in the agent loop. Must be well-tested and fail gracefully (observer failure should never block the main agent)
- **Block size management**: Structured blocks need size limits and consolidation logic to prevent unbounded growth (claude-subconscious uses 20K per block — we'd want something smaller for context efficiency)

---

## Open Questions

- **Block granularity**: Should we start with the full 8-block structure from claude-subconscious, or begin with fewer blocks (e.g., just `guidance`, `user_preferences`, `pending_items`) and expand based on real usage?
- **Storage format**: Extend MEMORY.md with structured sections? New files per block in `~/.hermes/memories/`? Or wait for #346's structured storage?
- **Interaction with existing memory**: Should the observer replace MEMORY.md/USER.md entirely, or coexist alongside them? The manual memory tool is useful for explicit "remember this" — perhaps keep USER.md manual and make other blocks observer-managed?
- **Gateway vs CLI**: Gateway sessions are typically shorter and more frequent. Should the observer behave differently (e.g., batch-process recent sessions instead of processing each one)?
- **Multi-user**: For gateway deployments with multiple users, should each user get their own observer state? (Probably yes — this aligns with existing per-user session isolation)
- **Relationship to Honcho**: If Honcho integration is enabled, should the observer write to Honcho instead of/in addition to local blocks? Honcho already does some cross-session user modeling

---

## References

- [letta-ai/claude-subconscious](https://github.com/letta-ai/claude-subconscious) — MIT, TypeScript, v1.5.1. The plugin that inspired this issue
- [Letta V1 Agent Architecture](https://www.letta.com/blog/letta-v1-agent) — Blog post on the evolution from MemGPT to V1, including sleep-time compute
- [Letta Code](https://www.letta.com/blog/letta-code) — Memory-first coding agent with `/init`, `/remember`, and skill learning
- [MemGPT Paper](https://arxiv.org/abs/2310.08560) — Original research on LLM agents with persistent memory and self-editing context
- Related issues: #346 (Structured Memory), #509 (Cognitive Memory Ops), #500 (Proactive Context Loop), #362 (PAHF Personalization)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Subconscious Observer Agent — Background Memory Processing, Cross-Session Pattern Detection & Proactive Guidance (inspired by Letta claude-subconscious) #553

Overview

Research Findings

How claude-subconscious Works

Key Design Decisions

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: Subconscious Observer Agent — Background Memory Processing, Cross-Session Pattern Detection & Proactive Guidance (inspired by Letta claude-subconscious) #553

Description

Overview

Research Findings

How claude-subconscious Works

Key Design Decisions

Current State in Hermes Agent

Implementation Plan

Skill vs. Tool Classification

What We'd Need

Phased Rollout

Pros & Cons

Pros

Cons / Risks

Open Questions

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions