[Critical UX] Memory persistence, token waste from session replay, state.db corruption, and environment hallucination  field report from heavy production use

## Context & Acknowledgment

First, I want to say: **Hermes is an extraordinary piece of work.** The skill system, persistent memory, session search, delegate_task subagents, the gateway architecture — it's the most capable CLI AI agent I've used. I run it daily for production software development (orchestrating a 3-actor email processing pipeline with DBOS, PostgreSQL, S3, Gmail API), and it consistently delivers. The team at Nous Research has built something genuinely special.

That said, after 3 weeks of heavy daily use (8+ hours/day on Claude Opus), I've hit a cluster of interrelated issues around **memory persistence and context management** that together cause severe token waste and, in one case, actual hallucination about the execution environment. I'm reporting them as one issue because they compound each other.

## Problem Summary

During a ~12-hour intensive session (Apr 5, 2026), I lost approximately **2.6M tokens (~69% of total consumption)** to context replay overhead, and at one point Hermes hallucinated that it was running in a cloud container with "outdated information" — after hours of productive work on my local WSL2 machine.

## Issue 1: Session Fragmentation Causes Exponential Token Replay

### What happens
Long CLI conversations get silently fragmented into multiple sessions. Each new session replays the **entire conversation history** as input tokens.

### Observed data (single day, Apr 5 2026)

**Conversation A** ("stale checker refactoring"):
- 15 sessions with the same first user message
- User message count grows: 10 → 11 → 15 → 20 → ... → 54 → 57
- File sizes: 170KB → 259KB → 434KB → 576KB → 728KB (each is the FULL history)
- **~1.9M tokens consumed, only ~190K were necessary (89% waste)**

**Conversation B** ("checker moving emails"):
- 9 sessions, same pattern
- **~1.1M tokens consumed, only ~165K necessary (84% waste)**

### Why it happens
When a session ends (auto-review trigger, max iterations reached, error, etc.), the next session replays the full message history to maintain context. The user doesn't see this — the conversation appears continuous in the terminal.

### Impact
With Claude Opus pricing, this turned a productive 12-hour workday into burning through an entire monthly API budget in one sitting. The user has no visibility into when session boundaries occur or how much replay is happening.

### Suggested fixes
- Show a visible indicator when a session boundary occurs ("Session resumed — N tokens of context replayed")
- Implement incremental context (compressed summary of prior turns instead of full replay)
- Warn when context replay exceeds a configurable threshold (e.g., 100K tokens)
- Related existing issue: #4379 (token overhead analysis), #2667 (context buffer)

## Issue 2: state.db Corruption Kills session_search

### What happens
The SQLite `state.db` database becomes corrupted during normal use, making `session_search` completely non-functional. The `PRAGMA integrity_check` reports malformed B-tree pages.

### Observed data
```
state.db (24MB) — integrity_check FAILED
  "Tree 12 page 5541: btreeInitPage() returns error code 11"
  Multiple corrupted pages in the messages table and FTS index
```

### Recovery
Manual recovery via `.dump` + filtering corrupted rows + FTS rebuild recovered 110/128 sessions and 6,645 messages. 18 sessions were permanently lost from the DB (though JSON session files on disk were intact).

### Why it matters
`session_search` is the ONLY way for Hermes to recall cross-session context. When it breaks, the agent loses all long-term recall, forcing the user to manually re-explain project context every session. For complex multi-day projects, this is devastating.

### Likely cause
WAL mode + concurrent writes from CLI + gateway + subagent processes accessing the same DB file. The symlink setup (state.db → hermes-sync/state.db) may contribute.

### Suggested fixes
- Add periodic `PRAGMA integrity_check` and auto-repair (the JSON session files can serve as source of truth)
- Use WAL2 mode or ensure proper locking across all processes accessing the DB
- Add a `hermes db repair` CLI command
- Consider making JSON session files the primary store with SQLite as a search index that can be rebuilt

## Issue 3: MEMORY.md Size Limit (2,200 chars) is Critically Small

### What happens
The persistent memory store (`MEMORY.md`) has a ~2,200 character limit. For a complex multi-service project (3 actors, PostgreSQL, S3, Gmail API, stale checker with 5 checks, multiple status enums, credential resolution patterns), this forces extreme compression that loses critical context.

### Real example
My MEMORY.md at 90% capacity contains compressed fragments like:
```
PG+S3, NO KAFKA/REDIS. PALS=orchestrator locks, DBOS partition_queue(concurrency=1). 
workflows/: reader_response.py+doctype_response.py+common.py. 
stale_checker.py: pg_try_advisory_lock(900100001), 5 checks(...)
```

This is the ENTIRE project architecture compressed into telegram-style abbreviations. Critical details that should be in memory (like "classification_status uses 'completed' NOT 'classified'") barely fit alongside everything else.

### Workaround
Skills serve as extended memory (~20KB+ each), but they're loaded on-demand and require trigger matching. They don't replace the "always present" nature of memory.

### Suggested fixes
- Increase default limit significantly (8KB+ for memory, 4KB+ for user profile)
- Related existing issue: #5320 (raise memory_char_limit defaults)
- Consider tiered memory: hot (always injected, small) + warm (loaded on keyword match, larger)

## Issue 4: Environment Hallucination in Long Sessions

### What happens
After hours of continuous work on a local WSL2 machine (`terminal.backend: local`), Hermes told the user they were running in a "cloud container with outdated information" — which was completely false.

### Root cause analysis
The terminal tool description contains phrases like:
> "cloud sandboxes may be cleaned up, idled out, or recreated between turns"

And `execute_code` runs in `/tmp/hermes_sandbox_*` paths. After 700K+ tokens of context, the model appears to confuse **tool description warnings** with its **actual execution environment**. Additionally, when subagents modify files but the main conversation has stale `read_file` results from earlier turns, the model may interpret the discrepancy as "being in a different environment" rather than "my cached context is stale."

### Impact
The user spent hours working productively, only to be told (incorrectly) that none of the work was reliable because "we're in a cloud environment with outdated files." This destroyed confidence in the session's output.

### Suggested fixes
- Inject a clear, authoritative `[ENVIRONMENT: local]` marker in each turn (not just in the tool descriptions)
- When `terminal.backend: local`, strip/modify the cloud sandbox warnings from tool descriptions
- Add a "context freshness" indicator — flag when file reads are older than N minutes in the conversation

## Environment

- Hermes Agent v0.6.0+
- Model: Claude Opus 4 (anthropic/claude-opus-4.6) via Anthropic API
- OS: Ubuntu 24.04 (WSL2 on Windows 11)
- Terminal backend: local
- Usage pattern: 8+ hours/day, heavy delegate_task usage, complex multi-file codebase work


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Critical UX] Memory persistence, token waste from session replay, state.db corruption, and environment hallucination field report from heavy production use #5563

Context & Acknowledgment

Problem Summary

Issue 1: Session Fragmentation Causes Exponential Token Replay

What happens

Observed data (single day, Apr 5 2026)

Why it happens

Impact

Suggested fixes

Issue 2: state.db Corruption Kills session_search

What happens

Observed data

Recovery

Why it matters

Likely cause

Suggested fixes

Issue 3: MEMORY.md Size Limit (2,200 chars) is Critically Small

What happens

Real example

Workaround

Suggested fixes

Issue 4: Environment Hallucination in Long Sessions

What happens

Root cause analysis

Impact

Suggested fixes

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Critical UX] Memory persistence, token waste from session replay, state.db corruption, and environment hallucination field report from heavy production use #5563

Description

Context & Acknowledgment

Problem Summary

Issue 1: Session Fragmentation Causes Exponential Token Replay

What happens

Observed data (single day, Apr 5 2026)

Why it happens

Impact

Suggested fixes

Issue 2: state.db Corruption Kills session_search

What happens

Observed data

Recovery

Why it matters

Likely cause

Suggested fixes

Issue 3: MEMORY.md Size Limit (2,200 chars) is Critically Small

What happens

Real example

Workaround

Suggested fixes

Issue 4: Environment Hallucination in Long Sessions

What happens

Root cause analysis

Impact

Suggested fixes

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions