Context & Acknowledgment
First, I want to say: Hermes is an extraordinary piece of work. The skill system, persistent memory, session search, delegate_task subagents, the gateway architecture — it's the most capable CLI AI agent I've used. I run it daily for production software development (orchestrating a 3-actor email processing pipeline with DBOS, PostgreSQL, S3, Gmail API), and it consistently delivers. The team at Nous Research has built something genuinely special.
That said, after 3 weeks of heavy daily use (8+ hours/day on Claude Opus), I've hit a cluster of interrelated issues around memory persistence and context management that together cause severe token waste and, in one case, actual hallucination about the execution environment. I'm reporting them as one issue because they compound each other.
Problem Summary
During a ~12-hour intensive session (Apr 5, 2026), I lost approximately 2.6M tokens (~69% of total consumption) to context replay overhead, and at one point Hermes hallucinated that it was running in a cloud container with "outdated information" — after hours of productive work on my local WSL2 machine.
Issue 1: Session Fragmentation Causes Exponential Token Replay
What happens
Long CLI conversations get silently fragmented into multiple sessions. Each new session replays the entire conversation history as input tokens.
Observed data (single day, Apr 5 2026)
Conversation A ("stale checker refactoring"):
- 15 sessions with the same first user message
- User message count grows: 10 → 11 → 15 → 20 → ... → 54 → 57
- File sizes: 170KB → 259KB → 434KB → 576KB → 728KB (each is the FULL history)
- ~1.9M tokens consumed, only ~190K were necessary (89% waste)
Conversation B ("checker moving emails"):
- 9 sessions, same pattern
- ~1.1M tokens consumed, only ~165K necessary (84% waste)
Why it happens
When a session ends (auto-review trigger, max iterations reached, error, etc.), the next session replays the full message history to maintain context. The user doesn't see this — the conversation appears continuous in the terminal.
Impact
With Claude Opus pricing, this turned a productive 12-hour workday into burning through an entire monthly API budget in one sitting. The user has no visibility into when session boundaries occur or how much replay is happening.
Suggested fixes
Issue 2: state.db Corruption Kills session_search
What happens
The SQLite state.db database becomes corrupted during normal use, making session_search completely non-functional. The PRAGMA integrity_check reports malformed B-tree pages.
Observed data
state.db (24MB) — integrity_check FAILED
"Tree 12 page 5541: btreeInitPage() returns error code 11"
Multiple corrupted pages in the messages table and FTS index
Recovery
Manual recovery via .dump + filtering corrupted rows + FTS rebuild recovered 110/128 sessions and 6,645 messages. 18 sessions were permanently lost from the DB (though JSON session files on disk were intact).
Why it matters
session_search is the ONLY way for Hermes to recall cross-session context. When it breaks, the agent loses all long-term recall, forcing the user to manually re-explain project context every session. For complex multi-day projects, this is devastating.
Likely cause
WAL mode + concurrent writes from CLI + gateway + subagent processes accessing the same DB file. The symlink setup (state.db → hermes-sync/state.db) may contribute.
Suggested fixes
- Add periodic
PRAGMA integrity_check and auto-repair (the JSON session files can serve as source of truth)
- Use WAL2 mode or ensure proper locking across all processes accessing the DB
- Add a
hermes db repair CLI command
- Consider making JSON session files the primary store with SQLite as a search index that can be rebuilt
Issue 3: MEMORY.md Size Limit (2,200 chars) is Critically Small
What happens
The persistent memory store (MEMORY.md) has a ~2,200 character limit. For a complex multi-service project (3 actors, PostgreSQL, S3, Gmail API, stale checker with 5 checks, multiple status enums, credential resolution patterns), this forces extreme compression that loses critical context.
Real example
My MEMORY.md at 90% capacity contains compressed fragments like:
PG+S3, NO KAFKA/REDIS. PALS=orchestrator locks, DBOS partition_queue(concurrency=1).
workflows/: reader_response.py+doctype_response.py+common.py.
stale_checker.py: pg_try_advisory_lock(900100001), 5 checks(...)
This is the ENTIRE project architecture compressed into telegram-style abbreviations. Critical details that should be in memory (like "classification_status uses 'completed' NOT 'classified'") barely fit alongside everything else.
Workaround
Skills serve as extended memory (~20KB+ each), but they're loaded on-demand and require trigger matching. They don't replace the "always present" nature of memory.
Suggested fixes
Issue 4: Environment Hallucination in Long Sessions
What happens
After hours of continuous work on a local WSL2 machine (terminal.backend: local), Hermes told the user they were running in a "cloud container with outdated information" — which was completely false.
Root cause analysis
The terminal tool description contains phrases like:
"cloud sandboxes may be cleaned up, idled out, or recreated between turns"
And execute_code runs in /tmp/hermes_sandbox_* paths. After 700K+ tokens of context, the model appears to confuse tool description warnings with its actual execution environment. Additionally, when subagents modify files but the main conversation has stale read_file results from earlier turns, the model may interpret the discrepancy as "being in a different environment" rather than "my cached context is stale."
Impact
The user spent hours working productively, only to be told (incorrectly) that none of the work was reliable because "we're in a cloud environment with outdated files." This destroyed confidence in the session's output.
Suggested fixes
- Inject a clear, authoritative
[ENVIRONMENT: local] marker in each turn (not just in the tool descriptions)
- When
terminal.backend: local, strip/modify the cloud sandbox warnings from tool descriptions
- Add a "context freshness" indicator — flag when file reads are older than N minutes in the conversation
Environment
- Hermes Agent v0.6.0+
- Model: Claude Opus 4 (anthropic/claude-opus-4.6) via Anthropic API
- OS: Ubuntu 24.04 (WSL2 on Windows 11)
- Terminal backend: local
- Usage pattern: 8+ hours/day, heavy delegate_task usage, complex multi-file codebase work
Context & Acknowledgment
First, I want to say: Hermes is an extraordinary piece of work. The skill system, persistent memory, session search, delegate_task subagents, the gateway architecture — it's the most capable CLI AI agent I've used. I run it daily for production software development (orchestrating a 3-actor email processing pipeline with DBOS, PostgreSQL, S3, Gmail API), and it consistently delivers. The team at Nous Research has built something genuinely special.
That said, after 3 weeks of heavy daily use (8+ hours/day on Claude Opus), I've hit a cluster of interrelated issues around memory persistence and context management that together cause severe token waste and, in one case, actual hallucination about the execution environment. I'm reporting them as one issue because they compound each other.
Problem Summary
During a ~12-hour intensive session (Apr 5, 2026), I lost approximately 2.6M tokens (~69% of total consumption) to context replay overhead, and at one point Hermes hallucinated that it was running in a cloud container with "outdated information" — after hours of productive work on my local WSL2 machine.
Issue 1: Session Fragmentation Causes Exponential Token Replay
What happens
Long CLI conversations get silently fragmented into multiple sessions. Each new session replays the entire conversation history as input tokens.
Observed data (single day, Apr 5 2026)
Conversation A ("stale checker refactoring"):
Conversation B ("checker moving emails"):
Why it happens
When a session ends (auto-review trigger, max iterations reached, error, etc.), the next session replays the full message history to maintain context. The user doesn't see this — the conversation appears continuous in the terminal.
Impact
With Claude Opus pricing, this turned a productive 12-hour workday into burning through an entire monthly API budget in one sitting. The user has no visibility into when session boundaries occur or how much replay is happening.
Suggested fixes
Issue 2: state.db Corruption Kills session_search
What happens
The SQLite
state.dbdatabase becomes corrupted during normal use, makingsession_searchcompletely non-functional. ThePRAGMA integrity_checkreports malformed B-tree pages.Observed data
Recovery
Manual recovery via
.dump+ filtering corrupted rows + FTS rebuild recovered 110/128 sessions and 6,645 messages. 18 sessions were permanently lost from the DB (though JSON session files on disk were intact).Why it matters
session_searchis the ONLY way for Hermes to recall cross-session context. When it breaks, the agent loses all long-term recall, forcing the user to manually re-explain project context every session. For complex multi-day projects, this is devastating.Likely cause
WAL mode + concurrent writes from CLI + gateway + subagent processes accessing the same DB file. The symlink setup (state.db → hermes-sync/state.db) may contribute.
Suggested fixes
PRAGMA integrity_checkand auto-repair (the JSON session files can serve as source of truth)hermes db repairCLI commandIssue 3: MEMORY.md Size Limit (2,200 chars) is Critically Small
What happens
The persistent memory store (
MEMORY.md) has a ~2,200 character limit. For a complex multi-service project (3 actors, PostgreSQL, S3, Gmail API, stale checker with 5 checks, multiple status enums, credential resolution patterns), this forces extreme compression that loses critical context.Real example
My MEMORY.md at 90% capacity contains compressed fragments like:
This is the ENTIRE project architecture compressed into telegram-style abbreviations. Critical details that should be in memory (like "classification_status uses 'completed' NOT 'classified'") barely fit alongside everything else.
Workaround
Skills serve as extended memory (~20KB+ each), but they're loaded on-demand and require trigger matching. They don't replace the "always present" nature of memory.
Suggested fixes
Issue 4: Environment Hallucination in Long Sessions
What happens
After hours of continuous work on a local WSL2 machine (
terminal.backend: local), Hermes told the user they were running in a "cloud container with outdated information" — which was completely false.Root cause analysis
The terminal tool description contains phrases like:
And
execute_coderuns in/tmp/hermes_sandbox_*paths. After 700K+ tokens of context, the model appears to confuse tool description warnings with its actual execution environment. Additionally, when subagents modify files but the main conversation has staleread_fileresults from earlier turns, the model may interpret the discrepancy as "being in a different environment" rather than "my cached context is stale."Impact
The user spent hours working productively, only to be told (incorrectly) that none of the work was reliable because "we're in a cloud environment with outdated files." This destroyed confidence in the session's output.
Suggested fixes
[ENVIRONMENT: local]marker in each turn (not just in the tool descriptions)terminal.backend: local, strip/modify the cloud sandbox warnings from tool descriptionsEnvironment