Problem
The state.db file grows rapidly due to dual FTS5 indexes on the messages table. On a moderately used instance (42K messages, 2.4K sessions):
| Component |
Size |
% of DB |
| messages data (content, tool_calls, reasoning) |
99MB |
19.6% |
| sessions data (system_prompt) |
45MB |
8.9% |
| FTS indexes (fts + fts_trigram) |
358MB |
70.8% |
| Other (indexes, overhead) |
3MB |
0.7% |
| Total |
505MB |
100% |
The messages_fts_trigram table alone consumes 247MB (49% of the entire DB) — 2.6x the size of the primary FTS index.
Root Cause
-
Dual FTS indexes: Every message is indexed twice:
messages_fts (porter tokenizer) — 111MB
messages_fts_trigram (trigram tokenizer) — 247MB
-
Trigram tokenizer is expensive for CJK: Chinese text produces significantly more trigram tokens than English, inflating the index. The trigram index is 2.2x larger than the porter stemmer index despite indexing the same data.
-
system_prompt stored per-session: 2.4K sessions × ~17KB system_prompt = 38.6MB. Many sessions share nearly identical prompts (same model + similar config), but each stores a full copy.
Growth Rate
- Daily: ~150 new sessions + ~2000 messages → +20MB/day
- At this rate: 600MB/month, 7.3GB/year
Suggested Fixes
-
Make trigram FTS optional: The porter stemmer FTS handles most English queries well. The trigram index is only needed for CJK substring search (3+ chars). Consider:
- Adding a config option to disable trigram indexing
- Or only building it on-demand when CJK search is used
-
Normalize system_prompt storage: Store a deduplicated system_prompts table with a foreign key from sessions, eliminating redundant ~38MB.
-
Add VACUUM/PRAGMA: Consider PRAGMA auto_vacuum = INCREMENTAL or periodic VACUUM to reclaim space after session deletion.
-
Add a session retention/cleanup mechanism: Currently sessions grow indefinitely. A configurable TTL or max session count would help long-running instances.
Environment
- Hermes Agent v0.13.0 (2026.5.7)
- macOS, Python 3.11.15
- state.db: 505MB, 42155 messages, 2375 sessions
- Primary models: MiniMax-M2.7 (1915 sessions), glm-5-turbo (458 sessions)
Problem
The
state.dbfile grows rapidly due to dual FTS5 indexes on themessagestable. On a moderately used instance (42K messages, 2.4K sessions):The
messages_fts_trigramtable alone consumes 247MB (49% of the entire DB) — 2.6x the size of the primary FTS index.Root Cause
Dual FTS indexes: Every message is indexed twice:
messages_fts(porter tokenizer) — 111MBmessages_fts_trigram(trigram tokenizer) — 247MBTrigram tokenizer is expensive for CJK: Chinese text produces significantly more trigram tokens than English, inflating the index. The trigram index is 2.2x larger than the porter stemmer index despite indexing the same data.
system_prompt stored per-session: 2.4K sessions × ~17KB system_prompt = 38.6MB. Many sessions share nearly identical prompts (same model + similar config), but each stores a full copy.
Growth Rate
Suggested Fixes
Make trigram FTS optional: The porter stemmer FTS handles most English queries well. The trigram index is only needed for CJK substring search (3+ chars). Consider:
Normalize system_prompt storage: Store a deduplicated
system_promptstable with a foreign key from sessions, eliminating redundant ~38MB.Add VACUUM/PRAGMA: Consider
PRAGMA auto_vacuum = INCREMENTALor periodic VACUUM to reclaim space after session deletion.Add a session retention/cleanup mechanism: Currently sessions grow indefinitely. A configurable TTL or max session count would help long-running instances.
Environment