Problem
session_search uses SQLite FTS5 for full-text search. FTS5 default tokenizer splits Chinese text character-by-character (since there are no spaces between words). This causes multi-character Chinese queries to fail.
Example: searching "记忆断裂" becomes 记 AND 忆 AND 断 AND 裂 — requiring all 4 individual characters to match in the same message. Despite the data existing (LIKE finds 20+ matches), FTS5 returns 0 results.
This affects all CJK (Chinese, Japanese, Korean) users.
Reproduction
from hermes_state import SessionDB
db = SessionDB()
# FTS5 search — returns 0
results = db.search_messages(query="记忆断裂", limit=5)
print(len(results)) # 0
# But data exists
import sqlite3
conn = sqlite3.connect("~/.hermes/state.db")
conn.execute("SELECT count(*) FROM messages WHERE content LIKE '%记忆断裂%'")
# Returns 20+
Environment
- Hermes Agent v0.10.0
- macOS, Python 3.11
- SQLite FTS5 with default tokenizer
Suggested fix
Add a LIKE fallback in SessionDB.search_messages(): when FTS5 returns no results and the query contains CJK characters, retry with WHERE content LIKE ?. This preserves FTS5 performance for English while ensuring CJK queries work.
We have a working implementation and can submit a PR.
Related
Problem
session_searchuses SQLite FTS5 for full-text search. FTS5 default tokenizer splits Chinese text character-by-character (since there are no spaces between words). This causes multi-character Chinese queries to fail.Example: searching "记忆断裂" becomes
记 AND 忆 AND 断 AND 裂— requiring all 4 individual characters to match in the same message. Despite the data existing (LIKE finds 20+ matches), FTS5 returns 0 results.This affects all CJK (Chinese, Japanese, Korean) users.
Reproduction
Environment
Suggested fix
Add a LIKE fallback in
SessionDB.search_messages(): when FTS5 returns no results and the query contains CJK characters, retry withWHERE content LIKE ?. This preserves FTS5 performance for English while ensuring CJK queries work.We have a working implementation and can submit a PR.
Related