Skip to content

session_search: FTS5 returns empty results for Chinese/CJK queries #11511

@iamagenius00

Description

@iamagenius00

Problem

session_search uses SQLite FTS5 for full-text search. FTS5 default tokenizer splits Chinese text character-by-character (since there are no spaces between words). This causes multi-character Chinese queries to fail.

Example: searching "记忆断裂" becomes 记 AND 忆 AND 断 AND 裂 — requiring all 4 individual characters to match in the same message. Despite the data existing (LIKE finds 20+ matches), FTS5 returns 0 results.

This affects all CJK (Chinese, Japanese, Korean) users.

Reproduction

from hermes_state import SessionDB
db = SessionDB()

# FTS5 search — returns 0
results = db.search_messages(query="记忆断裂", limit=5)
print(len(results))  # 0

# But data exists
import sqlite3
conn = sqlite3.connect("~/.hermes/state.db")
conn.execute("SELECT count(*) FROM messages WHERE content LIKE '%记忆断裂%'")
# Returns 20+

Environment

  • Hermes Agent v0.10.0
  • macOS, Python 3.11
  • SQLite FTS5 with default tokenizer

Suggested fix

Add a LIKE fallback in SessionDB.search_messages(): when FTS5 returns no results and the query contains CJK characters, retry with WHERE content LIKE ?. This preserves FTS5 performance for English while ensuring CJK queries work.

We have a working implementation and can submit a PR.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions