Skip to content

Fix FTS5 corruption detection: use MATCH query instead of rowid LIMIT 1#30514

Open
bgriffin83 wants to merge 1 commit into
NousResearch:mainfrom
bgriffin83:fix/fts5-corruption-verify-match-query
Open

Fix FTS5 corruption detection: use MATCH query instead of rowid LIMIT 1#30514
bgriffin83 wants to merge 1 commit into
NousResearch:mainfrom
bgriffin83:fix/fts5-corruption-verify-match-query

Conversation

@bgriffin83

Copy link
Copy Markdown

Summary

Fixes silent FTS5 corruption where session_search returns zero results even though the messages table is intact.

Root Cause

The FTS5 corruption detection in _ensure_fts_table() used:

cursor.execute(f"SELECT rowid FROM {table_name} LIMIT 1")

SELECT rowid only reads from the FTS5 content table (a regular b-tree), not the inverted-index b-trees. FTS5 corruption from stale WAL after an unclean shutdown lives exclusively in the index b-trees. So the check passed every time even when the index was trashed, and session_search silently returned nothing.

Fix

Changed the verify query to:

cursor.execute(f"SELECT rowid FROM {table_name} WHERE {table_name} MATCH 'the' LIMIT 1")

MATCH 'the' forces SQLite to traverse the FTS5 inverted-index b-tree. A corrupted index throws OperationalError immediately, which triggers the existing drop-recreate-backfill path — auto-healing on next restart.

Additional Improvements

  • Refactored the duplicate FTS5 setup code for messages_fts and messages_fts_trigram into a shared _ensure_fts_table() helper
  • Added backfill verification after schema init: compares row counts between the FTS table and messages table, and backfills any missing rows — catches partial index rebuilds that the simple existence check misses

Testing

  • 228/228 existing state DB tests pass (test_hermes_state.py + test_hermes_state_wal_fallback.py)
  • Verified against a corrupted state.db (integrity-check errors on Trees 5 & 20) — the new query correctly detects corruption, drops, recreates, and backfills

The _ensure_fts_table() verification used SELECT rowid FROM {table} LIMIT 1
which only reads from the FTS5 content table (a regular b-tree), not the
inverted-index b-trees. FTS5 corruption from stale WAL after unclean shutdown
lives in the index b-trees, so the check passed even when the index was trashed.

This caused silent search failures — session_search returned nothing while the
messages table remained intact.

Fix: use WHERE {table} MATCH 'the' LIMIT 1, which forces SQLite to traverse
the FTS5 inverted-index b-tree. A corrupted index throws OperationalError
immediately, triggering the drop-recreate-backfill path.

Also refactored the duplicate FTS5 setup code for messages_fts and
messages_fts_trigram into a shared _ensure_fts_table() helper, and added
a backfill verification step after schema init to catch partial index
rebuilds.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 22, 2026
@bgriffin83

Copy link
Copy Markdown
Author

My agent became "dumb" wrt memory from yesterday.. turns out some corruption on the sqlite DB was the cause. Not a pro, but hoping this PR will help others

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants