Skip to content

fix(state): survive broken trigram fts index#30082

Draft
0-CYBERDYNE-SYSTEMS-0 wants to merge 1 commit into
NousResearch:mainfrom
0-CYBERDYNE-SYSTEMS-0:codex/state-db-trigram-repair
Draft

fix(state): survive broken trigram fts index#30082
0-CYBERDYNE-SYSTEMS-0 wants to merge 1 commit into
NousResearch:mainfrom
0-CYBERDYNE-SYSTEMS-0:codex/state-db-trigram-repair

Conversation

@0-CYBERDYNE-SYSTEMS-0

Copy link
Copy Markdown

What does this PR do?

Fixes a state.db failure mode where a broken optional messages_fts_trigram FTS5 virtual table can prevent SessionDB from initializing, even though the canonical sessions and messages tables are intact.

This matters because the gateway now relies on state.db as canonical transcript storage. If SessionDB init fails, the agent can appear to have no conversation history despite ordinary transcript rows still being present.

The approach is intentionally narrow:

  • probe the optional trigram FTS table with sqlite3.DatabaseError, not only OperationalError;
  • try to rebuild the optional trigram FTS table and backfill it from messages;
  • if sqlite blocks schema repair for a broken vtable, disable only trigram search for that connection instead of failing the whole DB open;
  • make append_message() create a missing session row defensively so gateway flush races do not drop messages.

This is related to #27770, but not a duplicate: #27770 makes trigram FTS optional for size/configuration. This PR handles the corrupt-vtable constructor failure path that prevents state.db from opening.

Related Issue

Fixes #
Refs #27770

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • hermes_state.py
    • Added trigram FTS backfill/rebuild helpers.
    • Treats broken optional trigram FTS as recoverable during schema init.
    • Falls back by disabling trigram search on the connection if schema repair is blocked.
    • Routes CJK trigram search through LIKE fallback when trigram search is unavailable.
    • Makes append_message() insert a missing sessions row with source unknown before inserting the message.
  • tests/test_hermes_state.py
    • Added regression coverage for missing-session append self-healing.
    • Added regression coverage for trigram FTS rebuild preserving messages.
    • Added regression coverage for blocked schema repair degrading instead of failing DB open.

How to Test

  1. Run syntax checks:
    python -m py_compile hermes_state.py tests/test_hermes_state.py
  2. Run focused state DB tests:
    python -m pytest tests/test_hermes_state.py -q
  3. Optional real-world regression check: open a copy of a state.db where SELECT * FROM messages_fts_trigram LIMIT 0 raises vtable constructor failed: messages_fts_trigram; SessionDB should still open and ordinary writes should still work.

Local verification:

  • python -m py_compile hermes_state.py tests/test_hermes_state.py passed.
  • python -m pytest tests/test_hermes_state.py -q passed: 218 passed in 1.45s.
  • A copy of an actual broken DB opened successfully after this patch: integrity ok, sessions 446, messages 27893, and a new message write succeeded with trigram search disabled for the connection.
  • scripts/run_tests.sh was attempted. It completed with unrelated environment/live-system failures in this macOS sandbox: 24354 passed, 333 failed; failures were in socket binding, live-system guard, systemd/platform, OAuth/env tests, and unrelated web provider expectations.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(state): survive broken trigram fts index)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS, Python 3.11.9

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — SQLite-only recovery path, no platform-specific calls
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

$ python -m pytest tests/test_hermes_state.py -q
218 passed in 1.45s

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants