fix(doctor): detect and repair state.db FTS indexes#32589
Closed
plcunha wants to merge 1 commit into
Closed
Conversation
This was referenced May 28, 2026
OutThisLife
added a commit
that referenced
this pull request
Jun 9, 2026
…s reappear (#43149) * fix(state.db): recover from malformed sqlite_master so hidden sessions reappear The corruption class behind "Desktop/Dashboard show no sessions while hundreds of session files sit on disk" is a malformed sqlite_master — most often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts entries — surfacing as: sqlite3.DatabaseError: malformed database schema (messages_fts) - table messages_fts already exists SQLite parses the whole schema while preparing the FIRST statement on a connection, so on this class every statement fails before it runs: PRAGMA journal_mode (which is where SessionDB.__init__ actually trips, in apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and even DROP TABLE. The only operations that still work are PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain FTS-index rebuild at the _init_schema layer therefore cannot reach or fix this; the canonical sessions/messages rows are intact — only the derived schema is broken. Add a dedicated recovery that operates where the failure actually happens: - hermes_state.repair_state_db_schema(): backs up the raw file first, then a least-destructive ladder — (1) de-duplicate sqlite_master keeping the lowest rowid per object (preserves the existing FTS index), escalating to (2) drop every messages_fts* schema object + VACUUM and let the next open rebuild the FTS index from messages. sessions/messages are never modified. Plus is_malformed_db_error() to discriminate this class. - SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs once (process-guarded against loops / concurrent web_server opens) and reopens, so Desktop/Dashboard recover on their own instead of silently showing "no sessions". - hermes doctor --fix detects the malformed class and repairs it (reporting the recovered session count + backup name). - hermes sessions repair [--check-only] [--no-backup] runs on the raw file path, since SessionDB() itself cannot open a malformed DB. Supersedes #32589 and #33869: both targeted FTS corruption but gated their repair behind statements (integrity_check / SELECT / DROP TABLE) that themselves fail on this class, and neither addressed the apply_wal_with_fallback open-time failure. Credit preserved via Co-authored-by. Closes #33865. Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> * test(state.db): cover strat-B escalation + unrepairable safe-fail paths --------- Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com>
Contributor
|
Closing as superseded by PR #43149. Your doctor-side FTS repair work was directionally right, but the live failure class needs a lower-level recovery path because SQLite fails while parsing the schema, before |
itskaism
pushed a commit
to itskaism/hermes-agent
that referenced
this pull request
Jun 10, 2026
…s reappear (NousResearch#43149) * fix(state.db): recover from malformed sqlite_master so hidden sessions reappear The corruption class behind "Desktop/Dashboard show no sessions while hundreds of session files sit on disk" is a malformed sqlite_master — most often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts entries — surfacing as: sqlite3.DatabaseError: malformed database schema (messages_fts) - table messages_fts already exists SQLite parses the whole schema while preparing the FIRST statement on a connection, so on this class every statement fails before it runs: PRAGMA journal_mode (which is where SessionDB.__init__ actually trips, in apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and even DROP TABLE. The only operations that still work are PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain FTS-index rebuild at the _init_schema layer therefore cannot reach or fix this; the canonical sessions/messages rows are intact — only the derived schema is broken. Add a dedicated recovery that operates where the failure actually happens: - hermes_state.repair_state_db_schema(): backs up the raw file first, then a least-destructive ladder — (1) de-duplicate sqlite_master keeping the lowest rowid per object (preserves the existing FTS index), escalating to (2) drop every messages_fts* schema object + VACUUM and let the next open rebuild the FTS index from messages. sessions/messages are never modified. Plus is_malformed_db_error() to discriminate this class. - SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs once (process-guarded against loops / concurrent web_server opens) and reopens, so Desktop/Dashboard recover on their own instead of silently showing "no sessions". - hermes doctor --fix detects the malformed class and repairs it (reporting the recovered session count + backup name). - hermes sessions repair [--check-only] [--no-backup] runs on the raw file path, since SessionDB() itself cannot open a malformed DB. Supersedes NousResearch#32589 and NousResearch#33869: both targeted FTS corruption but gated their repair behind statements (integrity_check / SELECT / DROP TABLE) that themselves fail on this class, and neither addressed the apply_wal_with_fallback open-time failure. Credit preserved via Co-authored-by. Closes NousResearch#33865. Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> * test(state.db): cover strat-B escalation + unrepairable safe-fail paths --------- Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> (cherry picked from commit 218452b)
wachoo
pushed a commit
to wachoo/hermes-agent
that referenced
this pull request
Jun 10, 2026
…s reappear (NousResearch#43149) * fix(state.db): recover from malformed sqlite_master so hidden sessions reappear The corruption class behind "Desktop/Dashboard show no sessions while hundreds of session files sit on disk" is a malformed sqlite_master — most often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts entries — surfacing as: sqlite3.DatabaseError: malformed database schema (messages_fts) - table messages_fts already exists SQLite parses the whole schema while preparing the FIRST statement on a connection, so on this class every statement fails before it runs: PRAGMA journal_mode (which is where SessionDB.__init__ actually trips, in apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and even DROP TABLE. The only operations that still work are PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain FTS-index rebuild at the _init_schema layer therefore cannot reach or fix this; the canonical sessions/messages rows are intact — only the derived schema is broken. Add a dedicated recovery that operates where the failure actually happens: - hermes_state.repair_state_db_schema(): backs up the raw file first, then a least-destructive ladder — (1) de-duplicate sqlite_master keeping the lowest rowid per object (preserves the existing FTS index), escalating to (2) drop every messages_fts* schema object + VACUUM and let the next open rebuild the FTS index from messages. sessions/messages are never modified. Plus is_malformed_db_error() to discriminate this class. - SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs once (process-guarded against loops / concurrent web_server opens) and reopens, so Desktop/Dashboard recover on their own instead of silently showing "no sessions". - hermes doctor --fix detects the malformed class and repairs it (reporting the recovered session count + backup name). - hermes sessions repair [--check-only] [--no-backup] runs on the raw file path, since SessionDB() itself cannot open a malformed DB. Supersedes NousResearch#32589 and NousResearch#33869: both targeted FTS corruption but gated their repair behind statements (integrity_check / SELECT / DROP TABLE) that themselves fail on this class, and neither addressed the apply_wal_with_fallback open-time failure. Credit preserved via Co-authored-by. Closes NousResearch#33865. Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> * test(state.db): cover strat-B escalation + unrepairable safe-fail paths --------- Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com>
changman
pushed a commit
to changman/hermes-agent
that referenced
this pull request
Jun 10, 2026
…s reappear (NousResearch#43149) * fix(state.db): recover from malformed sqlite_master so hidden sessions reappear The corruption class behind "Desktop/Dashboard show no sessions while hundreds of session files sit on disk" is a malformed sqlite_master — most often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts entries — surfacing as: sqlite3.DatabaseError: malformed database schema (messages_fts) - table messages_fts already exists SQLite parses the whole schema while preparing the FIRST statement on a connection, so on this class every statement fails before it runs: PRAGMA journal_mode (which is where SessionDB.__init__ actually trips, in apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and even DROP TABLE. The only operations that still work are PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain FTS-index rebuild at the _init_schema layer therefore cannot reach or fix this; the canonical sessions/messages rows are intact — only the derived schema is broken. Add a dedicated recovery that operates where the failure actually happens: - hermes_state.repair_state_db_schema(): backs up the raw file first, then a least-destructive ladder — (1) de-duplicate sqlite_master keeping the lowest rowid per object (preserves the existing FTS index), escalating to (2) drop every messages_fts* schema object + VACUUM and let the next open rebuild the FTS index from messages. sessions/messages are never modified. Plus is_malformed_db_error() to discriminate this class. - SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs once (process-guarded against loops / concurrent web_server opens) and reopens, so Desktop/Dashboard recover on their own instead of silently showing "no sessions". - hermes doctor --fix detects the malformed class and repairs it (reporting the recovered session count + backup name). - hermes sessions repair [--check-only] [--no-backup] runs on the raw file path, since SessionDB() itself cannot open a malformed DB. Supersedes NousResearch#32589 and NousResearch#33869: both targeted FTS corruption but gated their repair behind statements (integrity_check / SELECT / DROP TABLE) that themselves fail on this class, and neither addressed the apply_wal_with_fallback open-time failure. Credit preserved via Co-authored-by. Closes NousResearch#33865. Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> * test(state.db): cover strat-B escalation + unrepairable safe-fail paths --------- Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com>
alt-glitch
pushed a commit
that referenced
this pull request
Jun 14, 2026
…s reappear (#43149) * fix(state.db): recover from malformed sqlite_master so hidden sessions reappear The corruption class behind "Desktop/Dashboard show no sessions while hundreds of session files sit on disk" is a malformed sqlite_master — most often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts entries — surfacing as: sqlite3.DatabaseError: malformed database schema (messages_fts) - table messages_fts already exists SQLite parses the whole schema while preparing the FIRST statement on a connection, so on this class every statement fails before it runs: PRAGMA journal_mode (which is where SessionDB.__init__ actually trips, in apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and even DROP TABLE. The only operations that still work are PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain FTS-index rebuild at the _init_schema layer therefore cannot reach or fix this; the canonical sessions/messages rows are intact — only the derived schema is broken. Add a dedicated recovery that operates where the failure actually happens: - hermes_state.repair_state_db_schema(): backs up the raw file first, then a least-destructive ladder — (1) de-duplicate sqlite_master keeping the lowest rowid per object (preserves the existing FTS index), escalating to (2) drop every messages_fts* schema object + VACUUM and let the next open rebuild the FTS index from messages. sessions/messages are never modified. Plus is_malformed_db_error() to discriminate this class. - SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs once (process-guarded against loops / concurrent web_server opens) and reopens, so Desktop/Dashboard recover on their own instead of silently showing "no sessions". - hermes doctor --fix detects the malformed class and repairs it (reporting the recovered session count + backup name). - hermes sessions repair [--check-only] [--no-backup] runs on the raw file path, since SessionDB() itself cannot open a malformed DB. Supersedes #32589 and #33869: both targeted FTS corruption but gated their repair behind statements (integrity_check / SELECT / DROP TABLE) that themselves fail on this class, and neither addressed the apply_wal_with_fallback open-time failure. Credit preserved via Co-authored-by. Closes #33865. Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com> * test(state.db): cover strat-B escalation + unrepairable safe-fail paths --------- Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com> Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
hermes doctorcoverage for a real-world state.db failure mode where the canonicalsessions/messagestables are readable, but the derived FTS5 indexes used by session search are malformed or out of sync.When
~/.hermes/state.dbexists, doctor now:PRAGMA integrity_checkinstead of only counting sessions;messages_ftsandmessages_fts_trigramwithMATCHso SQLite traverses the inverted index, not just the content rows;messages;hermes doctor --fix, creates a consistent sqlite backup and rebuilds the derived FTS indexes frommessages.This keeps
messagesas the source of truth and only rebuilds derived search indexes.Motivation
I hit this locally on a live Hermes install:
The message history itself was intact, but session search/recall could become unreliable. The existing doctor check only did
SELECT COUNT(*) FROM sessions, so it reported state.db as healthy.Existing PRs checked
I found related open PRs and made this intentionally complementary rather than a duplicate:
SessionDBschema init using aMATCHprobe.SessionDBopen.This PR focuses on the user-facing diagnostics/repair path in
hermes doctor, so users can detect and repair the issue without waiting for a normalSessionDBstartup path to trip over it.Tests
Result: